强化学习框架RLlib教程004：Training APIs的使用（三）高级pythonAPI

　　定制训练流程（Custom Training Workflows）

　　全局协调（Global Coordination）

　　回调函数和自定义准则（Callbacks and Custom Metrics）

　　可视化自定义的度量（Visualizing Custom Metrics）

　　自定义探索行为（Customizing Exploration Behavior）

　　训练过程中自定义评估（Customized Evaluation During Training）

　　重写轨迹（Rewriting Trajectories）

　　课程式学习（Curriculum Learning）

　　参考资料

定制训练流程（Custom Training Workflows）

在基础的training例子中，Tune在每一次训练迭代中都会调用一次train()并且报告和返回新的训练结果。有时候我们相对整个训练过程进行控制，但是还想在Tune内运行。Tune支持自定义的训练函数可以用来实现自定义训练工作流

即便是非常细粒度的训练过程的控制，你可以使用RLlib的低阶的building blocks直接构建一个完全定制化的训练工作流。

返回目录

全局协调（Global Coordination）

有时，我们需要协调运行在不同进程中的代码。比如，维护一个全局变量，或者policies使用的超参数。Ray提供了一个通用的方式来实现，即actors。这些actors被分配一个全局名字，并且对他们的处理可以通过这个名字获取。例如，想维护一个共享的全局计数器，他根据环境做累加，并且由driver程序在不同时期读取：

import os
os.environ["CUDA_VISIBLE_DEVICES"] = '3'
import ray
import numpy as np
import ray.rllib.agents.ppo as ppo
from ray.tune.logger import pretty_print
ray.init()
import gym
# Get a reference to the policy
from ray.rllib.agents.ppo import PPOTrainer

@ray.remote
class Counter:
   def __init__(self):
      self.count = 0
   def inc(self, n):
      self.count += n
   def get(self):
      return self.count

# on the driver
counter = Counter.options(name="global_counter").remote()
print(ray.get(counter.get.remote()))  # get the latest count

# in your envs
counter = ray.get_actor("global_counter")
counter.inc.remote(1)  # async call to increment the global count
print(ray.get(counter.get.remote()))  # get the latest count

View Code