openai健身房观察空间表示答案

【问题标题】：openai gym observation space representationopenai健身房观察空间表示
【发布时间】：2021-06-24 10:08:27
【问题描述】：

我有一个关于在健身房环境中表示观察的问题。我实际上有几个不同尺寸的观察空间，例如我有一个 24x24 像素的相机，然后是一个 1x25 值的 X 射线机，然后是 10 个温度传感器，所以 1x1 10 次。所以目前我用空格来表示它。字典用一些空格来封装连续值。框

class MyEnv(gym.Env):
    def __init__(self, ...):
        spaces = {
                'xray': gym.spaces.Box(low=-np.inf, high=np.inf, shape=(nbcaptors, )),
                'cam1': gym.spaces.Box(low=-np.inf, high=np.inf, shape=(cam1width, cam1height)),
                'cam2': gym.spaces.Box(low=-np.inf, high=np.inf, shape=(cam2width, cam2height)),
                'thermal': gym.spaces.Box(low=-np.inf, high=np.inf, shape=(thermalwidth, thermalheight))
                    }
        self.observation_space = gym.spaces.Dict(spaces)

然后，自定义代理可以通过以下方式处理数据：观察['cam1'] 或观察['xray'] 等...

问题是当我想使用第三方算法时，例如来自 stable-baselines3，它们不支持空格。字典。所以我的问题是：如何解决这个问题？我应该只用一个 1xn 框来表示我的 obervation_space，例如：

self.observation_space = 
    gym.spaces.Box(low=-np.inf, high=np.inf, 
                   shape=(nbcaptors*cam1width*cam1height*cam2width*cam2height*thermalwidth*thermalheight,)

这有意义吗？即使确实如此，我也发现这种方法存在 3 个问题：

我的一维空间的低和高可能不够好，因为例如其他空间可能有一些定义的界限。
在实现中会更容易出错
真的是 2d 矩阵，所以我必须将 4 个矩阵转换为 1d obervation_space 中的一个位置，然后自定义代理必须从 1d 观察重建 4 个矩阵。最初的快速非基于 RL 的实现已经需要很长时间才能运行，所以我担心这种开销会减慢速度。

此时我只看到两条路：

将我的所有 4 个矩阵映射到一维数组
用另一个gym.Env封装我的spaces.Dict gym.Env，它将处理从spaces.Dict到spaces.Box的转换，并根据我想使用自定义代理还是第三方来使用一个代理或另一个代理一个。

在性能和简单性方面，如果您能提供一些关于如何最好地解决这个问题的意见，我们将不胜感激。

谢谢！

【问题讨论】：

标签： python reinforcement-learning

【解决方案1】：

实际上，封装部分似乎正是 OpenAI 的好人所做的：

from gym.wrappers import FlattenObservation
from gym.spaces.utils import unflatten
wrapped_env = FlattenObservation(env)
obs1 = wrapped_env.reset()
unflatted_obs = unflatten(wrapped_env.unwrapped.observation_space, obs1)

【讨论】：

【解决方案2】：

你可以试试这个方法来定义观察空间：

low = np.array([nbcaptors_low, cam1width_low, cam1height_low, cam2width_low, cam2height_low, thermalwidth_low, thermalheight_low])
high = np.array([nbcaptors_high, cam1width_high, cam1height_high, cam2width_high, cam2height_high, thermalwidth_high, thermalheight_high])

self.observation_space = gym.spaces.Box(low=-np.inf, high=np.inf)

【讨论】：