RecoGym 数据集来自？答案

【问题标题】：RecoGym dataset is from?RecoGym 数据集来自？
【发布时间】：2019-03-04 11:07:47
【问题描述】：

我正在尝试通过强化在线购物系统（我拥有其中的数据）来对学习算法进行分类。

为此，我决定使用 RecoGym，但我找不到将自己的数据放入其中的方法。它们是纯粹发明的吗？有没有办法让强化算法仅根据我拥有的历史数据进行学习？

我附上 RecoGym 使用代码，看看你能不能看到。

import gym, reco_gym

# env_0_args is a dictionary of default parameters (i.e. number of products)
from reco_gym import env_1_args

# you can overwrite environment arguments here:
env_1_args['random_seed'] = 42

# initialize the gym for the first time by calling .make() and .init_gym()
env = gym.make('reco-gym-v1')
env.init_gym(env_1_args)

# .reset() env before each episode (one episode per user)
env.reset()
done = False

# counting how many steps
i = 0 

while not done:
    action, observation, reward, done, info = env.step_offline()
    print(f"Step: {i} - Action: {action} - Observation: {observation} - Reward: {reward}")
    i += 1

# instantiate instance of PopularityAgent class
num_products = 10
agent = PopularityAgent(num_products)

# resets random seed back to 42, or whatever we set it to in env_0_args
env.reset_random_seed()

# train on 1000 users offline
num_offline_users = 1000

for _ in range(num_offline_users):

    #reset env and set done to False
    env.reset()
    done = False

    while not done:
        old_observation = observation
        action, observation, reward, done, info = env.step_offline()
        agent.train(old_observation, action, reward, done)

# train on 100 users online and track click through rate
num_online_users = 100
num_clicks, num_events = 0, 0

for _ in range(num_online_users):

    #reset env and set done to False
    env.reset()
    observation, _, done, _ = env.step(None)
    reward = None
    done = None
    while not done:
        action = agent.act(observation, reward, done)
        observation, reward, done, info = env.step(action)

        # used for calculating click through rate
        num_clicks += 1 if reward == 1 and reward is not None else 0
        num_events += 1

ctr = num_clicks / num_events


print(f"Click Through Rate: {ctr:.4f}")

环境的论文在这里：https://arxiv.org/pdf/1808.00720.pdf

【问题讨论】：

标签： python recommendation-engine reinforcement-learning openai-gym

【解决方案1】：

数据纯属模拟，我们认为是合理的，但这纯粹是一个判断。您会发现，使用现实世界的数据，您只会记录过去的操作以及它们的执行情况。这使得评估执行不同操作的算法变得困难。虽然您可以使用逆倾向得分 (IPS)，但对于许多重要的应用程序来说，它通常会产生不可接受的噪音。

RecoGym 的作用是帮助您使用模拟 AB 测试评估算法。它包含一些您可以尝试的代理（并且正在添加更多代理），但它的目的不是为您的问题提供开箱即用的解决方案，而是帮助您测试和评估算法的沙盒。

【讨论】：

非常感谢。我将基于这个 RL 环境在 9 月份提交我的硕士论文。我可以从你那里得到一些电子邮件，以确保你真的是 David Rohde 并将其包含在我的演示文稿中吗？提前谢谢你（我不知道如何私下为这个平台说话，所以我可以给你我的电子邮件，所以欢迎任何建议）。
随时给我发电子邮件。它在纸上