【发布时间】:2018-07-12 09:32:27
【问题描述】:
我正在尝试实现 DeepMind 用来训练 AI 玩 Atari 游戏的深度 q 学习程序。他们使用并在多个教程中提到的功能之一是拥有两个版本的神经网络;一个在您循环浏览小批量训练数据时更新(称为 Q),另一个在您这样做时调用以帮助构建训练数据(Q')。然后定期(比如每 10k 个数据点)将 Q' 中的权重设置为 Q 的当前值。
我的问题是在 TensorFlow 中执行此操作的最佳方法是什么?既要同时存储两个相同的架构网络,又要定期更新彼此的权重?我当前的网络如下所示,目前仅使用默认图表和交互式会话。
sess = tf.InteractiveSession()
x = tf.placeholder(tf.float32, shape=[None, height, width, m])
y_ = tf.placeholder(tf.float32, shape=[None, env.action_space.n])
W_conv1 = weight_variable([8, 8, 4, 32])
b_conv1 = bias_variable([32])
h_conv1 = tf.nn.relu(conv2d(x, W_conv1, 4, 4) + b_conv1)
W_conv2 = weight_variable([4, 4, 32, 64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_conv1, W_conv2, 2, 2) + b_conv2)
W_conv3 = weight_variable([3, 3, 64, 64])
b_conv3 = bias_variable([64])
h_conv3 = tf.nn.relu(conv2d(h_conv2, W_conv3, 1, 1) + b_conv3)
# Flattern conv to dense
flat_input_size = 14*10*64
h_conv3_reshape = tf.reshape(h_conv3, [-1, flat_input_size])
# Dense layers
W_fc1 = weight_variable([flat_input_size, 512])
b_fc1 = bias_variable([512])
h_fc1 = tf.nn.relu(tf.matmul(h_conv3_reshape, W_fc1) + b_fc1)
W_fc2 = weight_variable([512, env.action_space.n])
b_fc2 = bias_variable([env.action_space.n])
y_conv = tf.matmul(h_fc1, W_fc2) + b_fc2
accuracy = tf.squared_difference(y_, y_conv)
loss = tf.reduce_mean(accuracy)
optimizer = tf.train.AdamOptimizer(0.0001).minimize(loss)
tf.global_variables_initializer().run()
【问题讨论】:
标签: python tensorflow neural-network reinforcement-learning q-learning