【发布时间】:2017-06-08 20:23:53
【问题描述】:
我正在研究一个 DDPG 实现,它需要计算一个网络(下:critic)相对于另一个网络(下:actor)输出的梯度。我的代码在大多数情况下已经使用队列而不是提要字典,但我还不能针对这个特定部分这样做:
import tensorflow as tf
tf.reset_default_graph()
states = tf.placeholder(tf.float32, (None,))
actions = tf.placeholder(tf.float32, (None,))
actor = states * 1
critic = states * 1 + actions
grads_indirect = tf.gradients(critic, actions)
grads_direct = tf.gradients(critic, actor)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
act = sess.run(actor, {states: [1.]})
print(act) # -> [1.]
cri = sess.run(critic, {states: [1.], actions: [2.]})
print(cri) # -> [3.]
grad1 = sess.run(grads_indirect, {states: [1.], actions: act})
print(grad1) # -> [[1.]]
grad2 = sess.run(grads_direct, {states: [1.], actions: [2.]})
print(grad2) # -> TypeError: Fetch argument has invalid type 'NoneType'
grad1 在这里计算梯度 w.r.t。到之前由actor 计算的馈入动作。 grad2 应该做同样的事情,但直接在图表内部,不需要反馈动作,而是直接评估 actor。问题是grads_direct 是None:
print(grads_direct) # [None]
我怎样才能做到这一点?我可以使用专门的“评估这个张量”操作吗?谢谢!
【问题讨论】:
标签: graph tensorflow reinforcement-learning gradient