【问题标题】:Tensorflow: tf.gradients between different paths of the graphTensorflow:图的不同路径之间的tf.gradients
【发布时间】:2017-06-08 20:23:53
【问题描述】:

我正在研究一个 DDPG 实现,它需要计算一个网络(下:critic)相对于另一个网络(下:actor)输出的梯度。我的代码在大多数情况下已经使用队列而不是提要字典,但我还不能针对这个特定部分这样做:

import tensorflow as tf
tf.reset_default_graph()

states = tf.placeholder(tf.float32, (None,))
actions = tf.placeholder(tf.float32, (None,))

actor = states * 1
critic = states * 1 + actions

grads_indirect = tf.gradients(critic, actions)
grads_direct = tf.gradients(critic, actor)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    act = sess.run(actor, {states: [1.]})
    print(act)  # -> [1.]
    cri = sess.run(critic, {states: [1.], actions: [2.]})
    print(cri)  # -> [3.]
    grad1 = sess.run(grads_indirect, {states: [1.], actions: act})
    print(grad1)  # -> [[1.]]
    grad2 = sess.run(grads_direct, {states: [1.], actions: [2.]})
    print(grad2)  # -> TypeError: Fetch argument has invalid type 'NoneType'

grad1 在这里计算梯度 w.r.t。到之前由actor 计算的馈入动作。 grad2 应该做同样的事情,但直接在图表内部,不需要反馈动作,而是直接评估 actor。问题是grads_directNone

print(grads_direct)  # [None]

我怎样才能做到这一点?我可以使用专门的“评估这个张量”操作吗?谢谢!

【问题讨论】:

    标签: graph tensorflow reinforcement-learning gradient


    【解决方案1】:

    在您的示例中,您不使用actor 来计算critic,因此梯度为无。

    你应该这样做:

    actor = states * 1
    critic = actor + actions  # change here
    
    grads_indirect = tf.gradients(critic, actions)
    grads_direct = tf.gradients(critic, actor)
    

    【讨论】:

      猜你喜欢
      • 2014-10-24
      • 1970-01-01
      • 2019-05-08
      • 1970-01-01
      • 1970-01-01
      • 2019-01-27
      • 1970-01-01
      • 1970-01-01
      • 2017-08-10
      相关资源
      最近更新 更多