在 tf.function 内循环，无法计算梯度答案

【问题标题】：Loop inside tf.function, impossible to compute gradients在 tf.function 内循环，无法计算梯度
【发布时间】：2021-03-18 18:00:37
【问题描述】：

我正在尝试针对在 tf.function 中的循环内定义的一些变量计算梯度，但是我总是得到 None 结果。这是一个复制问题的基本示例：

@tf.function
def problem():
  test = tf.constant(1.0)

  with tf.GradientTape() as tape:     
    for i in tf.range(5):
      test1 = test
      tape.watch(test1)
      test2 = test1

  grad = tape.gradient(test2, test1)

  return grad

print(problem())    #None

当然，在这种特殊情况下，我什至不需要循环。但是，在更一般的情况下，我想在循环期间将 test1 变量（可能还有其他变量）存储在 TensorArray （或类似结构）中，然后计算相对于这些变量的梯度。这可能吗？

【问题讨论】：

标签： tensorflow tensorflow2.0

【解决方案1】：

使用.range() 可能有效，但我认为当您编写tf 操作时，调用np 可能会阻止该操作在GPU 上运行。不过问题之前提过，请查看here。

我可以分享两种可能的解决方法。其中之一是在渐变胶带外使用tf. range。

@tf.function
def problem(step):
    test = tf.constant(1.0)
    for i in tf.range(step):
        with tf.GradientTape() as tape:    
            tape.watch(test)
            test1 = test
            test2 = test1
        grad = tape.gradient(test2, test1)
    return grad

print(problem(5)) # 1

或者在没有@tf.function的情况下运行。

def problem(step):
    test = tf.constant(1.0)
    with tf.GradientTape() as tape:   
        for _ in tf.range(step):
            tape.watch(test)
            test1 = test
            test2 = test1
    grad = tape.gradient(test2, test1)
    return grad

print(problem(5))  # 1

或者，您可以查看tf.while_loop。

更新

.range 很好用。它不会造成任何重大瓶颈。

【讨论】：

谢谢。在磁带外使用 tf.range 并不是很有用，因为循环只是代码的一部分，我最终需要计算在循环之后定义的东西的梯度。在没有 tf.function 的情况下运行可以工作，但会急切地评估，因此当有很多步骤时它会慢得多。
然后使用.range()。
tf.range() 没有 tf.function() 对运行时间没有影响。它们只有在它们一起使用时才真正有用。
我明白了。我没有观察到正确的代码库。通常在自定义训练循环中（无论是tf 还是pytorch），通常使用for .. range(epoch)。另一方面，图模型的编译速度应该比 Eager 模式快得多。

【解决方案2】：

我无法完全解释为什么会这样，但您可以将 tf.range 替换为 range 以获得所需的结果。请注意，您可以保留原始代码，但删除 tf.function 装饰器，您会得到相同的结果。

import tensorflow as tf

@tf.function
def problem():
  test = tf.constant(1.0)
  with tf.GradientTape() as tape:     
    for i in range(5):
      test1 = test
      tape.watch(test1)
      test2 = test1

  grad = tape.gradient(test2, test1)

  return grad

print(problem())  # tf.Tensor(1.0, shape=(), dtype=float32)

【讨论】：

是的，这两种解决方案都可以工作，但是当有很多步骤时，它们会使代码运行速度变慢。