有没有办法在张量流中剪辑中间爆炸梯度答案

【问题标题】：Is there a way to clip intermediate exploded gradients in tensorflow有没有办法在张量流中剪辑中间爆炸梯度
【发布时间】：2017-02-21 01:59:21
【问题描述】：

问题：一个很长的 RNN 网络

N1 -- N2 -- ... --- N100

对于像AdamOptimizer 这样的优化器，compute_gradient() 将为所有训练变量提供梯度。

但是，它可能会在某些步骤中爆炸。

但是如何剪辑那些中间的呢？

一种方法可能是从“N100 --> N99”手动执行反向传播，剪切渐变，然后“N99 --> N98”等等，但这太复杂了。

所以我的问题是：有没有更简单的方法来剪辑中间渐变？（当然，严格来说，它们不再是数学意义上的梯度）

【问题讨论】：

【解决方案1】：

您可以使用custom_gradient 装饰器制作tf.identity 的一个版本，该版本会剪辑中间的爆炸渐变。

``` 从 tensorflow.contrib.eager.python 导入 tfe

@tfe.custom_gradient def gradient_clipping_identity（张量，max_norm）：结果 = tf.identity(张量)

def grad（结果）： return tf.clip_by_norm(dresult, max_norm), None

返回结果，毕业生 ```

然后使用gradient_clipping_identity，就像您通常使用标识一样，您的渐变将在反向传递中被剪裁。

【讨论】：

【解决方案2】：

@tf.custom_gradient
def gradient_clipping(x):
  return x, lambda dy: tf.clip_by_norm(dy, 10.0)

【讨论】：