【发布时间】:2021-02-22 20:32:10
【问题描述】:
为什么tf.contrib.layers.instance_norm层包含StopGradient操作?即为什么需要它?
似乎即使在更简单的层tf.nn.moments 中也有StopGradient(可以是tf.contrib.layers.instance_norm 的构建块)。
x_m, x_v = tf.nn.moments(x, [1, 2], keep_dims=True)
我还在tf.nn.moments源代码中找到了关于StopGradient的注释:
# The dynamic range of fp16 is too limited to support the collection of
# sufficient statistics. As a workaround we simply perform the operations
# on 32-bit floats before converting the mean and variance back to fp16
y = math_ops.cast(x, dtypes.float32) if x.dtype == dtypes.float16 else x
# Compute true mean while keeping the dims for proper broadcasting.
mean = math_ops.reduce_mean(y, axes, keepdims=True, name="mean")
# sample variance, not unbiased variance
# Note: stop_gradient does not change the gradient that gets
# backpropagated to the mean from the variance calculation,
# because that gradient is zero
variance = math_ops.reduce_mean(
math_ops.squared_difference(y, array_ops.stop_gradient(mean)),
axes,
keepdims=True,
name="variance")
所以这是一种优化,因为梯度总是为零?
【问题讨论】:
标签: tensorflow deep-learning batch-normalization