通过嵌套的 tf.map_fn 反向传播梯度答案

【问题标题】：Backpropagating gradients through nested tf.map_fn通过嵌套的 tf.map_fn 反向传播梯度
【发布时间】：2019-11-28 18:58:18
【问题描述】：

我想在每个向量上映射一个 TensorFlow 函数，该向量对应于维度为 [batch_size, H, W, n_channels] 的矩阵中每个像素的深度通道。

换句话说，对于我在批次中拥有的每个 H x W 大小的图像：

我提取了一些具有相同大小H x W的特征图F_k（其数量为n_channels）（因此，所有特征图都是形状张量[H, W, n_channels];
然后，我希望将自定义函数应用于与 i-th 行和 j-th 关联的向量 v_ij每个特征图 F_k 的列，但探索整个深度通道（例如，v 的维度为 [1 x 1 x n_channels]）。理想情况下，所有这些都将并行发生。

可以在下面找到解释该过程的图片。与图片的唯一区别是输入和输出“感受野”的大小均为 1x1（将函数单独应用于每个像素）。

这类似于对矩阵应用 1x1 卷积；但是，我需要在深度通道上应用更通用的函数，而不是简单的求和运算。

我认为tf.map_fn() 可能是一个选项，我尝试了以下解决方案，我递归地使用tf.map_fn() 来访问与每个像素关联的功能。但是，这种方法似乎不是最理想的，最重要的是在尝试反向传播梯度时会引发错误。

您知道发生这种情况的原因以及我应该如何构造我的代码以避免错误？

这是我目前对该功能的实现：

import tensorflow as tf
from tensorflow import layers


def apply_function_on_pixel_features(incoming):
    # at first the input is [None, W, H, n_channels]
    if len(incoming.get_shape()) > 1:
        return tf.map_fn(lambda x: apply_function_on_pixel_features(x), incoming)
    else:
        # here the input is [n_channels]
        # apply some function that applies a transfomration and returns a vetor of the same size
        output = my_custom_fun(incoming) # my_custom_fun() doesn't change the shape
        return output

以及我的代码主体：

H = 128
W = 132
n_channels = 8

x1 = tf.placeholder(tf.float32, [None, H, W, 1])
x2 = layers.conv2d(x1, filters=n_channels, kernel_size=3, padding='same')

# now apply a function to the features vector associated to each pixel
x3 = apply_function_on_pixel_features(x2)  
x4 = tf.nn.softmax(x3)

loss = cross_entropy(x4, labels)
optimizer = tf.train.AdamOptimizer(lr)
train_op = optimizer.minimize(loss)  # <--- ERROR HERE!

具体错误如下：

File "/home/venvs/tensorflowGPU/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2481, in AddOp
    self._AddOpInternal(op)

File "/home/venvs/tensorflowGPU/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2509, in _AddOpInternal
    self._MaybeAddControlDependency(op)
File "/home/venvs/tensorflowGPU/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2547, in _MaybeAddControlDependency
    op._add_control_input(self.GetControlPivot().op)

AttributeError: 'NoneType' object has no attribute 'op'

整个错误堆栈和代码可以在here找到。感谢您的帮助，

更新：

根据@thushv89 的建议，我添加了一个可能的解决方案。我仍然不知道为什么我以前的代码不起作用。对此的任何见解仍然非常感谢。

【问题讨论】：

见stackoverflow.com/questions/49977236/tensorflow-broadcasting
@geometrikal 谢谢你的回答。恐怕我没有很好地解释这个问题。我更新了这个问题，所以也许更清楚了。如果您仍然认为广播是最好的选择，请您更好地解释一下如何在我的情况下使用它？（我没听懂）
我用我当前的代码和问题更新了问题
我认为该错误与 if 语句和 apply 函数中的递归有关。你能分享你申请的确切功能吗？我认为广播可以用于基本数学，而其他一些 tensorflow 函数需要一个轴参数。我不确定是否可以应用任何功能。
@gabriele，看着图像，您似乎试图在特征图中的每个像素上应用一些自定义函数？这是正确的，如果是这样，你为什么需要递归？只需进行一次 reshape，执行 map_fn 并再次进行 reshape 恢复到原始形状？

标签： tensorflow nested gradient backpropagation map-function

【解决方案1】：

@gabriele 关于必须依赖 batch_size，您是否尝试过以下方式？此函数不依赖于 batch_size。您可以将map_fn 替换为您喜欢的任何内容。

def apply_function_on_pixel_features(incoming):

    # get input shape:
    _, W, H, C = incoming.get_shape().as_list()
    incoming_flat = tf.reshape(incoming, shape=[-1, C])

    # apply function on every vector of shape [1, C]
    out_matrix = tf.map_fn(lambda x: x+1, incoming_flat)  # dimension remains unchanged

    # go back to the input shape shape [None, W, H, C]
    out_matrix = tf.reshape(out_matrix, shape=[-1, W, H, C])

    return out_matrix

我测试的完整代码如下。

import numpy as np
import tensorflow as tf
from tensorflow.keras.losses import categorical_crossentropy

def apply_function_on_pixel_features(incoming):

    # get input shape:
    _, W, H, C = incoming.get_shape().as_list()
    incoming_flat = tf.reshape(incoming, shape=[-1])

    # apply function on every vector of shape [1, C]
    out_matrix = tf.map_fn(lambda x: x+1, incoming_flat)  # dimension remains unchanged

    # go back to the input shape shape [None, W, H, C]
    out_matrix = tf.reshape(out_matrix, shape=[-1, W, H, C])

    return out_matrix

H = 32
W = 32
x1 = tf.placeholder(tf.float32, [None, H, W, 1])
labels = tf.placeholder(tf.float32, [None, 10])
x2 = tf.layers.conv2d(x1, filters=1, kernel_size=3, padding='same')

# now apply a function to the features vector associated to each pixel
x3 = apply_function_on_pixel_features(x2)  
x4 = tf.layers.flatten(x3)
x4 = tf.layers.dense(x4, units=10, activation='softmax')

loss = categorical_crossentropy(labels, x4)
optimizer = tf.train.AdamOptimizer(0.001)
train_op = optimizer.minimize(loss)


x = np.zeros(shape=(10, H, W, 1))
y = np.random.choice([0,1], size=(10, 10))


with tf.Session() as sess:
  tf.global_variables_initializer().run()
  sess.run(train_op, feed_dict={x1: x, labels:y})

【讨论】：

嗨@thushv89，感谢您的建议。但是，根据您的建议，我会将张量重塑为具有形状 [-1] 而不是 [batch_size * W * H, C] （这是我需要将该函数一致地应用于每个像素的所有特征）。另外，我认为重塑为 [-1, C] 然后是 [-1, W, H, C] 给了我一个错误。 TensorFlow 似乎在抱怨，因为它无法将形状未知的对象转换为张量。
@gabriele，实际上对我来说工作得很好。你有错误吗？
嗨@thushv89，很抱歉回复晚了，但我之前无法测试。我再次尝试用 -1 替换 batch_size ，现在它似乎工作了。可能我有事。感谢您的帮助！ :) 你应该使用shape=[-1, C] 更新incoming_flat = tf.reshape(incoming, shape=[-1]) 行，这就是我想要获得的，然后我会给你分配赏金
@gabriele，很高兴听到这个消息。更新了我的答案。 :)

【解决方案2】：

按照@thushv89 的建议，我重新塑造了数组，应用了函数，然后重新塑造了它（以避免tf.map_fn 递归）。我仍然不知道为什么之前的代码不起作用，但是当前的实现允许将渐变传播回之前的层。我会把它留在下面，谁可能感兴趣：

def apply_function_on_pixel_features(incoming, batch_size):

    # get input shape:
    _, W, H, C = incoming.get_shape().as_list()
    incoming_flat = tf.reshape(incoming, shape=[batch_size * W * H, C])

    # apply function on every vector of shape [1, C]
    out_matrix = my_custom_fun(incoming_flat)  # dimension remains unchanged

    # go back to the input shape shape [None, W, H, C]
    out_shape = tf.convert_to_tensor([batch_size, W, H, C])
    out_matrix = tf.reshape(out_matrix, shape=out_shape)

    return out_matrix

请注意，现在我需要提供批量大小以正确重塑张量，因为如果我将 None 或 -1 作为维度，TensorFlow 会报错。

对于上述代码的任何 cmets 和见解仍然非常感谢。

【讨论】：