ConvLSTMCell 的 TensorFlow 错误：输入的尺寸应匹配答案

【问题标题】：Tensorflow error with ConvLSTMCell: Dimensions of inputs should matchConvLSTMCell 的 TensorFlow 错误：输入的尺寸应匹配
【发布时间】：2018-02-22 07:12:54
【问题描述】：

我尝试根据 Tensorflow 文档输入 ConvLSTMCell 输入参数，但仍然收到此错误：

InvalidArgumentError: ConcatOp : Dimensions of inputs should match: shape[0] = [10,64,64,1] vs. shape[1] = [1,64,64,16]
     [[Node: rnn/while/rnn/Encoder_1/concat = ConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](rnn/while/TensorArrayReadV3, rnn/while/Switch_4:1, rnn/while/rnn/Encoder_1/split/split_dim)]]

我的代码是：

num_channels = 1
img_size = 64
filter_size1 = 5 
num_filters1 = 16
#If time_major == True, this must be a Tensor of shape: [max_time, batch_size, ...], or a nested tuple of such elements.
x = tf.placeholder(tf.float32, shape=[None,1, img_size, img_size, num_channels], name='x')
InputShape = [img_size,img_size, 1]
encoder_1_KernelShape = [filter_size1,filter_size1]
# create a ConvLSTMCell
rnn_cell = ConvLSTMCell(2, InputShape, num_filters1, encoder_1_KernelShape, use_bias=True, forget_bias=1.0, name='Encoder_1')

# 'outputs' is a tensor of shape [batch_size, max_time, cell_state_size]

# defining initial state
#initial_state = rnn_cell.zero_state(batch_size, dtype=tf.float32)
initial_state = rnn_cell.zero_state(1, dtype=tf.float32)
# 'state' is a tensor of shape [batch_size, cell_state_size]
encoder_1_outputs, encoder_1_state = tf.nn.dynamic_rnn(rnn_cell, x,
                                   initial_state=initial_state,
                                   dtype=tf.float32)

for i in range(2):
    x_train = data_3[0:10, i, :, :]
    x_train = x_train.flatten()
    x_train = x_train.reshape([10, 1, img_size, img_size, 1])
    x_train = np.float32(x_train)
    feed_dict_train = {x: x_train}

【问题讨论】：

同样的问题已在此链接中得到解答：stackoverflow.com/questions/41088064/…

标签： python tensorflow lstm convolution recurrent-neural-network

【解决方案1】：

试试这个：

num_channels = 1
img_size = 64
filter_size1 = 5
num_filters1 = 16

x = tf.placeholder(tf.float32, shape=[None,None,img_size,img_size,num_channels],
                   name='x')
InputShape = [img_size, img_size, num_channels]
encoder_1_KernelShape = [filter_size1, filter_size1]
rnn_cell = ConvLSTMCell(2, InputShape, num_filters1, encoder_1_KernelShape,
                        use_bias=True, forget_bias=1.0, name='Encoder_1')

initial_state = rnn_cell.zero_state(10, dtype=tf.float32)
encoder_1_outputs, encoder_1_state = tf.nn.dynamic_rnn(rnn_cell, x,
                                                       initial_state=initial_state,
                                                       dtype=tf.float32)

with tf.Session() as sess:
  sess.run(tf.global_variables_initializer())
  x_train = np.zeros([10, 1, img_size, img_size, num_channels], dtype=np.float32)
  sess.run(encoder_1_outputs, feed_dict={x: x_train})

请注意，x 中的第一个维度是 batch_size（在示例中等于 10），第二个维度是 sequence_num（等于 1）。

【讨论】：

谢谢！您的代码有效。所以如果我想同时输入 10 个序列，那么 x 的第二维是 10？ @马克西姆
另一个问题：我不应该使用“rnn_cell.LSTMCell.zero_state(batch_size, dtype)”而不是“rnn_cell.zero_state”，如果是，那么第一个输入是一个帧数序列或我每次输入的序列数？非常感谢@Maxim
batch_size 和 sequence_num 是独立的，所以你可以将 1 更改为任意值，它只会影响 RNN 的长度（来自dynamic_rnn 的单元格数）。并且使用rnn_cell.zero_state更好，因为输入状态应该对应dynamic_rnn中的单元格参数
使用“encoder_1_state.h”的这个设置我得到：。但我想要序列的全部 10 张图片的一个表示，换句话说，我想要模型学习数据中的时间趋势。 @马克西姆
仔细看：10 是批量大小，而不是序列长度。状态总是按顺序累积所有图片。