TensorFlow |在运行之间存储梯度答案

【问题标题】：TensorFlow | Store gradient in-between runsTensorFlow |在运行之间存储梯度
【发布时间】：2018-09-13 13:11:45
【问题描述】：

出于说明目的，假设我有一个简单的 LSTM 网络和一个输入序列X = (X1, ..., XT)

input Xt = (x1,...,xn) --> [LSTM] --> [output_layer] --> output(y1,...,yk)

有没有一种方法可以为网络提供单独的时间步输入，然后在最后调用 training_op？我想要实现的伪代码：

# Define computational graph
x = tf.placeholder(tf.float32, [batch_size, num_features])
y = tf.placeholder(tf.float32, [batch_size, output_size])
lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
state = tf.placeholder(tf.float32, [batch_size, lstm.state_size])
lstm_output, state = lstm(x, state)
output = tf.nn.dense(lstm_output, units=units)
loss = tf.losses.mean_squared_error(y, output)
train_op = tf.train.AdamOptimizer(lr).minimize(loss)

# Train loop
with tf.Session() as sess:
  for batch in batches:
    state = np.zeros(...)
    for timestep in batch:
      feed_dict = construct_feed_dict(timestep, state)
      out, _ = sess.run([output, loss], feed_dict)
      # Defer the weight update until the end of sequence
    sess.run(train_op, feed_dict=???)

我的理解是，返回的值是基本的 numpy 数组，因此如果我稍后再次将它们作为输入的一部分提供给网络，则有关该值计算的信息会丢失。

我很清楚我可以以 [total_timesteps, batch_size, num_features] 的形式提供输入。但是，我发现自己处于无法采用这种方法的情况： 1) 下一个时间步长输入是从网络输出f(y_t-1) 创建的。 2) LSTM 单元的隐藏状态在每个时间步都作为输入馈送到另一层。

【问题讨论】：

标签： python-3.x tensorflow machine-learning

【解决方案1】：

我通过实现自己的raw_rnn 单元成功地实现了这一点。

【讨论】：