如何将 TensorFlow 1.1x 检查点权重加载到 TF2.2 LSTM 层中 - 结果不同（Python、Keras）答案

【问题标题】：How to Load Tensorflow 1.1x Checkpoint weights into a TF2.2 LSTM layer - results are different (Python, Keras)如何将 TensorFlow 1.1x 检查点权重加载到 TF2.2 LSTM 层中 - 结果不同（Python、Keras）
【发布时间】：2021-04-03 16:15:51
【问题描述】：

我有一个旧的 TF1.1x 检查点，包括一个 LSTM 层，并且我还有一个早期运行的层激活，用于旧网络的每一层。我正在尝试使用 Python 在 TF2.2 和 Keras 中重新创建这个网络。旧网络中使用的层是 'tf.contrib.rnn.LSTMBlockFusedCell'。

我将检查点的 LSTM 内核权重拆分为相应的“内核”和“Recurrent_kernel”，并将它们分别加载到 TF2.2 中的 LSTM 层（以及“偏差”）。

但是，当我使用旧激活运行 model.prediction 时，与旧模型激活相比，我从新 LSTM 层得到完全不同的输出。

我只加载了上面的，即：Kernel、Recurrent_Kernel 和Bias weights。该层没有其他参数。

希望已经提取了下面代码sn-p中的要点：

# Create minimalistic Model, and Build it
#
modelC = keras.Sequential()
modelC.add( keras.layers.Reshape([-1,2048], name='l4_lstm' ))   
modelC.add( keras.layers.LSTM( units=2048 ) )

modelC.build(input_shape = (batch_size, 2048))


# Load Weights from Checkpoint Dictionary 'ckptdict', 
#
weights_ds = []
weights_ds.append(ckptdict['lstm_fused_cell/kernel'][:2048] ) # "W" 
weights_ds.append(ckptdict['lstm_fused_cell/kernel'][2048:] ) # "U" 
weights_ds.append(ckptdict['lstm_fused_cell/bias'])           # "b" 
modelC.set_weights(weights_ds)

# Run the minimal model on Activations from last layer before LSTM 
# (data corresponding to the Checkpointed TF1.1x model)
#
l3pred = modelC.predict( l3 )

# At this point, l3pred is wildly different from the TF1.1x version,
#

对于网络的其他层，导入权重的类似方法可以正常工作（== 与旧激活的结果相同），这些层都是“密集”的，但 LSTM 层让我望而却步。

谁能指出解释如何正确导入和运行 LSTM 层的描述？非常感谢！

（2019 年 7 月出现了类似的问题，但我还没有看到答案。）

【问题讨论】：

标签： python-3.x tensorflow keras lstm

【解决方案1】：

我前两天进入这个，发现了几件事：

tf1 和 tf2 中的 tf.keras.layers.LSTM 为 'recurrent_activation' 有不同的默认参数，在 tf1 中此参数默认为 'hard_sigmoid'，而在 tf2 中默认为 'sigmoid'
tf.contrib.rnn.LSTMBlockCell 和 tf.contrib.rnn.LSTMBlockFusedCell 与 tf.keras.layers.LSTM 中的权重有不同的权重顺序

对于 Blocked LSTM，顺序为 [Wi, Wci, Wf, Wo]

对于 LSTM，顺序为 [Wi, Wf, Wci, Wo] 偏差的顺序与wiight的顺序一致。

至于你的问题，我认为你应该在将它们加载到 tf2 LSTM 之前调整权重和偏差的顺序。

简单地说，通过方程式：

xh = [x, h_prev]

在 tf.keras.layers.LSTM 中

[i, f, ci, o] = xh * W + b

在 tf.contrib.rnn.LSTMBlockCell 中

[i, ci, f, o] = xh * W + b

以下计算相同：

f = f + forget_bias
if not use_peephole:
  wci = wcf = wco = 0

i = sigmoid(cs_prev * wci + i)
f = sigmoid(cs_prev * wcf + f)
ci = tanh(ci)

cs = ci .* i + cs_prev .* f
cs = clip(cs, cell_clip)

o = sigmoid(cs * wco + o)
co = tanh(cs)
h = co .* o

最后，带有 numpy 函数的代码：

# kernel in LSTMBlockCell (shape is [input_size + units, 4 * units])
def adjust_order(var):
  i, ci, f, o = np.split(var, 4, axis=-1)
  return np.concatenate([i, f, ci, o], axis=-1)
new_kernel = adjust_order(kernel)
new_bias = adjust_order(bias)

lstm_layer.set_weights([new_kernel[:input_size, :], new_kernel[input_size:, :], bias])

【讨论】：

感谢您的明确答复 - 由于我随后在处理过程中出现错误，我不确定这是否解决了我的 Checkpoint 读取问题，但是让您等待而没有任何响应，我觉得不礼貌。
谢谢。我的经验是解决一个最小的问题并深入研究它，我认为这可以解决任何问题。