根据您的评论:
[The] 我拥有的数据就像 t-48, t-47, t-46, ..... , t-1 作为过去的数据和
t+1, t+2, ......, t+12 作为我要预测的值
您可能根本不需要使用TimeDistributed 层:
首先,只需删除 LSTM 层的resturn_sequences=True 参数。完成后,LSTM 层会将过去的输入时间序列编码为形状为(50,) 的向量。现在您可以直接将其馈送到具有 12 个单位的 Dense 层:
# make sure the labels have are in shape (num_samples, 12)
y = np.reshape(y, (-1, 12))
power_in = Input(shape=(X.shape[1:],))
power_lstm = LSTM(50, recurrent_dropout=0.4128,
dropout=0.412563,
kernel_initializer=power_lstm_init)(power_in)
main_out = Dense(12, kernel_initializer=power_lstm_init)(power_lstm)
或者,如果您想使用TimeDistributed 层并考虑到输出本身就是一个序列,我们可以通过在 Dense 层之前使用另一个 LSTM 层在我们的模型中显式地强制执行这种时间依赖性(添加在第一个 LSTM 层之后的 RepeatVector 层,使其输出长度为 12 的时序序列,即与输出时序长度相同):
# make sure the labels have are in shape (num_samples, 12, 1)
y = np.reshape(y, (-1, 12, 1))
power_in = Input(shape=(48,1))
power_lstm = LSTM(50, recurrent_dropout=0.4128,
dropout=0.412563,
kernel_initializer=power_lstm_init)(power_in)
rep = RepeatVector(12)(power_lstm)
out_lstm = LSTM(32, return_sequences=True)(rep)
main_out = TimeDistributed(Dense(1))(out_lstm)
model = Model(power_in, main_out)
model.summary()
模型总结:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_3 (InputLayer) (None, 48, 1) 0
_________________________________________________________________
lstm_3 (LSTM) (None, 50) 10400
_________________________________________________________________
repeat_vector_2 (RepeatVecto (None, 12, 50) 0
_________________________________________________________________
lstm_4 (LSTM) (None, 12, 32) 10624
_________________________________________________________________
time_distributed_1 (TimeDist (None, 12, 1) 33
=================================================================
Total params: 21,057
Trainable params: 21,057
Non-trainable params: 0
_________________________________________________________________
当然,在这两个模型中,您可能需要调整超参数(例如 LSTM 层数、LSTM 层的维度等),以便能够准确地比较它们并获得良好的结果。
旁注:实际上,在您的场景中,您根本不需要使用TimeDistributed 层,因为(当前)Dense layer is applied on the last axis。因此,TimeDistributed(Dense(...)) 和 Dense(...) 是等价的。