LSTM Keras 整理 X 和 y 输入维度答案

【问题标题】：LSTM Keras sorting out the X and y input dimensionsLSTM Keras 整理 X 和 y 输入维度
【发布时间】：2020-05-05 00:01:32
【问题描述】：

我正在尝试构建 LSTM，但对塑造数据的最佳方式感到困惑。

我有一个如下所示的数据框：

df.head(5)


 data                                                     labels
0  [0.0009808844009380855, 0.0008974465127279559]             1
1  [0.0007158940267629654, 0.0008202958833774329]             3
2  [0.00040971929722210984, 0.000393972522972382]             3
3  [7.916243163372941e-05, 7.401835468434177e243]             3
4  [8.447556379936086e-05, 8.600626393842705e-05]             3

“数据”列是我的 X，标签是 y。 df 有 34890 行。每行包含 2 个浮点数。数据代表一堆连续的文本，每个观察都是一个句子的表示。有5个班。

我正在尝试用这些数据拟合 LSTM，但对如何使用 timestep 参数感到困惑。

使用此代码，我得到以下信息：

data = np.array(df.class_proba.to_list())

labels = pd.get_dummies(df['speaker_spaff']).values

print('Shape of data tensor:', data.shape)
print('Shape of label tensor:', labels.shape)

Shape of data tensor: (34890, 2)
Shape of label tensor: (34890, 5)

我认为我的标签张量是正确的，但我对我的数据张量感到困惑。

Keras LSTM 层需要形状：样本、时间步长和特征。

如果我理解正确的话，我的样本数是 34890，我的特征是 2，但时间戳呢？时间戳参数应该是什么？如何重塑我的数据以适应它？

【问题讨论】：

标签： python tensorflow keras deep-learning lstm

【解决方案1】：

如果需要多个时间步长，您必须创建一个滑动窗口函数来帮助您重塑数据，为此，来自 Keras 的 TimeSeriesGenerator 是一个很好的工具 (here a good example)

如果你认为你的数据必须有一个单一的时间步，你只需要简单地扩展维度：

data[:,None,:] ==> 新形状：(34890, 1, 2)，标签没问题

【讨论】：