Keras - 需要帮助实现 LSTM 以对非常简单的数据集进行预测答案

【问题标题】：Keras - Need help implementing an LSTM to make predictions on a very simple datasetKeras - 需要帮助实现 LSTM 以对非常简单的数据集进行预测
【发布时间】：2020-11-30 15:58:11
【问题描述】：

我正在创建一个隐马尔可夫模型和一个 LSTM 神经网络来对同一数据集进行预测，以比较两个不同模型的性能。我的 HMM 运行良好，但是当尝试使用相同的数据集训练我的 LSTM 时，我无法让我的网络学习任何东西。作为参考，这里有一个概括的图表，描述了我想要完成的事情：

LSTM Representational Diagram

为了实现 LSTM 神经网络，我遵循了this article，它使用小型 Keras 模型对具有多个输入的数据集进行预测，例如我的问题。然而，在实现了一个与教程中的非常相似的模型（代码如下）之后，我的准确率从未超过 40%。事实上，从 epoch 1 一直到我选择结束训练的任何 epoch，准确度总是完全相同的。由于某种原因，我的损失无论如何也超低，这让我认为准确度应该更高。由于损失和准确性不一致，我怀疑我的模型完全错误地表示我的数据或模型中的参数是完全错误的。

我的数据集非常基础，所以我觉得我错过了一些大的东西。我之前相当容易地创建了一个 CCN，我认为只要我遵循教程，制作 LSTM 就会很容易。如果我想创建一个非常基本的 LSTM 来进行非常基本的预测，我应该创建什么样的模型？使用分类分类和 LSTM 时应该使用什么损失函数？我能想到的最后一个具体问题，通常是什么导致准确性永远不会提高并且始终保持不变？

到目前为止，我对 LSTM 的实现有什么：

# Number of games to go back for next prediction.
TIME_STEPS = 1

# Gets the game data from the generated CSV file.
# Column 1 - Game Number
# Column 2 - Result
# Column 3 - My Rating
# Column 4 - Opponent's Rating
dataFile = 'ChessData.csv'
data = pd.read_csv(dataFile, index_col='Game Number')
df = data.copy()

# Splits the CSV file into training and validation data.
train_size = int(len(df) * 0.8)
train_dataset, test_dataset = df.iloc[:train_size], df.iloc[train_size:]

# Splits the data based on target/dependent variables.
# Also creates the X and y for supervised learning.
X_train = train_dataset.drop('Result', axis=1)
y_train = train_dataset.loc[:, ['Result']]

# Splits the test data for X and y and well.
X_test = test_dataset.drop('Result', axis=1)
y_test = test_dataset.loc[:, ['Result']]

# Different scaler for input and output
scaler_x = MinMaxScaler(feature_range = (0,1))
scaler_y = MinMaxScaler(feature_range = (0,1))
# Fit the scaler using available training data
input_scaler = scaler_x.fit(X_train)
output_scaler = scaler_y.fit(y_train)
# Apply the scaler to training data
y_train = output_scaler.transform(y_train)
X_train = input_scaler.transform(X_train)
# Apply the scaler to test data
y_test = output_scaler.transform(y_test)
X_test = input_scaler.transform(X_test)

# Create a 3D input
def create_dataset (X, y, time_steps = 1):
    Xs, ys = [], []
    for i in range(len(X)-time_steps):
        v = X[i:i+time_steps, :]
        Xs.append(v)
        ys.append(y[i+time_steps])
    return np.array(Xs), np.array(ys)

# Creates the 3D input by calling create_dataset for both
# the training data and the testing data.
X_test, y_test = create_dataset(X_test, y_test, TIME_STEPS)
X_train, y_train = create_dataset(X_train, y_train, TIME_STEPS)


# Defines the LSTM Model
def create_model(units, m):
    model = Sequential()
    model.add(m (units = units, return_sequences = True,
                input_shape = [X_train.shape[1], X_train.shape[2]]))
    model.add(Dropout(0.2))
    model.add(m (units = units))
    model.add(Dropout(0.2))
    model.add(Dense(units = 1))
    #Compile model
    model.compile(optimizer=keras.optimizers.Adam(0.001),
    loss="categorical_crossentropy",
    metrics=["accuracy"])
    return model

# Creates an LSTM model instance
model_lstm = create_model(128, LSTM)

# Fits the LSTM Model
def fit_model(model):
    early_stop = keras.callbacks.EarlyStopping(monitor = 'val_loss',
                                              patience = 10)
    history = model.fit(X_train, y_train, epochs = 100, 
                      validation_split = 0.2, batch_size = 32,
                        shuffle = False, callbacks = [early_stop])
    return history

history_lstm = fit_model(model_lstm)

# Make prediction
def prediction(model):
    prediction = model.predict(X_test)
    prediction = scaler_y.inverse_transform(prediction)
    return prediction

prediction_lstm = prediction(model_lstm)
print(prediction_lstm)

【问题讨论】：

标签： python tensorflow machine-learning keras lstm

【解决方案1】：

我在您的网络中发现了一个问题：您正在使用输出大小为 1 的分类交叉熵。我不知道你在预测什么，但如果这是一个二进制分类（例如 0 或 1），你应该使用 binary_crossentropy。如果是多分类问题，则应使用分类交叉熵，将最后一层大小设置为要预测的类数，并将标签单热编码。

单热编码，以4类为例，表示标签等于一个长度为4的数组，除了你设置的对应标签1的帽子之外，全零：

y1 = [0,0,1,0] #means third class 
y2 = [0,1,0,0] #means second class

【讨论】：

感谢您的回复！这绝对有帮助，因为我的问题是一个多分类问题，这就是我使用分类交叉熵的原因。如何更改我的 Y 以便每个 y 都是一个数组来表示 one-hot 编码？
你有多种选择，你可以自己做，使用 tf.one_hot tensorflow.org/api_docs/python/tf/one_hot 或者使用这个stackoverflow.com/questions/29831489/…。