【发布时间】:2021-12-31 18:33:20
【问题描述】:
我有一个与 Keras 时间序列数据示例代码几乎完全相同的转换器模型。我将使用股票信息流程来练习通过转换器进行分类,以实现简单的 {0,1} 分离。这里的问题是我得到的只是损失nan,而没有任何准确性提高。请看我的模型:
def transformer_encoder(inputs, head_size, num_heads, ff_dim, dropout=0):
# Attention and Normalization
x = layers.MultiHeadAttention(
key_dim=head_size, num_heads=num_heads, dropout=dropout
)(inputs, inputs)
x = layers.Dropout(dropout)(x)
x = layers.LayerNormalization(epsilon=1e-6)(x)
res = x + inputs
# Feed Forward Part
x = layers.Conv1D(filters=ff_dim, kernel_size=1, activation="relu")(res)
x = layers.Dropout(dropout)(x)
x = layers.Conv1D(filters=inputs.shape[-1], kernel_size=1)(x)
x = layers.LayerNormalization(epsilon=1e-6)(x)
return x + res
def build_model(
input_shape,
head_size,
num_heads,
ff_dim,
num_transformer_blocks,
mlp_units,
dropout=0,
mlp_dropout=0,
n_classes=0,
):
inputs = keras.Input(shape=input_shape)
x = inputs
for _ in range(num_transformer_blocks):
x = transformer_encoder(x, head_size, num_heads, ff_dim, dropout)
x = layers.GlobalAveragePooling1D(data_format="channels_first")(x)
for dim in mlp_units:
x = layers.Dense(dim, activation="relu")(x)
x = layers.Dropout(mlp_dropout)(x)
outputs = layers.Dense(n_classes, activation="softmax")(x)
return keras.Model(inputs, outputs)
model = build_model(
training_input_data[0].shape,
head_size=10,
num_heads=4,
ff_dim=2,
num_transformer_blocks=8,
mlp_units=[128],
mlp_dropout=0.4,
dropout=0.25,
n_classes=num_class
)
下面显示了我如何编译和运行以适应模型:
model.compile(
loss="categorical_crossentropy",
optimizer=keras.optimizers.Adam(learning_rate=1e-5),
metrics=["binary_accuracy"],
)
callbacks = [keras.callbacks.EarlyStopping(patience=10, restore_best_weights=True)]
model.fit(
training_input_data,
training_output_data,
epochs=EOPCHS,
batch_size=32,
callbacks=callbacks,
validation_split=0.1
)
请注意,所有训练数据都按照以下示例进行了标准化(整个数据集的一小部分)。 “日期”列实际上是在将数据转换为 numpy ndarray 之前弹出的。 train data screenshot
训练的结果是这样的:
Epoch 1/200
1049/1049 [==============================] - 22s 17ms/step - loss: nan - binary_accuracy: 0.5000 - val_loss: nan - val_binary_accuracy: 0.5000
Epoch 2/200
1049/1049 [==============================] - 17s 16ms/step - loss: nan - binary_accuracy: 0.5000 - val_loss: nan - val_binary_accuracy: 0.5000
Epoch 3/200
1049/1049 [==============================] - 17s 16ms/step - loss: nan - binary_accuracy: 0.5000 - val_loss: nan - val_binary_accuracy: 0.5000
Epoch 4/200
1049/1049 [==============================] - 17s 16ms/step - loss: nan - binary_accuracy: 0.5000 - val_loss: nan - val_binary_accuracy: 0.5000
Epoch 5/200
1049/1049 [==============================] - 17s 16ms/step - loss: nan - binary_accuracy: 0.5000 - val_loss: nan - val_binary_accuracy: 0.5000
Epoch 6/200
59/1049 [>.............................] - ETA: 16s - loss: nan - binary_accuracy: 0.5000
似乎 categorical_crossentropy 损失应该可以用于简单的 0-1 分类,从 softmax 的最后一层输出。该模型似乎什么都没学到 ---- 对于 0-1 的工作,acc 一直停留在 0.5。
有什么想法吗?
【问题讨论】:
-
你的数据集有多大?
-
输入形状为:(37286, 14, 9) 是不是太小了不适合训练?
-
默认情况下我们不会放置 dropout 层(它们可能对训练有害),仅当我们怀疑过度拟合时;但这不是编程问题。
-
好的,将尝试删除编码器循环中的丢失。我也在尝试一些新的策略来获取训练数据,希望能得到更多。还有什么?
-
而且,您是否建议在多头注意力后移除 dropout 或前馈中的那些?或两者?反正我会试一试的。
标签: tensorflow machine-learning keras