【问题标题】:Evaluation of Loss function returns error in LSTM model损失函数的评估在 LSTM 模型中返回错误
【发布时间】:2021-04-04 17:03:33
【问题描述】:

我正在尝试使用 tensorflow keras.Sequential 库为文本生成拟合 LSTM 模型和预训练嵌入。我有以下评估错误:

tensorflow.python.framework.errors_impl.InvalidArgumentError:  assertion failed: [Condition x == y did not hold element-wise:] [x (sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/Shape_1:0) = ] [5 199] [y (sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/strided_slice:0) = ] [200 199]
     [[node sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/assert_equal_1/Assert/Assert (defined at <input>:161) ]] [Op:__inference_train_function_4885]

我的模型如下:

def build_model(vocab_size, embedding_dim, rnn_units, batch_size, embedding_matrix):
    model = tf.keras.Sequential([
        #vocab_size = 30000, embedding_dim = 300, batch_size=64, embedding_matrix.shape = (30000, 300) 
        tf.keras.layers.Embedding(vocab_size, embedding_dim, weights=[embedding_matrix], trainable=False, batch_input_shape=[max_len, None]),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.LSTM(rnn_units,
                        return_sequences=True,
                        stateful=True,
                        recurrent_initializer='glorot_uniform'),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.LSTM(rnn_units,
                        return_sequences=True,
                        stateful=True,
                        recurrent_initializer='glorot_uniform'),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(vocab_size)
    ])
    return model


model = build_model(
    vocab_size=len(vocab),
    embedding_dim=embedding_dim,
    rnn_units=rnn_units,
    batch_size=batch_size,
    embedding_matrix=embedding_matrix
)

optimizer = tf.keras.optimizers.Adam()
model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy')
patience = 10
early_stop = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=patience)
checkpoint_dir = './checkpoints'+ datetime.datetime.now().strftime("_%Y.%m.%d-%H:%M:%S")
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")
checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix,
    save_weights_only=True
)

history = model.fit(text_ds, epochs=epochs, callbacks=[checkpoint_callback, early_stop], validation_data=text_ds)

查看其他类似问题后,问题似乎与输入形状和输出形状有关。尽管如此,我还是不明白出了什么问题。

模型总结为:

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding (Embedding)        (200, None, 300)          9000000   
_________________________________________________________________
dropout (Dropout)            (200, None, 300)          0         
_________________________________________________________________
lstm (LSTM)                  (200, None, 1024)         5427200   
_________________________________________________________________
dropout_1 (Dropout)          (200, None, 1024)         0         
_________________________________________________________________
lstm_1 (LSTM)                (200, None, 1024)         8392704   
_________________________________________________________________
dropout_2 (Dropout)          (200, None, 1024)         0         
_________________________________________________________________
dense (Dense)                (200, None, 30000)        30750000  
=================================================================
Total params: 53,569,904
Trainable params: 44,569,904
Non-trainable params: 9,000,000
_________________________________________________________________

输入和输出的形状如下:

Output: 
(200, None, 300)
(200, None, 300)
(200, None, 1024)
(200, None, 1024)
(200, None, 1024)
(200, None, 1024)
(200, None, 30000)

Input: 
(200, None)
(200, None, 300)
(200, None, 300)
(200, None, 1024)
(200, None, 1024)
(200, None, 1024)
(200, None, 1024)

编辑:

通过将return_sequences=False 放入最后一个 LSTM,我得到:

tensorflow.python.framework.errors_impl.InvalidArgumentError:  Incompatible shapes: [200,199,300] vs. [5,199,300]
     [[node sequential/dropout/dropout/Mul_1 (defined at <input>:161) ]] [Op:__inference_train_function_4801]

在这种情况下,模型摘要是:

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding (Embedding)        (200, None, 300)          9000000   
_________________________________________________________________
dropout (Dropout)            (200, None, 300)          0         
_________________________________________________________________
lstm (LSTM)                  (200, None, 1024)         5427200   
_________________________________________________________________
dropout_1 (Dropout)          (200, None, 1024)         0         
_________________________________________________________________
lstm_1 (LSTM)                (200, 1024)               8392704   
_________________________________________________________________
dropout_2 (Dropout)          (200, 1024)               0         
_________________________________________________________________
dense (Dense)                (200, 30000)              30750000  
=================================================================
Total params: 53,569,904
Trainable params: 44,569,904
Non-trainable params: 9,000,000
_________________________________________________________________

有输入:

(200, None)
(200, None, 300)
(200, None, 300)
(200, None, 1024)
(200, None, 1024)
(200, 1024)
(200, 1024)

【问题讨论】:

  • 请添加model.summary()结果
  • @Andrey 感谢您的回复。问题已更新
  • @Joachim 我认为你需要 return_sequences=False 在最后一个 LSTM 单元中
  • @MarcoCerliani 放置最后一个 return_sequences=False 时出现的错误似乎密切相关,我得到:'tensorflow.python.framework.errors_impl.InvalidArgumentError:不兼容的形状:[200,199,300] vs. [5,199,300] [[node sequence/dropout/dropout/Mul_1 (defined at :161) ]] [Op:__inference_train_function_4801]' 注意 [200,199] 和 [5,199] 与原帖的错误一样。

标签: python tensorflow keras keras-layer word-embedding


【解决方案1】:

更改batch_input_shape 参数:

    tf.keras.layers.Embedding(vocab_size, embedding_dim, weights=[embedding_matrix], trainable=False, , batch_input_shape=[5, None]),

【讨论】:

  • 感谢您的回复。我已经尝试过了,仍然得到:'ValueError:如果 RNN 是有状态的,它需要知道它的批量大小。指定输入张量的批量大小: - 如果使用顺序模型,请通过将 batch_input_shape 参数传递给第一层来指定批量大小。 - 如果使用功能 API,请通过将 batch_shape 参数传递给您的输入层来指定批量大小。根据我之前的阅读,batch_size 需要传递到网络的第一层
  • 非常感谢您的提示和帮助。我仍然无法让它工作,我得到:'tensorflow.python.framework.errors_impl.InvalidArgumentError:logits 和标签必须具有相同的第一维,得到 logits 形状 [5,30000] 和标签形状 [995] [[节点sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits(定义在 :161)]] [Op:__inference_train_function_4844]' 仍然,这可能是另一条探索之路
猜你喜欢
  • 2019-04-28
  • 1970-01-01
  • 2021-11-19
  • 2018-03-10
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2022-09-27
相关资源
最近更新 更多