【问题标题】:TypeError: Inputs to a layer should be tensors. Got: last_hidden_stateTypeError:层的输入应该是张量。得到:last_hidden_​​state
【发布时间】:2021-12-30 00:20:04
【问题描述】:

我一直在尝试使用 BERT 训练句子相似度模型,但遇到了这个错误。我到处搜索,但找不到解决方案,有人可以帮我解决这个问题吗?附上代码供您参考。

 # Create the model under a distribution strategy scope.
strategy = tf.distribute.MirroredStrategy()

with strategy.scope():
    # Encoded token ids from BERT tokenizer.
    input_ids = tf.keras.layers.Input(
        shape=(max_length,), dtype=tf.int32, name="input_ids"
    )
    # Attention masks indicates to the model which tokens should be attended to.
    attention_masks = tf.keras.layers.Input(
        shape=(max_length,), dtype=tf.int32, name="attention_masks"
    )
    # Token type ids are binary masks identifying different sequences in the model.
    token_type_ids = tf.keras.layers.Input(
        shape=(max_length,), dtype=tf.int32, name="token_type_ids"
    )
    # Loading pretrained BERT model.
    bert_model = transformers.TFBertModel.from_pretrained("bert-base-uncased")
    # Freeze the BERT model to reuse the pretrained features without modifying them.
    bert_model.trainable = False

    sequence_output, pooled_output = bert_model(
        input_ids, attention_mask=attention_masks, token_type_ids=token_type_ids
    )
    # Add trainable layers on top of frozen layers to adapt the pretrained features on the new data.
    bi_lstm = tf.keras.layers.Bidirectional(
        tf.keras.layers.LSTM(64, return_sequences=True)
    )(sequence_output)
    # Applying hybrid pooling approach to bi_lstm sequence output.
    avg_pool = tf.keras.layers.GlobalAveragePooling1D()(bi_lstm)
    max_pool = tf.keras.layers.GlobalMaxPooling1D()(bi_lstm)
    concat = tf.keras.layers.concatenate([avg_pool, max_pool])
    dropout = tf.keras.layers.Dropout(0.3)(concat)
    output = tf.keras.layers.Dense(3, activation="softmax")(dropout)
    model = tf.keras.models.Model(
        inputs=[input_ids, attention_masks, token_type_ids], outputs=output
    )

    model.compile(
        optimizer=tf.keras.optimizers.Adam(),
        loss="categorical_crossentropy",
        metrics=["acc"],
    )


print(f"Strategy: {strategy}")
model.summary()

【问题讨论】:

    标签: python tensorflow huggingface-transformers bert-language-model


    【解决方案1】:

    您必须显式访问 Bert 模型输出的 last_hidden_state 属性:

    with strategy.scope():
        # Encoded token ids from BERT tokenizer.
        input_ids = tf.keras.layers.Input(
            shape=(max_length,), dtype=tf.int32, name="input_ids"
        )
        # Attention masks indicates to the model which tokens should be attended to.
        attention_masks = tf.keras.layers.Input(
            shape=(max_length,), dtype=tf.int32, name="attention_masks"
        )
        # Token type ids are binary masks identifying different sequences in the model.
        token_type_ids = tf.keras.layers.Input(
            shape=(max_length,), dtype=tf.int32, name="token_type_ids"
        )
        # Loading pretrained BERT model.
        bert_model = transformers.TFBertModel.from_pretrained("bert-base-uncased")
        # Freeze the BERT model to reuse the pretrained features without modifying them.
        bert_model.trainable = False
    
        output = bert_model(
            input_ids, attention_mask=attention_masks, token_type_ids=token_type_ids)
    
        bi_lstm = tf.keras.layers.Bidirectional(
            tf.keras.layers.LSTM(64, return_sequences=True)
        )(output['last_hidden_state'])
        # Applying hybrid pooling approach to bi_lstm sequence output.
        avg_pool = tf.keras.layers.GlobalAveragePooling1D()(bi_lstm)
        max_pool = tf.keras.layers.GlobalMaxPooling1D()(bi_lstm)
        concat = tf.keras.layers.concatenate([avg_pool, max_pool])
        dropout = tf.keras.layers.Dropout(0.3)(concat)
        output = tf.keras.layers.Dense(3, activation="softmax")(dropout)
        model = tf.keras.models.Model(
            inputs=[input_ids, attention_masks, token_type_ids], outputs=output
        )
    
        model.compile(
            optimizer=tf.keras.optimizers.Adam(),
            loss="categorical_crossentropy",
            metrics=["acc"],
        )
    
    print(f"Strategy: {strategy}")
    model.summary()
    

    如果您想使用所有隐藏状态而不仅仅是最后一个,请尝试以下操作。请注意,您必须将 BertConfig 的 output_hidden_states 参数设置为 True 并将此配置传递给 Bert 模型。输出是一个包含 13 个隐藏状态的列表,这就是为什么它们在被传递到 LSTM 层之前被连接起来。

    import tensorflow as tf
    from transformers import BertTokenizer, BertConfig
    import transformers
    
    # Loading pretrained BERT model.
    config = BertConfig.from_pretrained("bert-base-uncased", output_hidden_states=True)
    bert_model = transformers.TFBertModel.from_pretrained("bert-base-uncased", config = config)
    # Freeze the BERT model to reuse the pretrained features without modifying them.
    bert_model.trainable = False
    
    output = bert_model(
        input_ids, attention_mask=attention_masks, token_type_ids=token_type_ids)
    
    all_hidden_states = tf.concat(output['hidden_states'], axis=1)
    bi_lstm = tf.keras.layers.Bidirectional(
        tf.keras.layers.LSTM(64, return_sequences=True)
    )(all_hidden_states)
    

    【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2022-06-13
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-06-13
    • 1970-01-01
    • 2021-11-04
    相关资源
    最近更新 更多