如何在变压器和火炬中使用句子 bert答案

【问题标题】：how to use sentence bert with transformers and torch如何在变压器和火炬中使用句子 bert
【发布时间】：2021-12-08 13:23:46
【问题描述】：

我想使用sentence_transformers
但由于政策限制，我无法安装 package sentence-transformers

不过我有变压器和手电筒包。

我去了这个page 并尝试运行以下代码

在此之前，我去了page并下载了所有文件

import os
path="/yz/sentence-transformers/multi-qa-mpnet-base-dot-v1/" #local path where I have stored files
os.listdir(path)

['.dominokeep',
 'config.json',
 'data_config.json',
 'modules.json',
 'sentence_bert_config.json',
 'special_tokens_map.json',
 'tokenizer_config.json',
 'train_script.py',
 'vocab.txt',
 'tokenizer.json',
 'config_sentence_transformers.json',
 'README.md',
 'gitattributes',
 '9e1e76b7a067f72e49c7f571cd8e811f7a1567bec49f17e5eaaea899e7bc2c9e']

我运行的代码是

from transformers import AutoTokenizer, AutoModel
import torch

# Load model from HuggingFace Hub

path="/yz/sentence-transformers/multi-qa-mpnet-base-dot-v1/"

"""tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/multi-qa-mpnet-base-dot-v1")
model = AutoModel.from_pretrained("sentence-transformers/multi-qa-mpnet-base-dot-v1")"""

tokenizer = AutoTokenizer.from_pretrained(path)
model = AutoModel.from_pretrained(path)

我得到的错误如下

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-18-bb33f7c519e0> in <module>
     32 model = AutoModel.from_pretrained("sentence-transformers/multi-qa-mpnet-base-dot-v1")"""
     33 
---> 34 tokenizer = AutoTokenizer.from_pretrained(path)
     35 model = AutoModel.from_pretrained(path)
     36 

/usr/local/anaconda/lib/python3.6/site-packages/transformers/models/auto/tokenization_auto.py in from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
    308         config = kwargs.pop("config", None)
    309         if not isinstance(config, PretrainedConfig):
--> 310             config = AutoConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)
    311 
    312         if "bert-base-japanese" in str(pretrained_model_name_or_path):

/usr/local/anaconda/lib/python3.6/site-packages/transformers/models/auto/configuration_auto.py in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
    342 
    343         if "model_type" in config_dict:
--> 344             config_class = CONFIG_MAPPING[config_dict["model_type"]]
    345             return config_class.from_dict(config_dict, **kwargs)
    346         else:

KeyError: 'mpnet'

我的问题：

我应该如何解决这个错误？
有没有办法对MiniLM-L6-H384-uncased使用相同的方法- .我想使用它，因为它似乎更快

=============================== 包版本如下 -

transformers - 4.0.0
torch - 1.4.0

【问题讨论】：

很快就会分享我的变形金刚版本。你能让 MiniLM-L6-H384-uncased 工作吗？
包版本是transformers - 4.0.0 and torch - 1.4.0...你用的是哪个版本的转换器？
MPnet 与转换器 4.1.0 一起添加。你能升级你的包吗？我没试过，但MiniLM-L6-H384-uncased 似乎是一个 BERT，你应该可以用 4.0.0 加载它。
你能试试MiniLM-L6-H384-uncased吗？遇到问题...我可能无法更新我的软件包，MiniLM-L6-H384-uncased 似乎是唯一的选择.. 我现在不记得了，但我想我能够让唯一的标记器为它工作.. .model = AutoModel.from_pretrained(path) 失败 :(.
您是对的，您会收到一条错误消息，因为 pytorch-model.bin 是使用新版本创建的：RuntimeError: version_ <= kMaxSupportedFileFormatVersion INTERNAL ASSERT FAILED at /pytorch/caffe2/serialize/inline_container.cc:132, please report a bug to PyTorch. Attempted to read a PyTorch file with version 3, but the maximum supported version for reading is 2. Your PyTorch installation may be too old. (init at /pytorch/caffe2/serialize/inline_container.cc:132)。也许你可以看看是否有人创建了转换脚本。

标签： nlp huggingface-transformers transformer sentence-similarity sentence-transformers

【解决方案1】：

答案很简单，pytorch 1.4.0 不能使用“MiniLM-L6-H384-uncased”模型

print(torch.__version__)
# 1.4.0

torch.load("/content/MiniLM-L6-H384-uncased/pytorch_model.bin", location="cpu")

"""RuntimeError: version_ <= kMaxSupportedFileFormatVersion INTERNAL ASSERT FAILED 
at /pytorch/caffe2/serialize/inline_container.cc:132, please report a bug to 
PyTorch. Attempted to read a PyTorch file with version 3, but the maximum 
supported version for reading is 2. Your PyTorch installation may be too old. 
(init at /pytorch/caffe2/serialize/inline_container.cc:132)"""

【讨论】：