【发布时间】:2021-11-10 18:31:22
【问题描述】:
我正在使用变压器。将 BERT 嵌入到我的输入的管道。在没有管道的情况下使用它我能够获得恒定的输出,但不能使用管道,因为我无法将参数传递给它。
如何为我的管道传递与转换器相关的参数?
# These are BERT and tokenizer definitions
tokenizer = AutoTokenizer.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
model = AutoModel.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
inputs = ['hello world']
# Normally I would do something like this to initialize the tokenizer and get the result with constant output
tokens = tokenizer(inputs,padding='max_length', truncation=True, max_length = 500, return_tensors="pt")
model(**tokens)[0].detach().numpy().shape
# using the pipeline
pipeline("feature-extraction", model=model, tokenizer=tokenizer, device=0)
# or other option
tokenizer = AutoTokenizer.from_pretrained("emilyalsentzer/Bio_ClinicalBERT",padding='max_length', truncation=True, max_length = 500, return_tensors="pt")
model = AutoModel.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
nlp=pipeline("feature-extraction", model=model, tokenizer=tokenizer, device=0)
# to call the pipeline
nlp("hello world")
我已经尝试了几种方法,例如上面列出的选项,但无法获得恒定输出大小的结果。可以通过设置标记器参数来实现恒定的输出大小,但不知道如何为管道提供参数。
有什么想法吗?
【问题讨论】:
-
能否添加
inputs的示例?恒定输出是什么意思? -
更新了问题
标签: pytorch huggingface-transformers bert-language-model transformer huggingface-tokenizers