【发布时间】:2021-08-23 06:02:36
【问题描述】:
我目前使用拥抱脸管道进行情绪分析,如下所示:
from transformers import pipeline
classifier = pipeline('sentiment-analysis', device=0)
问题是,当我传递大于 512 个标记的文本时,它会崩溃,说输入太长。有没有办法将 max_length 和 truncate 参数从标记器直接传递到管道?
我的工作是:
从转换器导入 AutoTokenizer、AutoModelForSequenceClassification
model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
classifier = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer, device=0)
然后当我调用标记器时:
pt_batch = tokenizer(text, padding=True, truncation=True, max_length=512, return_tensors="pt")
但是像这样直接调用管道会更好:
classifier(text, padding=True, truncation=True, max_length=512)
【问题讨论】:
标签: huggingface-transformers huggingface-tokenizers