【发布时间】:2021-12-16 04:17:11
【问题描述】:
我正在尝试对服务器上包含数百万条推文的数据集进行情绪分析。我正在调用一个 API 预测函数,该函数获取 100 条推文的列表并遍历每条推文的测试以返回拥抱脸情绪值,并将该情绪写入 solr 数据库。但是,经过几百条推文的处理后,我收到以下错误,有什么建议吗?
API 代码:
from transformers import pipeline
model = pipeline(task = 'sentiment-analysis',model="finiteautomata/bertweet-base-sentiment-analysis")
# huggingface sentiment analyser
def huggingface_sent(sentence):
text=preprocess(sentence)
if (len(text)>0):
predicted_dic = {'NEG': 'Negative','NEU':'Neutral', 'POS':'Positive'}
return predicted_dic[model(text)[0]['label']]
else:
return 'Neutral'
def predict_list(tweets):
print('Data Processing\n')
predictions={}
for t_id in tweets.keys():
if(tweets[t_id]['language']=='en'):
predictions[t_id] = huggingface_sent(str(tweets[t_id]['full_text']))
else:
predictions[t_id]='NoneEnglish'
print('processed ', len(tweets.keys()))
print('\n first element is ', predictions[t_id])
return predictions
print('Running analyser ....\n')
错误日志:
令牌索引序列长度大于指定的最大值 该模型的序列长度(211 > 128)。运行这个序列 通过模型会导致索引错误 [2021-11-01 12:24:20,649] 应用程序中的错误:/api/predict [POST] Traceback 上的异常 (最近一次通话最后):文件 “/myusername/anaconda3/lib/python3.8/site-packages/flask/app.py”,行 第2447章 response = self.full_dispatch_request() 文件“/myusername/anaconda3/lib/python3.8/site-packages/flask/app.py”,行 1952 年,在 full_dispatch_request 中 rv = self.handle_user_exception(e) 文件“/myusername/anaconda3/lib/python3.8/site-packages/flask/app.py”,行 1821,在句柄_用户_异常中 reraise(exc_type,exc_value,tb)文件“/myusername/anaconda3/lib/python3.8/site-packages/flask/_compat.py”, 第 39 行,在再加注中 提高价值文件“/myusername/anaconda3/lib/python3.8/site-packages/flask/app.py”,行 1950 年,在 full_dispatch_request 中 rv = self.dispatch_request() 文件“/myusername/anaconda3/lib/python3.8/site-packages/flask/app.py”,行 1936 年,在 dispatch_request 中 返回 self.view_functionsrule.endpoint 文件“/mnt/raid1/diil/sentiment_api/analyser_main.py”,第 11 行,在 api_predict_list 预测 = predict_list(tweets) 文件“/mnt/raid1/diil/sentiment_api/analysisr_core.py”,第 84 行,在 预测列表 预测[t_id] = huggingface_sent(str(tweets[t_id]['full_text'])) 文件 “/mnt/raid1/diil/sentiment_api/analyser_core.py”,第 70 行,在 拥抱脸_发送 如果模型(文本):文件“/myusername/anaconda3/lib/python3.8/site-packages/transformers/pipelines/text_classification.py”, 第 126 行,在 调用 return super().call(*args, **kwargs) File "/myusername/anaconda3/lib/python3.8/site-packages/transformers/pipelines/base.py", 第 915 行,在 调用 返回 self.run_single(inputs, preprocess_params, forward_params, postprocess_params) 文件 "/myusername/anaconda3/lib/python3.8/site-packages/transformers/pipelines/text_classification.py", 第 172 行,在 run_single 返回 [super().run_single(inputs, preprocess_params, forward_params, postprocess_params)] 文件 "/myusername/anaconda3/lib/python3.8/site-packages/transformers/pipelines/base.py", 第 922 行,在 run_single 中 model_outputs = self.forward(model_inputs, **forward_params) 文件 "/myusername/anaconda3/lib/python3.8/site-packages/transformers/pipelines/base.py", 第 871 行,向前 model_outputs = self._forward(model_inputs, **forward_params) 文件 "/myusername/anaconda3/lib/python3.8/site-packages/transformers/pipelines/text_classification.py", 第 133 行,在 _forward 返回 self.model(**model_inputs) 文件“/myusername/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py”, 第 1051 行,在 _call_impl 中 返回 forward_call(*input, **kwargs) 文件“/myusername/anaconda3/lib/python3.8/site-packages/transformers/models/roberta/modeling_roberta.py”, 第 1198 行,向前 输出= self.roberta(文件“/myusername/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py”, 第 1051 行,在 _call_impl 中 返回 forward_call(*input, **kwargs) 文件“/myusername/anaconda3/lib/python3.8/site-packages/transformers/models/roberta/modeling_roberta.py”, 第 841 行,向前 embedding_output = self.embeddings(文件“/myusername/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py”, 第 1051 行,在 _call_impl 中 返回 forward_call(*input, **kwargs) 文件“/myusername/anaconda3/lib/python3.8/site-packages/transformers/models/roberta/modeling_roberta.py”, 第 136 行,向前 position_embeddings = self.position_embeddings(position_ids) 文件 "/myusername/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", 第 1051 行,在 _call_impl 中 返回 forward_call(*input, **kwargs) 文件“/myusername/anaconda3/lib/python3.8/site-packages/tousername/anaconda3/lib/python3.8/site-packages/torch/nn/functional.py”, 第 2043 行,在嵌入中 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) IndexError: index out of range in selfusername/anaconda3/lib/python3.8/site-packages/torch/nn/functional.py", 第 2043 行,在嵌入中 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) IndexError: index out of range in self
【问题讨论】:
-
令牌索引序列长度大于此模式指定的最大序列长度可能意味着句子/文本太长?
标签: python pytorch sentiment-analysis huggingface-transformers bert-language-model