【发布时间】:2021-06-30 10:19:41
【问题描述】:
训练数据包含大约 20000 行标题:id, sentiment, text
我将情绪映射如下:
df.sentiment= df.sentiment.map({"Neutral": 1, "Negative":0, "Positive":2 })
在我拥有clean 和text pre-processing 之后,我使用Logistic Regression 如下:
XTR, XTST, YTR, YTST= train_test_split(df.text, df.sentiment, test_size =.2, random_state=100)
lg= LogisticRegression(max_iter=20000)
pp = make_pipeline(TfidfVectorizer(),lg)
pg= {'logisticregression__C': [0.01, 0.1, 1, 10, 100]}
m= GridSearchCV(pipe, pg, cv=5)
m.fit(XTR,YTR)
pr= m.predict(XTST)
print(f"Accuracy: {accuracy_score(YTST, pr):.2f}")
print(classification_report(YTST, pr))
Output 如下:
Accuracy 0.59
precision recall f1-score support
0 0.00 0.00 0.00 686
1 0.59 1.00 0.74 2374
2 0.00 0.00 0.00 940
accuracy 0.59 4000
macro avg 0.20 0.33 0.25 4000
weighted avg 0.35 0.59 0.44 4000
为什么 Negative: 0 和 Positive: 2 都得到 0.00 ?请帮忙
【问题讨论】:
标签: python-3.x pandas scikit-learn logistic-regression sentiment-analysis