【发布时间】:2019-02-15 03:12:23
【问题描述】:
我正在尝试在两个文件之间进行评分。两者具有相同的数据但不同的标签。来自训练数据的标签是正确的,来自测试数据的标签不一定......我想知道准确率、召回率和 f 分数。
import pandas
import numpy as np
import pandas as pd
from sklearn import metrics
from sklearn import cross_validation
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import precision_recall_fscore_support as score
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import precision_score, recall_score, confusion_matrix, classification_report, accuracy_score, f1_score
df_train = pd.read_csv('train.csv', sep = ',')
df_test = pd.read_csv('teste.csv', sep = ',')
vec_train = TfidfVectorizer()
X_train = vec_train.fit_transform(df_train['text'])
y_train = df_train['label']
vec_test = TfidfVectorizer()
X_test = vec_test.fit_transform(df_train['text'])
y_test = df_test['label']
clf = LogisticRegression(penalty='l2', multi_class = 'multinomial',solver ='newton-cg')
y_pred = clf.predict(X_test)
print ("Accuracy on training set:")
print (clf.score(X_train, y_train))
print ("Accuracy on testing set:")
print (clf.score(X_test, y_test))
print ("Classification Report:")
print (metrics.classification_report(y_test, y_pred))
一个愚蠢的数据示例:
TRAIN
text,label
dogs are cool,animal
flowers are beautifil,plants
pen is mine,objet
beyonce is an artist,person
TEST
text,label
dogs are cool,objet
flowers are beautifil,plants
pen is mine,person
beyonce is an artist,animal
错误:
Traceback(最近一次调用最后一次):
文件“accuracy.py”,第 30 行,在 y_pred = clf.predict(X_test)
文件“/usr/lib/python3/dist-packages/sklearn/linear_model/base.py”,第 324 行,在预测中 分数 = self.decision_function(X)
decision_function 中的文件“/usr/lib/python3/dist-packages/sklearn/linear_model/base.py”,第 298 行 "尚未" % {'name': type(self).name}) sklearn.exceptions.NotFittedError:此 LogisticRegression 实例尚未拟合
我只是想计算测试的准确率
【问题讨论】:
-
你根本没有适合你的模型!!!首先你应该使用
fit()函数。然后使用predict。并且您可以使用confusion_matrix来计算真假预测。
标签: python python-3.x machine-learning scikit-learn metrics