如何从正常的机器学习技术转变为交叉验证？答案

【问题标题】：How to change from normal machine learning technique to cross validation?如何从正常的机器学习技术转变为交叉验证？
【发布时间】：2020-07-16 18:04:21
【问题描述】：

from sklearn.svm import LinearSVC

from sklearn.feature_extraction.text import CountVectorizer

from sklearn.feature_extraction.text import TfidfTransformer

from sklearn.metrics import accuracy_score

X = data['Review']

y = data['Category']

tfidf = TfidfVectorizer(ngram_range=(1,1))

classifier = LinearSVC()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3)

clf =  Pipeline([
    ('tfidf', tfidf),
    ('clf', classifier)
])

clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)

print(classification_report(y_test, y_pred))


accuracy_score(y_test, y_pred)

这是训练模型和预测的代码。我需要知道我的模型性能。那我应该在哪里改成cross_val_score呢？

【问题讨论】：

以及如何使用classification_report(y_test, y_pred)打印每个交叉验证结果？？
1) 没有“正常”的机器学习技术 2) 请花一些时间学习如何正确格式化您的代码。 3) 不要使用 cmets 来更新问题 - 而是编辑和更新问题。
嗨，好的，这是我第一次这样做。对不起，谢谢。

标签： python machine-learning scikit-learn data-science cross-validation

【解决方案1】：

来自 sklearn documentation

使用交叉验证最简单的方法是在估计器和数据集上调用 cross_val_score 辅助函数。

你的情况是

from sklearn.model_selection import cross_val_score
scores = cross_val_score(clf, X_train, y_train, cv=5)
print(scores)

【讨论】：

是 clf.fix(x,y) 函数的 cross_val_score 替代品吗？
是的，您不再需要 fit 函数，因为 cross_val_score 适合您的模型使用来自数据的不同分区并使用剩余分区进行测试
好的，谢谢。我可以知道如何在每个交叉验证实验中打印每个分类报告吗？？

【解决方案2】：

使用这个：（这是我之前项目中的一个例子）

import numpy as np
from sklearn.model_selection import KFold, cross_val_score

kfolds = KFold(n_splits=5, shuffle=True, random_state=42)
def cv_f1(model, X, y):
  score = np.mean(cross_val_score(model, X, y,
                                scoring="f1",
                                cv=kfolds))
  return (score)


model = ....

score_f1 = cv_f1(model, X_train, y_train)

你可以有多个得分。你应该改变评分=“f1”。如果您想查看每个折叠的分数，只需删除 np.mean

【讨论】：