使用 Scikit-Learn 的 GridSearchCV 捕获所有排列的精度、召回率和 f1？答案

【问题标题】：Use Scikit-Learn's GridSearchCV to capture precision, recall, and f1 for all permutations?使用 Scikit-Learn 的 GridSearchCV 捕获所有排列的精度、召回率和 f1？
【发布时间】：2021-10-27 18:33:48
【问题描述】：

我想使用 Scikit-Learn 的 GridSearchCV 运行一堆实验，然后打印出每个实验的召回率、精度和 f1。

这篇文章（https://scikit-learn.org/stable/auto_examples/model_selection/plot_grid_search_digits.html）建议我需要多次运行.fit和.predict。

...
scores = ['precision', 'recall']
...
for score in scores:
    ...
    clf = GridSearchCV(
        SVC(), tuned_parameters, scoring='%s_macro' % score
    )
    clf.fit(X_train, y_train) # running for each scoring metric
    ...
    for mean, std, params in zip(means, stds, clf.cv_results_['params']):
        print("%0.3f (+/-%0.03f) for %r"
              % (mean, std * 2, params))
    ...
    y_true, y_pred = y_test, clf.predict(X_test) # running for each scoring metric
    print(classification_report(y_true, y_pred))

我想只运行一次.fit 并记录所有召回率、精度和 f1 指标。例如，类似于以下内容：

clf = GridSearchCV(
    SVC(), tuned_parameters, scoring=['recall', 'precision', 'f1'] # I don't think this syntax is even possible
)

clf.fit(X_train, y_train)

for metric in clf.something_that_i_cannot_find:
    ### does something like this exist?
    print(metric['precision']
    print(metric['recall'])
    print(metric['f1'])
    ###:end does something like this exist?

甚至可能：

...
for run in clf.something_that_i_cannot_find:
    ### does something like this exist?
    print(classification_report(run.y_true, run.y_pred))
    ###:end does something like this exist?

这篇文章 (Scoring in Gridsearch CV) 建议 GridSearchCV 可以识别多个评分者，但我仍然不知道如何访问所有实验的每个分数。

GridSearchCV 不支持我正在寻找的内容吗？文章中使用的方法（即多次运行.fit 和.predict）是完成类似于我要求的事情的最简单方法吗？

感谢您的宝贵时间？？？？

【问题讨论】：

您将不得不手动执行此操作，这将使用 scikit learn 中的折叠和循环参数需要大量代码，我建议设置随机状态并运行网格搜索 3 次。
感谢您的建议。我会采取这种方法。如果您想输入您的评论作为答案，我会接受它以关闭此循环。

标签： python scikit-learn metrics grid-search gridsearchcv

【解决方案1】：

您可以对二元分类进行多指标评估。我在尝试在iris dataset 上实现时遇到了 ValueError: Multi-class not supported。

我已经在下面的基本二进制数据上实现了，我正在计算四个不同的分数，

['AUC', 'F1', 'Precision', 'Recall']

注意：我们的想法不是从模型中进行推理，而只是展示多指标评估的工作原理。数据只是随机数据。

X, y = datasets.make_classification(n_classes=2, random_state=0)

# The scorers can be either one of the predefined metric strings or a scorer
# callable, like the one returned by make_scorer
f1_scorer = make_scorer(f1_score, average='binary')
scoring = {'AUC': 'roc_auc', 'F1': 'f1_micro', 'Precision': 'precision', 'Recall':'recall'}

# split data to train and test data
X_train, X_test, y_train, y_test =  train_test_split(X, y, test_size=0.2)

clf = GridSearchCV(
              SVC(),
              param_grid={'kernel': ['linear'], 'C': [1, 10, 100, 1000]},
              scoring=scoring,
              refit='AUC',
              return_train_score=True
               )
clf.fit(X_train, y_train)
results = clf.cv_results_


**Plotting the result**

plt.figure(figsize=(10, 10))
plt.title("GridSearchCV evaluating using multiple scorers simultaneously",
      fontsize=16)

plt.xlabel("min_samples_split")
plt.ylabel("Score")

ax = plt.gca()
ax.set_xlim(1, 1000)
ax.set_ylim(0.40, 1)

# Get the regular numpy array from the MaskedArray
X_axis = np.array(results['param_C'].data, dtype=float)

for scorer, color in zip(sorted(scoring), ['g', 'k', 'b', 'r']):
    for sample, style in (('train', '--'), ('test', '-')):
       sample_score_mean = results['mean_%s_%s' % (sample, scorer)]
       sample_score_std = results['std_%s_%s' % (sample, scorer)]
       ax.fill_between(X_axis, sample_score_mean - sample_score_std,
                    sample_score_mean + sample_score_std,
                    alpha=0.1 if sample == 'test' else 0, color=color)
       ax.plot(X_axis, sample_score_mean, style, color=color,
            alpha=1 if sample == 'test' else 0.7,
            label="%s (%s)" % (scorer, sample))

    best_index = np.nonzero(results['rank_test_%s' % scorer] == 1)[0][0]
    best_score = results['mean_test_%s' % scorer][best_index]

    # Plot a dotted vertical line at the best score for that scorer marked by x
    ax.plot([X_axis[best_index], ] * 2, [0, best_score],
        linestyle='-.', color=color, marker='x', markeredgewidth=3, ms=8)

    # Annotate the best score for that scorer
    ax.annotate("%0.2f" % best_score,
            (X_axis[best_index], best_score + 0.005))

plt.legend(loc="best")
plt.grid(False)
plt.show()

输出图

【讨论】：

感谢您提供示例代码。我会在星期一试一试。周末愉快?
我正在考虑/查看代码以准备星期一，并且很好奇我是否正确地遵循了您的代码。您能否确认以下内容是否属实？图中的test 案例显示了交叉验证中的test 结果，因此是训练数据的一个子集，例如X_train 和 X_test。图中的test 案例未显示将模型应用于来自train_test_split 的测试数据的结果，例如y_train 和 y_test。我的想法是正确的还是我错过了示例代码中的某些内容？
再次感谢您的帮助和指导?
@zhao Li...是的，你是对的。 test cases 不是实际的 X_test 和 y_test。图中的test cases 是交叉验证逻辑创建的validation data 拆分。对不起，如果它对情节中的test 案例造成了混淆。
很高兴为您提供帮助！....欢迎您！...如果对您有用，请点赞。

【解决方案2】：

您将不得不手动执行此操作，这将需要大量代码来使用 sklearn 循环折叠以及参数的另一个多个循环。我建议为折叠策略、网格搜索和模型设置随机状态，并为每个指标运行网格搜索 3 次。

【讨论】：