sklearn roc_auc_score 的阈值是多少答案

【问题标题】：What is the threshold for the sklearn roc_auc_scoresklearn roc_auc_score 的阈值是多少
【发布时间】：2021-05-27 07:51:17
【问题描述】：

在我的分类问题中，我想检查我的模型是否表现良好，所以我做了一个 roc_auc_score 来找到准确度，得到的值是 0.9856825361839688

我的问题

这是我的代码

x,y=make_classification(n_samples=2000,n_classes=2,weights=[1,1],random_state=24)
x_train, x_test, y_train, y_test=train_test_split(x,y,test_size=0.3,random_state=43)


from sklearn.neighbors import KNeighborsClassifier
knn_classifier=KNeighborsClassifier()
knn_classifier.fit(x_train, y_train)
ytrain_pred = knn_classifier.predict_proba(x_train)
print('train roc-auc: {}'.format(roc_auc_score(y_train, ytrain_pred[:,1])))

火车 roc-auc：0.9856825361839688

现在我做一个 roc-auc 图来检查最好的分数

fpr_1, tpr_1, thresholds_1=roc_curve(y_train, ytrain_pred[:,1])
fig,ax=plt.subplots(1,1,figsize=(15,7))
g=sns.lineplot(x=fpr_1,y=tpr_1,ax=ax,color='green')
g.set_xlabel('False Positive Rate')
g.set_ylabel('True Positive Rate')
g.set(xlim=(0,0.8))

从图中我可以直观地看到 TPR 从 0.2(FPR) 开始处于最大值，所以从我得到的 roc_auc_score 来看，我是否应该认为该方法以 0.2 作为阈值

我明确计算了每个阈值的准确度分数

_result=pd.concat([pd.Series(thresholds_1),pd.Series(accuracy_ls)],axis=1)
_result.columns=['threshold','accuracy score']

那么，无论阈值是多少，我是否应该认为 roc_auc_score 给出最高分？

【问题讨论】：

那么对于二元分类，阈值是0.5吗？
哪个工作点（阈值）最好取决于您的应用程序。更糟糕的是：误报还是误报？
@couka，请查看我更新的问题
请查看更新后的答案

标签： python machine-learning scikit-learn roc

【解决方案1】：

方法roc_auc_score用于评估分类器。它告诉你 roc 曲线下的面积。 (https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html)

roc_auc_score == 1 - 理想的分类器。

对于评估数据集中两个类的样本数量相等的二进制分类：roc_auc_score == 0.5 - 随机分类器。

在这种方法中，我们不比较彼此之间的阈值。

哪个阈值更好，您应该自行决定，具体取决于您要解决的业务问题。准确率和召回率哪个更重要？

【讨论】：