roc_auc_score() 和 auc() 的结果不同答案

【问题标题】：Different result with roc_auc_score() and auc()roc_auc_score() 和 auc() 的结果不同
【发布时间】：2015-09-18 11:35:09
【问题描述】：

我很难理解 scikit-learn 中 roc_auc_score() 和 auc() 之间的区别（如果有的话）。

我想预测具有不平衡类的二进制输出（Y=1 约为 1.5%）。

分类器

model_logit = LogisticRegression(class_weight='auto')
model_logit.fit(X_train_ridge, Y_train)

Roc 曲线

false_positive_rate, true_positive_rate, thresholds = roc_curve(Y_test, clf.predict_proba(xtest)[:,1])

AUC 的

auc(false_positive_rate, true_positive_rate)
Out[490]: 0.82338034042531527

和

roc_auc_score(Y_test, clf.predict(xtest))
Out[493]: 0.75944737191205602

有人可以解释这种差异吗？我认为两者都只是计算 ROC 曲线下的面积。可能是因为数据集不平衡，但我不知道为什么。

谢谢！

【问题讨论】：

标签： python machine-learning scikit-learn

【解决方案1】：

predict 只返回一个类或其他类。然后，您在分类器上使用predict 的结果计算 ROC，只有三个阈值（试验所有一个类，平凡所有其他类，以及介于两者之间）。您的 ROC 曲线如下所示：

      ..............................
      |
      |
      |
......|
|
|
|
|
|
|
|
|
|
|
|

同时，predict_proba() 返回整个概率范围，因此现在您可以对数据设置三个以上的阈值。

             .......................
             |
             |
             |
          ...|
          |
          |
     .....|
     |
     |
 ....|
.|
|
|
|
|

因此不同的区域。

【讨论】：

【解决方案2】：

当您使用 y_pred（类标签）时，您已经决定门槛。当你使用 y_prob （正类概率）您对阈值持开放态度，ROC 曲线应该会有所帮助阈值由您决定。

对于第一种情况，您使用的是概率：

y_probs = clf.predict_proba(xtest)[:,1]
fp_rate, tp_rate, thresholds = roc_curve(y_true, y_probs)
auc(fp_rate, tp_rate)

当您这样做时，您正在考虑“服用前”的 AUC 决定您将使用的阈值。

在第二种情况下，您使用的是预测（而不是概率），在这种情况下，请使用 'predict' 而不是 'predict_proba' 应该得到相同的结果。

y_pred = clf.predict(xtest)
fp_rate, tp_rate, thresholds = roc_curve(y_true, y_pred)
print auc(fp_rate, tp_rate)
# 0.857142857143

print roc_auc_score(y, y_pred)
# 0.857142857143

【讨论】：

在使用predict 而不是predict_proba 的情况下，正如您所说，最终选择了一个特定的阈值.. roc_auc_score 的计算将如何？有什么想法吗？
@Ophilia，仅来自文档scikit-learn.org/stable/modules/generated/…roc_auc_score(y_true, y_score...)，其中y_score - “目标分数，可以是正类的概率估计、置信度值或非-阈值 衡量决策“。所以它与用predict_proba()计算AUC是一样的

【解决方案3】：

AUC 并不总是 ROC 曲线下的面积。曲线下面积是some曲线下的（抽象）面积，所以它比AUROC更笼统。对于不平衡的类，找到精确召回曲线的 AUC 可能会更好。

查看roc_auc_score的sklearn源代码：

def roc_auc_score(y_true, y_score, average="macro", sample_weight=None):
    # <...> docstring <...>
    def _binary_roc_auc_score(y_true, y_score, sample_weight=None):
            # <...> bla-bla <...>

            fpr, tpr, tresholds = roc_curve(y_true, y_score,
                                            sample_weight=sample_weight)
            return auc(fpr, tpr, reorder=True)

    return _average_binary_score(
        _binary_roc_auc_score, y_true, y_score, average,
        sample_weight=sample_weight)

如你所见，这首先得到一个roc曲线，然后调用auc()得到面积。

我猜你的问题是predict_proba() 电话。对于普通的predict()，输出总是相同的：

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_curve, auc, roc_auc_score

est = LogisticRegression(class_weight='auto')
X = np.random.rand(10, 2)
y = np.random.randint(2, size=10)
est.fit(X, y)

false_positive_rate, true_positive_rate, thresholds = roc_curve(y, est.predict(X))
print auc(false_positive_rate, true_positive_rate)
# 0.857142857143
print roc_auc_score(y, est.predict(X))
# 0.857142857143

如果您为此更改上述内容，有时会得到不同的输出：

false_positive_rate, true_positive_rate, thresholds = roc_curve(y, est.predict_proba(X)[:,1])
# may differ
print auc(false_positive_rate, true_positive_rate)
print roc_auc_score(y, est.predict(X))

【讨论】：

感谢您指出precision-recall曲线的重要性，但在这种情况下曲线是ROC。问题是：为什么我得到两个不同的结果，因为这两种方法应该计算相同的面积？
他们为什么要这样做？这完全取决于您如何获得auc() 函数的输入。比如说，sklearn 建议fpr, tpr, thresholds = metrics.roc_curve(y, pred, pos_label=2); metrics.auc(fpr, tpr)，然后很自然auc() 和roc_auc_score() 返回相同的结果。但目前尚不清楚您是如何从您的帖子中获得false_positive_rate, true_positive_rate 的。
顺便说一下，我喜欢 ROC 曲线正是因为它对不平衡类不敏感（参见 (fastml.com/what-you-wanted-to-know-about-auc)
我的错，我复制了错误的代码行。现在已经修复了，谢谢指点！
你是对的。因为 est.predict(X) 输出一些二进制文件，所以使用 roc_auc_score(y, est.predict(X)) 是没有意义的。写roc_auc_score(y, est.predict_proba(X)[:,1]) 解决了这个问题。谢谢！