为什么 pROC Package 不同功能计算的 95%CI 值不同？答案

【问题标题】：Why the 95%CI value calculated by different functions of pROC Package were different？为什么 pROC Package 不同功能计算的 95%CI 值不同？
【发布时间】：2020-11-23 11:33:31
【问题描述】：

我正在使用 pROC 包计算“最佳”阈值的特异性值和 95%CI，我的程序代码如下

data(aSAH)
myroc <- roc(aSAH$outcome, aSAH$s100b)
ci.thresholds(myroc, thresholds = "best")

95% CI (2000 stratified bootstrap replicates):
 thresholds sp.low sp.median sp.high se.low se.median se.high
      0.205 0.7083    0.8056  0.8889 0.4878    0.6341  0.7805

我通过函数 ci.coords 得到的值是：

ci.coords(myroc, x = "best", ret = c("specificity"))
95% CI (2000 stratified bootstrap replicates):
 threshold specificity.low specificity.median specificity.high
      best          0.6663             0.8194           0.9865

而通过函数 ci.thresholds 的值为：

ci.thresholds(myroc)
95% CI (2000 stratified bootstrap replicates):
 thresholds  sp.low sp.median sp.high se.low se.median se.high
       -Inf 0.00000    0.0000  0.0000 1.0000    1.0000  1.0000
      0.065 0.06944    0.1389  0.2222 0.9268    0.9756  1.0000
      0.075 0.12500    0.2222  0.3194 0.8049    0.9024  0.9756
      0.085 0.19440    0.3056  0.4167 0.7805    0.8780  0.9756
      0.095 0.27780    0.3889  0.5000 0.7073    0.8293  0.9268
      0.105 0.37500    0.4861  0.5972 0.6579    0.7805  0.9024
      0.115 0.43060    0.5417  0.6528 0.6098    0.7561  0.8780
      0.135 0.47220    0.5833  0.6944 0.5366    0.6829  0.8293
      0.155 0.58330    0.6944  0.7917 0.5122    0.6585  0.8049
      0.205 0.70830    0.8056  0.8889 0.4878    0.6341  0.7805
      0.245 0.72220    0.8194  0.9028 0.4390    0.5854  0.7317
      0.290 0.75000    0.8333  0.9167 0.3659    0.5122  0.6585
      0.325 0.76390    0.8472  0.9306 0.3171    0.4634  0.6098
      0.345 0.79170    0.8750  0.9444 0.2927    0.4390  0.5854
      0.395 0.81910    0.8889  0.9583 0.2683    0.4146  0.5610
      0.435 0.83330    0.9028  0.9583 0.2439    0.3902  0.5366
      0.475 0.90280    0.9583  1.0000 0.1951    0.3415  0.4878
      0.485 0.93060    0.9722  1.0000 0.1707    0.3171  0.4634
      0.510 1.00000    1.0000  1.0000 0.1707    0.2927  0.4390

thresholds为0.205时，specificity的值为0.8056(ci.thresholds(myroc, thresholds = "best"))，但通过ci.coords(myroc, x = "best", ret = c( “特异性”)) 为 0.8194，此时阈值为 0.245。为什么不同函数得到的阈值不一样？

然后，ci.coords(myroc, x = "best", ret = c("specificity")) 得到的特异性值为0.8194，95%CI为0.6806-0.9861，但通过ci.thresholds(myroc) 为 0.8194， 95%CI: 0.7222-0.9028。

更新：

> coords(myroc, x = "best", ret="all", transpose = FALSE)
          threshold specificity sensitivity  accuracy tn tp fn fp       npv  ppv  fdr       fpr       tpr       tnr
threshold     0.205   0.8055556   0.6341463 0.7433628 58 26 15 14 0.7945205 0.65 0.35 0.1944444 0.6341463 0.8055556
                fnr 1-specificity 1-sensitivity 1-accuracy     1-npv 1-ppv precision    recall   youden
threshold 0.3658537     0.1944444     0.3658537  0.2566372 0.2054795  0.35      0.65 0.6341463 1.439702
          closest.topleft
threshold       0.1716575



> ci.coords(myroc, x = "best", ret = "all", transpose = TRUE)
95% CI (2000 stratified bootstrap replicates):
     threshold threshold.low threshold.median threshold.high specificity.low specificity.median specificity.high
best      best          0.12            0.205           0.51          0.6663             0.8194                1
     sensitivity.low sensitivity.median sensitivity.high accuracy.low accuracy.median accuracy.high tn.low tn.median
best          0.3902             0.6341           0.8049       0.6637          0.7522         0.823  47.98        59
     tn.high tp.low tp.median tp.high fn.low fn.median fn.high fp.low fp.median fp.high npv.low npv.median npv.high
best      72     16        26      33      8        15      25      0        13   24.02  0.7273     0.7973   0.8732
     ppv.low ppv.median ppv.high fdr.low fdr.median fdr.high fpr.low fpr.median fpr.high tpr.low tpr.median tpr.high
best  0.5366     0.6667        1       0     0.3333   0.4634       0     0.1806   0.3337  0.3902     0.6341   0.8049
     tnr.low tnr.median tnr.high fnr.low fnr.median fnr.high 1-specificity.low 1-specificity.median 1-specificity.high
best  0.6663     0.8194        1  0.1951     0.3659   0.6098                 0               0.1806             0.3337
     1-sensitivity.low 1-sensitivity.median 1-sensitivity.high 1-accuracy.low 1-accuracy.median 1-accuracy.high
best            0.1951               0.3659             0.6098          0.177            0.2478          0.3363
     1-npv.low 1-npv.median 1-npv.high 1-ppv.low 1-ppv.median 1-ppv.high precision.low precision.median precision.high
best    0.1268       0.2027     0.2727         0       0.3333     0.4634        0.5366           0.6667              1
     recall.low recall.median recall.high youden.low youden.median youden.high closest.topleft.low
best     0.3902        0.6341      0.8049      1.279         1.447        1.61             0.08148
     closest.topleft.median closest.topleft.high
best                 0.1717               0.4021

coords 和 ci.coords 的特异性分别为 0.8055556 和 0.8194，上面还有一些其他不同的结果。

【问题讨论】：

ci.coords 和 ci.thresholds 运行 2000 个分层 bootstrap 复制，您应该期望不同的值，因为 bootstrap 是基于 RNG 的，重采样方法。您应该set.seed() 以获得可重复的结果。
感谢您的回答，我改了程序，设置了set.seed()，但是不同函数得到的一些值还是不一样的（代码和结果在我的更新帖里）。
请参阅?pROC::ci.thresholds 的详细信息部分的第 3 段。这些函数不计算（完全）相同的东西，它们的结果应该不同（但不是很不同）。
我注意到 ci.thresholds 计算参数中给出的阈值的灵敏度和特异性的置信区间 (CI)，ci.coords 计算 a 坐标的置信区间 (CI) ROC曲线用coords函数，coords返回ROC曲线在指定点的坐标。但是我想知道为什么不同的方法计算出的结果不同以获得最佳阈值的原因是什么？为什么coords 和ci.coords 结果不同？最后我应该采用哪个函数（值）？

标签： r roc proc-r-package

【解决方案1】：

当你跑步时

ci.coords(myroc, x = "best" [...]

您正在有效地计算最佳阈值本身的置信区间。

在内部，pROC 重新采样数据，确定重新采样曲线上的最佳阈值，计算该阈值处的坐标，并重复 2000 次。这不同于将阈值设置为整个 ROC 曲线上的最佳点并在该给定阈值处重新采样。

如果您关注阈值置信区间，您可以看到这一点：

ci.coords(myroc, x = "best", ret = "all", transpose = TRUE)
95% CI (2000 stratified bootstrap replicates):
     threshold threshold.low threshold.median threshold.high [...]
best      best          0.12            0.205           0.51

看看“最佳”阈值如何在 0.205 左右变化，介于 0.12 和 0.51 之间？因此，所有坐标也将具有更宽的置信区间。

ci.thresholds 函数的行为不同，它使用我上面提到的第二个选项，在完整的 ROC 曲线上设置“最佳”阈值：

ci.thresholds(myroc, thresholds = "best")

95% CI (2000 stratified bootstrap replicates):
 thresholds 
      0.205

看看阈值周围没有置信区间是怎么回事？它是在重采样之前设置的。如果您将x 设置为数字阈值（恰好是完整 ROC 曲线上的最佳值，即此处的 0.205），您可以使用 ci.coords 获得相同的行为：

> ci.coords(myroc, x = 0.205)
95% CI (2000 stratified bootstrap replicates):
      threshold threshold.low threshold.median threshold.high specificity.low specificity.median specificity.high sensitivity.low sensitivity.median sensitivity.high
0.205     0.205         0.205            0.205          0.205          0.7083             0.8056           0.8889          0.4878             0.6341           0.7805

可以看到阈值没有重新采样（置信区间不围绕 0.205 值变化），置信区间与使用ci.thresholds 获得的相似。

我意识到这可以在 ?ci.coords 中得到更好的记录，并将在未来的版本中实现这一点。

【讨论】：

感谢您的回答，我想我了解这些差异的原因，并希望看到新版本的 pROC 软件包:)