【发布时间】:2019-04-07 19:15:56
【问题描述】:
我试图了解当模型中的 FPR 为 0.5 时如何计算真阳性率,然后生成 ROc 曲线。但我肯定在编码方面遇到了一些问题......
> library(nycflights13)
> late_arrival<- flights$arr_delay>50
> summary(late_arrival)
Mode FALSE TRUE NA's
logical 275847 51499 9430
> late_arrival.lr <- glm(late_arrival~carrier+dep_delay+month+year, data=flights, family='binomial')
警告信息:
glm.fit: fitted probabilities numerically 0 or 1 occurred
> summary(late_arrival.lr)
Call:
glm(formula = late_arrival ~ carrier + dep_delay + month + year,
family = "binomial", data = flights)
Deviance Residuals:
Min 1Q Median 3Q Max
-3.0972 -0.2445 -0.1920 -0.1570 3.9217
Coefficients: (1 not defined because of singularities)
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.9122786 0.0430834 -90.807 < 2e-16 ***
carrierAA 0.2174443 0.0485813 4.476 7.61e-06 ***
carrierAS -0.3549507 0.2540636 -1.397 0.16239
carrierB6 0.5142442 0.0428985 11.987 < 2e-16 ***
carrierDL 0.2228855 0.0449833 4.955 7.24e-07 ***
carrierEV 0.3230899 0.0431394 7.489 6.92e-14 ***
carrierF9 1.1544420 0.1444764 7.991 1.34e-15 ***
carrierFL 0.7190162 0.0812251 8.852 < 2e-16 ***
carrierHA -0.2276957 0.4115495 -0.553 0.58008
carrierMQ 0.8086500 0.0475393 17.010 < 2e-16 ***
carrierOO 1.0138755 0.9037621 1.122 0.26193
carrierUA 0.0919203 0.0431571 2.130 0.03318 *
carrierUS 0.6063731 0.0525429 11.541 < 2e-16 ***
carrierVX -0.0485832 0.0852892 -0.570 0.56893
carrierWN -0.1551747 0.0574042 -2.703 0.00687 **
carrierYV 0.5737826 0.1999578 2.870 0.00411 **
dep_delay 0.1000536 0.0004308 232.263 < 2e-16 ***
month 0.0009126 0.0024337 0.375 0.70767
year NA NA NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 284924 on 327345 degrees of freedom
Residual deviance: 108708 on 327328 degrees of freedom
AIC: 108744
Number of Fisher Scoring iterations: 7
它不断向我显示此警告:(Dispersion parameter for binomial family taken to be 1)
我实际上如何从这里预测条件? 我知道我必须以某种方式产生预测值和实际值才能达到真正的阳性率。任何人都可以指导我吗? 非常感谢!
【问题讨论】: