【问题标题】:How to do a GLM when "contrasts can be applied only to factors with 2 or more levels"?当“对比只能应用于具有 2 个或更多水平的因素”时如何进行 GLM?
【发布时间】:2018-10-22 03:53:54
【问题描述】:

我想在 R 中使用 glm 进行回归,但有没有办法做到这一点,因为我得到了对比错误。

mydf <- data.frame(Group=c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12),
                   WL=rep(c(1,0),12), 
                   New.Runner=c("N","N","N","N","N","N","Y","N","N","N","N","N","N","Y","N","N","N","Y","N","N","N","N","N","Y"), 
                   Last.Run=c(1,5,2,6,5,4,NA,3,7,2,4,9,8,NA,3,5,1,NA,6,10,7,9,2,NA))

mod <- glm(formula = WL~New.Runner+Last.Run, family = binomial, data = mydf)
#Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
# contrasts can be applied only to factors with 2 or more levels

【问题讨论】:

    标签: r regression glm


    【解决方案1】:

    使用此处定义的debug_contr_errordebug_contr_error2 函数:How to debug “contrasts can be applied only to factors with 2 or more levels” error? 我们可以轻松定位问题:变量New.Runner 中只剩下一个级别。

    info <- debug_contr_error2(WL ~ New.Runner + Last.Run, mydf)
    
    info[c(2, 3)]
    #$nlevels
    #New.Runner 
    #         1 
    #
    #$levels
    #$levels$New.Runner
    #[1] "N"
    
    ## the data frame that is actually used by `glm`
    dat <- info$mf
    

    不能对单个级别的因子应用对比,因为任何类型的对比都会将级别数减少1。通过1 - 1 = 0,此变量将从模型矩阵中删除。

    那么,我们是否可以简单地要求不对单级因素应用对比?不可以。所有对比方法都禁止这样做:

    contr.helmert(n = 1, contrasts = FALSE)
    #Error in contr.helmert(n = 1, contrasts = FALSE) : 
    #  not enough degrees of freedom to define contrasts
    
    contr.poly(n = 1, contrasts = FALSE)
    #Error in contr.poly(n = 1, contrasts = FALSE) : 
    #  contrasts not defined for 0 degrees of freedom
    
    contr.sum(n = 1, contrasts = FALSE)
    #Error in contr.sum(n = 1, contrasts = FALSE) : 
    #  not enough degrees of freedom to define contrasts
    
    contr.treatment(n = 1, contrasts = FALSE)
    #Error in contr.treatment(n = 1, contrasts = FALSE) : 
    #  not enough degrees of freedom to define contrasts
    
    contr.SAS(n = 1, contrasts = FALSE)
    #Error in contr.treatment(n, base = if (is.numeric(n) && length(n) == 1L) n else length(n),  : 
    #  not enough degrees of freedom to define contrasts
    

    其实,如果你仔细想想,你会得出结论,没有对比,一个单一水平的因素只是一个全1的虚拟变量,即截距。所以,你绝对可以做到以下几点:

    dat$New.Runner <- 1    ## set it to 1, as if no contrasts is applied
    
    mod <- glm(formula = WL ~ New.Runner + Last.Run, family = binomial, data = dat)
    #(Intercept)   New.Runner     Last.Run  
    #     1.4582           NA      -0.2507
    

    由于rank-deficiency,您将获得New.RunnerNA 系数。事实上,applying contrasts is a fundamental way to avoid rank-deficiency。只是当一个因素只有一个层次时,对比的应用就成了一个悖论。

    我们也来看看模型矩阵:

    model.matrix(mod)
    #   (Intercept) New.Runner Last.Run
    #1            1          1        1
    #2            1          1        5
    #3            1          1        2
    #4            1          1        6
    #5            1          1        5
    #6            1          1        4
    #8            1          1        3
    #9            1          1        7
    #10           1          1        2
    #11           1          1        4
    #12           1          1        9
    #13           1          1        8
    #15           1          1        3
    #16           1          1        5
    #17           1          1        1
    #19           1          1        6
    #20           1          1       10
    #21           1          1        7
    #22           1          1        9
    #23           1          1        2
    

    (intercept)New.Runner 具有相同的列,并且只能估计其中之一。如果你想估计New.Runner,就把截距去掉:

    glm(formula = WL ~ 0 + New.Runner + Last.Run, family = binomial, data = dat)
    #New.Runner    Last.Run  
    #    1.4582     -0.2507 
    

    确保彻底消化排名不足的问题。如果您有多个单级因子并将它们全部替换为 1,则删除单个截距仍会导致排名不足。

    dat$foo.factor <- 1
    glm(formula = WL ~ 0 + New.Runner + foo.factor + Last.Run, family = binomial, data = dat)
    #New.Runner  foo.factor    Last.Run  
    #    1.4582          NA     -0.2507 
    

    【讨论】:

      猜你喜欢
      • 2020-05-03
      • 2022-01-19
      • 2015-07-23
      • 2018-08-11
      • 2021-01-24
      • 2015-07-10
      • 2016-09-03
      • 2015-12-09
      • 1970-01-01
      相关资源
      最近更新 更多