【问题标题】:Finding model (returned from for loops) with lowest AIC in R查找 R 中 AIC 最低的模型(从 for 循环返回)
【发布时间】:2017-12-28 11:59:28
【问题描述】:

我正在尝试寻找 AIC 最低的模型。模型从两个 for 循环中返回,这些循环可以组合列。我无法制作具有最低 AIC 的函数返回模型。下面的代码演示了我卡在哪里:

rm(list = ls())

data <- iris

data <- data[data$Species %in% c("setosa", "virginica"),]

data$Species = ifelse(data$Species == 'virginica', 0, 1)

mod_headers <- names(data[1:ncol(data)-1])

f <- function(mod_headers){
    for(i in 1:length(mod_headers)){
    tab <- combn(mod_headers,i)
    for(j in 1:ncol(tab)){
      tab_new <- c(tab[,j])
      mod_tab_new <- c(tab_new, "Species")
      model <- glm(Species ~., data=data[c(mod_tab_new)], family = binomial(link = "logit"))
    }
    }
  best_model <- model[which(AIC(model)[order(AIC(model))][1])]
  print(best_model)
}

f(mod_headers)

有什么建议吗?谢谢!

【问题讨论】:

标签: r for-loop glm model-comparison


【解决方案1】:

我用矢量化替代品替换了你的 for 循环

library(tidyverse)
library(iterators)
# Column names you want to use in glm model, saved as list
whichcols <- Reduce("c", map(1:length(mod_headers), ~lapply(iter(combn(mod_headers,.x), by="col"),function(y) c(y))))

# glm model results using selected column names, saved as list
models <- map(1:length(whichcols), ~glm(Species ~., data=data[c(whichcols[[.x]], "Species")], family = binomial(link = "logit")))

# selects model with lowest AIC
best <- models[[which.min(sapply(1:length(models),function(x)AIC(models[[x]])))]]

输出

Call:  glm(formula = Species ~ ., family = binomial(link = "logit"), 
data = data[c(whichcols[[.x]], "Species")])

Coefficients:
 (Intercept)  Petal.Length  
       55.40        -17.17  

Degrees of Freedom: 99 Total (i.e. Null);  98 Residual
Null Deviance:      138.6 
Residual Deviance: 1.208e-09    AIC: 4

【讨论】:

    【解决方案2】:

    glm() 使用迭代重新加权最小二乘算法。该算法在收敛之前达到最大迭代次数 - 更改此参数有助于您的情况:

     glm(Species ~., data=data[mod_tab_new], family = binomial(link = "logit"), control = list(maxit = 50))
    

    使用which 时出现另一个问题,我在每个模型拟合后将其替换为if,以与迄今为止的最低 AIC 进行比较。不过,我认为有比这种for-loop 方法更好的解决方案。

    f <- function(mod_headers){
      lowest_aic <- Inf     # added
      best_model <- NULL    # added
    
      for(i in 1:length(mod_headers)){
        tab <- combn(mod_headers,i)
        for(j in 1:ncol(tab)){
          tab_new <- tab[, j]
          mod_tab_new <- c(tab_new, "Species")
          model <- glm(Species ~., data=data[mod_tab_new], family = binomial(link = "logit"), control = list(maxit = 50))
          if(AIC(model) < lowest_aic){ # added
            lowest_aic <- AIC(model)   # added
            best_model <- model        # added
          }
        }
      }
      return(best_model)
    }
    

    【讨论】:

    • 这实际上是有效的,并且似乎最有效地找到具有一堆 cols 的最佳模型。谢谢!
    【解决方案3】:

    使用您的循环,只需将所有模型放在一个列表中。 然后计算所有这些模型的 AIC。 最后返回 AIC 最小的模型。

    f <- function(mod_headers) {
    
      models <- list()
      k <- 1
      for (i in 1:length(mod_headers)) {
        tab <- combn(mod_headers, i)
        for(j in 1:ncol(tab)) {
          mod_tab_new <- c(tab[, j], "Species")
          models[[k]] <- glm(Species ~ ., data = data[mod_tab_new], 
                             family = binomial(link = "logit"))
          k <- k + 1
        }
      }
    
      models[[which.min(sapply(models, AIC))]]
    }
    

    【讨论】:

      猜你喜欢
      • 2021-05-06
      • 1970-01-01
      • 2020-01-05
      • 1970-01-01
      • 1970-01-01
      • 2017-08-26
      • 2019-07-31
      • 2021-01-06
      • 1970-01-01
      相关资源
      最近更新 更多