【问题标题】:Broom Package - Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 0 (non-NA) cases扫帚包 - lm.fit(x,y,offset = offset,singular.ok =singular.ok,...)中的错误:0(非 NA)案例
【发布时间】:2020-06-09 03:55:31
【问题描述】:

我有一个学生属性和考试成绩的数据框,我正在尝试为每个年级(1 到 12)拟合一个线性模型。我正在使用 broom 包为每个年级有效地创建模型。下面是一个简化的示例数据集和我正在使用的代码。

#start df creation 

grade <- rep(1:12, each = 40)
attendance_rate <- round(runif(480, min=25, max=100), 1)
test_growth <- round(runif(480, min = -12, max = 38))
binary_flag <- round(runif(480, min = 0, max = 1))
score <- round(runif(480, min = 92, max = 370))
survey_response <- round(runif(480, min = 1, max = 4))

df <- data.frame(grade, attendance_rate, test_growth, binary_flag, score, survey_response) 

df$survey_response[df$grade == 1] <- NA

# end df creation

#create train test split for each grade level
set.seed(123)

df_train <- lapply(split(seq(1:nrow(df)), df$grade), function(x) sample(x, floor(.6*length(x))))
df_test <- mapply(function(x,y) setdiff(x,y), x = split(seq(1:nrow(df)), df$grade), y = df_train)

df_train <- df[unlist(df_train),]

df_test <- df[unlist(df_test),]



#create models
models_nested <- df_train %>%
  group_by(grade) %>% nest() %>% 
  mutate(
    fit = map(data, ~ lm(score ~ attendance_rate + test_growth + binary_flag + survey_response, data = .x)),
    tidied = map(fit, tidy),
    augmented = map(fit, augment),
    glanced = map(fit, glance)
  )

不幸的是,当我尝试运行以 models_nested 开头的代码块时,我收到以下错误:

Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 
  0 (non-NA) cases

我知道这种情况正在发生,因为所有 1 年级的学生在survey_response 列中都有一个 NA 值。如果不对一年级进行单独的回归,完全删除调查响应列/变量,我不知道如何解决这个问题。如果特定成绩子集仅包含空值,有没有办法告诉 lm 函数简单地忽略变量?我显然想在其他年级模型的回归中保留该变量。

我已尽力将这个问题弄清楚,但如有必要,我很乐意在 cmets 中澄清。

编辑 6/9/2020:我不想为一年级模型返回 NA,我只是希望一年级的线性模型在没有survey_response 列的情况下运行。我希望将survey_response 列包含在所有其他年级级别模型中。

希望有人能帮忙!

【问题讨论】:

    标签: r linear-regression broom


    【解决方案1】:

    我们可以检查survey_response 中的NA 值并相应地使用模型。

    library(broom)
    library(dplyr)
    library(tidyr)
    library(purrr)
    
    df_train %>%
       group_by(grade) %>% 
       nest() %>% 
        mutate(fit = map(data, ~ if(all(is.na(.x$survey_response)))
                  lm(score ~ attendance_rate + test_growth + binary_flag, data = .x) 
                  else lm(score ~ attendance_rate + test_growth + binary_flag + survey_response, data = .x)),
            tidied = map(fit, tidy),
            augmented = map(fit, augment),
            glanced = map(fit, glance))
    
    
    #   grade data              fit    tidied           augmented          glanced          
    #   <int> <list>            <list> <list>           <list>             <list>           
    # 1     1 <tibble [24 × 5]> <lm>   <tibble [4 × 5]> <tibble [24 × 11]> <tibble [1 × 11]>
    # 2     2 <tibble [24 × 5]> <lm>   <tibble [4 × 5]> <tibble [24 × 11]> <tibble [1 × 11]>
    # 3     3 <tibble [24 × 5]> <lm>   <tibble [4 × 5]> <tibble [24 × 11]> <tibble [1 × 11]>
    # 4     4 <tibble [24 × 5]> <lm>   <tibble [4 × 5]> <tibble [24 × 11]> <tibble [1 × 11]>
    # 5     5 <tibble [24 × 5]> <lm>   <tibble [4 × 5]> <tibble [24 × 11]> <tibble [1 × 11]>
    # 6     6 <tibble [24 × 5]> <lm>   <tibble [4 × 5]> <tibble [24 × 11]> <tibble [1 × 11]>
    # 7     7 <tibble [24 × 5]> <lm>   <tibble [4 × 5]> <tibble [24 × 11]> <tibble [1 × 11]>
    # 8     8 <tibble [24 × 5]> <lm>   <tibble [4 × 5]> <tibble [24 × 11]> <tibble [1 × 11]>
    # 9     9 <tibble [24 × 5]> <lm>   <tibble [4 × 5]> <tibble [24 × 11]> <tibble [1 × 11]>
    #10    10 <tibble [24 × 5]> <lm>   <tibble [4 × 5]> <tibble [24 × 11]> <tibble [1 × 11]>
    #11    11 <tibble [24 × 5]> <lm>   <tibble [4 × 5]> <tibble [24 × 11]> <tibble [1 × 11]>
    #12    12 <tibble [24 × 5]> <lm>   <tibble [4 × 5]> <tibble [24 × 11]> <tibble [1 × 11]>
    

    【讨论】:

    • 谢谢你。我意识到我可以让我的问题更清楚,我道歉。我不想简单地返回一年级的 NA,我想运行一个不包括调查响应列的替代线性模型。因此,我希望以下模型仅适用于一年级: lm(score ~ admission_rate + test_growth + binary_flag, data = .x) 请注意,survey_response 变量已被删除,但仅适用于一年级。这可能吗?再次感谢您的宝贵时间和意见。
    • @rachael_learns 但你并不总是只有survey_response。会给你一个错误。有时test_growth 也会给你一个错误,对吧?你怎么知道?
    • 导致问题的总是survey_response - 我正在处理教育数据,它只是这个特定数据集的一个特征,一年级的孩子不参加这项调查,而所有其他年级的孩子都参加调查。
    • 是的,这行得通!太感谢了。我希望有一天我会和你一样在R! :D
    【解决方案2】:

    我们可以从purrr使用possibly

    library(broom)
    library(dplyr)
    library(tidyr)
    library(purrr)
    
    poslm <- possibly(lm, otherwise = NA)
    df_train %>%
       group_by(grade) %>% 
       nest() %>% 
       mutate(fit = map(data, ~ poslm(score ~ attendance_rate + test_growth + 
                  binary_flag + survey_response, data = .x)), 
             tidied = map(fit, possibly(tidy, otherwise = NA)),
                augmented = map(fit, possibly(augment, otherwise = NA)),
              glanced = map(fit, possibly(glance, otherwise = NA)))
    # A tibble: 12 x 6
    # Groups:   grade [12]
    #   grade data              fit       tidied           augmented          glanced          
    #   <int> <list>            <list>    <list>           <list>             <list>           
    # 1     1 <tibble [24 × 5]> <lgl [1]> <lgl [1]>        <lgl [1]>          <lgl [1]>        
    # 2     2 <tibble [24 × 5]> <lm>      <tibble [5 × 5]> <tibble [24 × 12]> <tibble [1 × 11]>
    # 3     3 <tibble [24 × 5]> <lm>      <tibble [5 × 5]> <tibble [24 × 12]> <tibble [1 × 11]>
    # 4     4 <tibble [24 × 5]> <lm>      <tibble [5 × 5]> <tibble [24 × 12]> <tibble [1 × 11]>
    # 5     5 <tibble [24 × 5]> <lm>      <tibble [5 × 5]> <tibble [24 × 12]> <tibble [1 × 11]>
    # 6     6 <tibble [24 × 5]> <lm>      <tibble [5 × 5]> <tibble [24 × 12]> <tibble [1 × 11]>
    # 7     7 <tibble [24 × 5]> <lm>      <tibble [5 × 5]> <tibble [24 × 12]> <tibble [1 × 11]>
    # 8     8 <tibble [24 × 5]> <lm>      <tibble [5 × 5]> <tibble [24 × 12]> <tibble [1 × 11]>
    # 9     9 <tibble [24 × 5]> <lm>      <tibble [5 × 5]> <tibble [24 × 12]> <tibble [1 × 11]>
    #10    10 <tibble [24 × 5]> <lm>      <tibble [5 × 5]> <tibble [24 × 12]> <tibble [1 × 11]>
    #11    11 <tibble [24 × 5]> <lm>      <tibble [5 × 5]> <tibble [24 × 12]> <tibble [1 × 11]>
    #12    12 <tibble [24 × 5]> <lm>      <tibble [5 × 5]> <tibble [24 × 12]> <tibble [1 × 11]>
    

    【讨论】:

    • 感谢您的回复。我意识到我的问题不够清晰。我不想为一年级模型返回 NA,我只想让一年级的线性模型在没有survey_response 列的情况下运行。我希望将survey_response 列包含在所有其他年级级别模型中。这可能吗?
    猜你喜欢
    • 2014-12-12
    • 1970-01-01
    • 2020-09-09
    • 2013-04-20
    • 2018-09-26
    • 2020-08-23
    • 1970-01-01
    • 1970-01-01
    • 2017-09-26
    相关资源
    最近更新 更多