使用 function() 为 df 的多个子集生成线性回归答案

【问题标题】：use of function() to generate linear regression for multiple subsets of df使用 function() 为 df 的多个子集生成线性回归
【发布时间】：2021-07-22 14:03:57
【问题描述】：

作为 R 的新手，我正在努力分配一个函数并为多个子集执行它。（我尝试解决了三天，即使借助线程我也无法掌握它......）

设置：我需要计算不同浓度的多种抗生素（“MIC”）随时间的细菌杀灭率的线性回归。由于循环和map 对我来说太先进了，我想通过为每个子集分配一个函数来解决这个问题。最后，我想要一个显示抗生素、MIC、系数、p 值的数据框。

问题：以下两种方法都会导致我无法完全掌握的错误消息。

希望你能与我分享你的编码智慧！

library(tidyverse)
library(ggplot2)
library(drc)

# calucalting linear regressions
## exlcude bacteria("CFU") <=50, it is under detection limit of assay

rawdata2 <- structure(
        list(
                antibiotic = c(
                        "CHX",
                        "CHX",
                        "CHX",
                        "CHX",
                        "CHX",
                        "CHX",
                        "CHX",
                        "CHX",
                        "CHX",
                        "CHX"
                ),
                MIC = structure(
                        c(1L, 1L,
                          1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L),
                        .Label = c("0", "0.25", "0.5",
                                   "1", "2", "4"),
                        class = "factor"
                ),
                minutes = c(0L, 10L, 30L,
                            60L, 120L, 300L, 10L, 30L, 60L, 120L),
                CFU_mean = c(
                        1044444.44444444,
                        1050000,
                        1141666.66666667,
                        2425000,
                        16916666.6666667,
                        157500000,
                        883333.333333333,
                        1175000,
                        1758333.33333333,
                        12408333.3333333
                )
        ),
        row.names = c(NA,-10L),
        groups = structure(
                list(
                        antibiotic = c("CHX",
                                       "CHX"),
                        MIC = structure(
                                1:2,
                                .Label = c("0", "0.25", "0.5", "1",
                                           "2", "4"),
                                class = "factor"
                        ),
                        .rows = structure(
                                list(1:6, 7:10),
                                ptype = integer(0),
                                class = c("vctrs_list_of",
                                          "vctrs_vctr", "list")
                        )
                ),
                row.names = c(NA,-2L),
                class = c("tbl_df",
                          "tbl", "data.frame"),
                .drop = TRUE
        ),
        class = c("grouped_df",
                  "tbl_df", "tbl", "data.frame")
)


regressions <- function(x) {
        model <- lm(CFU ~ minutes, data = .)
        coeff <- coef(model)
        return(coeff)
}

lm_abx_by_MIC <- rawdata2 %>% 
        group_by(antibiotic, MIC) %>% 
        filter(CFU >50) %>% 
        do(regressions(.))

is.data.frame(data) 中的错误：对象 '.'没找到

> regressions <- function(x) {
+         model <- lm(CFU_mean ~ minutes, data = x)
+         coeff <- coef(model)
+         return(coeff)
+ }
> lm_abx_by_MIC <- rawdata2 %>% 
+         group_by(antibiotic, MIC) %>% 
+         filter(CFU_mean >50) %>% 
+         do(regressions(.))

错误：结果 1、2、3、4、5，... 必须是数据帧，而不是数字运行rlang::last_error() 看看哪里出错了。

【问题讨论】：

请使用dput提供数据样本，以便我们以更有效的方式为您提供帮助。

标签： r linear-regression

【解决方案1】：

不确定这是否正是您要找的。使用broom::tidy()，我们可以实现这样的解决方案（虽然它没有像你那样定义函数）：

rawdata2 %>% 
  ungroup() %>% 
  filter(CFU_mean > 50) %>% 
  nest_by(antibiotic, MIC) %>% 
  mutate(model = list(lm(CFU_mean ~ minutes, data = data))) %>% 
  summarise(broom::tidy(model))
#> # A tibble: 4 x 7
#> # Groups:   antibiotic, MIC [2]
#>  antibiotic MIC   term          estimate std.error statistic p.value
#>  <chr>      <fct> <chr>            <dbl>     <dbl>     <dbl>   <dbl>
#> 1 CHX        0     (Intercept) -15891690. 11183665.    -1.42  0.228  
#> 2 CHX        0     minutes        529669.    82975.     6.38  0.00309
#> 3 CHX        0.25  (Intercept)  -1891787.  2091374.    -0.905 0.461  
#> 4 CHX        0.25  minutes        108146.    30345.     3.56  0.0705

这对你有帮助吗？

【讨论】：

【解决方案2】：

使用您原来的 regressions 函数的细微变化

rawdata2 %>% 
  group_by(antibiotic, MIC) %>% 
  filter(CFU_mean >50) %>%
  nest() %>%
  mutate(coeff = map(data, regressions)) %>%
  unnest(coeff)

#------
# A tibble: 4 x 4
# Groups:   antibiotic, MIC [2]
  antibiotic MIC   data                       coeff
  <chr>      <fct> <list>                     <dbl>
1 CHX        0     <tibble[,2] [6 x 2]>  -15891690.
2 CHX        0     <tibble[,2] [6 x 2]>     529669.
3 CHX        0.25  <tibble[,2] [4 x 2]>   -1891787.
4 CHX        0.25  <tibble[,2] [4 x 2]>     108146.

【讨论】：