【问题标题】:use of function() to generate linear regression for multiple subsets of df使用 function() 为 df 的多个子集生成线性回归
【发布时间】:2021-07-22 14:03:57
【问题描述】:

作为 R 的新手,我正在努力分配一个函数并为多个子集执行它。 (我尝试解决了三天,即使借助线程我也无法掌握它......)

设置:我需要计算不同浓度的多种抗生素(“MIC”)随时间的细菌杀灭率的线性回归。由于循环和map 对我来说太先进了,我想通过为每个子集分配一个函数来解决这个问题。最后,我想要一个显示抗生素、MIC、系数、p 值的数据框。

问题:以下两种方法都会导致我无法完全掌握的错误消息。

希望你能与我分享你的编码智慧!

library(tidyverse)
library(ggplot2)
library(drc)

# calucalting linear regressions
## exlcude bacteria("CFU") <=50, it is under detection limit of assay

rawdata2 <- structure(
        list(
                antibiotic = c(
                        "CHX",
                        "CHX",
                        "CHX",
                        "CHX",
                        "CHX",
                        "CHX",
                        "CHX",
                        "CHX",
                        "CHX",
                        "CHX"
                ),
                MIC = structure(
                        c(1L, 1L,
                          1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L),
                        .Label = c("0", "0.25", "0.5",
                                   "1", "2", "4"),
                        class = "factor"
                ),
                minutes = c(0L, 10L, 30L,
                            60L, 120L, 300L, 10L, 30L, 60L, 120L),
                CFU_mean = c(
                        1044444.44444444,
                        1050000,
                        1141666.66666667,
                        2425000,
                        16916666.6666667,
                        157500000,
                        883333.333333333,
                        1175000,
                        1758333.33333333,
                        12408333.3333333
                )
        ),
        row.names = c(NA,-10L),
        groups = structure(
                list(
                        antibiotic = c("CHX",
                                       "CHX"),
                        MIC = structure(
                                1:2,
                                .Label = c("0", "0.25", "0.5", "1",
                                           "2", "4"),
                                class = "factor"
                        ),
                        .rows = structure(
                                list(1:6, 7:10),
                                ptype = integer(0),
                                class = c("vctrs_list_of",
                                          "vctrs_vctr", "list")
                        )
                ),
                row.names = c(NA,-2L),
                class = c("tbl_df",
                          "tbl", "data.frame"),
                .drop = TRUE
        ),
        class = c("grouped_df",
                  "tbl_df", "tbl", "data.frame")
)


regressions <- function(x) {
        model <- lm(CFU ~ minutes, data = .)
        coeff <- coef(model)
        return(coeff)
}

lm_abx_by_MIC <- rawdata2 %>% 
        group_by(antibiotic, MIC) %>% 
        filter(CFU >50) %>% 
        do(regressions(.))

is.data.frame(data) 中的错误:对象 '.'没找到

> regressions <- function(x) {
+         model <- lm(CFU_mean ~ minutes, data = x)
+         coeff <- coef(model)
+         return(coeff)
+ }
> lm_abx_by_MIC <- rawdata2 %>% 
+         group_by(antibiotic, MIC) %>% 
+         filter(CFU_mean >50) %>% 
+         do(regressions(.))

错误:结果 1、2、3、4、5,... 必须是数据帧,而不是数字 运行rlang::last_error() 看看哪里出错了。

【问题讨论】:

  • 请使用dput提供数据样本,以便我们以更有效的方式为您提供帮助。

标签: r linear-regression


【解决方案1】:

不确定这是否正是您要找的。 使用broom::tidy(),我们可以实现这样的解决方案(虽然它没有像你那样定义函数):

rawdata2 %>% 
  ungroup() %>% 
  filter(CFU_mean > 50) %>% 
  nest_by(antibiotic, MIC) %>% 
  mutate(model = list(lm(CFU_mean ~ minutes, data = data))) %>% 
  summarise(broom::tidy(model))
#> # A tibble: 4 x 7
#> # Groups:   antibiotic, MIC [2]
#>  antibiotic MIC   term          estimate std.error statistic p.value
#>  <chr>      <fct> <chr>            <dbl>     <dbl>     <dbl>   <dbl>
#> 1 CHX        0     (Intercept) -15891690. 11183665.    -1.42  0.228  
#> 2 CHX        0     minutes        529669.    82975.     6.38  0.00309
#> 3 CHX        0.25  (Intercept)  -1891787.  2091374.    -0.905 0.461  
#> 4 CHX        0.25  minutes        108146.    30345.     3.56  0.0705 

这对你有帮助吗?

【讨论】:

    【解决方案2】:

    使用您原来的 regressions 函数的细微变化

    rawdata2 %>% 
      group_by(antibiotic, MIC) %>% 
      filter(CFU_mean >50) %>%
      nest() %>%
      mutate(coeff = map(data, regressions)) %>%
      unnest(coeff)
    
    #------
    # A tibble: 4 x 4
    # Groups:   antibiotic, MIC [2]
      antibiotic MIC   data                       coeff
      <chr>      <fct> <list>                     <dbl>
    1 CHX        0     <tibble[,2] [6 x 2]>  -15891690.
    2 CHX        0     <tibble[,2] [6 x 2]>     529669.
    3 CHX        0.25  <tibble[,2] [4 x 2]>   -1891787.
    4 CHX        0.25  <tibble[,2] [4 x 2]>     108146.
    

    【讨论】:

      猜你喜欢
      • 2021-08-16
      • 2019-10-13
      • 2020-08-13
      • 2020-08-08
      • 1970-01-01
      • 2020-08-30
      • 2015-03-21
      • 2021-01-15
      • 1970-01-01
      相关资源
      最近更新 更多