【问题标题】:Functional programming principles to perform multiple statistical computations执行多种统计计算的函数式编程原理
【发布时间】:2018-01-20 09:50:21
【问题描述】:

我想应用一些包含可靠性测量的统计计算,例如 ICC 或变异系数。虽然我可以单独计算它们,但我还不熟悉 R 函数式编程实践以直接执行多个计算而无需太多代码重复。

考虑以下 data.frame 示例,其中包含对五个不同变量 (Var1, ... Var5) 的重复测量 (T1, T2):

set.seed(123)
df = data.frame(matrix(rnorm(100), nrow=10))
names(df) <- c("T1.Var1", "T1.Var2", "T1.Var3", "T1.Var4", "T1.Var5",
               "T2.Var1", "T2.Var2", "T2.Var3", "T2.Var4", "T2.Var5")

如果我想计算每个变量的两个重复测量之间的类内相关系数,我可以:1)创建返回的函数:ICC,下限和上限值:

calcula_ICC <- function(a, b) {
  ICc <- ICC(matrix(c(a,b), ncol = 2))
  icc <- ICc$results[[2]] [3]
  lo  <- ICc$results[[7]] [3]
  up  <- ICc$results[[8]] [3]
  round(c(icc, lo, up),2)
} 

和 2) 将其应用于每个对应的变量,如下所示:

calcula_ICC(df$T1.Var1, df$T2.Var1)
calcula_ICC(df$T1.Var2, df$T2.Var2)
calcula_ICC(df$T1.Var3, df$T2.Var3)
calcula_ICC(df$T1.Var4, df$T2.Var4)
calcula_ICC(df$T1.Var5, df$T2.Var5)

然后,我将对每个变量进行类似的其他统计计算,例如重复测量之间的变异系数或标准误差。

但是,怎么可能使用一些函数式编程原理呢?例如,我如何创建一个函数,将 T1T2 上的每个相应变量以及所需的函数作为参数?

【问题讨论】:

标签: r dataframe functional-programming purrr


【解决方案1】:

函数式编程方法是使用mapply。无需“整理”:

result = mapply(calcula_ICC, df[, 1:5], df[, 6:10], USE.NAMES=FALSE)

colnames(result) = paste0('Var', 1:5)

# Better than setting rownames here is to have calcula_ICC() return a named vector
rownames(result) = c('icc','lo','up')

> result
#      Var1  Var2  Var3  Var4  Var5
# icc  0.09  0.08 -0.37 -0.23 -0.17
# lo  -0.54 -0.55 -0.80 -0.73 -0.70
# up   0.66  0.65  0.29  0.43  0.48

(注意结果是一个矩阵。)

【讨论】:

    【解决方案2】:

    有很多方法可以解决这个问题,我没有时间全部发布,但我可能会回来添加一个 lapply 解决方案,因为 apply 函数在R.

    使用dplyrtidyr

    这里有一个dplyrtidyr 可能有帮助的解决方案:

    require(dplyr)
    require(tidyr)
    
    # let's have a function for each value you want eventually
    GetICC <- function(x, y) {
      require(psych)
      ICC(matrix(c(x, y), ncol = 2))$results[[2]][3]
    }
    
    GetICCLo <- function(x, y) {
      require(psych)
      ICC(matrix(c(x, y), ncol = 2))$results[[7]][3]
    }
    
        GetICCUp <- function(x, y) {
          require(psych)
      ICC(matrix(c(x, y), ncol = 2))$results[[8]][3]
    }
    
    # tidy up your data, take a look at what this looks like
    mydata <- df %>%
      mutate(id = row_number()) %>%
      gather(key = time, value = value, -id) %>%
      separate(time, c("Time", "Var")) %>%
      spread(key = Time, value = value)
    
    # group by variable, then run your functions
    # notice I added mean difference between the two
    # times as an example of how you can extend this
    # to include whatever summaries you need
    myresults <- mydata %>%
      group_by(Var) %>%
      summarize(icc = GetICC(T1, T2),
                icc_lo = GetICCLo(T1, T2),
                icc_up = GetICCUp(T1, T2),
                mean_diff = mean(T2) - mean(T1))
    

    只要您传递给汇总的所有内容都将在同一级别聚合/计算,此方法就可以很好地工作。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2013-10-20
      • 1970-01-01
      • 2021-07-15
      • 2020-10-25
      • 2015-07-19
      • 2022-01-09
      相关资源
      最近更新 更多