【问题标题】:R calculate multiple t.tests of a data frame usign column names stored in listR使用存储在列表中的列名计算数据框的多个t.test
【发布时间】:2020-12-10 20:21:52
【问题描述】:

我已经在这工作了几天,似乎无法让它发挥作用。我有一个数值数据框,我正在尝试对多列的所有行执行 t.test。我觉得我在这里遗漏了一些非常基本的东西。

df1 = data.frame(a1 = rnorm(10, 2, 1),
               a2 = rnorm(10, 3, 1),
               a3 = rnorm(10, 1, 1.5),
               b1 = rnorm(10, 4, 2),
               b2 = rnorm(10, 2.5, 4),
               b3 = rnorm(10, 3, 3.5),
               c1 = rnorm(10, 7, 2.0),
               c2 = rnorm(10, 4, 9),
               c3 = rnorm(10, 5, 5))

下面我确定要比较哪些列集。例如,我想将所有以 a 开头的列与所有以 b 开头的列以及所有以 a 到 c 开头的列进行比较。我这样做是因为我不需要为每次比较创建一个新变量。

set1 = c("a", "a")
set2 = c("b", "c")

然后我从 df1 中提取确切的列名并将它们放入一个列表中

g1 = lapply(set1, function(x) grep(x, names(df1), value=T, fixed=T))

g2 = lapply(set2, function(x) grep(x, names(df1), value=T, fixed=T))

然后我尝试映射函数。想法是 R 应该使用 a 与 b 的比较值和 a 与 c 的比较值来计算数据帧每一行的 p.value。

test = map2(g1, g2, function(x,y){t_test = apply(df1, 1, function(z) {t.test(z[g1[[x]]], z[g2[[y]]], alternative = "two.sided", var.equal = T)$p.value}) })

现在,如果我选择完全不循环,这可以正常工作。

t_test = apply(df1, 1, function(z) {t.test(z[g1[[1]]], z[g2[[1]]], alternative = "two.sided", var.equal = T)$p.value})

非常感谢任何建议或帮助。

【问题讨论】:

    标签: r


    【解决方案1】:

    您可以使用expand.gridapply

    comb <- expand.grid(grep("a", names(df1)), grep("c", names(df1)))
    
    setNames(apply(comb, 1, function(i) t.test(df1[,i[1]], df1[,i[2]])),
             apply(comb, 1, function(i) paste(names(df1)[i], collapse = " v ")))
    
    #> $`a1 v c1`
    #> 
    #>  Welch Two Sample t-test
    #> 
    #> data:  df1[, i[1]] and df1[, i[2]]
    #> t = -7.3843, df = 11.962, p-value = 8.607e-06
    #> alternative hypothesis: true difference in means is not equal to 0
    #> 95 percent confidence interval:
    #>  -7.114060 -3.871511
    #> sample estimates:
    #> mean of x mean of y 
    #>  1.963807  7.456593 
    #> 
    #> 
    #> $`a2 v c1`
    #> 
    #>  Welch Two Sample t-test
    #> 
    #> data:  df1[, i[1]] and df1[, i[2]]
    #> t = -5.0853, df = 12.938, p-value = 0.0002122
    #> alternative hypothesis: true difference in means is not equal to 0
    #> 95 percent confidence interval:
    #>  -5.529495 -2.231017
    #> sample estimates:
    #> mean of x mean of y 
    #>  3.576337  7.456593 
    #> 
    #> 
    #> $`a3 v c1`
    #> 
    #>  Welch Two Sample t-test
    #> 
    #> data:  df1[, i[1]] and df1[, i[2]]
    #> t = -7.8122, df = 15.439, p-value = 9.538e-07
    #> alternative hypothesis: true difference in means is not equal to 0
    #> 95 percent confidence interval:
    #>  -8.150119 -4.662903
    #> sample estimates:
    #> mean of x mean of y 
    #>  1.050081  7.456593 
    #> 
    #> 
    #> $`a1 v c2`
    #> 
    #>  Welch Two Sample t-test
    #> 
    #> data:  df1[, i[1]] and df1[, i[2]]
    #> t = -0.19935, df = 9.1996, p-value = 0.8463
    #> alternative hypothesis: true difference in means is not equal to 0
    #> 95 percent confidence interval:
    #>  -6.632092  5.554581
    #> sample estimates:
    #> mean of x mean of y 
    #>  1.963807  2.502563 
    #> 
    #> 
    #> $`a2 v c2`
    #> 
    #>  Welch Two Sample t-test
    #> 
    #> data:  df1[, i[1]] and df1[, i[2]]
    #> t = 0.39654, df = 9.2716, p-value = 0.7007
    #> alternative hypothesis: true difference in means is not equal to 0
    #> 95 percent confidence interval:
    #>  -5.024565  7.172113
    #> sample estimates:
    #> mean of x mean of y 
    #>  3.576337  2.502563 
    #> 
    #> 
    #> $`a3 v c2`
    #> 
    #>  Welch Two Sample t-test
    #> 
    #> data:  df1[, i[1]] and df1[, i[2]]
    #> t = -0.53312, df = 9.4962, p-value = 0.6062
    #> alternative hypothesis: true difference in means is not equal to 0
    #> 95 percent confidence interval:
    #>  -7.566913  4.661950
    #> sample estimates:
    #> mean of x mean of y 
    #>  1.050081  2.502563 
    #> 
    #> 
    #> $`a1 v c3`
    #> 
    #>  Welch Two Sample t-test
    #> 
    #> data:  df1[, i[1]] and df1[, i[2]]
    #> t = -2.4041, df = 9.5786, p-value = 0.03807
    #> alternative hypothesis: true difference in means is not equal to 0
    #> 95 percent confidence interval:
    #>  -7.4463155 -0.2605803
    #> sample estimates:
    #> mean of x mean of y 
    #>  1.963807  5.817255 
    #> 
    #> 
    #> $`a2 v c3`
    #> 
    #>  Welch Two Sample t-test
    #> 
    #> data:  df1[, i[1]] and df1[, i[2]]
    #> t = -1.3903, df = 9.7868, p-value = 0.1953
    #> alternative hypothesis: true difference in means is not equal to 0
    #> 95 percent confidence interval:
    #>  -5.843047  1.361209
    #> sample estimates:
    #> mean of x mean of y 
    #>  3.576337  5.817255 
    #> 
    #> 
    #> $`a3 v c3`
    #> 
    #>  Welch Two Sample t-test
    #> 
    #> data:  df1[, i[1]] and df1[, i[2]]
    #> t = -2.9074, df = 10.432, p-value = 0.015
    #> alternative hypothesis: true difference in means is not equal to 0
    #> 95 percent confidence interval:
    #>  -8.400133 -1.134214
    #> sample estimates:
    #> mean of x mean of y 
    #>  1.050081  5.817255
    

    reprex package (v0.3.0) 于 2020 年 12 月 10 日创建

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2013-11-01
      • 2017-12-24
      • 1970-01-01
      • 2020-09-02
      • 2020-10-25
      • 1970-01-01
      • 1970-01-01
      • 2019-10-19
      相关资源
      最近更新 更多