R使用存储在列表中的列名计算数据框的多个t.test答案

【问题标题】：R calculate multiple t.tests of a data frame usign column names stored in listR使用存储在列表中的列名计算数据框的多个t.test
【发布时间】：2020-12-10 20:21:52
【问题描述】：

我已经在这工作了几天，似乎无法让它发挥作用。我有一个数值数据框，我正在尝试对多列的所有行执行 t.test。我觉得我在这里遗漏了一些非常基本的东西。

df1 = data.frame(a1 = rnorm(10, 2, 1),
               a2 = rnorm(10, 3, 1),
               a3 = rnorm(10, 1, 1.5),
               b1 = rnorm(10, 4, 2),
               b2 = rnorm(10, 2.5, 4),
               b3 = rnorm(10, 3, 3.5),
               c1 = rnorm(10, 7, 2.0),
               c2 = rnorm(10, 4, 9),
               c3 = rnorm(10, 5, 5))

下面我确定要比较哪些列集。例如，我想将所有以 a 开头的列与所有以 b 开头的列以及所有以 a 到 c 开头的列进行比较。我这样做是因为我不需要为每次比较创建一个新变量。

set1 = c("a", "a")
set2 = c("b", "c")

然后我从 df1 中提取确切的列名并将它们放入一个列表中

g1 = lapply(set1, function(x) grep(x, names(df1), value=T, fixed=T))

g2 = lapply(set2, function(x) grep(x, names(df1), value=T, fixed=T))

然后我尝试映射函数。想法是 R 应该使用 a 与 b 的比较值和 a 与 c 的比较值来计算数据帧每一行的 p.value。

test = map2(g1, g2, function(x,y){t_test = apply(df1, 1, function(z) {t.test(z[g1[[x]]], z[g2[[y]]], alternative = "two.sided", var.equal = T)$p.value}) })

现在，如果我选择完全不循环，这可以正常工作。

t_test = apply(df1, 1, function(z) {t.test(z[g1[[1]]], z[g2[[1]]], alternative = "two.sided", var.equal = T)$p.value})

非常感谢任何建议或帮助。

【问题讨论】：

标签： r

【解决方案1】：

您可以使用expand.grid 和apply：

comb <- expand.grid(grep("a", names(df1)), grep("c", names(df1)))

setNames(apply(comb, 1, function(i) t.test(df1[,i[1]], df1[,i[2]])),
         apply(comb, 1, function(i) paste(names(df1)[i], collapse = " v ")))

#> $`a1 v c1`
#> 
#>  Welch Two Sample t-test
#> 
#> data:  df1[, i[1]] and df1[, i[2]]
#> t = -7.3843, df = 11.962, p-value = 8.607e-06
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>  -7.114060 -3.871511
#> sample estimates:
#> mean of x mean of y 
#>  1.963807  7.456593 
#> 
#> 
#> $`a2 v c1`
#> 
#>  Welch Two Sample t-test
#> 
#> data:  df1[, i[1]] and df1[, i[2]]
#> t = -5.0853, df = 12.938, p-value = 0.0002122
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>  -5.529495 -2.231017
#> sample estimates:
#> mean of x mean of y 
#>  3.576337  7.456593 
#> 
#> 
#> $`a3 v c1`
#> 
#>  Welch Two Sample t-test
#> 
#> data:  df1[, i[1]] and df1[, i[2]]
#> t = -7.8122, df = 15.439, p-value = 9.538e-07
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>  -8.150119 -4.662903
#> sample estimates:
#> mean of x mean of y 
#>  1.050081  7.456593 
#> 
#> 
#> $`a1 v c2`
#> 
#>  Welch Two Sample t-test
#> 
#> data:  df1[, i[1]] and df1[, i[2]]
#> t = -0.19935, df = 9.1996, p-value = 0.8463
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>  -6.632092  5.554581
#> sample estimates:
#> mean of x mean of y 
#>  1.963807  2.502563 
#> 
#> 
#> $`a2 v c2`
#> 
#>  Welch Two Sample t-test
#> 
#> data:  df1[, i[1]] and df1[, i[2]]
#> t = 0.39654, df = 9.2716, p-value = 0.7007
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>  -5.024565  7.172113
#> sample estimates:
#> mean of x mean of y 
#>  3.576337  2.502563 
#> 
#> 
#> $`a3 v c2`
#> 
#>  Welch Two Sample t-test
#> 
#> data:  df1[, i[1]] and df1[, i[2]]
#> t = -0.53312, df = 9.4962, p-value = 0.6062
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>  -7.566913  4.661950
#> sample estimates:
#> mean of x mean of y 
#>  1.050081  2.502563 
#> 
#> 
#> $`a1 v c3`
#> 
#>  Welch Two Sample t-test
#> 
#> data:  df1[, i[1]] and df1[, i[2]]
#> t = -2.4041, df = 9.5786, p-value = 0.03807
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>  -7.4463155 -0.2605803
#> sample estimates:
#> mean of x mean of y 
#>  1.963807  5.817255 
#> 
#> 
#> $`a2 v c3`
#> 
#>  Welch Two Sample t-test
#> 
#> data:  df1[, i[1]] and df1[, i[2]]
#> t = -1.3903, df = 9.7868, p-value = 0.1953
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>  -5.843047  1.361209
#> sample estimates:
#> mean of x mean of y 
#>  3.576337  5.817255 
#> 
#> 
#> $`a3 v c3`
#> 
#>  Welch Two Sample t-test
#> 
#> data:  df1[, i[1]] and df1[, i[2]]
#> t = -2.9074, df = 10.432, p-value = 0.015
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>  -8.400133 -1.134214
#> sample estimates:
#> mean of x mean of y 
#>  1.050081  5.817255

^{由reprex package (v0.3.0) 于 2020 年 12 月 10 日创建}

【讨论】：