【发布时间】:2017-02-11 16:39:45
【问题描述】:
我有一个包含 200 多个变量的数据框(下面是一个示例):
| x | P | Var1_mean | Var2_mean | Var3_mean | Var1_sd | Var2_sd | Var3_sd
------------------------------------------------------------------------------
1 | A | P1 | 100 | 50.47 | 298.2 | 2.33 | 0.04 | 8.77
2 | A | P2 | 98 | 18 | 350.33 | 2.32 | 0.04 | 10.3
3 | B | P1 | 100 | 30.93 | 152.73 | 2.33 | 0.04 | 4.49
4 | B | P2 | 100 | 25.33 | 237.67 | 2.33 | 0.04 | 6.99
5 | C | P1 | 99.9 | 25.07 | 184.93 | 2.32 | 0.04 | 5.44
6 | C | P2 | 100 | 18.33 | 132.33 | 2.32 | 0.04 | 3.89
每个变量都有参考周期 P1 和测量周期 P2 的 N 个观测值(A、B、C 等)。
我希望为每个观察结果计算每个变量的两个周期之间的差异,并将其除以参考周期的标准差。
使用上面的例子:
df <- data.frame(x=c("A","A","B","B","C","C"),
P=c("P1","P2","P1","P2","P1","P2"),
Var1_mean=c(100.0,98,100.0,100.0,99.9,100.0),
Var2_mean = c(50.47,18,30.93,25.33,25.07,18.33),
Var3_mean = c(298.2,350.33,152.73,237.67,184.93,132.33),
Var1_sd = c(2.33,2.32,2.33,2.33,2.32,2.32),
Var2_sd = c(0.04,0.04,0.04,0.04,0.04,0.04),
Var3_sd = c(8.77,10.3,4.49,6.99,5.44,3.89))
Z.A.Var1 <- (df$Var1_mean[df$x=="A" & df$P=="P1"] - df$Var1_mean[df$x=="A" & df$P=="P2"])
/ df$Var1_sd[df$x=="A" & df$P=="P1"]
Z.A.Var2 <- (df$Var2_mean[df$x=="A" & df$P=="P1"] - df$Var2_mean[df$x=="A" & df$P=="P2"])
/ df$Var2_sd[df$x=="A" & df$P=="P1"]
等等。
我可以使用“for”循环进行计算,扫描观察结果和变量,但运行起来会很麻烦且速度很慢。
是否有人对如何以更智能的方式执行此操作提出建议,例如使用 dplyr os 类似的东西?
【问题讨论】:
标签: r