通过循环在数据框中的特定列上运行计算答案

【问题标题】：Run calculations on specific columns in a dataframe by looping通过循环在数据框中的特定列上运行计算
【发布时间】：2018-01-23 19:41:55
【问题描述】：

以下是数据示例：

  Test.Statistic          P     FDR_P Bonferroni_P Control_mean NH4._mean
1       8.203199 0.01654619 0.7405529            1         0.00  0.000000
2       7.622793 0.02211727 0.7405529            1         0.00  1.095238
3       7.501205 0.02350357 0.7405529            1         2.10  1.761905
4       6.510000 0.03858082 0.7405529            1         0.85  0.000000
5       6.149339 0.04620490 0.7405529            1         0.65  5.095238
6       6.052381 0.04850005 0.7405529            1         0.00  1.428571
  NO3._mean
1 0.4285714
2 1.1904762
3 1.1428571
4 0.0000000
5 3.4285714
6 0.0000000

我想将公式 (trt_mean/control_mean)-1 应用于每个处理列（NH4 和 NO3）。我合并了一些 cmets，但仍然无法在 dt 中调用第 1 列（control_mean）。

dt <- as.data.frame.table(kw_res)
cols <- grep("_mean", colnames(dt))
rel_abund_function <- function(z) {
  return((z / z[, 1])-1)
}

dt[, lapply(cols, rel_abund_function)]

有什么建议吗？

【问题讨论】：

你能发布一些示例数据吗？
请提供一个可重现的示例供我们使用并发布该示例的预期输出。您可以使用dput(head(df)) 显示您的一些数据。

标签： r loops dataframe subset

【解决方案1】：

可能是这样的：

library(data.table)
dt <- as.data.table(mtcars)
colnames(dt) <- c(sapply(1:5, function(z) paste("ctrl",z,sep="")),
                  sapply(1:5, function(z) paste("treatment",z,"_mean", sep="")), 
                  "rawval")

数据

> head(dt)
   ctrl1 ctrl2 ctrl3 ctrl4 ctrl5 treatment1_mean treatment2_mean treatment3_mean treatment4_mean treatment5_mean rawval
1:  21.0     6   160   110  3.90           2.620           16.46               0               1               4      4
2:  21.0     6   160   110  3.90           2.875           17.02               0               1               4      4
3:  22.8     4   108    93  3.85           2.320           18.61               1               1               4      1
4:  21.4     6   258   110  3.08           3.215           19.44               1               0               3      1
5:  18.7     8   360   175  3.15           3.440           17.02               0               0               3      2
6:  18.1     6   225   105  2.76           3.460           20.22               1               0               3      1

代码

在此示例中获取具有特定名称格式 (_mean) 的列，并应用自定义函数：

cols <- grep("_mean", colnames(dt))
my_mean_func <- function(z){
  return((z-mean(z))/100)
}

dt[, lapply(.SD, my_mean_func), .SDcols = cols]

输出

> head(dt[, lapply(.SD, my_mean_func), .SDcols = cols])
   treatment1_mean treatment2_mean treatment3_mean treatment4_mean treatment5_mean
1:      -0.0059725      -0.0138875       -0.004375       0.0059375        0.003125
2:      -0.0034225      -0.0082875       -0.004375       0.0059375        0.003125
3:      -0.0089725       0.0076125        0.005625       0.0059375        0.003125
4:      -0.0000225       0.0159125        0.005625      -0.0040625       -0.006875
5:       0.0022275      -0.0082875       -0.004375      -0.0040625       -0.006875
6:       0.0024275       0.0237125        0.005625      -0.0040625       -0.006875

【讨论】：

我认为我在正确的轨道上，你能帮我解决调用 control_mean 列的问题吗？
@Becca 我可以尝试提供帮助，但是您能否提供一个适当的示例来说明您所看到的以及您想要输出的内容？目前还不是很清楚。另外，原始数据请使用dput(data_frame_name) - 更容易读入R。对于输出，如果你想粘贴它，没关系。