对列表中数据框列的子集应用简洁函数答案

【问题标题】：Apply succinct function over subsets of columns of data frames in a list对列表中数据框列的子集应用简洁函数
【发布时间】：2018-04-19 05:26:37
【问题描述】：

我有一个名为“n.l.df”的数据框（12 列，8 行）列表（28 项）。统计信息需要在每个数据帧内分别对列 1:3、4:6、7:9、10:12 逐行应用。我正在遍历列表，通过执行以下操作计算统计信息：

library(tidyverse)
avgs <- n.l.df
avgs <- lapply(avgs, function(x) {
x[1,1] <-mean(as.numeric(x[1,1:3]))
x[2,1] <-mean(as.numeric(x[2,1:3]))
x[3,1] <-mean(as.numeric(x[3,1:3]))
x[4,1] <-mean(as.numeric(x[4,1:3]))
x[5,1] <-mean(as.numeric(x[5,1:3]))
x[6,1] <-mean(as.numeric(x[6,1:3]))
x[7,1] <-mean(as.numeric(x[7,1:3]))
x[8,1] <-mean(as.numeric(x[8,1:3]))
x[1,4] <-mean(as.numeric(x[1,4:6]))
x[2,4] <-mean(as.numeric(x[2,4:6]))
x[3,4] <-mean(as.numeric(x[3,4:6]))
x[4,4] <-mean(as.numeric(x[4,4:6]))
x[5,4] <-mean(as.numeric(x[5,4:6]))
x[6,4] <-mean(as.numeric(x[6,4:6]))
x[7,4] <-mean(as.numeric(x[7,4:6]))
x[8,4] <-mean(as.numeric(x[8,4:6]))
x[1,7] <-mean(as.numeric(x[1,7:9]))
x[2,7] <-mean(as.numeric(x[2,7:9]))
x[3,7] <-mean(as.numeric(x[3,7:9]))
x[4,7] <-mean(as.numeric(x[4,7:9]))
x[5,7] <-mean(as.numeric(x[5,7:9]))
x[6,7] <-mean(as.numeric(x[6,7:9]))
x[7,7] <-mean(as.numeric(x[7,7:9]))
x[8,7] <-mean(as.numeric(x[8,7:9]))
x[1,10] <-mean(as.numeric(x[1,10:12]))
x[2,10] <-mean(as.numeric(x[2,10:12]))
x[3,10] <-mean(as.numeric(x[3,10:12]))
x[4,10] <-mean(as.numeric(x[4,10:12]))
x[5,10] <-mean(as.numeric(x[5,10:12]))
x[6,10] <-mean(as.numeric(x[6,10:12]))
x[7,10] <-mean(as.numeric(x[7,10:12]))
x[8,10] <-mean(as.numeric(x[8,10:12]))
return(x)
})

这很好用，我可以在需要时删除第 2、3、5、6、8、9、11 和 12 列中不必要的值。我喜欢我不必将数据帧收集成长格式并将其保留为数据帧列表是可取的。

显然，这种方式太重复了，我认为必须有一种方法可以进行嵌套的 lapply/apply，但这超出了我的水平。如何简化和缩短此代码？

谢谢。

【问题讨论】：

标签： r dataframe iteration mean tidyverse

【解决方案1】：

library(tidyverse)

# For reproducibility
set.seed(100)

# list of 28 random data frames
df_list <- rerun(28, data.frame(replicate(12,sample(1:100,8))))

# Use map to iterate over the list, using rowMeans and select to get means of select columns.
map(df_list, ~mutate(., rm_1_3 = rowMeans(select(., 1:3)),
                           rm_4_6 = rowMeans(select(., 4:6)),
                           rm_7_9 = rowMeans(select(., 7:9)),
                           rm_10_12 = rowMeans(select(., 10:12))))


[[1]]
  X1 X2 X3 X4 X5 X6 X7 X8  X9 X10 X11 X12   rm_1_3   rm_4_6   rm_7_9 rm_10_12
1 31 55 21 43 35 34 21 13  45  58  46  31 35.66667 37.33333 26.33333 45.00000
2 26 17 36 17 95 86 31 23  36  96  60  73 26.33333 66.00000 30.00000 76.33333
3 55 62 99 76 69 77 33 59 100  65  91  89 72.00000 74.00000 64.00000 81.66667
4  6 86 67 86 87 81 20 21  44  61  96  21 53.00000 84.66667 28.33333 59.33333
5 45 27 52 53 18 58 23 45  24  83   4  35 41.33333 43.00000 30.66667 40.66667
6 46 38 68 27 60 47 27 62  66  74  55  43 50.66667 44.66667 51.66667 57.33333
7 77 72 51 46 94 74 56 91  39  79  69  86 66.66667 71.33333 62.00000 78.00000
8 35 63 70 87 13 83 24 63  31   9  24  37 56.00000 61.00000 39.33333 23.33333

这将为您提供 28 个数据框的列表，每个数据框添加 4 列统计信息。如果您只是想要手段，请将transmute 替换为mutate

【讨论】：

谢谢，杰克！我不知道地图功能。一旦我将我的对象强制转换为数字数据框，它就可以很好地工作。
它专门来自 'purrr`，这是一套 tidyverse 函数式编程工具。我建议您阅读它。