按变量组聚合列答案

【问题标题】：Aggregate columns by variable group按变量组聚合列
【发布时间】：2020-07-19 19:08:35
【问题描述】：

我正在使用 R 并且我想根据它们的组对列进行汇总，因此在这个示例中，而不是十列，我有三列 high，medium 和 low 及其汇总值。如果这些是行，我会使用aggregate，但我不知道如何处理列。

set.seed(4)
a<-matrix(runif(40),ncol=10,nrow=4)
colnames(a)<-letters[1:10]
a
               a         b          c         d         e
[1,] 0.585800305 0.8135742 0.94904022 0.1000535 0.9710557
[2,] 0.008945796 0.2604278 0.07314447 0.9540688 0.5839880
[3,] 0.293739612 0.7244059 0.75467503 0.4156071 0.9622046
[4,] 0.277374958 0.9060922 0.28600062 0.4551024 0.7617024
             f         g         h         i           j
[1,] 0.7145085 0.6491614 0.5137017 0.8779959 0.460025911
[2,] 0.9966129 0.8308064 0.5297775 0.6545220 0.622056487
[3,] 0.5062709 0.4819990 0.5671122 0.4823709 0.388418035
[4,] 0.4899432 0.8417462 0.2389489 0.9710298 0.006592727

type<-c("high","high","low","high","medium","high","medium","high","low","low")

【问题讨论】：

类似Row-wise sum of values grouped by columns with same name？ IE。 t(rowsum(t(a), type))。也许您首先需要将“类型”转换为factor 并定义levels 的所需顺序：type = factor(type, levels = c("high", "medium", "low"))。

标签： r aggregate

【解决方案1】：

我们可以复制type 并在tapply 中使用它

tapply(a, type[col(a)], FUN = sum)
#    high       low    medium 
#10.352068  6.525872  6.082664

或者如果它是按行的

sapply(split(seq_along(type), type), function(i) rowSums(a[, i]))
#         high      low   medium
#[1,] 2.727638 2.287062 1.620217
#[2,] 2.749833 1.349723 1.414794
#[3,] 2.507136 1.625464 1.444204
#[4,] 2.367462 1.263623 1.603449

或者稍微紧凑一些

sapply(split.default(as.data.frame(a), type), rowSums)

或使用aggregate

aggregate(Freq ~ ., as.data.frame.table(`colnames<-`(a, type)), FUN = sum)

或使用split 将数据拆分为一个list 向量并循环遍历list 以返回sum

sapply(split(a, type[col(a)]), sum)
#    high       low    medium 
#10.352068  6.525872  6.082664

【讨论】：