do.call(rbind, ...) 是否有更高阶的替换？答案

【问题标题】：Is there a higher order replacement for do.call(rbind, ...)?do.call(rbind, ...) 是否有更高阶的替换？
【发布时间】：2014-08-28 16:44:24
【问题描述】：

考虑以下数据框A

A <- data.frame(ID = c(1,1,1,2,2,2), num = c(6,2,8,3,3,1))

对于A，我想拆分ID，然后计算num 的差值。可以（几乎）获得所需的结果

do.call(rbind, Map(function(x) { x$new <- c(diff(x$num), NA); x }, 
                   split(A, A$ID)))
#     ID num new
# 1.1  1   6  -4
# 1.2  1   2   6
# 1.3  1   8  NA
# 2.4  2   3   0
# 2.5  2   3  -2
# 2.6  2   1  NA

do.call(rbind, ...) 在 R 用户中广受欢迎已不是什么秘密。但是在?Map 页面（Reduce、Filter 等）上使用更高阶的函数式编程函数，我认为可能有一些我不知道的东西可以替代do.call(rbind, ...)，这也将重置过程中的行名。我尝试了以下方法。

> Reduce(function(x) { x$new <- c(diff(x$num), NA); x }, Map, split(A, A$ID))
# Error in f(init, x[[i]]) : unused argument (x[[i]])
> Reduce(function(x) { x$new <- c(diff(x$num), NA); x }, split(A, A$ID))
# Error in f(init, x[[i]]) : unused argument (x[[i]])
> Reduce(Map(function(x) { x$new <- c(diff(x$num), NA); x }, split(A, A$ID)))
# Error in Reduce(Map(function(x) { : 
#   argument "x" is missing, with no default

得到我想要的确切结果

> M <- do.call(rbind, Map(function(x) { x$new <- c(diff(x$num), NA); x }, 
                          split(A, A$ID)))
> rownames(M) <- NULL
> M
#   ID num new
# 1  1   6  -4
# 2  1   2   6
# 3  1   8  NA
# 4  2   3   0
# 5  2   3  -2
# 6  2   1  NA

有没有高阶函数可以代替do.call(rbind, ...)，同时合并rownames(x) <- NULL？

注意：我真的在寻找与?Map 相关的答案，但我对其他人持开放态度。

【问题讨论】：

你可以 Reduce(rbind, Map...) 但为什么不直接使用 ave 或 aggregate（或在它们上包装一个函数）隐藏 split、lapply 和 _bind？
@alexis_laz - Reduce(rbind, Map...) 是我正在寻找或在这个问题中的具体答案。
你们中的一个人应该将其发布为答案。
我希望@alexis_laz 发布它。这不是我的答案。他应该得到荣誉。

标签： r

【解决方案1】：

你可以从“data.table”中查看rbindlist：

library(data.table)

rbindlist(Map(function(x) { 
  x$new <- c(diff(x$num), NA)
  x}, split(A, A$ID)))
#    ID num new
# 1:  1   6  -4
# 2:  1   2   6
# 3:  1   8  NA
# 4:  2   3   0
# 5:  2   3  -2
# 6:  2   1  NA

不过，纯粹的“data.table”方法更加直接：

DT <- as.data.table(A)

DT[, new := c(diff(num), NA), by = ID][]
#    ID num new
# 1:  1   6  -4
# 2:  1   2   6
# 3:  1   8  NA
# 4:  2   3   0
# 5:  2   3  -2
# 6:  2   1  NA

【讨论】：

你愿意解释一下[]最后在做什么吗...？（其余的对我来说很有意义。）
@BenBolker，这只是打印结果。

【解决方案2】：

可以说这种拆分-应用-组合方法是plyr 的全部意义所在。不在基数 R 中，但实际上是“高阶”。

library("plyr")
ddply(A,"ID",transform,new=c(diff(num),NA))

dplyr 版本（显然transform 不是dplyr-aware：必须使用mutate 代替...）

library("dplyr")
A %>% group_by("ID") %>% 
     mutate(new=c(diff(num),NA))

【讨论】：

这些很棒。我一直采取“如果不需要，为什么要加载包”的方法。但如今，不利用所有优秀的软件包几乎是愚蠢的。也许我应该将问题更新为 给我你所有的do.call(rbind, ...) 替换。