data.table 按联接表中的非 id 列分组答案

【问题标题】：data.table group by non-id column in joined tabledata.table 按联接表中的非 id 列分组
【发布时间】：2016-05-17 19:57:30
【问题描述】：

考虑以下两个data.tables：

x <- data.table(id=c(1,2,3,4),cost=c(0.7,0.2,0.5,0.9))
y <- data.table(id=c(1,2,3,4),group=c(1,2,1,2))
setkey(x,id)
setkey(y,id)

我想通过减去按 y 中的组分组的平均值来标准化成本。

我的尝试如下，但是，R给出了一个错误，它找不到'group'：

x[y,cost:=(cost-mean(cost)),by=.(group)]

有没有不向 x 添加列的情况下执行此操作的好方法？

【问题讨论】：

你写的should work，现在还没有。
我想现在你可以做x[y, group := i.group][, cost := cost - mean(cost), by = group]。或者这个x[y, cost := cost - ave(cost, i.group)]
谢谢大卫，后者似乎对我很有效。
但请记住，它会比前者慢 X4（我刚刚进行了基准测试）
@jangorecki OP 要求提供一种无需创建新列的方法...虽然 ave 解决方案显然不是惯用的，因为它慢了 X4 倍。正如 eddi 所指出的，目前似乎没有解决方案。

标签： r data.table

【解决方案1】：

这对你有用吗？

output <- y[x][, normcost:=(cost-mean(cost)), by=group]

output 
#    id group cost normcost
# 1:  1     1  0.7     0.10
# 2:  2     2  0.2    -0.35
# 3:  3     1  0.5    -0.10
# 4:  4     2  0.9     0.35

【讨论】：