【发布时间】:2016-02-19 08:49:24
【问题描述】:
是否允许 .SDcols 随 by 分组变量变化?我有以下情况,我想将.SDcols 更改为每年的不同列。 .SDcols 的值在一个 data.table 中,而我正在尝试使用这些值将函数应用于另一个表中的 .SD。
很可能我错过了明显的方法并且做错了,但这就是我正在尝试的,
## Contains the .SDcols applicable to each year
dat1 <- data.table(
year = 1:4,
vals = lapply(1:4, function(i) letters[1:i])
)
## Make the sample data (with NAs)
set.seed(1775)
dat2 <- data.table( year = sample(1:4, 10, TRUE) )
dat2[, letters[1:4] := replicate(4, sample(c(NA, 1:5), 10, TRUE), simplify=FALSE)]
## Goal: Sum up the columns in the corresponding .SDcols for each year
## Attempt, doesn't work -- I think b/c .SDcols must be fixed?
dat2[, SUM := rowSums(.SD, na.rm=TRUE), by=year,
.SDcols=unlist(dat1[year == .BY[[1]], vals])]
## Desired result, by simply iterating through each possible year
for (i in 1:4) {
dat2[year==i, SUM := rowSums(.SD, na.rm=TRUE),
.SDcols=unlist(dat1[year == i, vals])]
}
dat2[]
# year a b c d SUM
# 1: 1 3 1 5 1 3
# 2: 2 1 3 3 1 4
# 3: 1 5 4 3 NA 5
# 4: 4 1 NA 4 5 10
# 5: 2 2 2 2 NA 4
# 6: 2 NA 3 3 NA 3
# 7: 4 2 3 2 NA 7
# 8: 1 2 NA 5 4 2
# 9: 2 3 3 5 1 6
# 10: 3 NA 4 2 NA 6
【问题讨论】:
标签: r data.table