【发布时间】:2016-10-27 11:22:03
【问题描述】:
我想从signal == 1 的每次运行信号开始计算cumsum 的某个值。
示例数据:
set.seed(123)
df <- data.frame(Date = seq.Date(as.Date('2016-09-01'),as.Date('2016-09-30'),by = 'days'),
value = sample(1:10,size=30,replace = TRUE),
signal = c(rep(0,3),rep(1,2),rep(0,1),rep(1,5),rep(0,6),rep(1,3),rep(0,5),rep(1,5)))
> head(df,12)
Date value signal
1 2016-09-01 10 0
2 2016-09-02 10 0
3 2016-09-03 7 0
4 2016-09-04 8 1
5 2016-09-05 1 1
6 2016-09-06 5 0
7 2016-09-07 8 1
8 2016-09-08 3 1
9 2016-09-09 4 1
10 2016-09-10 3 1
11 2016-09-11 2 1
12 2016-09-12 5 0
到目前为止我做了什么:
我的解决方案正在运行,但我认为使用dplyr 或data.table 有一种更高效、更优雅的方法。
df$pl <- rep(0,length(df))
# calculating the indices of start/end of runs where signal == 1
runs <- rle(df$signal)
start <- cumsum(runs$lengths) +1
start <- start[seq(1, length(start), 2)]
end <- cumsum(runs$lengths)[-1]
end <- end[seq(1, length(end), 2)]
for(i in 1:length(start))
{
df$pl[start[i]:end[i]] <- cumsum(df$value[start[i]:end[i]])
}
> head(df,12)
Date value signal pl
1 2016-09-01 10 0 0
2 2016-09-02 10 0 0
3 2016-09-03 7 0 0
4 2016-09-04 8 1 8
5 2016-09-05 1 1 9
6 2016-09-06 5 0 0
7 2016-09-07 8 1 8
8 2016-09-08 3 1 11
9 2016-09-09 4 1 15
10 2016-09-10 3 1 18
11 2016-09-11 2 1 20
12 2016-09-12 5 0 0
【问题讨论】:
-
setDT(df)[,pl:=cumsum(value), rleid(signal)][signal==0, pl:=0]
标签: r data.table dplyr