【发布时间】:2016-08-20 01:44:55
【问题描述】:
我要检查数据的重叠,这里是数据
ID <- c(rep(1,3), rep(3, 5), rep(4,4),rep(5,5))
Begin <- c(0,2.5,3,7,8,7,25,25,10,15,17,20,1,NA,10,11,13)
End <- c(1.5,3.5,6,12,8,11,29,35, 12,19,NA,28,5,20,30,20,25)
df <- data.frame(ID, Begin, End)
df
ID Begin End
1 1 0.0 1.5
2 1 2.5 3.5
3 1 3.0 6.0*
4 3 7.0 12.0
5 3 8.0 8.0*
6 3 7.0 11.0*
7 3 25.0 29.0
8 3 25.0 35.0*
9 4 10.0 12.0
10 4 15.0 19.0
11 4 17.0 NA*
12 4 20.0 28.0
13 5 1.0 5.0
14 5 NA 20.0
15 5 10.0 30.0
16 5 11.0 20.0*
17 5 13.0 25.0*
* 表示重叠:
- 对于第 3 行,ID = 1,Begin=3.0 小于 3.5,所以设置 Begin_New = 3.5,但是
- 虽然 ID = 3,但不一样,第 5 行 Begin = 8.0 小于 12.0,我们设置 Begin_New = 12,但它继续运行,如果我们将 Begin = 7.0 与 End = 8.0 进行比较,这是不正确的,因为现在 End是 12 是更高的下一个值。
这是我的输出设计
ID Begin End Begin_New1
1 1 0.0 1.5 0.0
2 1 2.5 3.5 2.5
3 1 3.0 6.0 3.5*
4 3 7.0 12.0 7.0
5 3 8.0 8.0 12.0*
6 3 7.0 11.0 12.0*
7 3 25.0 29.0 25.0
8 3 25.0 35.0 29.0*
9 4 10.0 12.0 10.0
10 4 15.0 19.0 15.0
11 4 17.0 NA 19.0*
12 4 20.0 28.0 20.0
13 5 1.0 5.0 1.0
14 5 NA 20.0 NA
15 5 10.0 30.0 20.0*
16 5 11.0 20.0 30.0*
17 5 13.0 25.0 30.0*
当我使用这段代码时,我没有得到我想要的输出,它只移动 1 行并比较每一行
setDT(df)[, Begin_New := shift(End), by = ID][!which(Begin < Begin_New), Begin_New:= Begin]
ID Begin End Begin_New
1: 1 0.0 1.5 0.0
2: 1 2.5 3.5 2.5
3: 1 3.0 6.0 3.5
4: 3 7.0 12.0 7.0
5: 3 8.0 8.0 12.0
6: 3 7.0 11.0 8.0
7: 3 25.0 29.0 25.0
8: 3 25.0 35.0 29.0
9: 4 10.0 12.0 10.0
10: 4 15.0 19.0 15.0
11: 4 17.0 NA 19.0
12: 4 20.0 28.0 20.0
13: 5 1.0 5.0 1.0
14: 5 NA 20.0 NA
15: 5 10.0 30.0 20.0
16: 5 11.0 20.0 30.0
17: 5 13.0 25.0 20.0
这是我不想要的输出
【问题讨论】:
-
我再添加一行 setDT(df)[, Begin_New1 := shift(Begin_New), by = ID][!which(Begin_New
标签: r data.table lag shift