如果我理解正确,OP 希望为每个 ID 创建一个自己的时间尺度,其中第一年分别从 startdate 开始。最后,将结果汇总到各个年份。
下面的代码结合了data.table 用于分组和聚合的语法和magrittr 用于算术的管道。 (顺便说一句,利用magrittr 的大量管道函数是一个很好的练习。)
library(data.table)
library(magrittr)
cml_by_ID <- DT[, {
cmltime <- seq(min(startdate), max(enddate), by = "year") %>%
append(max(enddate)) %>%
diff() %>%
as.numeric() %>%
divide_by(365.25) %>%
round(1)
year <- cmltime %>% seq_along()
cmlevent <- year %>%
is_in(
eventdate %>%
subtract(min(startdate)) %>%
as.numeric() %>%
divide_by(365.25) %>%
ceiling()
) %>%
as.integer()
list(year = year, cmltime = cmltime, cmlevent = cmlevent)
},
by = ID]
核心思想是创建以startdate 开头的年度日期序列,附加enddate 并计算以年为单位的差异。所以,cmltime 包含每年观察时间贡献的份额(除去年外,大部分为 1.0)。 year 与 cmltime 一起计算。最后,确定事件发生的年份。 cmlevent 在那些年被设置为 1。
每个ID 的结果合并到一个数据对象中
cml_by_ID
ID year cmltime cmlevent
1: 1 1 1.0 0
2: 1 2 1.0 0
3: 1 3 1.0 0
4: 1 4 1.0 0
5: 1 5 1.0 1
6: 1 6 1.0 1
7: 1 7 1.0 0
8: 1 8 1.0 0
9: 1 9 1.0 0
10: 1 10 1.0 1
11: 1 11 1.0 0
12: 1 12 1.0 0
13: 1 13 1.0 0
14: 1 14 1.0 0
15: 1 15 0.1 0
16: 2 1 1.0 0
17: 2 2 1.0 0
18: 2 3 1.0 1
19: 2 4 1.0 0
20: 2 5 1.0 0
21: 2 6 0.2 0
22: 3 1 1.0 0
23: 3 2 1.0 0
24: 3 3 0.2 1
ID year cmltime cmlevent
最后,每年汇总一下:
cml_by_ID[, lapply(.SD, sum), .SDcols = c("cmltime", "cmlevent"), by = year]
year cmltime cmlevent
1: 1 3.0 0
2: 2 3.0 0
3: 3 2.2 2
4: 4 2.0 0
5: 5 2.0 1
6: 6 1.2 1
7: 7 1.0 0
8: 8 1.0 0
9: 9 1.0 0
10: 10 1.0 1
11: 11 1.0 0
12: 12 1.0 0
13: 13 1.0 0
14: 14 1.0 0
15: 15 0.1 0
数据
library(data.table)
DT <- fread(
"ID eventdate startdate enddate timeyrs eventyr
1 20-10-2007 16-06-2003 21-07-2017 14.1 4.34
1 11-11-08 16-06-2003 21-07-2017 14.1 5.41
1 26-09-2012 16-06-2003 21-07-2017 14.1 9.28
2 11-05-2014 20-04-2012 16-06-2017 5.2 2.06
3 11-04-2017 6-02-2015 21-04-2017 2.2 2.18",
select = 1:4
)
# convert date strings to Date class
cols <- names(DT)[names(DT) %like% "date$"]
DT[, (cols) := lapply(.SD, lubridate::dmy), .SDcols = cols]
DT
ID eventdate startdate enddate
1: 1 2007-10-20 2003-06-16 2017-07-21
2: 1 2008-11-11 2003-06-16 2017-07-21
3: 1 2012-09-26 2003-06-16 2017-07-21
4: 2 2014-05-11 2012-04-20 2017-06-16
5: 3 2017-04-11 2015-02-06 2017-04-21