【发布时间】:2016-04-10 10:00:16
【问题描述】:
我有一个大型数据集。
我想创建一个列来显示每个重复 ID 的开始日期和结束日期(从上一行)之间的天数。 比如R1,因为没有重复,我就不计算区间了。 对于 R2,首先,我需要根据开始日期以递增的方式对其进行排序。然后我计算第二个最早的开始日期和上一行的结束日期之间的天数。接下来我继续计算从第二个最早开始日期到第三个最早开始日期和结束日期之间的天数,以此类推。我也想为任何其他重复的 ID 这样做。
然后我想创建一个新列,以与第一部分相同的方式计算具有相同事件级别的重复 ID 的天数。 我想知道我该怎么做。
ID<-c("R1","R2","R2","R3","R3","R4","R4","R4","R4","R3","R3","R3","R3","R2","R2","R2","R5","R6")
START<-c("3-4-2013","4-5-2018","4-5-2015","4-6-2011","5-5-2012","1-9-2010","23-4-1999","25-6-2011","3-6-2011","4-5-2014",
"6-6-2016","5-7-2014","7-7-1990","3-3-1998","4-4-1990","7-8-2014","22-4-1970","23-5-1984")
End<-c("3-4-2014","4-5-2019","5-5-2015","4-6-2013","5-5-2014","1-9-2012","23-4-2010","25-6-2015","3-6-2013","6-5-2014",
"6-8-2016","5-8-2014","7-9-1990","3-7-1998","4-9-1990","7-12-2014","22-7-1970","23-8-1984")
event<-c("a","b","b","s","s","f","f","b","b","a","a","a","s","c","c","b","m","a")
df<-data.frame(ID,START,End,event)
所以结果会是这样的:
ID START End event Time1 Time2
1 R1 3-4-2013 3-4-2014 a NA NA
14 R2 3-3-1998 3-7-1998 c NA NA
15 R2 4-4-1990 4-9-1990 c (4-4-1990)-(3-7-1998) (4-4-1990)-(3-7-1998)
3 R2 4-5-2015 5-5-2015 b (4-5-2015)-(4-9-1990) NA
2 R2 4-5-2018 4-5-2019 b (4-5-2018)-(5-5-2015) (4-5-2018)-(5-5-2015)
16 R2 7-8-2014 7-12-2014 b (7-8-2014)-(4-5-2019) (7-8-2014)-(4-5-2019)
10 R3 4-5-2014 6-5-2014 a NA NA
4 R3 4-6-2011 4-6-2013 s (4-6-2011)-(6-5-2014) NA
5 R3 5-5-2012 5-5-2014 s (5-5-2012)-(4-6-2013) (5-5-2012)-(4-6-2013)
12 R3 5-7-2014 5-8-2014 a (5-7-2014)-(5-5-2014) (5-7-2014)-(6-5-2014)
11 R3 6-6-2016 6-8-2016 a (6-6-2016)-(5-8-2014) (6-6-2016)-(5-8-2014)
13 R3 7-7-1990 7-9-1990 s (7-7-1990)-(5-5-2014)
6 R4 1-9-2010 1-9-2012 f
7 R4 23-4-1999 23-4-2010 f
8 R4 25-6-2011 25-6-2015 b
9 R4 3-6-2011 3-6-2013 b
17 R5 22-4-1970 22-7-1970 m
18 R6 23-5-1984 23-8-1984 a
>
【问题讨论】:
-
你能告诉我们预期的输出应该是什么样子吗?您是否希望仅在相同事件发生或未发生时才查找已过去的天数?