【发布时间】:2017-04-17 16:33:42
【问题描述】:
我有一个数据集,其中包含一个人离开网络的日期。一个人可以多次离开网络,因为他们可能会在离开网络后再次加入网络。以下代码复制了该场景。
library(data.table)
Leaving_Date<- data.table(Id= c(1,2,3,4,3,5),Date = as.Date(
c("2017-01-01","2017-02-03","2017-01-01","2017-03-10","2017-02-09","2017-02-05")))
(ID 在此表中重复多次,因为一个人可以多次离开网络,只要他们再次加入该网络)
> Leaving_Date
Id Date
1: 1 2017-01-01
2: 2 2017-02-03
3: 3 2017-01-01
4: 4 2017-03-10
5: 3 2017-02-09
6: 5 2017-02-05
我有另一个数据集,给出了跟踪特定人的日期,可以是他们离开网络之前或之后的日期。以下代码复制了该场景。
FOLLOWUPs <- data.table(Id = c(1,2,3,2,2,3,3,4,1,5),
Date =as.Date(c("2016-10-01","2017-02-04",
"2017-01-17","2017-02-23", "2017-03-03",
"2017-02-10","2017-02-11","2017-01-01",
"2017-01-15","2017-01-01")))
> FOLLOWUPs
Id Date
1: 1 2016-10-01
2: 2 2017-02-04
3: 3 2017-01-17
4: 2 2017-02-23
5: 2 2017-03-03
6: 3 2017-02-10
7: 3 2017-02-11
8: 4 2017-01-01
9: 1 2017-01-15
10: 5 2017-01-01
现在我想在 Leaving_Date 中查找每个案例并找到他们被跟进的日期,并创建三列(SevenDay、FourteenDay、ThirtyDay)以 0 和 1 表示跟进的时间段(如果有的话)。我正在使用以下代码:
SEVENDAY_FOLLOWUP <- vector()
FOURTEEN_FOLLOWUP <- vector()
THIRTYDAY_FOLLOWUP <- vector()
for(i in 1:nrow(Leaving_Date)){
sub_data <- FOLLOWUPs[Id== Leaving_Date[i,1]]
if(nrow(sub_data[Date > Leaving_Date[i,Date] &
Date < (Leaving_Date[i,Date]+7)])== 0){
SEVENDAY_FOLLOWUP <- rbind(SEVENDAY_FOLLOWUP,0)
}
else{
SEVENDAY_FOLLOWUP <- rbind(SEVENDAY_FOLLOWUP,1)
}
if(nrow(sub_data[Date > Leaving_Date[i,Date] &
Date < (Leaving_Date[i,Date]+14)])== 0){
FOURTEEN_FOLLOWUP <- rbind(FOURTEEN_FOLLOWUP,0)
}
else{
FOURTEEN_FOLLOWUP <- rbind(FOURTEEN_FOLLOWUP,1)
}
if(nrow(sub_data[Date > Leaving_Date[i,Date] &
Date < (Leaving_Date[i,Date]+30)])== 0){
THIRTYDAY_FOLLOWUP <- rbind(THIRTYDAY_FOLLOWUP,0)
}
else{
THIRTYDAY_FOLLOWUP <- rbind(THIRTYDAY_FOLLOWUP,1)
}
}
Leaving_Date$SEVENDAY <- as.vector(SEVENDAY_FOLLOWUP)
Leaving_Date$FOURTEENDAY <- as.vector(FOURTEEN_FOLLOWUP)
Leaving_Date$THIRTYDAY <- as.vector(THIRTYDAY_FOLLOWUP)
最终数据
> Leaving_Date
Id Date SEVENDAY FOURTEENDAY THIRTYDAY
1: 1 2017-01-01 0 0 1
2: 2 2017-02-03 1 1 1
3: 3 2017-01-01 0 0 1
4: 4 2017-03-10 0 0 0
5: 3 2017-02-09 1 1 1
6: 5 2017-02-05 0 0 0
这段代码效率非常低,因为我必须运行它进行 100k 次观察,而且需要很多时间。有没有有效的方法来做到这一点。
【问题讨论】:
-
您可能想阅读R Inferno的第二个圈子
-
@Frank 我已经编辑过了
标签: r performance loops data.table data-cleaning