【发布时间】:2020-08-29 17:06:20
【问题描述】:
data.table中的以下数据框
df <- data.table (id=c(1,1,2,2,3,3,4,4),
date=c("2013-11-22","2017-01-24","2017-06-24","2020-02-10","2011-01-03","2013-11-24","2015-01-24","2017-08-24"),
status=c("Former","Current","Former","Never","Current",NA,"Current","Former"))
df
id date status
1: 1 2013-11-22 Former
2: 1 2017-01-24 Current
3: 2 2017-06-24 Former
4: 2 2020-02-10 Never
5: 3 2011-01-03 Current
6: 3 2013-11-24 <NA>
7: 4 2015-01-24 Current
8: 4 2017-08-24 Former
我想使用以下逻辑为每个 id 创建一个唯一的行。应保留最新的date。如果最晚日期的status 是<NA> 或Never 并且还有一个更早日期的status,则应保留具有更早日期的行。
我用以下函数解决了这个问题:
unique1 <- df[df$status %in% c("Former","Current"),]
unique1 <- unique1[,.SD[which.max(anydate(date))],by=.(id)]
unique_final <- unique(df[order(id,ordered(status,c("Former","Current","Never",NA)))],by='id')
unique_final[match(unique1$id,unique_final$id),]<-unique1
并得到这些结果
id date status
1: 1 2017-01-24 Current
2: 2 2017-06-24 Former
3: 3 2011-01-03 Current
4: 4 2017-08-24 Former
有没有办法将这两个逻辑子集步骤结合起来?我想避免创建一个新的数据框而不是匹配它们。
我正在使用data.table,对于更大的数据集的解决方案会很棒。
谢谢!
【问题讨论】:
标签: r data.table unique