【发布时间】:2023-03-31 20:54:01
【问题描述】:
我有以下问题: 我有一个如下所示的数据集:
library(data.table)
dt <-
data.table(
student = c(rep(1, 8), rep(2, 8)),
year = rep(2001:2008, 2),
track = c(rep("Highschool", 3), rep("Vocational", 2), rep("Uni", 1), rep("Vocational", 2),
rep("Vocational", 2), rep("Highschool", 4), rep("Vocational", 2))
)
# student year track
# 1: 1 2001 Highschool
# 2: 1 2002 Highschool
# 3: 1 2003 Highschool
# 4: 1 2004 Vocational
# 5: 1 2005 Vocational
# 6: 1 2006 Uni
# 7: 1 2007 Vocational
# 8: 1 2008 Vocational
# 9: 2 2001 Vocational
#10: 2 2002 Vocational
#11: 2 2003 Highschool
#12: 2 2004 Highschool
#13: 2 2005 Highschool
#14: 2 2006 Highschool
#15: 2 2007 Vocational
#16: 2 2008 Vocational
如您所见,数据按时间顺序跟踪学生在特定年份接受的教育类型。 我想为跟踪学生注册的程序类型的数量分配唯一标识符,同时保持时间顺序。 因此,我希望我的 data.table 看起来像这样:
dt[, tracker := c(rep(1, 3), rep(2, 2), rep(3, 1), rep(4, 2),
rep(1, 2), rep(2, 4), rep(3, 2))]
# student year track tracker
# 1: 1 2001 Highschool 1
# 2: 1 2002 Highschool 1
# 3: 1 2003 Highschool 1
# 4: 1 2004 Vocational 2
# 5: 1 2005 Vocational 2
# 6: 1 2006 Uni 3
# 7: 1 2007 Vocational 4
# 8: 1 2008 Vocational 4
# 9: 2 2001 Vocational 1
#10: 2 2002 Vocational 1
#11: 2 2003 Highschool 2
#12: 2 2004 Highschool 2
#13: 2 2005 Highschool 2
#14: 2 2006 Highschool 2
#15: 2 2007 Vocational 3
#16: 2 2008 Vocational 3
我现在想出了以下解决方案:
dt[, helper := ifelse(shift(track) == track, 0, 1)]
dt[1, helper := 0]
dt[, tracker := cumsum(helper) + 1, by = "student"]
dt
# student year track helper tracker
# 1: 1 2001 Highschool 0 1
# 2: 1 2002 Highschool 0 1
# 3: 1 2003 Highschool 0 1
# 4: 1 2004 Vocational 1 2
# 5: 1 2005 Vocational 0 2
# 6: 1 2006 Uni 1 3
# 7: 1 2007 Vocational 1 4
# 8: 1 2008 Vocational 0 4
# 9: 2 2001 Vocational 0 1
#10: 2 2002 Vocational 0 1
#11: 2 2003 Highschool 1 2
#12: 2 2004 Highschool 0 2
#13: 2 2005 Highschool 0 2
#14: 2 2006 Highschool 0 2
#15: 2 2007 Vocational 1 3
#16: 2 2008 Vocational 0 3
现在我想知道:使用 data.table/dplyr/base 语法是否有更“直接”的方式来实现我的目标?
【问题讨论】:
标签: r dplyr data.table