【发布时间】:2021-11-16 06:09:14
【问题描述】:
我想在我拥有的数据集中创建一个新的标识符列。
ex <- structure(list(id = c("8210109300002", "8210109300002", "8210109300002",
"8210109300002", "8210109300002", "8210109300002", "8210109300002",
"8210109300002", "8210109300002"), serv_from_dt = structure(c(18262,
18263, 18267, 18267, 18268, 18269, 18269, 18275, 18276), class = "Date"),
serv_to_dt = structure(c(18262, 18263, 18267, 18267, 18268,
18269, 18269, 18275, 18276), class = "Date"), date_plus1 = structure(c(18263,
18264, 18268, 18268, 18269, 18270, 18270, 18276, 18277), class = "Date")),
row.names = c(NA, -9L), class = c("data.table", "data.frame"))
此标识符将基于 serv_to_date、serv_from_date 和 date_plus1 列。数据按 serv_from_date 排序;如果下一行的 ser_to_date 等于上一行的 serv_from_date 或 serv_to_date 等于上一行的 serv_from_date+1(即 date_plus1 列),则用 1 个标识符标记这些行。
我想要的最终输出是:
want <- structure(list(id = c("8210109300002", "8210109300002", "8210109300002",
"8210109300002", "8210109300002", "8210109300002", "8210109300002",
"8210109300002", "8210109300002"), serv_from_dt = structure(c(18262,
18263, 18267, 18267, 18268, 18269, 18269, 18275, 18276), class = "Date"),
serv_to_dt = structure(c(18262, 18263, 18267, 18267, 18268,
18269, 18269, 18275, 18276), class = "Date"), date_plus1 = structure(c(18263,
18264, 18268, 18268, 18269, 18270, 18270, 18276, 18277), class = "Date"),
identifier = c("1", "1", "2",
"2", "2", "2", "2",
"3", "3")), row.names = c(NA, -9L), class = c("data.table", "data.frame"))
我的第一步是创建一个列,用前一行的日期标识滞后日期:
ex %>%
mutate(NewCol = ifelse((lag(serv_from_dt) == date_plus1 | lag(serv_from_dt) == serv_to_dt), "yes", "no"))
但是,此代码没有正确地对匹配上一行的 date_plus1 的 serv_from_date 说“是”。
提前感谢您提供的任何帮助!
【问题讨论】:
标签: r dataframe if-statement data.table tidyverse