【发布时间】:2021-01-09 00:29:18
【问题描述】:
我有一个数据框,其中包含来自 Zoom 活动的出勤数据,其中包含电子邮件地址、加入时间和离开时间。许多与会者登录、注销,然后重新登录,因此以多行表示。我想计算与会者登录的总分钟数。在检查数据时,我注意到一个人的时间间隔重叠(见下例中的 email3),我希望能够识别数据集中的任何其他人就是这种情况。
这是一个示例数据框,其中包含所需的新列“重叠”:
structure(list(Email= c("email1@gmail.com", "email2@gmail.com", "email2@gmail.com", "email3@gmail.com",
"email3@gmail.com", "email3@gmail.com"), Join.Time = structure(c(as.POSIXct("2020-12-09 13:04:00"),
as.POSIXct("2020-12-09 13:20:00"), as.POSIXct("2020-12-09 13:30:00"),as.POSIXct("2020-12-09 13:07:00"),
as.POSIXct("2020-12-09 13:46:00"),as.POSIXct("2020-12-09 13:29:00")), class = c("POSIXct", "POSIXt"),
tzone = ""), Leave.Time = structure(c(as.POSIXct("2020-12-09 13:25:00"), as.POSIXct("2020-12-09 13:22:00"),
as.POSIXct("2020-12-09 14:01:00"), as.POSIXct("2020-12-09 13:29:00"), as.POSIXct("2020-12-09 14:00:00"),
as.POSIXct("2020-12-09 14:33:00")), class = c("POSIXct", "POSIXt"), tzone = "America/New_York"),
Overlap = c(FALSE, FALSE, FALSE, TRUE, TRUE, TRUE)), .Names = c("Email", "Join.Time", "Leave.Time", "Overlap"
), row.names = c(NA, -6L), class = "data.frame")
Email Join.Time Leave.Time Overlap
1 email1@gmail.com 2020-12-09 13:04:00 2020-12-09 13:25:00 FALSE
2 email2@gmail.com 2020-12-09 13:20:00 2020-12-09 13:22:00 FALSE
3 email2@gmail.com 2020-12-09 13:30:00 2020-12-09 14:01:00 FALSE
4 email3@gmail.com 2020-12-09 13:07:00 2020-12-09 13:29:00 TRUE
5 email3@gmail.com 2020-12-09 13:46:00 2020-12-09 14:00:00 TRUE
6 email3@gmail.com 2020-12-09 13:29:00 2020-12-09 14:33:00 TRUE
我尝试解决此处建议的解决方案:R Find overlap among time periods 但当我这样做时,我收到错误“if (int_overlaps(intervals[i], interval[j])) { 中的错误: 需要 TRUE/FALSE 的缺失值"
不胜感激!
【问题讨论】:
-
您的完整数据中有重叠列吗?
-
@AcidCatfish 不,希望创建一个重叠列或类似的东西来识别具有重叠时间间隔的电子邮件地址
标签: r