【问题标题】:Use R to create start time and end time columns based on first occurrence of events in another column使用 R 根据另一列中第一次发生的事件创建开始时间和结束时间列
【发布时间】:2020-02-08 11:58:45
【问题描述】:

例如,我想知道如何从下面的数据框中合成开始时间和结束时间 数据显示了调用时间段内的调用处理程序记录

Id  CallTime    CallHandler CallStatus
1   01/01/2020 00:05    A   Busy
2   01/01/2020 00:10    A   Free
3   01/01/2020 00:25    A   Free
4   01/01/2020 00:57    A   Free
5   01/01/2020 01:30    A   Busy
6   01/01/2020 01:45    A   Busy
7   01/01/2020 02:20    A   Busy
8   01/01/2020 02:25    A   Busy
9   01/01/2020 02:50    A   Free
10  01/01/2020 02:25    A   Free
11  01/01/2020 02:55    A   Busy
12  01/01/2020 03:25    A   Busy
13  01/01/2020 04:55    A   Free
14  01/01/2020 05:25    A   Busy
15  01/01/2020 05:55    A   Free
16  01/01/2020 06:25    A   Busy

预期输出

输出应根据忙和闲通话状态在不同列中返回开始和结束时间 忙忙碌碌


CallHandler StartTime   EndTime
A   01/01/2020 00:05    01/01/2020 00:10
A   01/01/2020 01:30    01/01/2020 02:50
A   01/01/2020 02:55    01/01/2020 04:55
A   01/01/2020 05:25    01/01/2020 05:55
A   01/01/2020 06:25    N/A

我用过

df %>%
  group_by(CallStatus) %>%
  mutate(StartTime = ifelse(CallStatus == "Free", CallTime, 0), EndTime = ifelse(CallStatus == "Busy", CallTime, 0))

得到以下,但真的是我想要的

Id      CallTime         CallHandler CallStatus StartTime        EndTime               
1       01/01/2020 00:05 A           Busy       0                01/01/2020 00:05
2       01/01/2020 00:10 A           Free       01/01/2020 00:10 0               
3       01/01/2020 00:25 A           Free       01/01/2020 00:25 0               
4       01/01/2020 00:57 A           Free       01/01/2020 00:57 0               
5       01/01/2020 01:30 A           Busy       0                01/01/2020 01:30
6       01/01/2020 01:45 A           Busy       0                01/01/2020 01:45
7       01/01/2020 02:20 A           Busy       0                01/01/2020 02:20
8       01/01/2020 02:25 A           Busy       0                01/01/2020 02:25
9       01/01/2020 02:50 A           Free       01/01/2020 02:50 0               
10      01/01/2020 02:25 A           Free       01/01/2020 02:25 0               
11      01/01/2020 02:55 A           Busy       0                01/01/2020 02:55
12      01/01/2020 03:25 A           Busy       0                01/01/2020 03:25
13      01/01/2020 04:55 A           Free       01/01/2020 04:55 0               
14      01/01/2020 05:25 A           Busy       0                01/01/2020 05:25
15      01/01/2020 05:55 A           Free       01/01/2020 05:55 0               
16      01/01/2020 06:25 A           Busy       0                01/01/2020 06:25

【问题讨论】:

  • 耶稣耶稣教。请添加minimal reproducible example。这样其他人就可以轻松地测试建议,而您更有可能得到一个好的答案!您提供的示例数据不容易使用。

标签: r time-series


【解决方案1】:

我们可以先filter 取出CallStatus"Busy"CallStatus 的先前值为"Busy" 的行,创建它们的组并在每个组中选择firstlast 条目。当StartTimeEndTime 相同时,我们将EndTime 替换为NA

library(dplyr)

df %>%
  filter(CallStatus == "Busy" | lag(CallStatus) == "Busy") %>%
  group_by(CallHandler, gr = cumsum(lag(CallStatus != "Busy", default = TRUE))) %>%
  summarise(StartTime = first(CallTime), 
            EndTime = last(CallTime)) %>%
  mutate(EndTime = replace(EndTime, StartTime == EndTime, NA)) %>%
  select(-gr)


# CallHandler StartTime         EndTime        
# <fct>       <fct>             <fct>          
#1 A           01/01/2020 00:05 01/01/2020 00:10
#2 A           01/01/2020 01:30 01/01/2020 02:50
#3 A           01/01/2020 02:55 01/01/2020 04:55
#4 A           01/01/2020 05:25 01/01/2020 05:55
#5 A           01/01/2020 06:25 NA             

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2020-05-20
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-07-23
    • 1970-01-01
    相关资源
    最近更新 更多