【问题标题】:find average incidents per business day查找每个工作日的平均事件
【发布时间】:2021-09-24 22:52:51
【问题描述】:

我有一个如下数据集:

+----+-------+---------------------+
| ID | SUBID |        date         |
+----+-------+---------------------+
| A  |     1 | 2021-01-01 12:00:00 |
| A  |     1 | 2021-01-02 01:00:00 |
| A  |     1 | 2021-01-02 02:00:00 |
| A  |     1 | 2021-01-03 03:00:00 |
| A  |     2 | 2021-01-05 16:00:00 |
| A  |     2 | 2021-01-06 13:00:00 |
| A  |     2 | 2021-01-07 06:00:00 |
| A  |     2 | 2021-01-08 08:00:00 |
| A  |     2 | 2021-01-08 10:00:00 |
| A  |     2 | 2021-01-08 11:00:00 |
| A  |     3 | 2021-01-09 09:00:00 |
| A  |     3 | 2021-01-10 19:00:00 |
| A  |     3 | 2021-01-11 20:00:00 |
| A  |     3 | 2021-01-12 22:00:00 |
| B  |     1 | 2021-02-01 23:00:00 |
| B  |     1 | 2021-02-02 15:00:00 |
| B  |     1 | 2021-02-03 06:00:00 |
| B  |     1 | 2021-02-04 08:00:00 |
| B  |     2 | 2021-02-05 18:00:00 |
| B  |     2 | 2021-02-05 19:00:00 |
| B  |     2 | 2021-02-06 22:00:00 |
| B  |     2 | 2021-02-06 23:00:00 |
| B  |     2 | 2021-02-07 04:00:00 |
| B  |     2 | 2021-02-08 02:00:00 |
| B  |     3 | 2021-02-09 01:00:00 |
| B  |     3 | 2021-02-10 03:00:00 |
| B  |     3 | 2021-02-11 13:00:00 |
| B  |     3 | 2021-02-12 14:00:00 |
+----+-------+---------------------+

我希望能够以小时为单位获取每个 ID 和 SUBID 组之间的时差,最好是在营业时间方面,其中每个出现在周末或联邦假日的日期都可以移动到最近的工作日(之前或之后)时间为 23:59:59 如下:

+----+-------+---------------------+------------------------------------------------------------------+
| ID | SUBID |        date         | timediff (hours) with preceding date for each group (ID, SUBID) |
+----+-------+---------------------+------------------------------------------------------------------+
| A  |     1 | 2021-01-01 12:00:00 |                                                                0 |
| A  |     1 | 2021-01-02 01:00:00 |                                                               13 |
| A  |     1 | 2021-01-02 02:00:00 |                                                                1 |
| A  |     1 | 2021-01-03 03:00:00 |                                                                1 |
| A  |     2 | 2021-01-05 16:00:00 |                                                                0 |
| A  |     2 | 2021-01-06 13:00:00 |                                                               21 |
| A  |     2 | 2021-01-07 06:00:00 |                                                               17 |
| A  |     2 | 2021-01-08 08:00:00 |                                                                2 |
| A  |     2 | 2021-01-08 10:00:00 |                                                                2 |
| A  |     2 | 2021-01-08 11:00:00 |                                                                1 |
| A  |     3 | 2021-01-09 09:00:00 |                                                                0 |
| A  |     3 | 2021-01-10 19:00:00 |                                                               36 |
| A  |     3 | 2021-01-11 20:00:00 |                                                                1 |
| A  |     3 | 2021-01-12 22:00:00 |                                                                1 |
| B  |     1 | 2021-02-01 23:00:00 |                                                                0 |
| B  |     1 | 2021-02-02 15:00:00 |                                                               16 |
| B  |     1 | 2021-02-03 06:00:00 |                                                               15 |
| B  |     1 | 2021-02-04 08:00:00 |                                                               26 |
| B  |     2 | 2021-02-05 18:00:00 |                                                                0 |
| B  |     2 | 2021-02-05 19:00:00 |                                                                1 |
| B  |     2 | 2021-02-06 22:00:00 |                                                               27 |
| B  |     2 | 2021-02-06 23:00:00 |                                                                1 |
| B  |     2 | 2021-02-07 04:00:00 |                                                                5 |
| B  |     2 | 2021-02-08 02:00:00 |                                                               22 |
| B  |     3 | 2021-02-09 01:00:00 |                                                                0 |
| B  |     3 | 2021-02-10 03:00:00 |                                                               26 |
| B  |     3 | 2021-02-11 13:00:00 |                                                               11 |
| B  |     3 | 2021-02-12 14:00:00 |                                                                1 |
+----+-------+---------------------+------------------------------------------------------------------+

最后我想计算平均时间,即每组(ID,SUBID)的时间差总和除以每组的总计数,如下所示:

+----+-------+------------------------------------------------------------+
| ID | SUBID | Average  time (count per group/ total time diff of group ) |
+----+-------+------------------------------------------------------------+
| A  |     1 | 15/4                                                       |
| A  |     2 | 43/6                                                       |
| A  |     3 | 38/4                                                       |
| B  |     1 | 57/4                                                       |
| B  |     2 | 56/6                                                       |
| B  |     3 | 38/4                                                       |
+----+-------+------------------------------------------------------------+

我是 R 的新手,我遇到了 lubridate 来帮助我格式化日期,我可以使用下面的代码获取时间差异

df%>%
        group_by(ID, SUBID) %>%
        mutate(time_diff = difftime(date, lag(date), unit = 'min'))

但是,我在获取工作日时间的差异以及根据最后一张表获取平均时间时遇到了麻烦

【问题讨论】:

    标签: r dplyr lubridate bizdays


    【解决方案1】:

    欢迎!使用dplyrlubridate

    使用的数据:

    library(tidyverse)
    library(lubridate)
    df <- data.frame(ID = c("A","A","A","A"),
                     SUBID = c(1,1,2,2),
                     Date = lubridate::as_datetime(c("2021-01-01 12:00:00","2021-01-02 1:00:00","2021-01-01 2:00:00","2021-01-01 13:00:00")))
    
      ID SUBID                Date
    1  A     1 2021-01-01 12:00:00
    2  A     1 2021-01-02 01:00:00
    3  A     2 2021-01-01 02:00:00
    4  A     2 2021-01-01 13:00:00
    

    代码:

    df %>% 
      group_by(ID, SUBID) %>% 
      mutate(diff = Date - lag(Date)) %>% 
      mutate(diff = ifelse(is.na(diff), 0, diff)) %>% 
      summarise(Average = sum(diff)/n())
    

    输出:

      ID    SUBID Average
      <chr> <dbl>   <dbl>
    1 A         1     6.5
    2 A         2     5.5
    

    编辑:如何处理week_ends

    对于周末,更简单的解决方案是将日期更改为下周一:

    df %>% 
      mutate(week_day = wday(Date,label = TRUE, abbr = FALSE)) %>%
      mutate(Date = ifelse(week_day == "samedi", Date + days(2),
                           ifelse(week_day == "dimanche", Date + days(1), Date))) %>%
      mutate(Date = as_datetime(Date))
    

    这将使用当天的名称创建列week_day。如果这一天是“samedi”(星期六)或“dimanche”(星期日),它会将 Date 增加 2 或 1 天,因此它变成了星期一。然后,您只需重新排序日期(df %&gt;% arrange(ID, SUBID, Date)) 并重新运行第一个代码。

    由于我的本地语言是法语,您必须将 samedidimanche 更改为 saturdaysunday

    对于假期,您可以通过创建一个表示假期的时间间隔变量来应用相同的系统,测试每个日期是否在此时间间隔内,如果是,则将日期更改为此时间间隔的最后一天.

    【讨论】:

    • 谢谢,您能告诉我如何将出现在周末或节假日的日期移动到计算前最近的工作日吗?
    • 对不起,我没有注意到你的那部分要求。我会考虑的
    猜你喜欢
    • 2017-06-10
    • 2015-09-12
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2023-03-19
    相关资源
    最近更新 更多