【问题标题】:R create shift ID where counter increases based on change in row values by groupR创建班次ID,其中计数器根据组中行值的变化而增加
【发布时间】:2023-02-10 13:39:36
【问题描述】:

我所拥有的是不同“用户”的时间戳和一个指示器,指示用户时间戳之间何时存在 1 小时以上的间隔(表示新的“班次”)。数据集如下所示:

user  datetime              shift_change_ind
1     2017-08-24 22:42:52   0       
1     2017-08-24 22:53:52   0       
1     2017-08-24 22:59:37   0       
1     2017-09-01 22:34:56   1       
1     2017-09-01 22:42:22   0       
1     2017-09-01 22:48:49   0       
1     2017-09-01 22:51:53   0       
1     2017-09-02 00:27:09   1       
1     2017-10-26 22:11:35   1       
1     2017-10-26 22:12:44   0       
1     2017-10-26 22:13:10   0       
1     2017-10-26 22:22:20   0       
1     2017-10-27 03:50:05   1       
1     2017-11-10 23:47:55   1       
1     2018-03-02 09:14:40   1       
1     2018-03-02 09:36:17   0       
1     2018-03-02 09:38:33   0       
2     2017-07-10 20:30:52   0       
2     2017-07-10 20:49:48   0       
2     2017-07-10 20:52:37   0       
2     2017-07-12 17:13:11   1       
2     2017-07-12 17:19:52   0       
2     2017-07-12 19:14:21   1       
2     2017-07-12 19:17:12   0   

代码在这里:

data = structure(list(user = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2), datetime = structure(c(1503614572.35, 
1503615232.527, 1503615577.937, 1504305296.2, 1504305742.53, 
1504306129.867, 1504306313.847, 1504312029.627, 1509055895.44, 
1509055964.003, 1509055990.587, 1509056540.84, 1509076205.797, 
1510357675.767, 1519982080, 1519983377, 1519983513, 1499718652.61, 
1499719788.737, 1499719957.883, 1499879591.997, 1499879992.94, 
1499886861.447, 1499887032.547), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), shift_change_ind = c(0, 0, 0, 1, 0, 
0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0)), row.names = c(NA, 
-24L), class = c("tbl_df", "tbl", "data.frame"))

我需要的是创建一个按用户分组的“班次 ID”列,它会在出现 1 时增加 ID 计数器,从而产生如下数据集:

user  datetime              shift_change_ind  shift_id
1     2017-08-24 22:42:52   0                 1 
1     2017-08-24 22:53:52   0                 1 
1     2017-08-24 22:59:37   0                 1 
1     2017-09-01 22:34:56   1                 2 
1     2017-09-01 22:42:22   0                 2 
1     2017-09-01 22:48:49   0                 2 
1     2017-09-01 22:51:53   0                 2 
1     2017-09-02 00:27:09   1                 3 
1     2017-10-26 22:11:35   1                 4 
1     2017-10-26 22:12:44   0                 4 
1     2017-10-26 22:13:10   0                 4 
1     2017-10-26 22:22:20   0                 4 
1     2017-10-27 03:50:05   1                 5 
1     2017-11-10 23:47:55   1                 6 
1     2018-03-02 09:14:40   1                 7 
1     2018-03-02 09:36:17   0                 7 
1     2018-03-02 09:38:33   0                 7 
2     2017-07-10 20:30:52   0                 1 
2     2017-07-10 20:49:48   0                 1 
2     2017-07-10 20:52:37   0                 1 
2     2017-07-12 17:13:11   1                 2 
2     2017-07-12 17:19:52   0                 2 
2     2017-07-12 19:14:21   1                 3 
2     2017-07-12 19:17:12   0                 3 

代码在这里:

new_data = structure(list(user = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2), datetime = structure(c(1503614572.35, 
1503615232.527, 1503615577.937, 1504305296.2, 1504305742.53, 
1504306129.867, 1504306313.847, 1504312029.627, 1509055895.44, 
1509055964.003, 1509055990.587, 1509056540.84, 1509076205.797, 
1510357675.767, 1519982080, 1519983377, 1519983513, 1499718652.61, 
1499719788.737, 1499719957.883, 1499879591.997, 1499879992.94, 
1499886861.447, 1499887032.547), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), shift_change_ind = c(0, 0, 0, 1, 0, 
0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0), shift_id = c(1, 
1, 1, 2, 2, 2, 2, 3, 4, 4, 4, 4, 5, 6, 7, 7, 7, 1, 1, 1, 2, 2, 
3, 3)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-24L))

我有数百万行,所以 for 循环似乎是一场噩梦。我尝试使用 rleid() 作为 shift_id 列的起始位置,使用 ifelse() 条件来确定 shift_change_ind 列中是否出现前导或滞后 1 或 0 以调整 shift_id 列中的计数器,但在重复 1 时遇到问题(在 shift_change_ind 列中)。另外,我知道我的方法很老套,必须有一种更干净、更好的方法来解决这个问题。任何帮助深表感谢。

【问题讨论】:

    标签: r datetime grouping counter


    【解决方案1】:

    我们可以在按“用户”分组后使用cumsum

    library(dplyr)
    out <- data %>% 
      group_by(user) %>%
      mutate(shift_id = cumsum(shift_change_ind)+1) %>%
      ungroup
    

    -输出

    as.data.frame(out)
     user            datetime shift_change_ind shift_id
    1     1 2017-08-24 22:42:52                0        1
    2     1 2017-08-24 22:53:52                0        1
    3     1 2017-08-24 22:59:37                0        1
    4     1 2017-09-01 22:34:56                1        2
    5     1 2017-09-01 22:42:22                0        2
    6     1 2017-09-01 22:48:49                0        2
    7     1 2017-09-01 22:51:53                0        2
    8     1 2017-09-02 00:27:09                1        3
    9     1 2017-10-26 22:11:35                1        4
    10    1 2017-10-26 22:12:44                0        4
    11    1 2017-10-26 22:13:10                0        4
    12    1 2017-10-26 22:22:20                0        4
    13    1 2017-10-27 03:50:05                1        5
    14    1 2017-11-10 23:47:55                1        6
    15    1 2018-03-02 09:14:40                1        7
    16    1 2018-03-02 09:36:17                0        7
    17    1 2018-03-02 09:38:33                0        7
    18    2 2017-07-10 20:30:52                0        1
    19    2 2017-07-10 20:49:48                0        1
    20    2 2017-07-10 20:52:37                0        1
    21    2 2017-07-12 17:13:11                1        2
    22    2 2017-07-12 17:19:52                0        2
    23    2 2017-07-12 19:14:21                1        3
    24    2 2017-07-12 19:17:12                0        3
    

    【讨论】:

    • 太感谢了!我刚刚意识到 cumsum 功能也可以在发布后立即运行。我知道我想多了哈哈。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2023-01-24
    • 2022-11-23
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多