【问题标题】:Calculate Rolling 12 Hours by Group in R在 R 中按组计算滚动 12 小时
【发布时间】:2023-03-26 23:56:01
【问题描述】:

我正在开展一个项目,在该项目中,我只需要包含至少间隔 12 小时进行实验室测试的患者,并保留每个包含的实验室测试的时间戳。问题是许多患者在 12 小时内完成了多个实验室,但客户要求不包括这些测试。我已经做到了这一点:

#Create dummy dataset
df = data.frame(
  "Encounter" = c(rep("12345", times=16), rep("67890", times = 5)),
  "Timestamp" = c("01/06/2022 04:00:00", "01/07/2022 08:00:00",
                   "01/08/2022 00:00:00", "01/08/2022 04:00:00",
                   "01/08/2022 08:00:00", "01/08/2022 20:00:00",
                   "01/09/2022 04:00:00", "01/09/2022 08:00:00",
                   "01/09/2022 20:00:00", "01/09/2022 23:26:00",
                   "01/10/2022 00:00:00", "01/10/2022 08:00:00",
                   "01/10/2022 20:00:00", "01/11/2022 00:00:00",
                   "01/11/2022 20:00:00", "01/12/2022 04:00:00",
                   "11/10/2021 11:00:00", "11/10/2021 12:00:00",
                   "11/10/2021 13:00:00", "11/10/2021 14:00:00",
                   "11/11/2021 00:00:00"))

#Convert timestamp to POSIXlt format
df$Timestamp <- strptime(as.character(df$Timestamp), format="%m/%d/%Y %H:%M")

#Calculate time (in hours) between each previous timestamp by Encounter
df <- df %>% 
  group_by(Encounter) %>% 
  arrange(Encounter, Timestamp) %>% 
  mutate(difftime(Timestamp, lag(Timestamp), units="hours"))

我似乎不知道下一步该做什么。似乎我需要计算一个滚动的 12 小时,然后在一行达到 12 小时后重置为 0,但我不知道该怎么做。以下是我的理想结果:

df$Keep.Row <- c(1,1,1,0,0,1,0,1,1,0,0,1,1,0,1,0,1,0,0,0,1)

【问题讨论】:

  • 您正在寻找zoo::rollapply。帮助中有一个如何使用时间窗口的示例。

标签: r timestamp rolling-computation


【解决方案1】:

这绝对没有什么优雅之处,但我相信它可以满足您的需求。我使用一个临时变量来存储“滚动”总和,然后在它们之间的小时数为 12 或更大时重置。

library(tidyverse)
df <- df %>% 
  group_by(Encounter) %>% 
  arrange(Encounter, Timestamp) %>% 
  mutate(time_diff = difftime(Timestamp, lag(Timestamp), units="hours")) %>%
  replace_na(list(time_diff = 0)) %>%
  mutate(temp = ifelse(time_diff < 12 & lag(time_diff) >= 12, time_diff, lag(time_diff) + time_diff),
         temp = ifelse(is.na(temp), 0, temp),
         hours_between = ifelse(time_diff >= 12, time_diff,
                        ifelse(time_diff < 12 & lag(time_diff) >= 12, time_diff, lag(temp) + time_diff)),
         keep = ifelse(hours_between >= 12 | is.na(hours_between), 1, 0)) %>%
  select(-temp)

reprex package (v2.0.1) 于 2022-01-27 创建

【讨论】:

  • 任何一天我都会把实用性置于优雅之上!非常感谢,这正是我所需要的!
  • @JulianneKubes 很高兴它成功了!作为提醒,我注意到我有 &gt; 而不是 &gt;= 用于其中一种情况,所以我已经更正了它,hours_between 现在将显示正确的值。
【解决方案2】:

这是使用accumulate 的替代选项。在这里,您可以使用差异,一旦它们超过 12 小时的阈值,只需使用 diff 值(重新开始)而不是使用累积和来重置。要包含每个 Encounter 的第一次,您可以将 diff 设为 12 小时,或者添加单独的 mutate 并检查 Timestamp == first(Timestamp) 的位置,在这些情况下将 keep 设置为 1。

library(tidyverse)

thresh <- 12

df %>%
  group_by(Encounter) %>% 
  arrange(Encounter, Timestamp) %>% 
  mutate(diff = difftime(Timestamp, lag(Timestamp, default = first(Timestamp) - (thresh * 60 * 60)), units = "hours"),
         keep = +(accumulate(diff, ~if_else(.x >= thresh, .y, .x + .y)) >= thresh))

输出

   Encounter Timestamp           diff              keep
   <chr>     <dttm>              <drtn>           <int>
 1 12345     2022-01-06 04:00:00 12.0000000 hours     1
 2 12345     2022-01-07 08:00:00 28.0000000 hours     1
 3 12345     2022-01-08 00:00:00 16.0000000 hours     1
 4 12345     2022-01-08 04:00:00  4.0000000 hours     0
 5 12345     2022-01-08 08:00:00  4.0000000 hours     0
 6 12345     2022-01-08 20:00:00 12.0000000 hours     1
 7 12345     2022-01-09 04:00:00  8.0000000 hours     0
 8 12345     2022-01-09 08:00:00  4.0000000 hours     1
 9 12345     2022-01-09 20:00:00 12.0000000 hours     1
10 12345     2022-01-09 23:26:00  3.4333333 hours     0
11 12345     2022-01-10 00:00:00  0.5666667 hours     0
12 12345     2022-01-10 08:00:00  8.0000000 hours     1
13 12345     2022-01-10 20:00:00 12.0000000 hours     1
14 12345     2022-01-11 00:00:00  4.0000000 hours     0
15 12345     2022-01-11 20:00:00 20.0000000 hours     1
16 12345     2022-01-12 04:00:00  8.0000000 hours     0
17 67890     2021-11-10 11:00:00 12.0000000 hours     1
18 67890     2021-11-10 12:00:00  1.0000000 hours     0
19 67890     2021-11-10 13:00:00  1.0000000 hours     0
20 67890     2021-11-10 14:00:00  1.0000000 hours     0
21 67890     2021-11-11 00:00:00 10.0000000 hours     1

【讨论】:

    【解决方案3】:

    可能会遗漏一些东西,但这不起作用:

    library(dplyr)
    
    df %>% 
      group_by(Encounter) %>% 
      arrange(Encounter, Timestamp) %>% 
      mutate(time_dif = difftime(Timestamp, lag(Timestamp), units="hours")) %>% 
      filter(time_dif > 12)
    

    【讨论】:

    • 感谢您的回复。问题是我需要通过 Encounter 计算每个实验室测试之间的累计小时数,并在总和达到 >= 12 时重置计算。下面的代码让我更接近:library(MESS) df &lt;- df %&gt;% group_by(group_12 = cumsumbinning(Hours.Num, 12)) %&gt;% mutate(cumsum_12 = cumsum(Hours.Num))
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2018-05-11
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2023-03-11
    • 2017-11-13
    相关资源
    最近更新 更多