【发布时间】:2020-12-18 19:18:56
【问题描述】:
我正在尝试为一组以超前/滞后方式运行的过滤器创建汇总统计信息。
关于领先/滞后的简短描述:
当一个新的过滤器上线时,它被置于滞后位置,这意味着水在通过初级(又名铅)过滤器后通过它。当前置过滤器堵塞时,当前的滞后过滤器移动到前置位置。总而言之,过滤器从滞后位置开始,然后进入领先位置。
视觉上,你可以想象成这样:
我需要做的是总结单个过滤器在线的整个时间,无论是领先还是落后。
这里是示例数据:
structure(list(record_timestamp = structure(c(1608192000, 1608192060,1608192120, 1608192180, 1608192240, 1608192300, 1608192360, 1608192420,1608192480, 1608192540, 1608192600, 1608192660, 1608192720, 1608192780,1608192840, 1608192900, 1608192960, 1608193020, 1608193080, 1608193140,1608193200, 1608193260, 1608193320, 1608193380, 1608193440, 1608193500,1608193560, 1608193620, 1608193680, 1608193740, 1608193800), class = c("POSIXct","POSIXt"), tzone = "UTC"), flow = c(20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10), lag_start = structure(c(1608192000,1608192000, 1608192000, 1608192000, 1608192000, 1608192000, 1608192000,1608192000, 1608192000, 1608192000, 1608192000, 1608192660, 1608192660,1608192660, 1608192660, 1608192660, 1608192660, 1608192660, 1608192660,1608192660, 1608192660, 1608193260, 1608193260, 1608193260, 1608193260,1608193260, 1608193260, 1608193260, 1608193260, 1608193260, 1608193260), class = c("POSIXct", "POSIXt"), tzone = "UTC"), lead_start = c("#N/A","#N/A", "#N/A", "#N/A", "#N/A", "#N/A", "#N/A", "#N/A", "#N/A","#N/A", "#N/A", "12/17/2020 8:11", "12/17/2020 8:11", "12/17/2020 8:11","12/17/2020 8:11", "12/17/2020 8:11", "12/17/2020 8:11", "12/17/2020 8:11","12/17/2020 8:11", "12/17/2020 8:11", "12/17/2020 8:11", "12/17/2020 8:21","12/17/2020 8:21", "12/17/2020 8:21", "12/17/2020 8:21", "12/17/2020 8:21","12/17/2020 8:21", "12/17/2020 8:21", "12/17/2020 8:21", "12/17/2020 8:21","12/17/2020 8:21")), class = c("spec_tbl_df", "tbl_df", "tbl","data.frame"), row.names = c(NA, -31L), spec = structure(list(cols = list(record_timestamp = structure(list(), class = c("collector_character","collector")), flow = structure(list(), class = c("collector_double","collector")), polish_start = structure(list(), class = c("collector_character", "collector")), lead_start = structure(list(), class = c("collector_character","collector"))), default = structure(list(), class = c("collector_guess","collector")), skip = 1), class = "col_spec"))
我的想法是“取消嵌套”它们并接受会有重复的时间戳,但每一行只会与一个过滤器相关联。关于如何做到这一点的任何想法?未嵌套的 DF 看起来像:
structure(list(record_timestamp = structure(c(1608192000, 1608192060,1608192120, 1608192180, 1608192240, 1608192300, 1608192360, 1608192420,1608192480, 1608192540, 1608192600, 1608192660, 1608192720, 1608192780,1608192840, 1608192900, 1608192960, 1608193020, 1608193080, 1608193140,1608193200, 1608192660, 1608192720, 1608192780, 1608192840, 1608192900,1608192960, 1608193020, 1608193080, 1608193140, 1608193200, 1608193260,1608193320, 1608193380, 1608193440, 1608193500, 1608193560,1608193620,1608193680, 1608193740, 1608193800), class = c("POSIXct", "POSIXt"), tzone = "UTC"), flow = c(20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), lag_start = structure(c(1608192000, 1608192000, 1608192000,1608192000, 1608192000, 1608192000, 1608192000, 1608192000,1608192000,1608192000, 1608192000, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1608192660, 1608192660, 1608192660, 1608192660, 1608192660, 1608192660,1608192660, 1608192660, 1608192660, 1608192660, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), class = c("POSIXct", "POSIXt"), tzone = "UTC"), lead_start = structure(c(NA, NA, NA, NA, NA, NA, NA, NA,NA, NA, NA, 1608192660, 1608192660, 1608192660, 1608192660,1608192660, 1608192660, 1608192660, 1608192660, 1608192660,1608192660, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1608193260,1608193260, 1608193260, 1608193260, 1608193260, 1608193260,1608193260, 1608193260, 1608193260, 1608193260), class = c("POSIXct","POSIXt"), tzone = "UTC"), filter_id = c(1, 1, 1, 1, 1, 1,1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2)), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -41L), spec = structure(list(cols = list(record_timestamp = structure(list(), class = c("collector_character","collector")), flow = structure(list(), class = c("collector_double","collector")), polish_start = structure(list(), class = c("collector_character","collector")), lead_start = structure(list(), class = c("collector_character", "collector")), filter_id = structure(list(), class = c("collector_double","collector"))), default = structure(list(), class = c("collector_guess","collector")), skip = 1), class = "col_spec"))
但我意识到,这将使我正在处理的数据的大小增加一倍,这已经是几年的一分钟数据了。因此,如果有一种方法可以在不加倍时间戳的情况下做到这一点,那将是首选。
最后,最终目标是有一个小的摘要 DF,如下所示:
Filter ID | Total Flow
----------------------------
1 | 370
2 | 250
... | ...
【问题讨论】:
-
将列更改为 POSIXct 而不是 chr。没有一个专栏。我能做的是导出每个过滤器的所有间隔。例如:数据 %>% distinct(lag_start, .keep_all = TRUE) %>% mutate(changeout_interval = interval(lag_start, lead(lead_start))))
-
这样你就可以得到每个过滤器的开始和停止时间,或者是当前正在使用的过滤器的 NA。我想当前日期时间可以用作这些过滤器的终点。这有帮助吗?
-
感谢您提供详细信息。请看下面的答案。这是否给出了预期的结果?或者,如果我遗漏了一些其他细节,请告诉我。