R中跨日期和时间组合的不同连续时间事件的总时间答案

【问题标题】：Sum time across different continuous time events across date and time combinations in RR中跨日期和时间组合的不同连续时间事件的总时间
【发布时间】：2021-11-27 04:46:32
【问题描述】：

我无法弄清楚如何计算和汇总数据集中多个日期和时间事件的连续时间观察结果。在here 中发现了一个类似的问题，但它只说明了一个连续时间事件的实例。我有一个包含多个日期和时间组合的数据集。这是该数据集中的一个示例，我在 R 中对其进行操作：

date.1 <- c("2021-07-21", "2021-07-21", "2021-07-21", "2021-07-29", "2021-07-29", "2021-07-30", "2021-08-01","2021-08-01","2021-08-01")
time.1 <- c("15:57:59", "15:58:00", "15:58:01", "15:46:10", "15:46:13", "18:12:10", "18:12:10","18:12:11","18:12:13")
df <- data.frame(date.1, time.1)
df
       date.1   time.1
1 2021-07-21 15:57:59
2 2021-07-21 15:58:00
3 2021-07-21 15:58:01
4 2021-07-29 15:46:10
5 2021-07-29 15:46:13
6 2021-07-30 18:12:10
7 2021-08-01 18:12:10
8 2021-08-01 18:12:11
9 2021-08-01 18:12:13

我尝试从我提供的链接中遵循以下脚本：

df$missingflag <-  c(1, diff(as.POSIXct(df$time.1, format="%H:%M:%S", tz="UTC"))) > 1
df
   date.1   time.1 missingflag
1 2021-07-21 15:57:59       FALSE
2 2021-07-21 15:58:00        TRUE
3 2021-07-21 15:58:01       FALSE
4 2021-07-29 15:46:10       FALSE
5 2021-07-29 15:46:13        TRUE
6 2021-07-30 18:12:10        TRUE
7 2021-08-01 18:12:10       FALSE
8 2021-08-01 18:12:11       FALSE
9 2021-08-01 18:12:13        TRUE

但它没有按预期工作，也没有接近我的答案。这本来是一个中间目标，可能不会回答我的问题。

我的困境的目标是考虑所有连续的时间观察，并像这样放入一个新表中：

   date.1   time.1      secs
1 2021-07-21 15:57:59       3
4 2021-07-29 15:46:10       1
5 2021-07-29 15:46:13       1
6 2021-07-30 18:12:10       1
7 2021-08-01 18:12:10       2
9 2021-08-01 18:12:13       1

您将看到记录了每个连续时间观察的开始时间，以及自连续观察开始以来观察到的总秒数（秒）。该脚本将考虑 date.1，因为数据集中有多个日期。

提前谢谢你。

【问题讨论】：

哎呀！那是一个错字。我现在就修。

标签： r date time sum subset

【解决方案1】：

您可以创建一个datetime 对象组合日期和时间列，获取连续值的差异，并创建所有时间间隔为 1 的组属于同一组。对于每个组，计算行数及其 first datetime 值。

library(dplyr)
library(tidyr)

df %>%
  unite(datetime, date.1, time.1, sep = ' ') %>%
  mutate(datetime = lubridate::ymd_hms(datetime)) %>%
  group_by(grp = cumsum(difftime(datetime, 
           lag(datetime, default = first(datetime)), units = 'secs') > 1)) %>%
  summarise(datetime = first(datetime), 
            secs = n(), .groups = 'drop') %>%
  select(-grp)

#  datetime             secs
#  <dttm>              <int>
#1 2021-07-21 15:57:59     3
#2 2021-07-29 15:46:10     1
#3 2021-07-29 15:46:13     1
#4 2021-07-30 18:12:10     1
#5 2021-08-01 18:12:10     2
#6 2021-08-01 18:12:13     1

我在这里将datetime 保留为单个组合列，但如果需要，您可以使用将它们再次分隔为两个不同的列

 %>% separate(datetime, c('date', 'time'), sep = ' ')

【讨论】：