【问题标题】:R Time Intervals: Grouping by hour of day when sample goes over the hour markR时间间隔:当样本超过小时标记时按一天中的小时分组
【发布时间】:2020-06-16 07:41:35
【问题描述】:

我有两种鸟类行为持续时间的视频数据,即鸟在巢上和离巢时。就我的分析而言,我需要每小时计算出入巢和出巢的持续时间。但是,通常不同的行为会与小时标记重叠。例如,鸟儿在 4:10-4:42 和 4:50 - 5:20 在巢上,我需要将第二个时段分开为 4:50-5:00 和 5:00-5:20,以便我可以每小时计算。我已经用 lubridate 包寻找了相当长的一段时间,但没有看到这样做的方法,但认为那里一定有什么东西。有什么建议吗?

示例数据如下。 “off.time.diff”是“off.bout.id”之间的秒差,与“on.time.diff”相同。在这里举个例子,这只鸟从 17:25:39 到 18:03:29 开始工作。我可以得到总时间(2270 秒),但不知道如何将它每小时分开。

Event   DT.event        off.bout.ID  on.bout.ID  off.time.diff  on.time.diff
off     4/27/12 17:25:13    1          0           NA               NA
on      4/27/12 17:25:39    1          1           26               NA
off     4/27/12 18:03:29    2          1           NA              2270
on      4/27/12 18:03:57    2          2           28               NA
off     4/27/12 19:41:16    3          2           NA              5839
on      4/27/12 19:43:50    3          3           154              NA
off     4/28/12 6:23:57     4          3           NA              38407
on      4/28/12 6:32:13     4          4           496              NA
off     4/28/12 6:40:20     5          4           NA              487
on      4/28/12 6:40:48     5          5           28               NA
off     4/28/12 8:16:07     6          5           NA              5719

【问题讨论】:

  • 您好 EagleEye。建议的解决方案之一是否回答了您的问题?

标签: r datetime time intervals lubridate


【解决方案1】:

我的建议背后的想法是检查每个事件有多少完整的小时标记,并为每个小时插入一个额外的行并相应地更改时间......

加载示例数据:

df <- read.table(text='Event   DT.event        off.bout.ID  on.bout.ID  off.time.diff  on.time.diff
off     4/27/12-17:25:13    1          0           NA               NA
on      4/27/12-17:25:39    1          1           26               NA
off     4/27/12-18:03:29    2          1           NA              2270
on      4/27/12-18:03:57    2          2           28               NA
off     4/27/12-19:41:16    3          2           NA              5839
on      4/27/12-19:43:50    3          3           154              NA
off     4/28/12-6:23:57     4          3           NA              38407
on      4/28/12-6:32:13     4          4           496              NA
off     4/28/12-6:40:20     5          4           NA              487
on      4/28/12-6:40:48     5          5           28               NA
off     4/28/12-8:16:07     6          5           NA              5719', header=T, stringsAsFactors=F)

设置日期时间变量。如有必要,调整tz 参数:

df$DT.event <- as.POSIXct(df$DT.event, format = "%m/%d/%y-%H:%M:%S")


library(dplyr)
library(tidyr)

# reshape data
# 
df2 <- df %>%
  select(Event, DT.event, on.bout.ID) %>% 
  pivot_wider(names_from = Event,
              values_from = DT.event) %>% 
  select(on.bout.ID, on, off)

df2df 的一些信息的更广泛的形式:

  on.bout.ID on                  off                
       <int> <dttm>              <dttm>             
1          0 NA                  2012-04-27 17:25:13
2          1 2012-04-27 17:25:39 2012-04-27 18:03:29
3          2 2012-04-27 18:03:57 2012-04-27 19:41:16
4          3 2012-04-27 19:43:50 2012-04-28 06:23:57
5          4 2012-04-28 06:32:13 2012-04-28 06:40:20
6          5 2012-04-28 06:40:48 2012-04-28 08:16:07
# Make a copy so we don't mutate the object we are using to iterate
#
df3 <- df2

for (i in seq_along(df2$on.bout.ID)) {

  # extract current iterations start and end time
  # 
  id <- df2$on.bout.ID[i]
  from <- df2$on[i]
  to <- df2$off[i]

  # calculate number of rows to insert
  # 
  hoursDiff <- as.numeric(format(to, "%H")) - as.numeric(format(from , "%H"))

  # compensate for crossing of midnight (00:00AM)
  # by adding 24
  #
  hoursDiff <- as.integer(difftime(as.Date(to), as.Date(from), unit="days")) * 24 + hoursDiff

  # if there is at least on pass of the full hour, insert a copy of the
  # current row but adapt on and off times
  # 
  if (!is.na(hoursDiff) & hoursDiff > 0) {
    for (hour in 1:hoursDiff) {

      # startime of this additional row
      # 
      fromTime <- as.POSIXct(paste0(format(from  + 3600 * hour, "%m/%d/%y-%H"), ":00:00"), format="%m/%d/%y-%H:%M:%S")

      # Maximal endtime of this additional row
      # 
      toTime <- fromTime + 3599

      # copy current line
      # 
      insert <- df2[i, ]

      # set start time for this new row to full hour
      #
      insert$on <- fromTime

      # if this is the last row to insert do NOT adapt off time
      # 
      if (!(toTime > to)) {
        insert$off <- toTime
      } 

      # add additional row
      # 
      df3 <- rbind(df3, insert)
    }

  # set off-time for the current line to end of first hour
  # 
  df3[df3$on.bout.ID == id & df3$on == from & df3$off == to,]$off <-  as.POSIXct(paste0(format(from, "%m/%d/%y-%H"), ":59:59"), format="%m/%d/%y-%H:%M:%S")
  }
}

# Use `dplyr` to sort result
#
library(dplyr)    
df3 %>% arrange(on.bout.ID, on)
    # A tibble: 21 x 3
      on.bout.ID on                  off                
           <int> <dttm>              <dttm>             
    1          0 NA                  2012-04-27 17:25:13
    2          1 2012-04-27 17:25:39 2012-04-27 17:59:59
    3          1 2012-04-27 18:00:00 2012-04-27 18:03:29
    4          2 2012-04-27 18:03:57 2012-04-27 18:59:59
    5          2 2012-04-27 19:00:00 2012-04-27 19:41:16
    6          3 2012-04-27 19:43:50 2012-04-27 19:59:59
    7          3 2012-04-27 20:00:00 2012-04-27 20:59:59
    8          3 2012-04-27 21:00:00 2012-04-27 21:59:59
    9          3 2012-04-27 22:00:00 2012-04-27 22:59:59
    10          3 2012-04-27 23:00:00 2012-04-27 23:59:59
    # … with 11 more rows

漂亮吗?不! 它有效吗?我也这么认为

编辑:

添加

 hoursDiff <- as.integer(difftime(as.Date(to), as.Date(from), unit="days")) * 24 + hoursDiff

扩展午夜穿越的功能

【讨论】:

    【解决方案2】:

    使用 tidyverse 可以做出比 Dario 更漂亮的解决方案:

    读取数据

    a =  
            read.csv(header = F, sep = ";",
                     col.names = c("Event","DT.event","off.bout.ID","on.bout.ID","off.time.diff","on.time.diff"),
                     text = gsub(pattern = "\\s+{2}",replacement = ";", 
                                 x="off     4/27/12 17:25:13    1          0           NA               NA
                            on      4/27/12 17:25:39    1          1           26               NA
                            off     4/27/12 18:03:29    2          1           NA               2270
                            on      4/27/12 18:03:57    2          2           28               NA
                            off     4/27/12 19:41:16    3          2           NA               5839
                            on      4/27/12 19:43:50    3          3           154              NA
                            off     4/28/12 6:23:57     4          3           NA               38407
                            on      4/28/12 6:32:13     4          4           496              NA
                            off     4/28/12 6:40:20     5          4           NA               487
                            on      4/28/12 6:40:48     5          5           28               NA
                            off     4/28/12 8:16:07     6          5           NA               5719"
                     )
            ) 
    
    a$DT.event <- mdy_hms(a$DT.event)
    
    

    添加一个包含可能感兴趣的时间的新行

    b <- a %>% select(DT.event) %>%
            mutate(DT.event = floor_date(DT.event,"hours")) %>%
            group_by(DT.event) %>%
            summarise() %>%
            full_join(a) %>%
            arrange(DT.event)
    
    

    发现差异

    c <- b %>% fill(Event, .direction = "up") %>%
            mutate(on.time.diff.hour = ifelse(Event == "off",
                                              difftime(DT.event, lag(DT.event),
                                                       "secs"), NA)) 
    
    

    您只需要注意查看天气,您在第二行中获得了额外的值(因为之前没有出现过)。

    结果

    # A tibble: 16 x 7
       DT.event            Event off.bout.ID on.bout.ID off.time.diff on.time.diff on.time.diff.hour
       <dttm>              <fct>       <int>      <int>         <int>        <int>             <dbl>
     1 2012-04-27 17:00:00 off            NA         NA            NA           NA                NA
     2 2012-04-27 17:25:13 off             1          0            NA           NA              1513
     3 2012-04-27 17:25:39 on              1          1            26           NA                NA
     4 2012-04-27 18:00:00 off            NA         NA            NA           NA              2061
     5 2012-04-27 18:03:29 off             2          1            NA         2270               209
     6 2012-04-27 18:03:57 on              2          2            28           NA                NA
     7 2012-04-27 19:00:00 off            NA         NA            NA           NA              3363
     8 2012-04-27 19:41:16 off             3          2            NA         5839              2476
     9 2012-04-27 19:43:50 on              3          3           154           NA                NA
    10 2012-04-28 06:00:00 off            NA         NA            NA           NA             36970
    11 2012-04-28 06:23:57 off             4          3            NA        38407              1437
    12 2012-04-28 06:32:13 on              4          4           496           NA                NA
    13 2012-04-28 06:40:20 off             5          4            NA          487               487
    14 2012-04-28 06:40:48 on              5          5            28           NA                NA
    15 2012-04-28 08:00:00 off            NA         NA            NA           NA              4752
    16 2012-04-28 08:16:07 off             6          5            NA         5719               967
    

    【讨论】:

    • 也许我有点密集,但我不明白这如何将观察结果分成 小时分档。如果这确实有效,我更喜欢 dplyr 解决方案... ;)
    • 这正是我需要的,谢谢!我也可以稍微调整一下以解决额外时间问题。
    【解决方案3】:

    这是一个使用data.table的选项:

    #create a lookup table of hourly data (to be dyn, you can use round(min()-1hr) and round(max()+1hr) to generate your hourly data
    hourly <- data.table(HOUR=seq(as.POSIXct("20120427 170000", format="%Y%m%d %H%M%S"), 
        as.POSIXct("20120428 090000", format="%Y%m%d %H%M%S"), 
        by="1 hour"))[, DT.event := HOUR]
    
    #get end of event from the row below
    DT[, endDT.event := shift(DT.event, -1L)]
    
    #perform rolling join to find the closest hour after this event time
    DT[, hr_aft := hourly[.SD, on=.(DT.event), roll=-Inf, HOUR]]
    
    #for those that cut across the hour mark, split into 2, if it can be more than 1hr, we can update this part to include that possibility
    ovlhr <- DT[hr_aft < endDT.event]
    ovlhr <- ovlhr[, .(Event, DT.event=c(DT.event, hr_aft), endDT.event=c(hr_aft, endDT.event)), 
        1L:nrow(ovlhr)][, (1L) := NULL]
    
    #append both dataset to get final desired output
    rbindlist(list(DT[hr_aft>=endDT.event][, hr_aft := NULL], ovlhr))[order(DT.event)]
    

    输出:

        Event            DT.event         endDT.event
     1:   off 2012-04-27 17:25:13 2012-04-27 17:25:39
     2:    on 2012-04-27 17:25:39 2012-04-27 18:00:00
     3:    on 2012-04-27 18:00:00 2012-04-27 18:03:29
     4:   off 2012-04-27 18:03:29 2012-04-27 18:03:57
     5:    on 2012-04-27 18:03:57 2012-04-27 19:00:00
     6:    on 2012-04-27 19:00:00 2012-04-27 19:41:16
     7:   off 2012-04-27 19:41:16 2012-04-27 19:43:50
     8:    on 2012-04-27 19:43:50 2012-04-27 20:00:00
     9:    on 2012-04-27 20:00:00 2012-04-28 06:23:57
    10:   off 2012-04-28 06:23:57 2012-04-28 06:32:13
    11:    on 2012-04-28 06:32:13 2012-04-28 06:40:20
    12:   off 2012-04-28 06:40:20 2012-04-28 06:40:48
    13:    on 2012-04-28 06:40:48 2012-04-28 07:00:00
    14:    on 2012-04-28 07:00:00 2012-04-28 08:16:07
    

    数据:

    library(data.table)
    DT <- fread("Event   DT.event
    off     4/27/12_17:25:13     
    on      4/27/12_17:25:39     
    off     4/27/12_18:03:29     
    on      4/27/12_18:03:57     
    off     4/27/12_19:41:16     
    on      4/27/12_19:43:50     
    off     4/28/12_6:23:57      
    on      4/28/12_6:32:13      
    off     4/28/12_6:40:20      
    on      4/28/12_6:40:48      
    off     4/28/12_8:16:07")      
    

    【讨论】:

      【解决方案4】:

      这是一个想法

      library(dplyr)
      library(lubridate)
      
      # Yours data
      a =  
        read.csv(header = F, sep = ";", stringsAsFactors = F,
                 col.names = c("Event","DT.event","off.bout.ID","on.bout.ID","off.time.diff","on.time.diff"),
                 text = gsub(pattern = "\\s+{2}",replacement = ";", 
                             x="off     4/27/12 17:25:13    1          0           NA               NA
                              on      4/27/12 17:25:39    1          1           26               NA
                              off     4/27/12 18:03:29    2          1           NA               2270
                              on      4/27/12 18:03:57    2          2           28               NA
                              off     4/27/12 19:41:16    3          2           NA               5839
                              on      4/27/12 19:43:50    3          3           154              NA
                              off     4/28/12 6:23:57     4          3           NA               38407
                              on      4/28/12 6:32:13     4          4           496              NA
                              off     4/28/12 6:40:20     5          4           NA               487
                              on      4/28/12 6:40:48     5          5           28               NA
                              off     4/28/12 8:16:07     6          5           NA               5719"
                 )
        ) %>% mutate(DT.event = as.POSIXct(DT.event, format = "%m/%d/%Y %H:%M:%S")
                    )
      # Ordering by time, if it isn't ordered
      a = a[order(a$DT.event),]
      
      # Build a trick column to calculate time difs with 'next_event'
      a[,"next_eve"] = as.POSIXct(c(a$DT.event[2:nrow(a)],NA))
      
      # Build column with time difference by "complete" hours
      a = a %>%
            mutate(dif_comp_hour_sec =  
                     case_when(
                       floor_date(next_eve,unit = "hour") > floor_date(next_eve,unit = "hour") ~ as.numeric(floor_date(next_eve,unit = "hour") - DT.event),
                                        T ~ as.numeric(next_eve - DT.event  )
                                       )
                  )
      

      如果需要,您可以使用“事件”列再次拆分为开/关列。

      这里是输出:

      #    Event          DT.event off.bout.ID on.bout.ID off.time.diff on.time.diff          next_eve dif_comp_hour_sec
      # 1    off 12-04-27 17:25:13           1          0            NA           NA 12-04-27 17:25:39                26
      # 2     on 12-04-27 17:25:39           1          1            26           NA 12-04-27 18:03:29              2270
      # 3    off 12-04-27 18:03:29           2          1            NA         2270 12-04-27 18:03:57                28
      # 4     on 12-04-27 18:03:57           2          2            28           NA 12-04-27 19:41:16              5839
      # 5    off 12-04-27 19:41:16           3          2            NA         5839 12-04-27 19:43:50               154
      # 6     on 12-04-27 19:43:50           3          3           154           NA 12-04-28 06:23:57             38407
      # 7    off 12-04-28 06:23:57           4          3            NA        38407 12-04-28 06:32:13               496
      # 8     on 12-04-28 06:32:13           4          4           496           NA 12-04-28 06:40:20               487
      # 9    off 12-04-28 06:40:20           5          4            NA          487 12-04-28 06:40:48                28
      # 10    on 12-04-28 06:40:48           5          5            28           NA 12-04-28 08:16:07              5719
      # 11   off 12-04-28 08:16:07           6          5            NA         5719              <NA>                NA
      

      【讨论】:

      • 否,因为时差还没有分成小时“箱”。例如,2,270 秒的持续时间从 17:25:39 开始,一直到 18:03:57。所以我需要剪掉这 3 分 29 秒,成为 18:00 小时“垃圾箱”的一部分。除了这 11 个事件之外,我还有更多天的数据,所以我想自动化这个而不是说,从 2,270 个嵌套长度中减去 3 分 29 秒(并从每个重叠一个小时的事件中减去手工)。
      • 嗯,现在我想我明白了。您指定的垃圾箱是 [00:00:00; 01:00:00) .... [13:00:00; 14:00:00) ... ?
      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2022-10-23
      • 1970-01-01
      • 2021-03-23
      • 1970-01-01
      • 2022-01-22
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多