【问题标题】:Grouping time and counting instances by 12 hour bins in R在 R 中按 12 小时 bin 对时间进行分组和计数实例
【发布时间】:2016-06-18 12:16:37
【问题描述】:

我有一个这样的数据框 df1:

    timestamp
01-12-2015 00:04
01-12-2015 02:20
01-12-2015 02:43
01-12-2015 04:31
01-12-2015 08:51
01-12-2015 11:28
01-12-2015 20:53
01-12-2015 21:28
02-12-2015 00:30
02-12-2015 20:22

其中包含时间戳。我想通过在 12 小时间隔内合并小时数来计数,即(01-12-2015[0-9]、01-12-2015[9-21] 等等。

输出样本:

DayOfMonth Group count
    1   1   5
    1   2   2
    2   1   2
    2   2   1

也可以用序列号替换月份中的日期,从 1 开始。非常感谢任何帮助解决这个问题。

【问题讨论】:

  • 转换成Date后可以使用cut
  • 谢谢,我一直在尝试使用 cut 功能,但无法获得所需的输出。
  • 没问题。我将其发布为解决方案。
  • Ehm... [0-9] - [9-21] 只是 9 小时的间隔,而不是 12...我没有得到什么吗?
  • 当然,[21-0] 是前一天与后天 [0-9] 的组合

标签: r


【解决方案1】:

基础 R 中的可能解决方案:

# convert the 'timestamp' column to a datetime format
df1$timestamp <- as.POSIXct(strptime(df1$timestamp, format = '%d-%m-%Y %H:%M'))
# create day.of.month variable
df1$day.of.month <- format(df1$timestamp, '%d')
# extract the 12 hour interval as am/pm values
df1$group <- gsub('[0-9: ]+','\\1',format(df1$timestamp, '%r'))
# aggregate
aggregate(. ~ group + day.of.month, df1, length)

给出:

  group day.of.month timestamp
1    am           01         6
2    pm           01         2
3    am           02         1
4    pm           02         1

另一个使用data.tablepm函数的解决方案lubridate

library(lubridate)
library(data.table)
setDT(df1)[, timestamp := dmy_hm(timestamp)
           ][, group := pm(timestamp)+1
             ][, .N, .(day.of.month = day(timestamp),group)]

给出:

   day.of.month group N
1:            1     1 6
2:            1     2 2
3:            2     1 1
4:            2     2 1

使用过的数据:

df1 <- structure(list(timestamp = c("01-12-2015 00:04", "01-12-2015 02:20", "01-12-2015 02:43", "01-12-2015 04:31", "01-12-2015 08:51", 
                                    "01-12-2015 11:28", "01-12-2015 20:53", "01-12-2015 21:28", "02-12-2015 00:30", "02-12-2015 20:22")),
                 .Names = "timestamp", class = "data.frame", row.names = c(NA,-10L))

【讨论】:

  • 谢谢,这很好用,但如果我有多个月的数据,它会汇总计数。为此,我将 df1$month 添加为变量。
  • @warwick12 你可以使用month(df1$timestamp) (data.table/lubridate) 或format(df1$timestamp, '%m') (base R) 来创建月份变量
【解决方案2】:

base R 方法相比,我们可以使用lubridate 函数轻松转换为'Datetime' 类,并使用dplyr 更有效地获得输出。

library(lubridate)
library(dplyr)
df1 %>% 
    mutate(timestamp = dmy_hm(timestamp)) %>%
    group_by(DayOfMonth = day(timestamp)) %>%
    group_by(Group = as.numeric(cut(timestamp, breaks = "12 hour")), 
            add=TRUE)  %>% 
    summarise(GroupCount = n())
#     DayOfMonth Group GroupCount
#         <int> <dbl>      <int>
#1          1     1          6
#2          1     2          2
#3          2     1          1
#4          2     2          1

或者我们可以使用 data.table 的紧凑选项

library(data.table)
setDT(df1)[, {t1 <- dmy_hm(timestamp); .(DayOfMonth = day(t1), 
   Group = (hour(t1)>12)+1L)}][, .(GroupCount = .N), .(DayOfMonth, Group)]
#     DayOfMonth Group GroupCount
#1:          1     1          6
#2:          1     2          2
#3:          2     1          1
#4:          2     2          1

注意:data.table 解决方案只需两个步骤即可完成...

数据

df1 <- structure(list(timestamp = c("01-12-2015 00:04", "01-12-2015 02:20", 
"01-12-2015 02:43", "01-12-2015 04:31", "01-12-2015 08:51", "01-12-2015 11:28", 
"01-12-2015 20:53", "01-12-2015 21:28", "02-12-2015 00:30", "02-12-2015 20:22"
)), .Names = "timestamp", class = "data.frame", row.names = c(NA,-10L))

【讨论】:

  • 谢谢,但我收到此错误:UseMethod("mutate_") 中的错误:没有适用于 'mutate_' 的方法应用于类“c('POSIXct', 'POSIXt')”的对象.要不要更改时间戳格式?
  • 谢谢,我认为问题出在时间戳格式上。我仍然收到错误:所有格式都无法解析
【解决方案3】:

base R 中的另一种可能的解决方案:

timeStamp <- c("01-12-2015 00:04","01-12-2015 02:20","01-12-2015 02:43","01-12-2015 04:31",
               "01-12-2015 08:51","01-12-2015 11:28","01-12-2015 20:53","01-12-2015 21:28",
               "02-12-2015 00:30","02-12-2015 20:22")
times <- as.POSIXlt(timeStamp,format="%d-%m-%Y %H:%M",tz='GMT')

DF <- data.frame(Times=times)
DF$Group <- as.logical(times$hour > 12) + 1
DF$DayOfMonth <- times$mday

res <- aggregate(Times ~ DayOfMonth + Group,data=DF, FUN = length)

# res :
#   DayOfMonth Group Times
# 1          1     1     6
# 2          2     1     1
# 3          1     2     2
# 4          2     2     1

或者,如果您想在小时范围内包含日期:[21-0] 前一天的第二天:

timeStamp <- c("01-12-2015 00:04","01-12-2015 02:20","01-12-2015 02:43","01-12-2015 04:31",
               "01-12-2015 08:51","01-12-2015 11:28","01-12-2015 20:53","01-12-2015 21:28",
               "02-12-2015 00:30","02-12-2015 20:22")
times <- as.POSIXlt(timeStamp,format="%d-%m-%Y %H:%M",tz='GMT')
h <- times$hour + times$min*1/60 + times$sec*1/3600
# here we add 3 hours to the dates in hours range [21-0] in this way we
# push them to the next day
times[h >= 21] <- times[h >= 21] + 3*3600

DF <- data.frame(Times=times)
DF$Group <- ifelse(h < 9,1,ifelse(h <= 21,2,NA))
DF$DayOfMonth <- times$mday

res <- aggregate(Times ~ DayOfMonth + Group,data=na.omit(DF), FUN = length)

# res :
#   DayOfMonth Group Times
# 1          1     1     5
# 2          2     1     2
# 3          1     2     2
# 4          2     2     1

【讨论】:

    【解决方案4】:

    除了已经提供的几个选项之外,stringi 包还具有一些日期解析功能:

    library(stringi)
    df1$timestamp <- stri_datetime_parse(df1$timestamp, format = 'dd-mm-yyyy HH:mm')
    df1$DayOfMonth <- stri_datetime_format(df1$timestamp, format = 'd')
    df1$Group <- stri_datetime_format(df1$timestamp, format = 'a')
    

    之后,您可以使用以下两个选项进行计数:

    # option 1:
    aggregate(. ~ Group + DayOfMonth, df1, length) # copied from @ProcrastinatusMaximus
    
    # option 2a:
    library(dplyr)
    df1 %>% 
      group_by(DayOfMonth, Group) %>% 
      tally()
    
    # option 2b:
    count(df1, DayOfMonth, Group)
    

    后者的输出:

      DayOfMonth Group     n
           (chr) (chr) (int)
    1          1  a.m.     6
    2          1  p.m.     2
    3          2  a.m.     1
    4          2  p.m.     1
    

    【讨论】:

      猜你喜欢
      • 2023-03-26
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-03-15
      • 2012-07-07
      • 1970-01-01
      • 2020-04-24
      相关资源
      最近更新 更多