【问题标题】:Calculating minutes per day given the start and stop times在给定开始和停止时间的情况下计算每天的分钟数
【发布时间】:2013-06-29 22:43:12
【问题描述】:

如何从时间间隔中提取分钟,以便返回每个日期的分钟数?

我有一个数据框,其中包含案例 ID、开始和停止时间以及总经过时间间隔(以分钟为单位)。我需要弄清楚每个案例的每个日期有多少分钟。

这有点类似于this question,但我无法将这种方法转移到这种情况。

原始数据集的结构是这样的,但最终的解决方案需要扩大规模以允许更多情况(请参阅下面的替代数据集)。

原始数据

>so
  Case               Start                Stop   Minutes
1    A 2013-01-10 18:00:00 2013-01-10 20:30:00  150 mins
2    B 2013-01-15 22:45:00 2013-01-16 00:15:00   90 mins
3    C 2013-01-20 22:00:00 2013-01-22 02:00:00 1680 mins
4    D 2013-01-27 12:00:00 2013-01-28 06:00:00 1080 mins
5    E 2013-01-01 00:00:00 2013-01-01 02:00:00  120 mins
6    F 2013-01-02 08:00:00 2013-01-03 07:00:00 1380 mins

想要的输出

  Case       Date Minutes
1    A 2013-01-10     150
2    B 2013-01-15      75
3    B 2013-01-16      15
4    C 2013-01-20     120
5    C 2013-01-21    1440
6    C 2013-01-22     120
7    D 2013-01-27     720
8    D 2013-01-28     360
9    E 2013-01-01     120
10   F 2013-01-02     960 
11   F 2013-01-03     420

原始数据的 DPUT

> dput(so)  
structure(list(  
Case = structure(1:5, .Label = c("A", "B", "C", "D", "E"), class = "factor"), 
Start = structure(c(2L, 3L, 4L, 5L, 1L), .Label = c("2013-01-01 00:00:00", "2013-01-10 18:00:00", "2013-01-15 22:45:00", "2013-01-20 22:00:00", "2013-01-27 12:00:00"), class = "factor"),  
Stop = structure(c(2L, 3L, 4L, 5L, 1L), .Label = c("2013-01-01 02:00:00", "2013-01-10 20:30:00", "2013-01-16 00:15:00", "2013-01-22 02:00:00", "2013-01-28 06:00:00"), class = "factor"), 
Minutes = structure(c(150, 90, 1680, 1080, 120), tzone = "", units = "mins", class = "difftime")
), 
.Names = c("Case","Start", "Stop", "Minutes"), 
row.names = c(NA, -5L), 
class = "data.frame")  

替代较大数据集的 DPUT

> dput(so_alt)
structure(list(Case = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L), .Label = c("A", "B", "C", "D", "E", "F", "G", "H"), class = "factor"),     Start = structure(c(12L, 13L, 4L, 5L, 19L, 20L, 21L, 30L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 9L, 6L, 7L, 8L, 3L, 10L, 11L, 14L, 15L, 16L), .Label = c("2012-10-25 18:10:00", "2012-11-01 20:18:00", "2012-11-16 17:45:00", "2012-12-06 01:00:00", "2012-12-06 08:00:00", "2012-12-26 13:15:00", "2012-12-29 19:50:00", "2012-12-30 00:00:00", "2013-01-01 01:46:00", "2013-01-10 20:15:00", "2013-01-11 09:00:00", "2013-01-29 17:00:00", "2013-02-05 21:30:00", "2013-02-21 01:50:00", "2013-02-21 09:25:00", "2013-02-21 12:20:00", "2013-02-22 21:45:00", "2013-02-24 15:30:00", "2013-03-01 10:10:00", "2013-03-06 16:15:00", "2013-03-07 20:00:00", "2013-03-12 21:00:00", "2013-03-13 05:15:00", "2013-03-14 00:45:00", "2013-03-14 11:30:00", "2013-03-15 21:00:00", "2013-03-16 08:15:00", "2013-03-17 06:45:00", "2013-03-18 18:04:00", "2013-03-21 21:40:00"), class = "factor"), Stop = structure(c(12L, 13L, 4L, 5L, 19L, 20L, 21L, 30L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 9L, 6L, 7L, 8L, 3L, 10L, 11L, 14L, 15L, 16L), .Label = c("2012-10-25 21:30:00", "2012-11-03 08:00:00", "2012-11-17 08:45:00", "2012-12-06 05:00:00", "2012-12-06 09:30:00", "2012-12-29 19:50:00", "2012-12-30 00:00:00", "2012-12-31 20:00:00", "2013-01-02 06:00:00", "2013-01-10 22:13:00", "2013-01-12 03:30:00", "2013-01-30 07:45:00", "2013-02-06 07:30:00", "2013-02-21 09:25:00", "2013-02-21 12:20:00", "2013-02-22 11:30:00", "2013-02-24 10:15:00", "2013-02-24 16:45:00", "2013-03-06 07:45:00", "2013-03-07 07:30:00", "2013-03-08 08:00:00", "2013-03-13 05:00:00", "2013-03-14 00:45:00", "2013-03-14 10:15:00", "2013-03-14 19:00:00", "2013-03-15 22:36:00", "2013-03-16 19:00:00", "2013-03-18 15:15:00", "2013-03-18 22:00:00", "2013-03-22 08:30:00"), class = "factor"), Minutes = structure(c(885, 600, 240, 90, 7055, 915, 720, 650, 480, 1170, 570, 450, 96, 645, 1950, 236, 1694, 4715, 250, 2640, 900, 118, 1110, 455, 175, 1390), class = "difftime", units = "mins")), .Names = c("Case", "Start", "Stop", "Minutes"), row.names = c(NA, 26L), class = "data.frame")  
>  

【问题讨论】:

    标签: r time


    【解决方案1】:

    这是一种方法。

    library(plyr)
    
    ddply(ddply(so, 1, function(x) {
        d1 <- as.Date(x$Start)
        d2 <- as.Date(x$Stop)
        if (d1 == d2) data.frame(date = d1, min = x$Minutes)
        else {
            ret <- rbind(data.frame(date=d1, min = difftime(as.POSIXct(paste(d1+1, "00:00:00")), as.POSIXct(x$Start), units = "mins")), 
            data.frame(date=d2, min=difftime(as.POSIXct(x$Stop), as.POSIXct(paste(d2, "00:00:00")), units = "mins")))
            if (d2-d1>1) {
                ret <- rbind(ret, data.frame(date = seq(d1+1, d2-1, by = "day"), min = 60*24))
            }
            ret
        }
    }), .(Case, date), summarise, min = sum(min))
    

    输出是:

      Case       date       min
    1    A 2013-01-10  150 mins
    2    B 2013-01-15   75 mins
    3    B 2013-01-16   15 mins
    4    C 2013-01-20  120 mins
    5    C 2013-01-21 1440 mins
    6    C 2013-01-22  120 mins
    

    The final version here:
    
    ddply(ddply(data.frame(.id = 1:nrow(so_alt), so_alt), .(.id, Case), function(x) {
        d1 <- as.Date(x$Start)
        d2 <- as.Date(x$Stop)
        if (d1 == d2) data.frame(date = d1, min = x$Minutes)
        else {
            ret <- rbind(data.frame(date=d1, min = difftime(as.POSIXct(paste(d1+1, "00:00:00")), as.POSIXct(x$Start), units = "mins")), 
            data.frame(date=d2, min=difftime(as.POSIXct(x$Stop), as.POSIXct(paste(d2, "00:00:00")), units = "mins")))
            if (d2-d1>1) {
                ret <- rbind(ret, data.frame(date = seq(d1+1, d2-1, by = "day"), min = 60*24))
            }
            ret
        }
    }), .(Case, date), summarise, min = sum(min))
    

    和输出:

       Case       date       min
    1     A 2013-01-01 1334 mins
    2     A 2013-01-02  360 mins
    3     A 2013-01-29  420 mins
    4     A 2013-01-30  465 mins
    5     A 2013-02-21  175 mins
    6     A 2013-03-12  180 mins
    7     A 2013-03-13  300 mins
    8     B 2012-12-26  645 mins
    9     B 2012-12-27 1440 mins
    10    B 2012-12-28 1440 mins
    11    B 2012-12-29 1190 mins
    12    B 2013-02-05  150 mins
    13    B 2013-02-06  450 mins
    14    B 2013-02-21  700 mins
    15    B 2013-02-22  690 mins
    16    B 2013-03-13 1125 mins
    17    B 2013-03-14   45 mins
    18    C 2012-12-06  240 mins
    19    C 2012-12-29  250 mins
    20    C 2012-12-30    0 mins
    21    C 2013-03-14  570 mins
    22    D 2012-12-06   90 mins
    23    D 2012-12-30 1440 mins
    24    D 2012-12-31 1200 mins
    25    D 2013-03-14  450 mins
    26    E 2012-11-16  375 mins
    27    E 2012-11-17  525 mins
    28    E 2013-03-01  830 mins
    29    E 2013-03-02 1440 mins
    30    E 2013-03-03 1440 mins
    31    E 2013-03-04 1440 mins
    32    E 2013-03-05 1440 mins
    33    E 2013-03-06  465 mins
    34    E 2013-03-15   96 mins
    35    F 2013-01-10  118 mins
    36    F 2013-03-06  465 mins
    37    F 2013-03-07  450 mins
    38    F 2013-03-16  645 mins
    39    G 2013-01-11  900 mins
    40    G 2013-01-12  210 mins
    41    G 2013-03-07  240 mins
    42    G 2013-03-08  480 mins
    43    G 2013-03-17 1035 mins
    44    G 2013-03-18  915 mins
    45    H 2013-02-21  455 mins
    46    H 2013-03-18  236 mins
    47    H 2013-03-21  140 mins
    48    H 2013-03-22  510 mins
    

    【讨论】:

    • 这很好用。但是,当“so”数据框中的案例超过 4 个时,此版本不起作用。您能否修改一下以扩大规模以允许更多案例?
    • 谢谢。这有所帮助,但它仍然不适用于更大的数据集。我添加了一个更大的替代数据集作为示例。
    • @Rollie 它看起来与so_alt 合作。你能把输出或错误信息放上去吗?
    • 问题是我的真实世界数据集中每个独特案例的事件比我在此处发布的示例要多。我已经更新了替代的 so_alt 数据集,它会产生这个错误:Error in seq.Date(d1 + 1, d2 - 1, by = "day") : 'from' must be of length 1 In addition: Warning messages: ......
    • 完美。感谢您的坚持和耐心。我非常感谢专业知识。
    【解决方案2】:

    这里是data.table 解决方案。这个想法很简单(可能效率不高): 为每个案例创建一个分钟向量,将其转换为天向量并使用表格计算频率。

    DT <- as.data.table(dat)
    DT[,list(seqD= seq(as.Date(Start),as.Date(Stop),'day'),
             duration= { 
               seq.minus <- seq(as.POSIXct(Start),
                                       as.POSIXct(Stop),'min')[-1]
               as.data.frame(table(format(seq.minus,'%Y-%m-%d')))$Freq
               }),
       by = list(Case)]
    
       Case       seqD duration
    1:    A 2013-01-10      150
    2:    B 2013-01-15       74
    3:    B 2013-01-16       16
    4:    C 2013-01-20      119
    5:    C 2013-01-21     1440
    6:    C 2013-01-22      121
    

    【讨论】:

      猜你喜欢
      • 2021-12-10
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2022-10-23
      • 1970-01-01
      • 2022-01-07
      • 1970-01-01
      • 2022-08-02
      相关资源
      最近更新 更多