【问题标题】:How to make timing differences between groups如何使组之间的时间差异
【发布时间】:2018-07-07 05:53:54
【问题描述】:

我遇到了与时间差异有关的问题,我正在尝试通过dplyr 解决。我的初始数据框如下所示:

Paper <- data.frame(
  Student = c("A", "A", "A", "A", "A", "A", "B", "B", "B", "B"), 
  Dates = c("2014-04-17", "2014-04-17", "2014-04-17", "2014-04-17", "2014-04-18", "2014-04-18", "2014-04-18", "2014-04-18", "2014-04-18","2014-04-18"),
  Time = c("10:35:00", "11:25:00", "19:15:00", "21:00:00", "22:00:00", "22:21:26", "10:25:00", "11:15:00", "16:05:00", "17:25:00"),
  Connection = c("Initial", "Final", "Initial", "Final", "Initial", "Final", "Initial", "Final", "Initial", "Final")
)

   Student      Dates     Time  Connection
       A    2014-04-17  10:35:00    Initial
       A    2014-04-17  11:25:00      Final
       A    2014-04-17  19:15:00    Initial
       A    2014-04-17  21:00:00      Final
       A    2014-04-18  22:00:00    Initial
       A    2014-04-18  22:21:26      Final
       B    2014-04-18  10:25:00    Initial
       B    2014-04-18  11:15:00      Final
       B    2014-04-18  16:05:00    Initial
       B    2014-04-18  17:25:00      Final

考虑到计算的实时时间在"Initial""Final" Connection 之间,我想知道每个Date 专用的时间。

所以我预期的数据框应该是这样的:

  Student    Dates    Time (Minutes)
     A    14-04-17     155
     A    14-04-18   21.43
     B    14-04-18     130

我已经尝试过了,我几乎得到了解决方案,但我不知道如何考虑计算连接之间的时间差("Initial"/"Final")所以我得到了这个:

Paper$Dates <- as.Date(Paper$Dates, "%Y-%m-%d")

Paper$Time <- as.numeric(as.POSIXct(as.character(Paper$Time),
                         format = "%H:%M:%S"))

FinalPaper <- 
  Paper %>% 
  group_by(Student, Dates) %>% 
  summarise(TimeSpent = sum(diff(Time))) %>% 
  mutate(TimeSpent = TimeSpent/60) %>% 
  mutate(TimeSpent = round(TimeSpent, digits = 2))

结果

  Student      Dates   TimeSpent
1       A   2014-04-17    625.00
2       A   2014-04-18     21.43
3       B   2014-04-18    420.00

TimeSpent 中可以看出,时间更高,这是因为我没有考虑连接,所以它计算了错误的时间。例如对于学生 A,它正在计算 10:35:0021:00:00 之间的时间,这是错误的。

非常感谢!!

【问题讨论】:

  • 好问题,很好解释的问题和可重现的数据和代码。谢谢你;)但有一件事;在您预期的data.frame 中,A 14-04-18 60 行错了?
  • 好的,谢谢!

标签: r datetime dataframe dplyr


【解决方案1】:

您可以使用cumsum(Connection == "Initial") 为每个“会话”添加一个ID。这样做的先决条件是数据按照您在此处显示的方式进行排序。然后我们可以计算每个会话的时间差,并再次聚合以获得每个学生每个日期花费的总时间:

Paper <- data.frame(
  Student = c("A", "A", "A", "A", "A", "A", "B", "B", "B", "B"), 
  Dates= c("2014-04-17", "2014-04-17", "2014-04-17", "2014-04-17", "2014-04-18", "2014-04-18", "2014-04-18", "2014-04-18", "2014-04-18","2014-04-18"),
  Time = c("10:35:00", "11:25:00", "19:15:00", "21:00:00", "22:00:00", "22:21:26", "10:25:00", "11:15:00", "16:05:00", "17:25:00"), 
  Connection = c("Initial", "Final", "Initial", "Final", "Initial", "Final", "Initial", "Final", "Initial", "Final")
)

Paper$Dates <- as.Date(Paper$Dates, "%Y-%m-%d")
Paper$Time <- as.numeric(as.POSIXct(as.character(Paper$Time),
                                    format = "%H:%M:%S"))

FinalPaper <- Paper %>% 
  mutate(seqid = cumsum(Connection == "Initial")) %>% 
  group_by(Student, Dates, seqid) %>% 
  summarise(TimeSpent = sum(diff(Time))) %>% 
  group_by(Student, Dates) %>% 
  summarise(TimeSpent = round(sum(TimeSpent)/60,2))

输出:

# A tibble: 3 x 3
# Groups:   Student [2]
  Student      Dates TimeSpent
   <fctr>     <date>     <dbl>
1       A 2014-04-17    155.00
2       A 2014-04-18     21.43
3       B 2014-04-18    130.00

希望这会有所帮助!

【讨论】:

    【解决方案2】:

    这是一个基于data.table 的解决方案:

    library(data.table)
    setDT(Paper)
    Paper[order(Student, Time), .(
        TimeSpend = sum(c(0,diff(Time))[Connection == "Final"])/60
      ), by = .(Student, Dates)]
    
       Student      Dates TimeSpend
    1:       A 2014-04-17 155.00000
    2:       A 2014-04-18  21.43333
    3:       B 2014-04-18 130.00000
    

    【讨论】:

      猜你喜欢
      • 2018-06-21
      • 2014-10-04
      • 2018-10-12
      • 2011-05-17
      • 2012-02-04
      • 1970-01-01
      • 2021-08-26
      • 2014-11-05
      • 1970-01-01
      相关资源
      最近更新 更多