【发布时间】:2018-07-07 05:53:54
【问题描述】:
我遇到了与时间差异有关的问题,我正在尝试通过dplyr 解决。我的初始数据框如下所示:
Paper <- data.frame(
Student = c("A", "A", "A", "A", "A", "A", "B", "B", "B", "B"),
Dates = c("2014-04-17", "2014-04-17", "2014-04-17", "2014-04-17", "2014-04-18", "2014-04-18", "2014-04-18", "2014-04-18", "2014-04-18","2014-04-18"),
Time = c("10:35:00", "11:25:00", "19:15:00", "21:00:00", "22:00:00", "22:21:26", "10:25:00", "11:15:00", "16:05:00", "17:25:00"),
Connection = c("Initial", "Final", "Initial", "Final", "Initial", "Final", "Initial", "Final", "Initial", "Final")
)
或
Student Dates Time Connection
A 2014-04-17 10:35:00 Initial
A 2014-04-17 11:25:00 Final
A 2014-04-17 19:15:00 Initial
A 2014-04-17 21:00:00 Final
A 2014-04-18 22:00:00 Initial
A 2014-04-18 22:21:26 Final
B 2014-04-18 10:25:00 Initial
B 2014-04-18 11:15:00 Final
B 2014-04-18 16:05:00 Initial
B 2014-04-18 17:25:00 Final
考虑到计算的实时时间在"Initial" 和"Final" Connection 之间,我想知道每个Date 专用的时间。
所以我预期的数据框应该是这样的:
Student Dates Time (Minutes)
A 14-04-17 155
A 14-04-18 21.43
B 14-04-18 130
我已经尝试过了,我几乎得到了解决方案,但我不知道如何考虑计算连接之间的时间差("Initial"/"Final")所以我得到了这个:
Paper$Dates <- as.Date(Paper$Dates, "%Y-%m-%d")
Paper$Time <- as.numeric(as.POSIXct(as.character(Paper$Time),
format = "%H:%M:%S"))
FinalPaper <-
Paper %>%
group_by(Student, Dates) %>%
summarise(TimeSpent = sum(diff(Time))) %>%
mutate(TimeSpent = TimeSpent/60) %>%
mutate(TimeSpent = round(TimeSpent, digits = 2))
结果
Student Dates TimeSpent
1 A 2014-04-17 625.00
2 A 2014-04-18 21.43
3 B 2014-04-18 420.00
从TimeSpent 中可以看出,时间更高,这是因为我没有考虑连接,所以它计算了错误的时间。例如对于学生 A,它正在计算 10:35:00 和 21:00:00 之间的时间,这是错误的。
非常感谢!!
【问题讨论】:
-
好问题,很好解释的问题和可重现的数据和代码。谢谢你;)但有一件事;在您预期的
data.frame中,A 14-04-18 60行错了? -
好的,谢谢!
标签: r datetime dataframe dplyr