【问题标题】:Update two columns that are interdependent row-wise using data.table使用 data.table 逐行更新相互依赖的两列
【发布时间】:2018-05-16 00:32:17
【问题描述】:

我想创建一个数据表,其中包含巴士站之间的出发和到达时间。这是我的data.table 的格式。 (下面的可重现数据集)

    trip_id stop_sequence arrival_time departure_time travel_time
 1:       a             1     07:00:00       07:00:00    00:00:00
 2:       a             2     00:00:00       00:00:00    00:02:41
 3:       a             3     00:00:00       00:00:00    00:01:36
 4:       a             4     00:00:00       00:00:00    00:02:39
 5:       a             5     00:00:00       00:00:00    00:02:28
 6:       b             1     07:00:00       07:00:00    00:00:00
 7:       b             2     00:00:00       00:00:00    00:00:00
 8:       b             3     00:00:00       00:00:00    00:01:36
 9:       b             4     00:00:00       00:00:00    00:00:37
10:       b             5     00:00:00       00:00:00    00:03:00

这是它应该如何工作的。这个想法是车辆按照停止顺序行驶。例如,在行程a 中,车辆从停止1 行驶到停止2 需要00:02:41。给定乘客在每个停靠站进出车辆的固定时间为 40 秒,巴士将从停靠站2 出发"07:03:21"

这里的问题是,这是两列之间的逐行迭代过程。直觉上,我会选择for set loop in data.table,但我无法理解这一点。帮忙?

可重现的数据集:

library(data.table)
library(chron)

dt <- structure(list(trip_id = c("a", "a", "a", "a", "a", "b", "b", 
      "b", "b", "b"), stop_sequence = c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 
      3L, 4L, 5L), arrival_time = structure(c(0.291666666666667, 0, 
      0, 0, 0, 0.291666666666667, 0, 0, 0, 0), format = "h:m:s", class = "times"), 
      departure_time = structure(c(0.291666666666667, 0, 0, 0, 
      0, 0.291666666666667, 0, 0, 0, 0), format = "h:m:s", class = "times"), 
      travel_time = structure(c(0, 0.00186598685444013, 0.00110857958406301, 
      0.00183749407361369, 0.00171664297781446, 0, 0.000522388450578203, 
      0.00111473367541453, 0.000427755975518318, 0.00207918951573377
      ), format = "h:m:s", class = "times")), .Names = c("trip_id", 
      "stop_sequence", "arrival_time", "departure_time", "travel_time"
      ), class = c("data.table", "data.frame"), row.names = c(NA, -10L
      ))

预期输出:前四行

   trip_id stop_sequence arrival_time departure_time travel_time
1:       a             1     07:00:00       07:00:00    00:00:00
2:       a             2     07:02:41       07:03:21    00:02:41
3:       a             3     07:04:57       07:05:37    00:01:36
4:       a             4     07:08:16       07:08:56    00:02:39

【问题讨论】:

    标签: r dataframe data.table gtfs


    【解决方案1】:

    我认为不用循环也可以做到这一点。我认为您可以在不循环的情况下计算departure_time,然后一旦有了,arrival_time 就是departure_time - 40 seconds

    dt2 <- copy(dt)
    dt2[,c("arrival_time", "departure_time") := .(cumsum(arrival_time + ifelse(travel_time==0, 0, travel_time + times("00:00:40"))) - ifelse(travel_time == 0 , 0, times("00:00:40")),
                                                  cumsum(arrival_time + ifelse(travel_time==0, 0, travel_time + times("00:00:40")))),
        by = trip_id]
    
    dt2
    
     #   trip_id stop_sequence arrival_time departure_time travel_time
     #1:       a             1     07:00:00       07:00:00    00:00:00
     #2:       a             2     07:02:41       07:03:21    00:02:41
     #3:       a             3     07:04:57       07:05:37    00:01:36
     #4:       a             4     07:08:16       07:08:56    00:02:39
     #5:       a             5     07:11:24       07:12:04    00:02:28
     #6:       b             1     07:00:00       07:00:00    00:00:00
     #7:       b             2     07:00:45       07:01:25    00:00:45
     #8:       b             3     07:03:01       07:03:41    00:01:36
     #9:       b             4     07:04:18       07:04:58    00:00:37
    #10:       b             5     07:07:58       07:08:38    00:03:00
    

    或者,因此您不必为departure_time 重复很长的cumsum 来获得arrival_time,您可以这样做:

    dt2[,departure_time := cumsum(arrival_time + ifelse(travel_time==0, 0, travel_time + times("00:00:40"))), by = trip_id]
    dt2[, arrival_time := departure_time - ifelse(travel_time == 0 , 0, times("00:00:40"))]
    

    @eddi 发布的第三个选项:

    dt[, departure_time := arrival_time[1] + cumsum(travel_time) + (0:(.N-1))*times('00:00:40'), by = trip_id]
    dt[, arrival_time := c(arrival_time[1], tail(departure_time, -1) - times('00:00:40')), by = trip_id]
    

    【讨论】:

    • dt[, departure_time := arrival_time[1] + cumsum(travel_time) + (0:(.N-1))*times('00:00:40'), by = trip_id]; dt[, arrival_time := c(arrival_time[1], tail(departure_time, -1) - times('00:00:40')), by = trip_id]
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2011-04-26
    • 2021-09-19
    • 1970-01-01
    • 2014-01-15
    • 2010-11-11
    • 2018-08-04
    • 1970-01-01
    相关资源
    最近更新 更多