【问题标题】:Time data populating in r时间数据填充 r
【发布时间】:2019-11-27 13:03:21
【问题描述】:

我有一个数据框,它记录出租车的出行时间。当他们旅行和闲置时,我需要将这些数据转换为 24 小时的持续时间。 下面我分享一个示例数据和所需的数据框

df <- data.frame(cab_id = c("c1","c1","c2*","c3","c3","c3","c4","c4"),
                 trip_id = c("101","102","103","104","105","106","107","108"),
                 trip_start = c("15:00", "21:27", " 23:11", "   09:33", "   17:41", "22:11", "21:31", "23:47"),
                 trip_end = c("18:30", "23:33", "02:30", "12:11", "20:18", "01:15", "22:45", "02:12"))



reqd_df <- data.frame(time = c("0","0.25","0.5","0.75","1","1.25","1.5","1.75","2","2.25","2.5","2.75","3","3.25","3.5","3.75","4","4.25","4.5","4.75","5","5.25","5.5","5.75","6","6.25","6.5","6.75","7","7.25","7.5","7.75","8","8.25","8.5","8.75","9","9.25","9.5","9.75","10","10.25","10.5","10.75","11","11.25","11.5","11.75","12","12.25","12.5","12.75","13","13.25","13.5","13.75","14","14.25","14.5","14.75","15","15.25","15.5","15.75","16","16.25","16.5","16.75","17","17.25","17.5","17.75","18","18.25","18.5","18.75","19","19.25","19.5","19.75","20","20.25","20.5","20.75","21","21.25","21.5","21.75","22","22.25","22.5","22.75","23","23.25","23.5","23.75"),
                      c1 = c("0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","1","1","1","1","1","1","1","1","1","1","1","1","1","1","1","0","0","0","0","0","0","0","0","0","0","0","1","1","1","1","1","1","1","1","1","0"),
                      c2 = c("1","1","1","1","1","1","1","1","1","1","1","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","1","1","1"),
                      c3 = c("1","1","1","0","1","1","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","1","1","1","1","1","1","1","1","1","1","1","1","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","1","1","1","1","1","1","1","1","1","1","1","0","0","0","0","0","0","0","1","1","1","1","1","1","1"),
                      c4 = c("1","1","1","1","1","1","1","1","1","1","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","1"))

reqd_df_2 <- melt(reqd_df,id = 1)


ggplot(reqd_df_2, aes(x = time, y = value,  label = value, fill= variable ))+
  geom_bar(stat = "identity")+
  facet_grid(rows = vars(variable))



Q1:如何将数据帧 df 转换为数据帧 reqd_df, 其中时间四舍五入到附近的一刻钟,间隔如 (1:15, 1:30, 1:45, 2:00)。如果出租车在特定的一刻钟内行驶,则将其填充为 1,如果出租车空闲,则将其填充为 0。

注1:0、0.25、0:50、0:75、1:00分别代表0:00、0:15、0:30、0:45、1:00,以此类推。

注意 2:一些行程从一天的结束时间开始,例如 22:11 并在 02:15 左右结束,从 22:15 到 23:45 和 00:00 到 02:15 数据将填充 1

提前致谢

【问题讨论】:

  • 您已经在为每个驾驶室 (c1,c2,c3,c4) 绘制图表,并在 y 轴上指示活动或空闲...该图表有什么问题?
  • 您只是关心使时间成为一个连续变量而不是一个因素吗?如果是这样,请在创建数据框时使用time=seq(0,23.75,0.25),这样就可以了。
  • Paolo Lorenzini 如何将 df 转换为 reqd_df,
  • @phalteman 我需要将数据帧 df 转换为 reqd_df

标签: r data.table tidyverse tidyr lubridate


【解决方案1】:

使用data.table::as.ITime 和最近的滚动连接(即roll="nearest"):

library(data.table)
setDT(df)

#convert into ITime
cols <- c("trip_start", "trip_end")
df[, (cols) := lapply(.SD, function(x) as.ITime(x)), .SDcols=cols]

#sequence of times with 15 mins interval
sec15min <- 15 * 60
alltimes <- as.ITime(seq(as.ITime("00:00"), as.ITime("23:59"), sec15min))
intl <- data.table(INTL=alltimes)[, VAL:=INTL]

#find nearest 15min interval
df[, near_start := intl[df, on=.(INTL=trip_start), roll="nearest", VAL]]    
df[, near_end := intl[df, on=.(INTL=trip_end), roll="nearest", VAL]]

#get straight to reqd_df_2 
df[, {
        if (near_start <= near_end)
            biz <- seq(near_start, near_end, sec15min)
        else
            biz <- c(seq(as.ITime("00:00"), near_end, sec15min),
                seq(near_start, as.ITime("23:59"), sec15min))

        .(time=as.ITime(alltimes),
            busy=replace(logical(length(alltimes)), alltimes %in% biz, TRUE))
    },
    by=.(cab_id, trip_id)][,
        .(busy=+any(busy)), by=.(time, cab_id)]

输出:

         time cab_id busy
  1: 00:00:00     c1    0
  2: 00:15:00     c1    0
  3: 00:30:00     c1    0
  4: 00:45:00     c1    0
  5: 01:00:00     c1    0
 ---                     
380: 22:45:00     c4    1
381: 23:00:00     c4    0
382: 23:15:00     c4    0
383: 23:30:00     c4    0
384: 23:45:00     c4    1

p.s.:我认为 OP 在所需的输出中缺少行程 107。

【讨论】:

    猜你喜欢
    • 2015-12-03
    • 2019-05-16
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-07-24
    • 1970-01-01
    • 2020-11-13
    • 2015-07-08
    相关资源
    最近更新 更多