【问题标题】:Cleaning origin-destination data in R在 R 中清理起点-终点数据
【发布时间】:2018-02-15 07:50:47
【问题描述】:

我有类似这样的行程数据

ClientID <- c("45675")
Date <- c("10/10/2016")
PickUpAddress <- c("123 Street", "45 Way", "66 Blvd")
DropOffAddress <- c("45 Way", "66 Blvd", "123 Street")
PickUpTime <- c("08:00", "17:00", "18:00")
DropOffTime <- c("8:30", "17:30", "19:00")

df <- data.frame(ClientID, Date, PickUpAddress, DropOffAddress, PickUpTime, DropOffTime)

df
  ClientID       Date PickUpAddress DropOffAddress PickUpTime DropOffTime
1    45675 10/10/2016    123 Street         45 Way      08:00        8:30
2    45675 10/10/2016        45 Way        66 Blvd      17:00       17:30
3    45675 10/10/2016       66 Blvd     123 Street      18:00       19:00

但每年都有数千条记录和每个客户的不同旅行次数。

本例中的第三行是回程(到原始起点的行程)。我想从数据库中删除所有回程。

有什么建议吗?

【问题讨论】:

  • 所有行程都有回程段吗?客户可以在一天内(或多天)进行多次旅行吗?
  • 行程不一定有回程段,客户一天可以多次行程。
  • 你如何定义原始来源?
  • 没有很好的方法来定义原始来源,除了我希望它是一天第一次旅行的接送地址。

标签: r dplyr


【解决方案1】:

您可以尝试以下基于客户端家庭地址定义的解决方案。

library(dplyr)
library(lubridate)

# create date/time format variables
df$Date_PickUpTime <- paste(df$Date, df$PickUpTime, sep = " ")
df$Date_DropOffTime <- paste(df$Date, df$DropOffTime, sep = " ")

df$Date_PickUpTime <- mdy_hm(df$Date_PickUpTime)
df$Date_DropOffTime <- mdy_hm(df$Date_DropOffTime)

str(df) # as you can see Date_PickUpTime and Date_DropOffTime are in POSIXct format

# define the client home address
df %>%
  group_by(ClientID) %>%                 # group by client
  arrange(Date_PickUpTime) %>%           # order the data by Date_PickUpTime
  mutate(HomeAddress = PickUpAddress[1]) # client home address is the first PickUpAddress

# ... then add filter to the above code

df %>%
  group_by(ClientID) %>% # group by client
  arrange(Date_PickUpTime) %>%      # order the data
  mutate(HomeAddress = PickUpAddress[1]) %>% # client home address
  filter(DropOffAddress != HomeAddress) # condition for filter:
                                        # DropOffAddress is different to HomeAddress
                                        # return trip (3rd) is not selected

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2020-11-24
    • 2020-04-28
    • 1970-01-01
    • 2018-09-27
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多