【问题标题】:Fuzzy Join with POSIXct and POSIXt使用 POSIXct 和 POSIXt 进行模糊连接
【发布时间】:2020-11-26 14:18:14
【问题描述】:
test1 <- structure(list(trip_count = 1:10, pickup_datetime = structure(c(1357019059, 
1357019939, 1357022493, 1357023065, 1357024439, 1357025235, 1357026348, 
1357026924, 1357027562, 1357028863), tzone = "UTC", class = c("POSIXct", 
"POSIXt")), dropoff_datetime = structure(c(1357019158, 1357021384, 
1357023008, 1357024189, 1357024694, 1357025815, 1357026604, 1357027240, 
1357027830, 1357029381), tzone = "UTC", class = c("POSIXct", 
"POSIXt"))), row.names = c(NA, -10L), class = c("tbl_df", "tbl", 
"data.frame"))
test2 <- structure(list(DATE = structure(c(1357001460, 1357005060, 1357008660, 
1357012260, 1357015860, 1357019460, 1357023060, 1357026660, 1357030260, 
1357033860), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    HourlyDryBulbTemperature = c(39L, 38L, 39L, 39L, 39L, 39L, 
    39L, 38L, 39L, 39L), HourlyPrecipitation = c(0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0)), row.names = c(NA, 10L), class = "data.frame")

大家好,我有两个数据框,我想根据日期和时间加入它们。它应该是一个fuzzy_join,以便如果来自test2DATEpickup_datetime 和来自test1dropoff_datetime 之内,则连接工作。 我试过了

test1 <- fuzzy_left_join(test1,test2,by = c("DATE" = "pickup_datetime", "DATE" = "dropoff_datetime"),match_fun = list(`>=`, `<=`))

但这会返回:Error: All columns in a tibble must be vectors. Column "col" is NULL.

更新:我找到了解决方案

dropoff_data <- str_split_fixed(test1$dropoff_datetime, " ", 2)
colnames(dropoff_data) <- c("join_date","dropoff_time")
test1 <- cbind.data.frame(test1,dropoff_data)
test1$join_time <- hour(hms(as.character(test1$dropoff_time)))
rm(dropoff_data)

dropoff_data <- str_split_fixed(test2$DATE, " ", 2)
colnames(dropoff_data) <- c("join_date","time")
test2 <- cbind.data.frame(test2,dropoff_data)
test2$join_time <- hour(hms(as.character(test2$time)))
rm(dropoff_data)

test1 <- left_join(test1,test2,by = c("join_date","join_time"))

谢谢大家!

【问题讨论】:

  • 也许你需要颠倒你的列名...试试:fuzzy_left_join(test1, test2, by = c("pickup_datetime" = "DATE", "dropoff_datetime" = "DATE"), match_fun = list(, >=))
  • 大家好,感谢您的回复。不幸的是,无论哪种方式,错误消息都会再次出现。看起来错误可能在其他地方......

标签: r datetime tibble fuzzyjoin


【解决方案1】:

也许这样的事情可以帮助你:

library(data.table)
setDT(test1)
setDT(test2)
t <- test1[, c(.SD, as.list(test2)), by = 1:nrow(test1)]
t[DATE >= pickup_datetime & DATE <= dropoff_datetime]

【讨论】:

  • 谢谢!这种解决方案将两个表连接起来,但不在正确的行上。结果并没有真正的意义。
  • 它只是一个笛卡尔积,最后一行过滤了 DATE 介于 piccup_datatime 和 dropoff_datetime 之间的情况。我不明白你为什么认为它没有意义?
  • 可能你想把结果存到表t中,所以在最后一行的开头加上t &lt;-..
猜你喜欢
  • 2018-07-25
  • 2019-05-24
  • 2017-01-31
  • 1970-01-01
  • 2019-03-07
  • 2019-09-24
  • 1970-01-01
  • 2012-06-21
  • 2022-01-10
相关资源
最近更新 更多