【发布时间】:2022-01-14 16:51:46
【问题描述】:
我有两个数据集,我想在 R 中使用 inner_join 进行合并。问题是第二个数据集包含日期范围,我想保留这些信息。如何将第一个数据集的日期与第二个数据集的日期范围相匹配?下面是一个工作示例。
非常感谢。
library(data.table)
library(dplyr)
# First Dataset
dt_1 <- data.table()
dt_1$city <- c("madrid","milan","milan","paris", "Rome")
dt_1$address <- c("a","a","b","c","d")
dt_1$date_1 <- c( "2017", "2013", "2008", "1901","2009")
dt_1
# Second dataset
dt_2 <- data.table()
dt_2$city <- c("milan","madrid","Porto","Barcelona", "Rome")
dt_2$address <- c("a","a","b","c","d")
dt_2$date_1 <- c( "2012", "2016", "2006", "1900","2009")
dt_2$date_2 <- c( "2015", "NA", "2022", "1930","NA")
dt_2
## How to match the corresponding exact dates of the two datasets BUT ALSO the dates falling -
## in the ranges
# This keeps only if the first date is the same
dt_match <- inner_join(dt_1, dt_2, by = c("city","address","date_1"), keep = TRUE)
# How to achieve this ?
dt_match <- data.table()
dt_match$city <- c("milan","Rome")
dt_match$address <- c("a","c")
dt_match$date <- c( "2013","2009")
dt_match
【问题讨论】:
-
(1) 您的值需要使用数字,而不是字符串。为什么?
"222" >= "2017"是真的,因为它的字典排序。使用as.integer(或as.numeric)修复。 (2) 这不是一个连接:如果是,那么dt_1[2,]将在dt_2[3,]中找到匹配项。看来 Waldi 是正确的,这只是逐行比较。 (3) 如果它是一个连接,但是,在固定数字之后,开始可能是dt_1[dt_2, date_2 := i.date_2, on = .(date_1 >= date_1, date_1 <= date_2)]。
标签: r data.table