【问题标题】:Merge dataframes based on date criteria in R根据 R 中的日期标准合并数据框
【发布时间】:2021-02-18 18:01:34
【问题描述】:

我有两个数据帧(比如 DF1 和 DF2)。我想根据多个标准合并它们。如果 DF1 的“州”和“城市”与 DF2 的匹配,并且 DF2 的“日期”在 DF1 的“日期”的四年内,那么我想将 DF2 的“边距”列添加到 DF1。如果条件不满足,DF1 的 'margin' 列的值为 NA。

DF1 <- structure(list(date = c("2001-02-14", "2001-06-14", "2004-03-31", 
"2003-03-11", "2003-06-29"), state = c("DE", "NY", "NY", "NY", 
"AZ"), city = c("Wilmington", "New York", "Buffalo", "New York", 
"Phoenix"), industry = c("Retail", "Computers and Software", 
"Manufacturing (Misc.)", "Healthcare and Medical", "Construction and Supplies"
), SIC = c(5331, 3571, 2541, 8063, 2421)), row.names = c(2937L, 
2817L, 2117L, 2298L, 2228L), class = "data.frame")

DF2 <- structure(list(date = c("2000-11-07", "2000-11-07", "2008-11-04", 
"2000-11-07", "2000-11-07", "2008-11-04", "2004-11-02", "2004-11-02", 
"2008-11-04", "2012-11-06"), state = c("MA", "NY", "OH", "VA", 
"CA", "DE", "NY", "NY", "NY", "AZ"), city = c("Boston", "New York", 
"Cleveland", "Richmond", "Los Angeles", "Wilmington", "New York", 
"Buffalo", "New York", "Phoenix"), margin = c(-3.61895488477766, -41.5805022156573, -40.2049010106604, 
24.8839947364776, 17.2042747593408, -55.4514285714286, -35.5094126201826, 
-61.9743406985032, -39.9718177548145, 7.47655435915248)), row.names = c(9849L, 
10041L, 29268L, 11941L, 7365L, 31116L, 13227L, 17397L, 23352L, 
32571L), class = "data.frame")

【问题讨论】:

  • 在您的示例中,所有日期都在 4 年内
  • 也是列merge指的是margin?两者都需要澄清
  • 谢谢@akrun。我的样本中有很多观察结果,所以我不得不随机抽样。在这个随机样本中,日期可能在 4 年内,但通常不是。
  • 谢谢@EJJ。我修正了错字。

标签: r dataframe merge


【解决方案1】:

这样的?取决于你想要的间隔。

 library(lubridate)
 library(fuzzyjoin)
    
        DF1$date <- ymd(DF1$date)     
        DF2$date <- ymd(DF2$date)       
        DF2$interval <- interval(DF2$date,  DF2$date + years(4))
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                
        fuzzy_left_join(DF1, DF2, 
                        by = c("city" = "city",
                               "state" = "state",
                               "date" = "interval"),
                        match_fun = c(`==`, `==`, `%within%`))

【讨论】:

  • 谢谢@gravertje。稍作修正,它确实有效。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2018-08-15
  • 2015-09-28
  • 1970-01-01
  • 2021-04-01
  • 2016-07-29
  • 2021-08-06
  • 1970-01-01
相关资源
最近更新 更多