【问题标题】:Merge dataframes by date columns within range from one another按日期列在彼此范围内合并数据框
【发布时间】:2018-07-07 14:50:49
【问题描述】:

我想按患者 ID 和日期将 df1(包含患者的治疗间隔)与 df2(包含实验室值)合并,以使实验室值的日期在药物开始日期后 5 天内.见下文

对于 df1:

ID = c(2, 2, 2, 2, 3, 5) 
Medication = c("aspirin", "aspirin", "aspirin", "tylenol", "lipitor", "advil") 
Start.Date = c("05/01/2017", "05/05/2017", "06/20/2017", "05/01/2017", "05/06/2017", "05/28/2017")
Stop.Date = c("05/04/2017", "05/10/2017", "06/27/2017", "05/15/2017", "05/12/2017", "06/13/2017")
df1 = data.frame(ID, Medication, Start.Date, Stop.Date) 

  ID Medication Start.Date  Stop.Date
   2    aspirin 05/01/2017 05/30/2017
   2    tylenol 05/01/2017 05/15/2017
   3    lipitor 05/06/2017 05/18/2017
   5      advil 05/28/2017 06/13/2017

对于 df2:

ID = c(2,2,2,3,3,5)
Lab.date = c("04/30/2017", "05/03/2017", "05/15/2017", "05/05/2017", "05/18/17", "05/15/2017")
Lab.wbc = c(5.4, 3.2, 7.1, 6.0, 10.8, 11.3)
df2 = data.frame(ID, Lab.date, Lab.wbc)

  ID   Lab.date Lab.wbc
   2 04/30/2017     5.4
   2 05/03/2017     3.2
   2 05/15/2017     7.1
   3 05/05/2017     6.0
   3 05/18/2017    10.8
   5 05/15/2017    11.3

合并应导致以下情况,其中 Lab.date 距离药物开始日期 + 或 - 5 天:

   ID Medication Start.Date Stop.Date  Lab.date   Lab.wbc
   2    aspirin  05/01/2017 05/30/2017 04/30/2017 5.4
   2    aspirin  05/01/2017 05/30/2017 05/03/2017 3.2
   2    tylenol  05/01/2017 05/15/2017 04/30/2017 5.4
   2    tylenol  05/01/2017 05/15/2017 05/03/2017 3.2
   3    lipitor  05/06/2017 05/18/2017 05/05/2017 6.0

【问题讨论】:

标签: r dataframe merge date-range


【解决方案1】:

以下是一个可能的解决方案。请注意,最终数据框中还有其他潜在结果,您在问题结束时没有考虑到这些结果。

library(dplyr)

# reproducing your setup
ID = c(2, 2, 2, 2, 3, 5) 
Medication = c("aspirin", "aspirin", "aspirin", "tylenol", "lipitor", "advil") 
Start.Date = c("05/01/2017", "05/05/2017", "06/20/2017", "05/01/2017", "05/06/2017", "05/28/2017")
Stop.Date = c("05/04/2017", "05/10/2017", "06/27/2017", "05/15/2017", "05/12/2017", "06/13/2017")
df1 = data.frame(ID, Medication, Start.Date, Stop.Date) 

ID = c(2,2,2,3,3,5)
Lab.date = c("04/30/2017", "05/03/2017", "05/15/2017", "05/05/2017", "05/18/17", "05/15/2017")
Lab.wbc = c(5.4, 3.2, 7.1, 6.0, 10.8, 11.3)
df2 = data.frame(ID, Lab.date, Lab.wbc)

# having a full join by patient ID
full_df <- full_join(df1, df2, by = "ID")

# note that accurate result should include more rows compared to the one given in the question
result <- full_df %>%
  # including the day difference for your reference
  mutate(Day.diff = abs(as.Date(Start.Date, "%m/%d/%Y") - as.Date(Lab.date, "%m/%d/%Y"))) %>%
  # filtering the data frame to keep the difference within 5 days
  filter(Day.diff <= 5)

【讨论】:

    猜你喜欢
    • 2019-06-06
    • 2015-09-28
    • 1970-01-01
    • 2014-05-30
    • 1970-01-01
    • 1970-01-01
    • 2018-11-14
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多