【问题标题】:In R, how do I calculate difference between dates when condition is met?在 R 中,如何计算满足条件的日期之间的差异?
【发布时间】:2020-06-15 20:11:01
【问题描述】:

我有两个数据框(df1df2),其中包含某些事件的开始日期和结束日期。我已经确定了哪些日期有重叠事件,这里定义为在df1 中的开始日期在df2 的开始和结束日期之内。如果发生重叠,则将它们标记为TRUE,如果没有重叠,则将它们标记为FALSE。我想知道的是...当OverlapTRUE 时,我如何计算df2df1 的开始时间之间的差异?

> df1$aa
    date_start  date_end    Site
1   2002-04-14  2002-04-21  aa
2   2002-06-26  2002-07-05  aa
3   2002-08-15  2002-08-20  aa
4   2004-05-12  2004-05-19  aa
> df2$bb
    date_start  date_end    Site
1   2002-04-13  2002-04-19  bb
2   2002-08-11  2002-08-19  bb
3   2005-06-09  2005-06-14  bb
4   2005-08-10  2005-08-14  bb

这段代码判断是否有重叠

df1$aa$Overlap <- df1$aa$date_start %in% unlist(Map(':', df2$bb$date_start, df2$bb$date_end))
> df1$aa
    date_start  date_end    Site    Overlap
1   2002-04-14  2002-04-21  aa      TRUE
2   2002-06-26  2002-07-05  aa      FALSE
3   2002-08-15  2002-08-20  aa      TRUE
4   2004-05-12  2004-05-19  aa      FALSE

您可以看到有两个事件(第 1 行和第 3 行),其中 OverlapTRUE。当Overlap等于TRUE时,我想做的是确定date_startdf1df2之间的时间差(Diff)。

我正在寻找的结果应该是这样的。

    date_start  date_end    Site    Overlap   Diff
1   2002-04-13  2002-04-21  aa      TRUE      1
2   2002-08-13  2002-08-20  aa      TRUE      4

【问题讨论】:

    标签: r dataframe date


    【解决方案1】:

    这应该可以解决您的一些嵌套for 循环的问题。

    # Setup df1
    df1 <- read.table(textConnection(
      '    date_start  date_end    Site
    1   2002-04-14  2002-04-21  aa
    2   2002-06-26  2002-07-05  aa
    3   2002-08-15  2002-08-20  aa
    4   2004-05-12  2004-05-19  aa'
    ))
    df1$date_start <- as.Date(df1$date_start)
    df1$date_end <- as.Date(df1$date_end)
    
    # Setup df1
    df2 <- read.table(textConnection(
      '    date_start  date_end    Site
    1   2002-04-13  2002-04-19  bb
    2   2002-08-11  2002-08-19  bb
    3   2005-06-09  2005-06-14  bb
    4   2005-08-10  2005-08-14  bb'
    ))
    df2$date_start <- as.Date(df2$date_start)
    df2$date_end <- as.Date(df2$date_end)
    
    
    # Find overlap of dates
    df1$Overlap <- df1$date_start %in% unlist(Map(':', df2$date_start, df2$date_end))
    
    
    # Loop through rows
    for (i in 1:nrow(df1)) {
    
      # Go through only those that overlap
      if (df1[i, "Overlap"]) {
    
        # Loop through all rows in other data frame
        for (j in 1:nrow(df2)) {
    
          # Check if within range of df1
          sec_date_range <- df2[j, "date_start"]:df2[j, "date_end"]
          if (df1[i, "date_start"] %in% sec_date_range) {
    
            # Find absolute difference in start dates
            df1[i, "diff"] <- df1[i, "date_start"] - df2[j, "date_start"]
            df1[i, "diff"] <- abs(df1[i, "diff"])
          }
        }
      }
    }
    
    # Filter and print result
    df1[df1$Overlap, ]
    #>   date_start   date_end Site Overlap    diff
    #> 1 2002-04-14 2002-04-21   aa    TRUE  1 days
    #> 3 2002-08-15 2002-08-20   aa    TRUE  4 days
    

    reprex package (v0.3.0) 于 2020-06-15 创建

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2021-12-31
      • 1970-01-01
      • 2019-04-30
      • 2017-03-26
      • 1970-01-01
      • 2020-08-29
      • 2022-01-16
      • 2020-01-15
      相关资源
      最近更新 更多