【问题标题】:Extracting the nearest date prior to a fixed date in r在r中提取固定日期之前的最近日期
【发布时间】:2017-10-13 05:50:07
【问题描述】:

我有一个数据集,df,如下:

df <- read.table(text = "
                 ID INDEX_DATE  DATE    VALUE
                 1  14/06/2017  16/02/2015  7
                 1  14/06/2017  16/02/2015  6.5
                 1  14/06/2017  21/07/2015  7
                 1  14/06/2017  08/09/2015  9.5
                 1  14/06/2017  04/12/2015  8.9
                 1  14/06/2017  28/01/2016  8.4
                 1  14/06/2017  30/03/2016  8.2
                 1  14/06/2017  25/05/2016  7.6
                 1  14/06/2017  17/08/2016  8.2
                 1  14/06/2017  16/11/2016  8
                 1  14/06/2017  18/01/2017  8.8
                 1  14/06/2017  24/03/2017  9.1
                 1  14/06/2017  14/06/2017  7.9
                 1  14/06/2017  14/06/2017  8.0
                 2  10/01/2017  26/03/2015  7.4
                 2  10/01/2017  26/03/2015  7.7
                 2  10/01/2017  14/04/2016  7
                 2  10/01/2017  03/04/2017  7.6
                 3  14/01/2015  06/05/2015  7.2
                 3  14/01/2015  29/07/2015  7.1
                 3  14/01/2015  18/11/2015  7.2
                 3  14/01/2015  17/02/2016  7.3
                 3  14/01/2015  03/08/2016  7.3
                 4  08/12/2016          NA  NA
                 ", header = TRUE, stringsAsFactors = FALSE)

我想为每个ID 提取与INDEX_DATE 之前最接近的DATE 对应的VALUE。如果INDEX_DATE 之前没有DATE,则使用INDEX_DATE 之后最近的DATE

有3种情况:

  1. ID == 1:只有DATEs 在INDEX_DATE 之前(或同一日期)
  2. ID == 2:在INDEX_DATE 之前和之后都有DATEs
  3. ID == 3:在INDEX_DATE之后只有DATEs

如果选择的DATE 有多个值,我会随机选择其中任何一个。

我想要的输出:

df2 <- read.table(text = "
                 ID INDEX_DATE  DATE    VALUE
                 1  14/06/2017  14/06/2017  7.9
                 2  10/01/2017  14/04/2016  7
                 3  14/01/2015  06/05/2015  7.2
                 4  08/12/2016          NA  NA
                 ", header = TRUE, stringsAsFactors = FALSE)

我尝试编写我的代码,但它不适用于第二种情况(即ID == 2):

library(lubridate); library(dplyr)
df2 <- df %>%
  mutate_at(vars(INDEX_DATE, DATE), funs(dmy)) %>%
  mutate(DATEDIFF = (INDEX_DATE - DATE)) %>%
  group_by(ID) %>%
  mutate(PRIORPOST = if_else(any(DATEDIFF >= 0), "PRIOR", "POST_ONLY"),
         CHOSEN_VALUE = if_else(PRIORPOST == "PRIOR" & DATEDIFF >= 0 & DATEDIFF == min(DATEDIFF), VALUE, 
                        if_else(PRIORPOST == "POST_ONLY" & DATEDIFF < 0 & DATEDIFF == max(DATEDIFF), VALUE, NA_real_))) %>%
  filter(!is.na(CHOSEN_VALUE))

【问题讨论】:

    标签: r dplyr data-manipulation lubridate


    【解决方案1】:

    试试这个,希望对你有帮助:

    library(dplyr)  
    df$INDEX_DATE <- as.Date(df$INDEX_DATE, format="%d/%m/%Y")
    df$DATE <- as.Date(df$DATE, format="%d/%m/%Y")
    df$minDay <- abs(difftime(df$INDEX_DATE, df$DATE, units="days")) #Calculate the nearest day.
    
    index <- 1
    for (i in 1:length(unique(df$ID))) {
      temp <- subset(df, ID==unique(df$ID)[i])
      temp <- temp[temp$minDay == min(temp$minDay),] #Extract the minimum day of each ID.
      if (nrow(temp) > 1) {temp <- temp[sample(nrow(temp),1),]} else {temp <- temp} #Randomly pick 1 row.
      temp <- temp[, -5] #Remove minDay
      if (index == 1) { #Combine each ID's minimum row
        output <- temp
        index <- 0
      } else {
        output <- rbind(output, temp)
      }
    }
    output
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2017-02-28
      • 1970-01-01
      • 2020-06-30
      • 2022-12-18
      • 2020-10-02
      • 1970-01-01
      • 2019-04-28
      相关资源
      最近更新 更多