【问题标题】:Is there an R function that can calculate the days since a condition of another column?是否有一个 R 函数可以计算自另一列条件以来的天数?
【发布时间】:2022-01-27 08:15:38
【问题描述】:

我想改变原始 df 以添加一列来计算自上次数学(“M”)缺席以来的天数。我希望学生数学缺勤的第一次出现为 NA,如果之前没有任何数学缺勤,我希望该值为 Inf。

我按日期订购了 df,然后写出了这行代码:

df %>% groupby(Student_ID) %>% mutate(dayssinceM = ifelse(Subject == "M", c(NA, diff(Absent_Date)), Inf))

这一直有效,直到学生后来在数据中出现数学缺失。我尝试添加另一个 ifelse 语句:ifelse(lag(Subject == "M", c(NA, diff(Absent_Date)), Inf)),但它只有在之前直接缺席数学时才有效。我希望学生以前有任何数学缺席。我在想也许以某种方式添加rollapply。我很想听听您的想法并获得帮助。

原始df:

 Studen_ID       Absent_Date       Subject        

    4567           08/30/2018          M
    4567           09/22/2019          M
    8345           09/01/2019          SS
    8345           03/30/2019          S         
    8345           07/18/2017          S
    8345           01/08/2019          M

这是所需的输出:

 Student_ID       Absent_Date       Subject         dayssinceM            

    4567           08/30/2018          M                 NA
    4567           09/22/2019          M                 388
    8345           07/18/2017          S                 Inf
    8345           01/08/2019          M                 NA        
    8345           03/30/2019          S                 81
    8345           09/01/2019         SS                 236

【问题讨论】:

    标签: r date dplyr


    【解决方案1】:

    这不是一种非常优雅的方法,但您可以使用以下连接来构建答案:

    library(tidyverse)
    
    df <- data.frame(
      Student_ID = c(rep(4567,2), rep(8345,4)),
      Absent_Date = c("2018-08-30","2019-09-22","2019-09-01","2019-03-30","2017-07-18","2019-01-08"),
      Subject = c("M","M","SS","S","S","M")
    )
    
    df_m <- df %>% filter(Subject == "M") %>%
      mutate(dummy = 1) %>%
      rename(M_Date = Absent_Date,
             M_Subject = Subject)
    
    df_daysSinceM <- df %>%
      mutate(dummy = 1) %>%
      full_join(df_m, by=c("Student_ID","dummy")) %>%
      filter(M_Date < Absent_Date) %>%
      mutate(daysSinceM = floor(difftime(Absent_Date, M_Date, units="days"))) %>%
      select(Student_ID, Absent_Date, Subject, daysSinceM)
    
    result <- df %>%
      anti_join(df_daysSinceM, by=c("Student_ID","Absent_Date","Subject")) %>%
      bind_rows(df_daysSinceM)
    

    【讨论】:

      【解决方案2】:

      可能类似于以下内容的内容会为您解决问题

      1. 无论是否为 M 都创建一个索引
      2. 向前推,这样您就有了最新“M”的索引
      3. 使用此索引提取每行最后一个“M”的日期
      4. 计算差异时间
      library(lubridate) # for as_date (more consistent than as.Date)
      df %>% groupby(Student_ID) %>% 
             arrange(Absendt_Date) %>%
             mutate(mIndex = cumsum(Subject == "M"), 
                    lastMdate = Absent_Date[mIndex],
                    DaysSinceM = as_date(Absent_Date) - as_date(lastMdate))
      

      【讨论】:

        猜你喜欢
        • 2022-01-19
        • 1970-01-01
        • 2020-08-06
        • 2021-12-15
        • 1970-01-01
        • 2022-01-16
        • 2021-01-07
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多