【问题标题】:Get the mean between two dates获取两个日期之间的平均值
【发布时间】:2020-06-30 10:30:29
【问题描述】:

我想计算两个日期之间变量的平均值。

这里是数据框的例子

library(lubridate) #ymd function

day= rep(seq.Date(from=ymd("2020-03-01"),to=ymd("2020-04-15"),by="day"), times=4)
center= rep(c("A", "B", "C", "D"), each=46)
ocupation= as.numeric(round(runif(184,20,40),1))
df <- data.frame(day,center,ocupation)


start <- mdy("03/15/2020","04/12/2020","05/01/2020","02/13/2020")
end <- mdy("03/20/2020","04/28/2020","05/14/2020","03/01/2020")
center<-c("A", "A", "B", "C")
id<-c(1,2,3,4)
patients <- data.frame(id, center,start,end)

显示的患者数据框只是一个样本,原始包含超过 12.000 个 ids

从每个 id 中,我想得到中心的开始日期和结束日期之间的平均职业

【问题讨论】:

    标签: r date


    【解决方案1】:

    您可以使用来自tidyversedplyr 包来执行此操作。

    df <- as_tibble(df)
    
    library(dplyr) # 1.0.0
    
    df %>%
            # find only the days in df corresponding to day ranges in patients
            filter(day %in% c(seq(patients[1, 3], patients[1, 4], by = "days"),
                             seq(patients[2, 3], patients[2, 4], by = "days"),
                             seq(patients[3, 3], patients[3, 4], by = "days"),
                             seq(patients[4, 3], patients[4, 4], by = "days"))) %>%
            # add id column
            mutate(id = ifelse(day %in% seq(patients[1, 3], patients[1, 4], by = "days"), patients$id[1],
                               ifelse(day %in% seq(patients[2, 3], patients[2, 4], by = "days"), patients$id[2],
                                      ifelse(day %in% seq(patients[3, 3], patients[3, 4], by = "days"), patients$id[3], patients$id[4])))) %>%
            # group by id
            group_by(id) %>%
            # find mean occuption for each id
            summarise(mean_occupation = mean(ocupation))
    
    # A tibble: 3 x 2
         id mean_occupation
      <dbl>           <dbl>
    1     1            29.7
    2     2            31.7
    3     4            32.2
    

    编辑

    带有for 的版本会为许多id 循环:

    df <- as_tibble(df)
    library(dplyr)
    
    # create days vector from patients
    days <- list()
    for (i in 1:nrow(patients)) {
            dates <- seq(patients[i, 3], patients[i, 4], by = "days")
            for (j in 1:length(dates)) {
                    names(dates)[j] <- patients$id[i]
            }
            days[[i]] <- dates
    }
    days <- as.Date(unlist(days), origin = "1970-01-01")
    
    # filter df for days
    mid <- df %>%
            filter(day %in% days)
    
    # create id col (I couldn't do this directly in mutate())
    id <- character()
    for (i in 1:nrow(mid)) {
            id[i] <- names(days)[which(days == mid$day[i])]
    }
    
    # bind together and finish
    final <- mid %>%
            cbind(id) %>% as_tibble() %>%
            group_by(id) %>%
            summarise(mean_occupation = mean(ocupation))
    
    > final
    # A tibble: 3 x 2
      id    mean_occupation
      <chr>           <dbl>
    1 1                29.7
    2 2                31.7
    3 4                32.2
    

    【讨论】:

    • 这种方法的问题在于它不仅仅是4个id,而是12000
    • 没问题,我会创建一个for 循环,这样代码就不会那么重复了
    • 现在的问题是,每天都可以有多个 id,而不仅仅是一个,并且在您的解决方案中(至少我是这样尝试的,它每天只得到一个 id)
    【解决方案2】:

    我会创建一个函数来返回一个 id 的平均占用率:

    mean.occ = function(id, patients, occupency, day, center){
      to.select = day > patients[id, "start"] & day < patients[id, "end"] & center == patients[id, "center"]
      return(mean(occupency[to.select]))
    }
    

    这里,day &gt; patients[id, "start"] &amp; day &lt; patients[id, "end"] &amp; center == patients[id, "center"] 选择特定 id 的开始日期和结束日期之间的占用率值,并对应于给定的中心。

    然后使用sapply 将其应用于每个id:

    mean.occupancies = sapply(patients$id, FUN = mean.occ, patients, ocupation, day, center)
    
    

    终于可以将结果添加到patients数据框了:

    patients = cbind.data.frame(patients, mean.occupancies)
    

    【讨论】:

    • 这种方法的问题是它没有考虑到每个中心都有特定的入住率
    • 确实,我错过了这一点。我相应地编辑了答案。
    • 它给了我一个错误,不兼容的方法(“Ops.data.table”,">.Date") for ">"
    • 对我来说很好(我不得不用 cbind.data.frame 替换 rbind.data.frame ......)。也许是因为包 data.table 被加载并覆盖了一些函数?如果是这种情况,重启 R 应该可以解决问题(只要 data.table 没有加载)
    • ahhh.... 和 ocupation 写在 sapply 中...
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多