【问题标题】:The tidyverse way to grow a data set增长数据集的 tidyverse 方法
【发布时间】:2019-05-01 14:34:02
【问题描述】:

我正在尝试了解 tidyverse 解决通常会增加数据集长度的问题的方法。 group_by + mutate 无法解决这些类型的问题,因为行数不相等。

下面是一个数据集示例和一个时间,我想在其中获取开始日期和结束日期之间的一系列日期。我以不整洁的方式展示了如何做到这一点。我如何使用tidyverse 完成此操作?

dat <- structure(list(id = c("01", "02", "03", "04", "05", "06", "07", 
"08", "09", "10"), race = structure(c(1L, 1L, 1L, 1L, 3L, 1L, 
1L, 1L, 2L, 1L), .Label = c("White", "Hispanic", "Black", "Asian", 
"Bi-Racial", "Native", "Other", "Hawaiian"), class = "factor"), 
    installdate = structure(c(17683, 17713, 17713, 17744, 17744, 
    17744, 17805, 17836, 17836, 17897), class = "Date"), usageenddate = structure(c(17758, 
    17759, 17726, 17809, 17773, 17777, 17821, 17863, 17899, 17964
    ), class = "Date")), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -10L))

##    id    race     installdate usageenddate
##    <chr> <fct>    <date>      <date>      
##  1 01    White    2018-06-01  2018-08-15  
##  2 02    White    2018-07-01  2018-08-16  
##  3 03    White    2018-07-01  2018-07-14  
##  4 04    White    2018-08-01  2018-10-05  
##  5 05    Black    2018-08-01  2018-08-30  
##  6 06    White    2018-08-01  2018-09-03  
##  7 07    White    2018-10-01  2018-10-17  
##  8 08    White    2018-11-01  2018-11-28  
##  9 09    Hispanic 2018-11-01  2019-01-03  
## 10 10    White    2019-01-01  2019-03-09  

library(tidyverse)

dat2 <- dat %>%
    group_by(id) %>%
    mutate(
        weeks2 = length(seq.Date(installdate, usageenddate, by = 'weeks'))
    )

dat2[rep(seq_len(nrow(dat2)), dat2$weeks2),] %>%
    group_by(id) %>%
    mutate(
        weeks = as.Date(cut(seq.Date(installdate[1], usageenddate[1], by = 'weeks'), 'week'))
    ) %>%
    select(id, race, weeks)


    ##    id    race  weeks     
    ##    <chr> <fct> <date>    
    ##  1 01    White 2018-05-28
    ##  2 01    White 2018-06-04
    ##  3 01    White 2018-06-11
    ##  4 01    White 2018-06-18
    ##  5 01    White 2018-06-25
    ##  6 01    White 2018-07-02
    ##  7 01    White 2018-07-09
    ##  8 01    White 2018-07-16
    ##  9 01    White 2018-07-23
    ## 10 01    White 2018-07-30
    ## # ... with 57 more rows

【问题讨论】:

    标签: r tidyverse


    【解决方案1】:

    如果我们需要单个%&gt;%,则使用uncount

    library(tidyverse)
    dat %>%
       group_by(id) %>%
       mutate(
        weeks2 = length(seq.Date(installdate, usageenddate, by = 'weeks'))
     ) %>% 
        uncount(weeks2) %>% 
        group_by(id) %>% 
        mutate(
         weeks = as.Date(cut(seq.Date(installdate[1], 
                  usageenddate[1], by = 'weeks'), 'week'))
     ) %>% 
        select(id, race, weeks)
    # A tibble: 67 x 3
    # Groups:   id [10]
    #   id    race  weeks     
    #   <chr> <fct> <date>    
    # 1 01    White 2018-05-28
    # 2 01    White 2018-06-04
    # 3 01    White 2018-06-11
    # 4 01    White 2018-06-18
    # 5 01    White 2018-06-25
    # 6 01    White 2018-07-02
    # 7 01    White 2018-07-09
    # 8 01    White 2018-07-16
    # 9 01    White 2018-07-23
    #10 01    White 2018-07-30
    # … with 57 more rows
    

    或者不是创建一个中间步骤来扩展行(注意在前一种情况下,我们执行两次seq - 1)以获取length,然后再次执行cut 步骤),按'id'分组后,用map2循环'installdate'、usagenddate'对应的元素,得到seqcut按'week',转换成Date

    dat %>% 
       group_by(id) %>%
       mutate(weeks = map2(installdate, usageenddate, ~ 
          seq(.x, .y, by = 'weeks') %>% 
            cut('week') %>%
            as.Date)) %>% 
       select(id, race, weeks) %>% 
       unnest
    # A tibble: 67 x 3
    # Groups:   id [10]
    #   id    race  weeks     
    #   <chr> <fct> <date>    
    # 1 01    White 2018-05-28
    # 2 01    White 2018-06-04
    # 3 01    White 2018-06-11
    # 4 01    White 2018-06-18
    # 5 01    White 2018-06-25
    # 6 01    White 2018-07-02
    # 7 01    White 2018-07-09
    # 8 01    White 2018-07-16
    # 9 01    White 2018-07-23
    #10 01    White 2018-07-30
    # … with 57 more rows
    

    【讨论】:

      猜你喜欢
      • 2018-01-21
      • 2015-11-07
      • 1970-01-01
      • 2021-10-07
      • 1970-01-01
      • 1970-01-01
      • 2021-11-11
      相关资源
      最近更新 更多