【问题标题】:strptime range and make a date columnstrptime 范围并创建一个日期列
【发布时间】:2017-11-09 18:46:04
【问题描述】:

我有以下形式的日期

Date                      Value
<chr>                      <dbl>
[2014-1-24 - 2014-2-2]      1.1
[2014-2-3 - 2014-3-2]       2.2
.                           .
.                           .
.                           .

这种情况持续了很多年。我想将其转换为长格式,如下所示

Date          Value
<date>        <dbl>
2014-01-24     1.1
2014-01-25     1.1
2014-01-26     1.1
2014-01-27     1.1
2014-01-28     1.1
2014-01-29     1.1
2014-01-30     1.1
2014-01-31     1.1
2014-02-01     1.1
2014-02-02     1.1
2014-02-03     2.2
2014-02-04     2.2
2014-02-05     2.2
.               .
.               .
.               .

什么是完成此任务的干净方法?

【问题讨论】:

    标签: r dplyr lubridate stringr


    【解决方案1】:

    使用dplyrtidyr

    library(dplyr); library(tidyr);
    
    df %>% 
        mutate(Date = str_match_all(Date, '\\d{4}-\\d{1,2}-\\d{1,2}'), 
               Date = lapply(Date, function(d) seq(as.Date(d[1]), as.Date(d[2]), by='day'))) %>% 
        unnest() 
    
    #   Value       Date
    #1    1.1 2014-01-24
    #2    1.1 2014-01-25
    #3    1.1 2014-01-26
    #4    1.1 2014-01-27
    #5    1.1 2014-01-28
    #6    1.1 2014-01-29
    #7    1.1 2014-01-30
    #8    1.1 2014-01-31
    #9    1.1 2014-02-01
    #10   1.1 2014-02-02
    #11   2.2 2014-02-03
    #12   2.2 2014-02-04
    # ...
    

    使用purrr:

    library(stringr); library(purrr)
    
    # extract the start and end date from Date string
    df$Date <- map(str_match_all(df$Date, '\\d{4}-\\d{1,2}-\\d{1,2}'), as.Date)
    
    # map over rows and expand the date from range to Sequence using seq.Date
    pmap_df(df, ~ data_frame(Date = seq(.x[1], .x[2], by='day'), Value = .y))
    
    # A tibble: 38 x 2
    #         Date Value
    #       <date> <dbl>
    # 1 2014-01-24   1.1
    # 2 2014-01-25   1.1
    # 3 2014-01-26   1.1
    # 4 2014-01-27   1.1
    # 5 2014-01-28   1.1
    # 6 2014-01-29   1.1
    # 7 2014-01-30   1.1
    # 8 2014-01-31   1.1
    # 9 2014-02-01   1.1
    #10 2014-02-02   1.1
    # ... with 28 more rows
    

    【讨论】:

    • 谢谢。我使用了tidyr的方式,由于某种原因purrr仍然没有点击。如果你有一个好的purrr 参考,请告诉我。
    【解决方案2】:

    这是一个使用data.tablelubridate 的选项。按“值”分组(假设它是唯一的 - 如果不使用行序列),用tstrsplit 将“日期”分成两列,用ymd 将其转换为Date 类(来自lubridate) ,并使用Reduce 获取日期序列

    library(data.table)
    library(lubridate)
    setDT(df1)[, .(Date = Reduce(function(...) seq(..., by = '1 day'), 
                   lapply(tstrsplit(Date, "\\s-\\s"), ymd))), Value][, .(Date, Value)]
    #          Date Value
    # 1: 2014-01-24   1.1
    # 2: 2014-01-25   1.1
    # 3: 2014-01-26   1.1
    # 4: 2014-01-27   1.1
    # 5: 2014-01-28   1.1
    # 6: 2014-01-29   1.1
    # 7: 2014-01-30   1.1
    # 8: 2014-01-31   1.1
    # 9: 2014-02-01   1.1
    #10: 2014-02-02   1.1
    #11: 2014-02-03   2.2
    #12: 2014-02-04   2.2
    #13: 2014-02-05   2.2
    #14: 2014-02-06   2.2
    # - -
    # - -
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2011-01-10
      • 1970-01-01
      • 1970-01-01
      • 2012-09-15
      • 2019-06-12
      • 2011-08-31
      相关资源
      最近更新 更多