【问题标题】:Expand row year by year using end and start date使用结束日期和开始日期逐年展开行
【发布时间】:2021-07-25 22:28:21
【问题描述】:

我正在处理一个包含不同高管职业的数据集。每位高管都有一个独特的DirectorID 和一个独特的CompanyID,用于他/她所担任的不同职位。我想使用开始和结束日期逐年扩展行。所以第一个观察有 startdate 2009-01-01 和 enddate 2020-11-01,我希望第一行的日期范围从 2009-01-01 到 2009-12-31,第二行 2010-01- 01 至 2010 年 12 月 31 日。等和最后一行 2020-01-01 到 2020-11-01。其次,对于几个观察,日期是未知的,用 NA 表示,如果一行缺少日期,则应该省略,但我希望这些观察出现在最终输出中。

 DirectorID CompanyID DateStartRole DateEndRole
        <dbl>     <dbl> <date>        <date>     
 1    2363928   3262324 2009-01-01    2020-11-01 
 2    2374506   2301528 2008-01-01    2009-01-01 
 3     777766     44584 NA            NA         
 4     892379     32342 NA            2018-01-01 
 5     328171    583820 NA            NA         
 6     599185     54210 1988-01-01    2000-01-01 
 7    1973398      4142 2018-04-01    2020-11-01 
 8    1973398    550686 NA            2010-01-01 
 9    1041558    829450 NA            NA         
10     542061     10961 NA            2009-09-28 
11    1262068    922842 2011-01-01    2018-08-01 
12     330769     21696 NA            2000-11-01 
13      32554     15457 1997-01-01    2001-01-01 
14    2305779   1378145 2007-01-01    2012-01-01 
15    2305779    829105 2020-05-07    2020-11-01 
16    1935158    573779 2011-07-01    2017-07-03 
17     836789    829244 2006-01-01    2007-01-01 
18     836789    591529 NA            NA         
19     490624      3443 2004-01-01    2007-01-01 
20     554430     30279 1975-05-15    2011-10-28

所需的输出应如下所示

 DirectorID CompanyID DateStartRole DateEndRole
        <dbl>     <dbl> <date>        <date>     
 1    2363928   3262324 2009-01-01    2009-12-31
 2    2363928   3262324 2010-01-01    2010-12-31
---
 12   2363928   3262324 2020-01-01    2020-11-01
 13   2374506   2301528 2008-01-01    2009-01-01

【问题讨论】:

    标签: r


    【解决方案1】:

    这是使用tidyverselubridate 的一种方法。

    首先,删除缺少日期的行(它们将在末尾添加回来)。然后,从开始到结束创建间隔,并将行划分为年份。然后你可以intersect这些间隔与日历年。

    之后,您可以right_join 用原始数据替换缺失的行,并使用coalesce 替换任何缺失的日期(如果有)。

    library(tidyverse)
    library(lubridate)
    
    df %>%
      filter(!is.na(DateStartRole) & !is.na(DateEndRole)) %>%
      mutate(date_int = interval(DateStartRole, DateEndRole),
             year = map2(year(DateStartRole), year(DateEndRole), seq)) %>%
      unnest(year) %>%
      mutate(year_int = interval(as.Date(paste0(year, '-01-01')), as.Date(paste0(year, '-12-31'))),
             year_sect = intersect(date_int, year_int),
             start_new = as.Date(int_start(year_sect)),
             end_new = as.Date(int_end(year_sect))) %>%
      select(DirectorID, CompanyID, start_new, end_new) %>%
      right_join(df) %>%
      mutate(start_new = coalesce(start_new, DateStartRole),
             end_new = coalesce(end_new, DateEndRole))
    

    输出

        DirectorID CompanyID start_new  end_new    DateStartRole DateEndRole
             <int>     <int> <date>     <date>     <date>        <date>     
      1    2363928   3262324 2009-01-01 2009-12-31 2009-01-01    2020-11-01 
      2    2363928   3262324 2010-01-01 2010-12-31 2009-01-01    2020-11-01 
      3    2363928   3262324 2011-01-01 2011-12-31 2009-01-01    2020-11-01 
      4    2363928   3262324 2012-01-01 2012-12-31 2009-01-01    2020-11-01 
      5    2363928   3262324 2013-01-01 2013-12-31 2009-01-01    2020-11-01 
      6    2363928   3262324 2014-01-01 2014-12-31 2009-01-01    2020-11-01 
      7    2363928   3262324 2015-01-01 2015-12-31 2009-01-01    2020-11-01 
      8    2363928   3262324 2016-01-01 2016-12-31 2009-01-01    2020-11-01 
      9    2363928   3262324 2017-01-01 2017-12-31 2009-01-01    2020-11-01 
     10    2363928   3262324 2018-01-01 2018-12-31 2009-01-01    2020-11-01 
     11    2363928   3262324 2019-01-01 2019-12-31 2009-01-01    2020-11-01 
     12    2363928   3262324 2020-01-01 2020-11-01 2009-01-01    2020-11-01 
     13    2374506   2301528 2008-01-01 2008-12-31 2008-01-01    2009-01-01 
     14    2374506   2301528 2009-01-01 2009-01-01 2008-01-01    2009-01-01 
     15     599185     54210 1988-01-01 1988-12-31 1988-01-01    2000-01-01 
     16     599185     54210 1989-01-01 1989-12-31 1988-01-01    2000-01-01 
     17     599185     54210 1990-01-01 1990-12-31 1988-01-01    2000-01-01 
     18     599185     54210 1991-01-01 1991-12-31 1988-01-01    2000-01-01 
     19     599185     54210 1992-01-01 1992-12-31 1988-01-01    2000-01-01 
     20     599185     54210 1993-01-01 1993-12-31 1988-01-01    2000-01-01 
     21     599185     54210 1994-01-01 1994-12-31 1988-01-01    2000-01-01 
     22     599185     54210 1995-01-01 1995-12-31 1988-01-01    2000-01-01 
     23     599185     54210 1996-01-01 1996-12-31 1988-01-01    2000-01-01 
     24     599185     54210 1997-01-01 1997-12-31 1988-01-01    2000-01-01 
     25     599185     54210 1998-01-01 1998-12-31 1988-01-01    2000-01-01 
     26     599185     54210 1999-01-01 1999-12-31 1988-01-01    2000-01-01 
     27     599185     54210 2000-01-01 2000-01-01 1988-01-01    2000-01-01 
     28    1973398      4142 2018-04-01 2018-12-31 2018-04-01    2020-11-01 
     29    1973398      4142 2019-01-01 2019-12-31 2018-04-01    2020-11-01 
     30    1973398      4142 2020-01-01 2020-11-01 2018-04-01    2020-11-01 
     31    1262068    922842 2011-01-01 2011-12-31 2011-01-01    2018-08-01 
     32    1262068    922842 2012-01-01 2012-12-31 2011-01-01    2018-08-01 
     33    1262068    922842 2013-01-01 2013-12-31 2011-01-01    2018-08-01 
     34    1262068    922842 2014-01-01 2014-12-31 2011-01-01    2018-08-01 
     35    1262068    922842 2015-01-01 2015-12-31 2011-01-01    2018-08-01 
     36    1262068    922842 2016-01-01 2016-12-31 2011-01-01    2018-08-01 
     37    1262068    922842 2017-01-01 2017-12-31 2011-01-01    2018-08-01 
     38    1262068    922842 2018-01-01 2018-08-01 2011-01-01    2018-08-01 
     39      32554     15457 1997-01-01 1997-12-31 1997-01-01    2001-01-01 
     40      32554     15457 1998-01-01 1998-12-31 1997-01-01    2001-01-01 
     41      32554     15457 1999-01-01 1999-12-31 1997-01-01    2001-01-01 
     42      32554     15457 2000-01-01 2000-12-31 1997-01-01    2001-01-01 
     43      32554     15457 2001-01-01 2001-01-01 1997-01-01    2001-01-01 
     44    2305779   1378145 2007-01-01 2007-12-31 2007-01-01    2012-01-01 
     45    2305779   1378145 2008-01-01 2008-12-31 2007-01-01    2012-01-01 
     46    2305779   1378145 2009-01-01 2009-12-31 2007-01-01    2012-01-01 
     47    2305779   1378145 2010-01-01 2010-12-31 2007-01-01    2012-01-01 
     48    2305779   1378145 2011-01-01 2011-12-31 2007-01-01    2012-01-01 
     49    2305779   1378145 2012-01-01 2012-01-01 2007-01-01    2012-01-01 
     50    2305779    829105 2020-05-07 2020-11-01 2020-05-07    2020-11-01 
     51    1935158    573779 2011-07-01 2011-12-31 2011-07-01    2017-07-03 
     52    1935158    573779 2012-01-01 2012-12-31 2011-07-01    2017-07-03 
     53    1935158    573779 2013-01-01 2013-12-31 2011-07-01    2017-07-03 
     54    1935158    573779 2014-01-01 2014-12-31 2011-07-01    2017-07-03 
     55    1935158    573779 2015-01-01 2015-12-31 2011-07-01    2017-07-03 
     56    1935158    573779 2016-01-01 2016-12-31 2011-07-01    2017-07-03 
     57    1935158    573779 2017-01-01 2017-07-03 2011-07-01    2017-07-03 
     58     836789    829244 2006-01-01 2006-12-31 2006-01-01    2007-01-01 
     59     836789    829244 2007-01-01 2007-01-01 2006-01-01    2007-01-01 
     60     490624      3443 2004-01-01 2004-12-31 2004-01-01    2007-01-01 
     61     490624      3443 2005-01-01 2005-12-31 2004-01-01    2007-01-01 
     62     490624      3443 2006-01-01 2006-12-31 2004-01-01    2007-01-01 
     63     490624      3443 2007-01-01 2007-01-01 2004-01-01    2007-01-01 
     64     554430     30279 1975-05-15 1975-12-31 1975-05-15    2011-10-28 
     65     554430     30279 1976-01-01 1976-12-31 1975-05-15    2011-10-28 
     66     554430     30279 1977-01-01 1977-12-31 1975-05-15    2011-10-28 
     67     554430     30279 1978-01-01 1978-12-31 1975-05-15    2011-10-28 
     68     554430     30279 1979-01-01 1979-12-31 1975-05-15    2011-10-28 
     69     554430     30279 1980-01-01 1980-12-31 1975-05-15    2011-10-28 
     70     554430     30279 1981-01-01 1981-12-31 1975-05-15    2011-10-28 
     71     554430     30279 1982-01-01 1982-12-31 1975-05-15    2011-10-28 
     72     554430     30279 1983-01-01 1983-12-31 1975-05-15    2011-10-28 
     73     554430     30279 1984-01-01 1984-12-31 1975-05-15    2011-10-28 
     74     554430     30279 1985-01-01 1985-12-31 1975-05-15    2011-10-28 
     75     554430     30279 1986-01-01 1986-12-31 1975-05-15    2011-10-28 
     76     554430     30279 1987-01-01 1987-12-31 1975-05-15    2011-10-28 
     77     554430     30279 1988-01-01 1988-12-31 1975-05-15    2011-10-28 
     78     554430     30279 1989-01-01 1989-12-31 1975-05-15    2011-10-28 
     79     554430     30279 1990-01-01 1990-12-31 1975-05-15    2011-10-28 
     80     554430     30279 1991-01-01 1991-12-31 1975-05-15    2011-10-28 
     81     554430     30279 1992-01-01 1992-12-31 1975-05-15    2011-10-28 
     82     554430     30279 1993-01-01 1993-12-31 1975-05-15    2011-10-28 
     83     554430     30279 1994-01-01 1994-12-31 1975-05-15    2011-10-28 
     84     554430     30279 1995-01-01 1995-12-31 1975-05-15    2011-10-28 
     85     554430     30279 1996-01-01 1996-12-31 1975-05-15    2011-10-28 
     86     554430     30279 1997-01-01 1997-12-31 1975-05-15    2011-10-28 
     87     554430     30279 1998-01-01 1998-12-31 1975-05-15    2011-10-28 
     88     554430     30279 1999-01-01 1999-12-31 1975-05-15    2011-10-28 
     89     554430     30279 2000-01-01 2000-12-31 1975-05-15    2011-10-28 
     90     554430     30279 2001-01-01 2001-12-31 1975-05-15    2011-10-28 
     91     554430     30279 2002-01-01 2002-12-31 1975-05-15    2011-10-28 
     92     554430     30279 2003-01-01 2003-12-31 1975-05-15    2011-10-28 
     93     554430     30279 2004-01-01 2004-12-31 1975-05-15    2011-10-28 
     94     554430     30279 2005-01-01 2005-12-31 1975-05-15    2011-10-28 
     95     554430     30279 2006-01-01 2006-12-31 1975-05-15    2011-10-28 
     96     554430     30279 2007-01-01 2007-12-31 1975-05-15    2011-10-28 
     97     554430     30279 2008-01-01 2008-12-31 1975-05-15    2011-10-28 
     98     554430     30279 2009-01-01 2009-12-31 1975-05-15    2011-10-28 
     99     554430     30279 2010-01-01 2010-12-31 1975-05-15    2011-10-28 
    100     554430     30279 2011-01-01 2011-10-28 1975-05-15    2011-10-28 
    101     777766     44584 NA         NA         NA            NA         
    102     892379     32342 NA         2018-01-01 NA            2018-01-01 
    103     328171    583820 NA         NA         NA            NA         
    104    1973398    550686 NA         2010-01-01 NA            2010-01-01 
    105    1041558    829450 NA         NA         NA            NA         
    106     542061     10961 NA         2009-09-28 NA            2009-09-28 
    107     330769     21696 NA         2000-11-01 NA            2000-11-01 
    108     836789    591529 NA         NA         NA            NA         
    

    【讨论】:

      猜你喜欢
      • 2023-03-17
      • 2012-08-19
      • 1970-01-01
      • 1970-01-01
      • 2022-01-23
      • 2014-03-02
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多