【问题标题】:Generate grouped time series based on Open and Close date根据打开和关闭日期生成分组时间序列
【发布时间】:2021-07-14 21:26:49
【问题描述】:

我有一个包含 3 列的数据集,即它们的 ID 和开盘和收盘周。有些 ID 还没有收盘周,所以它们的收盘周等于 NA。但所有 ID 都有开放周。

set.seed(1990)
mydf <- tibble(id = as.vector(outer(letters, letters, paste0))[1:10]
               , open_week = rep(1:5,2)) %>%
  mutate(close_week = open_week + sample(1:5,10, replace = T)) %>%
  arrange(open_week)
mydf
# some are closed, some are not closed # if not closed, set to NA
mydf$close_week[sample(c(TRUE, FALSE),10, replace = T, prob = c(0.1,0.9))] <- NA

> mydf
# A tibble: 10 x 3
   id    open_week close_week
   <chr>     <int>      <int>
 1 aa            1          2
 2 fa            1          4
 3 ba            2          4
 4 ga            2         NA
 5 ca            3          7
 6 ha            3          6
 7 da            4          6
 8 ia            4          5
 9 ea            5          7
10 ja            5          9

根据上面的数据,我正在生成如下每周指标

have <- seq_len(max(mydf$close_week, na.rm = T)) %>% 
  as.data.frame() %>%
  set_names("Week") %>% 
  rowwise() %>%
  mutate(opened = sum(Week == mydf$open_week),
         closed = sum(Week == mydf$close_week, na.rm = T),
         active_ages_med = list(Week - mydf$open_week[Week >= mydf$open_week & 
                                                     Week < ifelse(is.na(mydf$close_week),
                                                                   max(mydf$close_week, na.rm = T) +1,
                                                                   mydf$close_week)]),
         closed_ages_med = list((Week - mydf$open_week[Week == mydf$close_week]) %>% na.omit()),
         active = length(act_ages_med),
         active_ages_med = median(active_ages_med),
         closed_ages_med = median(closed_ages_med)) %>% 
  ungroup() %>%
  mutate(active_growth = (active - lag(active))*100/lag(active))
have
> have
# A tibble: 9 x 7
   Week opened closed active_ages_med closed_ages_med active active_growth
  <int>  <int>  <int>        <dbl>        <dbl>  <int>       <dbl>
1     1      2      0          0           NA        2        NA  
2     2      2      1          0            1        3        50  
3     3      2      0          1           NA        5        66.7
4     4      2      2          1            2.5      5         0  
5     5      2      1          1.5          1        6        20  
6     6      0      2          2            2.5      4       -33.3
7     7      0      2          3.5          3        2       -50  
8     8      0      0          4.5         NA        2         0  
9     9      0      1          7            4        1       -50 

使用have,我正在跟踪每周的活动 ID,基于打开和关闭周。 have 缺少的是基于一些预定义分组的活动 ID 的贡献。 例如,假设我决定根据活跃年龄对活跃 ID 进行分类,即带有 Active Age &lt; 1 day 的 ID 和带有 Active Age &gt;= 1 day 的 ID。

因此,我应该能够得到不同组每周的活跃ID数,而不是每周的活跃ID数,然后计算每个组的增长率。 请注意,每个 ID 可能已根据参考周及其开放周更改其分组分类。例如,在第 1 周,open_week 等于 1 的 ID fa 将被归类为 Active Age &lt; 1 day,但在第 3 周,ID fa 应计为 Active Age &gt;= 1 day 组的一部分。

want <- tibble(Week = rep(c(1:9),each=2),
               group = rep(c('Active Age < 1 day','Active Age >= 1 day'),9),
               active = c(2,0,2,1,2,3,2,3,2,4,0,4,0,2,0,2,0,1),
               active_growth = c(NA,NA,0,NA,0,200,0,0,0,33,-100,0,0,-50,0,0,0,-50))
> want
# A tibble: 18 x 4
    Week group               active active_growth
   <int> <chr>                <dbl>       <dbl>
 1     1 Active Age < 1 day       2          NA
 2     1 Active Age >= 1 day      0          NA
 3     2 Active Age < 1 day       2           0
 4     2 Active Age >= 1 day      1          NA
 5     3 Active Age < 1 day       2           0
 6     3 Active Age >= 1 day      3         200
 7     4 Active Age < 1 day       2           0
 8     4 Active Age >= 1 day      3           0
 9     5 Active Age < 1 day       2           0
10     5 Active Age >= 1 day      4          33
11     6 Active Age < 1 day       0        -100
12     6 Active Age >= 1 day      4           0
13     7 Active Age < 1 day       0           0
14     7 Active Age >= 1 day      2         -50
15     8 Active Age < 1 day       0           0
16     8 Active Age >= 1 day      2           0
17     9 Active Age < 1 day       0           0
18     9 Active Age >= 1 day      1         -50

这是一个视觉辅助工具,可以捕捉一周过去的 ID 年龄

【问题讨论】:

    标签: r dplyr time-series data-transform


    【解决方案1】:

    所以我使用rowwise() 来获得想要的结果

      # calculate up to the last week
      week_last <- max(mydf$close_week, na.rm = T)
    
    
      # create complete week grid
      df <- as_tibble(data.frame(week = seq(from = min(mydf$open_week, na.rm = T)
                                           , to = max(mydf$close_week, na.rm = T), by = 1)))
      
      want <- df %>% 
        rowwise() %>%
        mutate(act_ticket_number_list = list(mydf$id[week >= mydf$open_week & 
                                                                week < ifelse(is.na(mydf$close_week),
                                                                              week_last +1,
                                                                              mydf$close_week)]),
               act_ticket_age_list = list(week - mydf$open_week[week >= mydf$open_week & 
                                                                  week < ifelse(is.na(mydf$close_week),
                                                                                         week +1,
                                                                                mydf$close_week)]),
               
               act_number_group_0_1 = list(act_ticket_number_list[act_ticket_age_list < 1]),
               act_number_group_1_above = list(act_ticket_number_list[act_ticket_age_list >= 1]),
    
               act_ages_group_0_1 = sum(act_ticket_age_list < 1, na.rm = T),
               act_ages_group_1_above = sum(act_ticket_age_list >= 1, na.rm = T),
    
               active_cnt = length(act_ticket_age_list)) %>% 
        ungroup() %>%
        dplyr::select(!where(is.list))
      
      want
    
    # A tibble: 9 x 4
       week act_ages_group_0_1 act_ages_group_1_above active_cnt
      <dbl>              <int>                  <int>      <int>
    1     1                  2                      0          2
    2     2                  2                      1          3
    3     3                  2                      3          5
    4     4                  2                      3          5
    5     5                  2                      4          6
    6     6                  0                      4          4
    7     7                  0                      2          2
    8     8                  0                      2          2
    9     9                  0                      1          1
    
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2012-01-31
      • 1970-01-01
      • 2020-11-02
      相关资源
      最近更新 更多