【发布时间】:2021-04-15 04:04:29
【问题描述】:
我有一个小标题,这样每一行都包含一个 ID 的打开日期和关闭日期。 鉴于这两个信息,我应该能够提取每周有多少活跃的 ID,每周关闭多少,以及活跃 ID 在一段时间内的增长率。p>
例如 ID aa 的开放日期为 week 1,关闭日期为 week 5。
因此,从week 1 到week 5,ID aa 将被视为active_id。
另一个 ID bb 的开放日期为 week 1,但没有关闭日期 NA,这意味着 ID 自 week 1 以来一直处于打开状态,但直到现在才关闭(比如现在是 week 10)。因此,从week 1 到week 10,ID bb 将被视为active_id。
# create weekly row
set.seed(1990)
have <- tibble(id = as.vector(outer(letters, letters, paste0))[1:48]
, open_week = sample(1:10,48, replace = T)
, age_week = sample(1:7,48, replace = T)) %>%
mutate(close_week = open_week + age_week) %>%
arrange(open_week)
# some are closed, some are not closed
# if not closed, set to NA
have$close_week[sample(c(TRUE, FALSE),48, replace = T, prob = c(0.3,0.7))] <- NA
# recalculate ID age for NA
have <- have %>%
mutate(age_week = if_else(is.na(close_week), max(open_week) - open_week, age_week))
have
> have
# A tibble: 48 x 4
id open_week age_week close_week
<chr> <int> <int> <int>
1 wa 10 0 NA
2 sb 4 1 5
3 ja 8 1 9
4 cb 9 1 NA
5 tb 9 1 NA
6 hb 10 1 11
7 pb 1 2 3
8 la 3 2 5
9 oa 6 2 8
10 rb 6 2 8
您可能会注意到,我想概括地说,我想每周生成指标(至少基于这个可重现的数据时间范围)以用于特征工程目的。我没有足够的时间获得每周快照,这肯定会简化许多这些操作。但我认为(至少对我而言)这很有趣(至少对我而言)如何单独使用这 3 列(ID、open time、close time),我可以重新生成每周的数据快照。
# Daily time series
# these active_id numbers, close_id, median age week, active_growth_rate are fictionous, not actual values based on
# the have data above
want <- tibble(open_week = seq(min(have$open_week),max(have$open_week))
,active_id = c(sample(10:18,length(open_week), replace = T))
,close_id = 20 - active_id
,median_age_week_active = c(sample(2:6,length(open_week), replace = T))
,median_age_week_closed = c(sample(2:6,length(open_week), replace = T))
,active_growth_rate = ((active_id - lag(active_id))/active_id) * 100)
> want
# A tibble: 10 x 6
open_week active_id close_id median_age_week_a… median_age_week_c… active_growth_r…
<int> <int> <dbl> <int> <int> <dbl>
1 1 12 8 4 2 NA
2 2 10 10 3 4 -20
3 3 11 9 6 6 9.09
4 4 11 9 4 3 0
5 5 16 4 3 5 31.2
6 6 10 10 3 3 -60
7 7 14 6 4 5 28.6
8 8 10 10 4 2 -40
9 9 18 2 4 6 44.4
10 10 18 2 4 4 0
【问题讨论】:
-
策略没有奏效吗?如果没有按预期工作,我可以尝试不同的?
-
@AnilGoyal,我会尽快回复您,谢谢 :)
标签: r time-series data-transform