【发布时间】:2020-07-09 14:35:50
【问题描述】:
如果满足多个条件,我正在尝试使用重置选项进行累积总和。更具体地说,我想对由id 分组的变量amount 和count 进行累积求和,如果满足这两个条件,则再次从0 重置/开始:amount >= 10 和count >= 3。我还想创建一个新列,如果满足这些条件,则包含 1,否则为 0。
数据样本:
df <- data.frame(
date = as.Date(c("2020-01-01", "2020-02-01", "2020-03-01", "2020-04-01", "2020-05-01", "2020-06-01", "2020-01-01", "2020-02-01", "2020-03-01", "2020-04-01", "2020-05-01", "2020-06-01", "2020-01-01", "2020-02-01", "2020-03-01", "2020-04-01", "2020-05-01", "2020-06-01")),
id = c("A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "C", "C", "C", "C", "C", "C"),
amount = c(1, 9, 5, 5, 6, 2, 10, 4, 8, 10, 6, 5, 5, 1, 6, 5, 5, 5),
count = c(0, 2, 5, 4, 5, 1, 0, 0, 0, 0, 2, 1, 1, 1, 1, 2, 1, 0)
)
期望的输出:
df <- data.frame(
date = as.Date(c("2020-01-01", "2020-02-01", "2020-03-01", "2020-04-01", "2020-05-01", "2020-06-01", "2020-01-01", "2020-02-01", "2020-03-01", "2020-04-01", "2020-05-01", "2020-06-01", "2020-01-01", "2020-02-01", "2020-03-01", "2020-04-01", "2020-05-01", "2020-06-01")),
id = c("A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "C", "C", "C", "C", "C", "C"),
amount = c(1, 9, 5, 5, 6, 2, 10, 4, 8, 10, 6, 5, 5, 1, 6, 5, 5, 5),
count = c(0, 2, 5, 4, 5, 1, 0, 0, 0, 0, 2, 1, 1, 1, 1, 2, 1, 0),
amount_cumsum = c(1, 10, 15, 5, 11, 2, 10, 14, 22, 32, 38, 43, 5, 6, 12, 5, 10, 5),
count_cumsum = c(0, 2, 7, 4, 9, 1, 0, 0, 0, 0, 2, 3, 1, 2, 3, 2, 3, 0),
condition_met = c(0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0)
)
如果可能,我想要dplyr 解决方案,但也欢迎使用替代方案。谢谢!
更新:一个被作者删掉的答案几乎解决了问题:
df %>% group_by(id) %>%
mutate(
amount_cumsum = purrr::accumulate(.x = amount, .f = ~ if_else(condition = .x < 10, true = .x + .y, false = .y)),
count_cumsum = purrr::accumulate(.x = count, .f = ~ if_else(condition = .x < 3, true = .x + .y, false = .y)),
condition_met = as.integer(amount_cumsum >= 10 & count_cumsum >= 3)
)
或者,或者:
df %>% group_by(id) %>%
mutate(
amount_cumsum = purrr::accumulate(.x = amount, .f = ~ case_when(.x < 10 ~ .x + .y, TRUE ~ .y)),
count_cumsum = purrr::accumulate(.x = count, .f = ~ case_when(.x < 3 ~ .x + .y, TRUE ~ .y)),
condition_met = as.integer(amount_cumsum >= 10 & count_cumsum >= 3)
)
如果一个变量满足条件,上面的答案会重置累积总和,但不考虑是否满足另一个条件。
【问题讨论】: