R - 使用累积部分计算（不是累计）填充数据帧的函数或脚本答案

【问题标题】：R - Function or script to populate a dataframe with accumulative partial calculations (not cum sum)R - 使用累积部分计算（不是累计）填充数据帧的函数或脚本
【发布时间】：2019-10-30 14:55:39
【问题描述】：

这是一个非常具体的挑战。假设我有这张表，想想一个典型的银行数据库（顺便使用 data.table）：

customer_id; month; balance
1;1;100
1;2;110
1;3;140
1;4;70

我需要一个脚本或函数，为每一行返回相对于每个客户迄今为止的最大历史余额的比率。

customer_id; month; balance; ratio
1;1;100;1       # 1 because 100 balance is both the current datapoint and the max value so far
1;2;110;1.1     # 1.1 because 110 balance is 1.1 of the prior max value, 100
1;3;140;1.27    # 1.27 because it's 140 divided by the prior max value, 110
1;4;70;0.5      # 0.5 because it's 70 divided by the prior max value, 140

我知道一些 dplyr 或 data.table 方法可以用于累积计算，例如 cumsum。但是，这有一个转折点，我在网上找不到。

谢谢。

【问题讨论】：

标签： r dplyr data.table

【解决方案1】：

您可以通过dplyr 使用cummax（累积最大值）和lag（获取以前的值）相当容易地做到这一点

library(dplyr)
dd %>% 
  group_by(customer_id) %>% 
  mutate(ratio = balance/lag(cummax(balance), default=first(balance)))

#   customer_id month balance ratio
#         <int> <int>   <int> <dbl>
# 1           1     1     100  1   
# 2           1     2     110  1.1 
# 3           1     3     140  1.27
# 4           1     4      70  0.5

在哪里

dd <- read.table(text="
customer_id; month; balance
1;1;100
1;2;110
1;3;140
1;4;70", sep=";", header=TRUE)

【讨论】：

非常感谢！这段代码可以满足我的需要，但它现在需要进行额外的调整以使其对具有 NA 的数据集具有故障安全性。你能看看我打开的新线程吗？ link 。谢谢！