R：定义列，其中第一条记录是来自其他列的计算，并且后续记录使用同一列中的先前记录进行更新答案

【问题标题】：R:Define column where first record is a calculation from other column and following records are updated using previous records from the same columnR：定义列，其中第一条记录是来自其他列的计算，并且后续记录使用同一列中的先前记录进行更新
【发布时间】：2018-12-07 08:15:18
【问题描述】：

我想创建这样的列“结果”

df <- policy income expense rate result
1      1      50     250     2     75  (250/2-50)
2      1      50     35      2     05  ((75+35)/2-50)
3      1      50     35      2    -30  ((5+35)/2-50)
4      2      70     600     3    130  (600/3-70)
5      2      70     50      3    -10  ((130+50)/3-70)
6      2      70     50      3   -56.6 ((-10+50)/3-70)

数据框已分组，因此我无法为每个组重复第一条记录的逻辑。请告诉我如何实现这一目标

感谢您的帮助

【问题讨论】：

标签： r group-by calculated-columns dplyr

【解决方案1】：

这是使用for 循环的解决方案。

数据：

df <- data.frame(policy = c(1,1,1,2,2,2),
                 income = c(50,50,50,70,70,70),
                 expense = c(250,35,35,600,50,50),
                 rate = c(2,2,2,3,3,3))

首先我们将数据通过policy进行分组：

dftemp <- split(df, df$policy)

然后我们为我们的结果初始化一个列表，并用NA 填充列表中的向量，以避免它们在循环中增长：

resulttemp <- vector("list", length(dftemp))
for(i in 1:length(resulttemp)){
  resulttemp[[i]] <- rep(NA, nrow(dftemp[[i]]))
}

现在我们循环分割的数据以获得我们的结果：

for(i in 1:length(dftemp)){
  for(j in 1:nrow(dftemp[[i]])){
    if(j == 1){
      resulttemp[[i]][j] <- dftemp[[i]]$expense[j]/dftemp[[i]]$rate[j]-dftemp[[i]]$income[j]
    }else{
      resulttemp[[i]][j] <- (resulttemp[[i]][j-1]+dftemp[[i]]$expense[j])/dftemp[[i]]$rate[j]-dftemp[[i]]$income[j]
    }
  }
}

然后我们unlist我们的结果并将它们添加到原始数据中：

df$result <- unlist(resulttemp)

df
  policy income expense rate    result
1      1     50     250    2  75.00000
2      1     50      35    2   5.00000
3      1     50      35    2 -30.00000
4      2     70     600    3 130.00000
5      2     70      50    3 -10.00000
6      2     70      50    3 -56.66667

注意，原始数据在拆分前必须按组排序！

【讨论】：

感谢 LAP。我确实想问在第一个语句中是否有 split 函数的替代方法，因为我有数百万条记录，并且它正在创建一个巨大的内存列表。谢谢
可能可以使用dplyr 的group_by() 函数，但我不确定如何在没有for 循环的情况下使用反应值来创建新向量。