如何获得位于R中上一行和下一行的值之间的平均值？答案

【问题标题】：How to obtain the mean between values locate on the previous and next row in R?如何获得位于R中上一行和下一行的值之间的平均值？
【发布时间】：2020-10-26 09:11:21
【问题描述】：

我有一个关于 R 的数据框，其中包含多年来许多团体的支出。它基本上看起来像这样（灰色列）：

我想根据上一年和下一年的支出添加各年的支出平均值，如黄色列所示。

我已尝试使用此代码：

expenditures %>%
 group_by(id) %>%
 mutate(
   avg_exp = ifelse(year != 2011 && year != 2008,
                        mean(c(
                          Spending[Year %in% (Year-1)],
                          Spending[Year %in% (Year+1)])),
                        NA)) %>%
 View()

但是，我保留了各种奇怪的数字。首先，ifelse 只应用 else 条件，即使 Year 列设置为整数。其次，即使我设置在 else 条件下也计算平均值，所有行（每组中）都填充了相同的数字，我不知道它来自哪里（它接近一般平均值组但不相同）。

有什么简单的方法可以做到这一点吗？谢谢

【问题讨论】：

我认为您需要lag 并将&& 更改为&
哦，没错，切换到 & 解决了 ifelse 问题，但它仍然用相同的神秘数字填充所有正确的行
另一种选择是将其视为线性过滤问题 - 例如as.vector(stats::filter(c(80,87,90,95), c(0.5,0,0.5)))。这里有更复杂的应用程序 - stackoverflow.com/questions/18436574/…

标签： r dplyr

【解决方案1】：

我们可以使用lag 和lead 的+，并在按“ID”分组后除以2。 lead 和 lag 中的 default 选项是 NA 因此，第一个和最后一个“年份”将是“平均值”列中的 NA

library(dplyr)
expenditures %>% 
    group_by(ID) %>%
    mutate(Mean = (lead(Spending) + lag(Spending))/2)

-输出

# A tibble: 12 x 4
# Groups:   ID [3]
#      ID  Year Spending   new
#   <int> <int>    <dbl> <dbl>
# 1     1  2008       55  NA  
# 2     1  2009       57  60  
# 3     1  2010       65  63.5
# 4     1  2011       70  NA  
# 5     2  2008       80  NA  
# 6     2  2009       87  85  
# 7     2  2010       90  91  
# 8     2  2011       95  NA  
# 9     3  2008      120  NA  
#10     3  2009      123 125  
#11     3  2010      130 129  
#12     3  2011      135  NA

或者另一种选择是cbind lead/lag 输出然后使用rowMeans

expenditures %>%
   group_by(ID) %>%
   mutate(Mean = rowMeans(cbind(lead(Spending), lag(Spending))))

数据

expenditures <- structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 
3L, 3L), Year = c(2008L, 2009L, 2010L, 2011L, 2008L, 2009L, 2010L, 
2011L, 2008L, 2009L, 2010L, 2011L), Spending = c(55, 57, 65, 
70, 80, 87, 90, 95, 120, 123, 130, 135)), class = "data.frame",
row.names = c(NA, 
-12L))

【讨论】：

【解决方案2】：

这是在ave 中使用embed 的基本R 选项

transform(
  expenditures,
  Mean = ave(Spending,ID,FUN = function(x) c(NA,rowMeans(embed(x,3)[,-2]),NA))
)

给了

   ID Year Spending  Mean
1   1 2008       55    NA
2   1 2009       57  60.0
3   1 2010       65  63.5
4   1 2011       70    NA
5   2 2008       80    NA
6   2 2009       87  85.0
7   2 2010       90  91.0
8   2 2011       95    NA
9   3 2008      120    NA
10  3 2009      123 125.0
11  3 2010      130 129.0
12  3 2011      135    NA

数据

> dput(expenditures)
structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 
3L, 3L), Year = c(2008L, 2009L, 2010L, 2011L, 2008L, 2009L, 2010L,
2011L, 2008L, 2009L, 2010L, 2011L), Spending = c(55, 57, 65,
70, 80, 87, 90, 95, 120, 123, 130, 135)), class = "data.frame", row.names = c(NA, 
-12L))

【讨论】：

【解决方案3】：

为了完成这里是data.table 回答shift：

library(data.table)

setDT(expenditures)
expenditures[, Mean := (shift(Spending) + shift(Spending, type = 'lead'))/2, ID]
expenditures

#    ID Year Spending  Mean
# 1:  1 2008       55    NA
# 2:  1 2009       57  60.0
# 3:  1 2010       65  63.5
# 4:  1 2011       70    NA
# 5:  2 2008       80    NA
# 6:  2 2009       87  85.0
# 7:  2 2010       90  91.0
# 8:  2 2011       95    NA
# 9:  3 2008      120    NA
#10:  3 2009      123 125.0
#11:  3 2010      130 129.0
#12:  3 2011      135    NA

【讨论】：