【问题标题】:apply a rolling mean to a database by an index通过索引将滚动平均值应用于数据库
【发布时间】:2017-08-03 22:34:27
【问题描述】:

我想通过多个 id 计算单个数据帧中数据的滚动平均值。请参阅下面的示例数据集。

date <- as.Date(c("2015-02-01", "2015-02-02", "2015-02-03", "2015-02-04", 
          "2015-02-05", "2015-02-06", "2015-02-07", "2015-02-08",  
          "2015-02-09", "2015-02-10", "2015-02-01", "2015-02-02", 
          "2015-02-03", "2015-02-04", "2015-02-05", "2015-02-06", 
          "2015-02-07", "2015-02-08", "2015-02-09", "2015-02-10"))
index <- c("a","a","a","a","a","a","a","a","a","a",
           "b","b","b","b","b","b","b","b","b","b")
x <- runif(20,1,100)
y <- runif(20,50,150)
z <- runif(20,100,200)

df <- data.frame(date, index, x, y, z)

我想计算 x、y 和 z 的滚动平均值,先按 a,然后按 b。

我尝试了以下操作,但出现错误。

test <- tapply(df, df$index, FUN = rollmean(df, 5, fill=NA))

错误:

Error in xu[k:n] - xu[c(1, seq_len(n - k))] : 
  non-numeric argument to binary operator

index 是一个字符这一事实似乎存在问题,但我需要它来计算均值...

【问题讨论】:

    标签: r statistics zoo


    【解决方案1】:

    1) ave 尝试ave 而不是tapply,并确保它仅应用于感兴趣的列,即第 3、4、5 列。

    roll <- function(x) rollmean(x, 5, fill = NA)
    cbind(df[1:2], lapply(df[3:5], function(x) ave(x, df$index, FUN = roll)))
    

    给予:

             date index        x         y        z
    1  2015-02-01     a       NA        NA       NA
    2  2015-02-02     a       NA        NA       NA
    3  2015-02-03     a 66.50522 127.45650 129.8472
    4  2015-02-04     a 61.71320 123.83633 129.7673
    5  2015-02-05     a 56.56125 120.86158 126.1371
    6  2015-02-06     a 66.13340 119.93428 127.1819
    7  2015-02-07     a 59.56807 105.83208 125.1244
    8  2015-02-08     a 49.98779  95.66024 139.2321
    9  2015-02-09     a       NA        NA       NA
    10 2015-02-10     a       NA        NA       NA
    11 2015-02-01     b       NA        NA       NA
    12 2015-02-02     b       NA        NA       NA
    13 2015-02-03     b 55.71327 117.52219 139.3961
    14 2015-02-04     b 54.58450 107.81763 142.6101
    15 2015-02-05     b 50.48102 104.94084 136.3167
    16 2015-02-06     b 37.89790  95.45489 135.4044
    17 2015-02-07     b 33.05259  85.90916 150.8673
    18 2015-02-08     b 49.91385  90.04940 147.1376
    19 2015-02-09     b       NA        NA       NA
    20 2015-02-10     b       NA        NA       NA
    

    2) by 另一种方法是使用byroll2 处理一个组,by 将其应用于每个组,生成 by 列表,do.call("rbind", ...) 将其重新组合在一起。

    roll2 <- function(x) cbind(x[1:2], rollmean(x[3:5], 5, fill = NA))
    do.call("rbind", by(df, df$index, roll2))
    

    给予:

               date index        x         y        z
    a.1  2015-02-01     a       NA        NA       NA
    a.2  2015-02-02     a       NA        NA       NA
    a.3  2015-02-03     a 66.50522 127.45650 129.8472
    a.4  2015-02-04     a 61.71320 123.83633 129.7673
    a.5  2015-02-05     a 56.56125 120.86158 126.1371
    a.6  2015-02-06     a 66.13340 119.93428 127.1819
    a.7  2015-02-07     a 59.56807 105.83208 125.1244
    a.8  2015-02-08     a 49.98779  95.66024 139.2321
    a.9  2015-02-09     a       NA        NA       NA
    a.10 2015-02-10     a       NA        NA       NA
    b.11 2015-02-01     b       NA        NA       NA
    b.12 2015-02-02     b       NA        NA       NA
    b.13 2015-02-03     b 55.71327 117.52219 139.3961
    b.14 2015-02-04     b 54.58450 107.81763 142.6101
    b.15 2015-02-05     b 50.48102 104.94084 136.3167
    b.16 2015-02-06     b 37.89790  95.45489 135.4044
    b.17 2015-02-07     b 33.05259  85.90916 150.8673
    b.18 2015-02-08     b 49.91385  90.04940 147.1376
    b.19 2015-02-09     b       NA        NA       NA
    b.20 2015-02-10     b       NA        NA       NA
    

    3) 宽格式 另一种方法是将df 从长格式转换为宽格式,在这种情况下,普通的rollmean 就可以了。

    rollmean(read.zoo(df, split = 2), 5, fill = NA)
    

    给予:

                    x.a       y.a      z.a      x.b       y.b      z.b
    2015-02-01       NA        NA       NA       NA        NA       NA
    2015-02-02       NA        NA       NA       NA        NA       NA
    2015-02-03 66.50522 127.45650 129.8472 55.71327 117.52219 139.3961
    2015-02-04 61.71320 123.83633 129.7673 54.58450 107.81763 142.6101
    2015-02-05 56.56125 120.86158 126.1371 50.48102 104.94084 136.3167
    2015-02-06 66.13340 119.93428 127.1819 37.89790  95.45489 135.4044
    2015-02-07 59.56807 105.83208 125.1244 33.05259  85.90916 150.8673
    2015-02-08 49.98779  95.66024 139.2321 49.91385  90.04940 147.1376
    2015-02-09       NA        NA       NA       NA        NA       NA
    2015-02-10       NA        NA       NA       NA        NA       NA
    

    之所以有效,是因为两组的日期相同。如果日期不同,那么它可能会引入 NA,rollmean 无法处理这些。在这种情况下使用

    rollapply(read.zoo(df, split = 2), 5, mean, fill = NA)
    

    注意:由于输入在其定义中使用随机数以使其可重现,我们必须首先发出set.seed。我们使用了这个:

    set.seed(123)
    date <- as.Date(c("2015-02-01", "2015-02-02", "2015-02-03", "2015-02-04", 
              "2015-02-05", "2015-02-06", "2015-02-07", "2015-02-08",  
              "2015-02-09", "2015-02-10", "2015-02-01", "2015-02-02", 
              "2015-02-03", "2015-02-04", "2015-02-05", "2015-02-06", 
              "2015-02-07", "2015-02-08", "2015-02-09", "2015-02-10"))
    index <- c("a","a","a","a","a","a","a","a","a","a",
               "b","b","b","b","b","b","b","b","b","b")
    x <- runif(20,1,100)
    y <- runif(20,50,150)
    z <- runif(20,100,200)
    

    【讨论】:

      【解决方案2】:

      这应该使用库 dplyrzoo 来解决问题:

      library(dplyr)
      library(zoo)
      
      df %>% 
        group_by(index) %>% 
        mutate(x_mean = rollmean(x, 5, fill = NA),
               y_mean = rollmean(y, 5, fill = NA),
               z_mean = rollmean(z, 5, fill = NA))
      

      您可能可以使用mutate_each 或其他形式的mutate 来整理它。

      您还可以更改rollmean 中的参数以满足您的需要,例如align = "right"na.pad = TRUE

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2012-02-14
        • 2015-04-23
        • 2021-08-03
        • 2019-01-20
        • 1970-01-01
        相关资源
        最近更新 更多