【问题标题】:Compute all fixed window averages with dplyr and RcppRoll使用 dplyr 和 RcppRoll 计算所有固定窗口平均值
【发布时间】:2018-09-29 06:36:41
【问题描述】:

我想使用 dplyr 和 RcppRoll 计算所有(或至少许多)固定窗口平均值。例如,如果我想根据storms 数据计算前 4、5 和 6 个时间步的平均风速,我可以使用以下方法:

library(dplyr)
library(RcppRoll)

set.seed(1)
storms <- storms[storms$name %in% sample(storms$name, size = 4),]

storms %>%
  select(name, year, month, day, hour, wind) %>%
  group_by(name) %>%
  arrange(name, year, month, day, hour) %>%
  mutate_at("wind", .funs = funs(
    "avg_4" = roll_meanr(., n = 4, fill = NA),
    "avg_5" = roll_meanr(., n = 5, fill = NA),
    "avg_6" = roll_meanr(., n = 6, fill = NA)
  ))

这可行,但是如果我想计算 2 到 20 窗口的所有固定窗口平均值,我会厌倦复制和粘贴 funs() 内的行。

似乎我应该能够以某种方式对其进行参数化,但我还没有弄清楚如何。

【问题讨论】:

    标签: r dplyr


    【解决方案1】:

    使用 Base R,希望对您有所帮助:

    storms_wind <- storms %>%
        select(name, year, month, day, hour, wind) %>%
        group_by(name) %>%
        arrange(name, year, month, day, hour)
    
    multi_avg <- function(df, start, end) {
                     for(i in (strat:end)){
                     varname <- paste("avg", i , sep="_")
                     df[[varname]] <- with(df, roll_meanr(wind, n = i, fill = NA))
                    }
                 df
               }
    
    
    multi_avg(df=storms_wind, start=4,end=20) 
    

    【讨论】:

      【解决方案2】:

      只需使用引用和取消引用的力量! 这就是你所拥有的:

      library(dplyr)
      library(RcppRoll)
      
      set.seed(1)
      storms <- storms[storms$name %in% sample(storms$name, size = 4),]
      
      storms_subset <- storms %>%
        select(name, year, month, day, hour, wind) %>%
        group_by(name) %>%
        arrange(name, year, month, day, hour) %>%
        mutate_at("wind", .funs = funs(
          "avg_4" = roll_meanr(., n = 4, fill = NA),
          "avg_5" = roll_meanr(., n = 5, fill = NA),
          "avg_6" = roll_meanr(., n = 6, fill = NA)
        ))
      

      现在让我们创建一个函数,为不同的xs 和ns 构建一组表达式,例如roll_meanr(x, n)

      make_rollmeans <- function(..., .n = 3) {
        # this line captures vars you typed in
        .dots <- rlang::exprs(...)
      
        # now you iterate over captured variables...
        q <- purrr::map(.dots, function(.var) {
          # ... and over window sizes
          purrr::map(.n, function(.nn) {
            # for each (variable, window) pair make an expression
            rlang::expr(RcppRoll::roll_meanr(!!.var, !!(.nn)))
          }) %>% 
            # set proper names by combining variable name, "avg", and window size
            purrr::set_names(paste0(as.character(.var), "_avg_", .n))
        }) %>%
          # and finally remove inner structure of list of expressions
          # after that you'll have a list of expressions with depth 1 
          purrr::flatten() 
        q
      }
      

      所有的魔法都来自rlang::expr(RcppRoll::roll_meanr(!!.var, !!(.nn)))。 使用!!.var,您可以将.var 替换为输入变量名称,即wind。 使用!!.nn,您将.nn 替换为number。 接下来,用rlang::expr(...) 引用表达式。

      此函数获取不带"" 的变量名和窗口大小向量。输出如下所示:

      make_rollmeans(wind, pressure, .n = c(3, 5))
      #> $wind_avg_3
      #> RcppRoll::roll_meanr(wind, 3)
      #> 
      #> $wind_avg_5
      #> RcppRoll::roll_meanr(wind, 5)
      #> 
      #> $pressure_avg_3
      #> RcppRoll::roll_meanr(pressure, 3)
      #> 
      #> $pressure_avg_5
      #> RcppRoll::roll_meanr(pressure, 5)
      

      你可以看到你正在寻找的表达式。

      接下来,您可以将 make_rollmeans 放入 mutate() 调用中,使用 !!! (bang-bang-bang) 运算符取消引用由它构建的表达式。

      select(storms_subset, wind) %>% mutate(!!!make_rollmeans(wind, .n = 3:20))
      #> Adding missing grouping variables: `name`
      #> # A tibble: 261 x 20
      #> # Groups:   name [4]
      #>    name     wind wind_avg_3 wind_avg_4 wind_avg_5 wind_avg_6 wind_avg_7
      #>    <chr>   <int>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
      #>  1 Ernesto    30       NA         NA          NA        NA         NA  
      #>  2 Ernesto    30       NA         NA          NA        NA         NA  
      #>  3 Ernesto    30       30.0       NA          NA        NA         NA  
      #>  4 Ernesto    35       31.7       31.2        NA        NA         NA  
      #>  5 Ernesto    40       35.0       33.8        33.       NA         NA  
      #>  6 Ernesto    50       41.7       38.8        37.       35.8       NA  
      #>  7 Ernesto    60       50.0       46.2        43.       40.8       39.3
      #>  8 Ernesto    55       55.0       51.2        48.       45.0       42.9
      #>  9 Ernesto    50       55.0       53.8        51.       48.3       45.7
      #> 10 Ernesto    45       50.0       52.5        52.       50.0       47.9
      #> # ... with 251 more rows, and 13 more variables: wind_avg_8 <dbl>,
      #> #   wind_avg_9 <dbl>, wind_avg_10 <dbl>, wind_avg_11 <dbl>,
      #> #   wind_avg_12 <dbl>, wind_avg_13 <dbl>, wind_avg_14 <dbl>,
      #> #   wind_avg_15 <dbl>, wind_avg_16 <dbl>, wind_avg_17 <dbl>,
      #> #   wind_avg_18 <dbl>, wind_avg_19 <dbl>, wind_avg_20 <dbl>
      

      我希望结果与您所要求的相同。 :)

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2018-03-21
        • 1970-01-01
        • 2021-05-08
        • 1970-01-01
        • 2014-12-09
        • 1970-01-01
        • 2018-10-25
        相关资源
        最近更新 更多