指数加权移动平均线的快速 R 实现？答案

【问题标题】：Fast R implementation of an Exponentially Weighted Moving Average?指数加权移动平均线的快速 R 实现？
【发布时间】：2017-03-13 21:45:45
【问题描述】：

我想对 R 中的向量执行指数加权移动平均（参数化定义为 here）。有没有比我下面的第一次尝试更好的实现？

我的第一次尝试是：

ewma <- function(x, a) {
  n <- length(x)
  s <- rep(NA,n)
  s[1] <- x[1]
  if (n > 1) {
    for (i in 2:n) {
      s[i] <- a * x[i] + (1 - a) * s[i-1]
    }
  }
  return(s)
}

y <- 1:1e7
system.time(s <- ewma(y,0.5))
#user  system elapsed 
#   2.48    0.00    2.50

在我的第二次尝试中，我认为通过矢量化可以做得更好：

ewma_vectorized <- function(x,a) {
  a <- 0.1
  n <- length(x)
  w <- cumprod(c(1, rep(1-a, n-1)))
  x1_contribution <- w * x[1]
  w <- a * w
  x <- x[-1]
  s <- apply(as.array(1:(n-1)), 1, function(i,x,w){sum(w[i:1] * x[1:i])}, x=x, w=w)
  s <- x1_contribution + c(0,s)
  return(s)
}

system.time(s <- ewma_vectorized(y,0.5))
# I stopped the program after it continued to run for 4min

我想我不应该对第二次尝试的结果感到太惊讶。这是一个非常丑陋的矢量化尝试。但是必须有一些类似的东西在我的第一次尝试中有所改进......对吗？

更新：

我确实找到了更好的实现 here 并对其进行了如下调整：

ewma_vectorized_v2 <- function(x, a) {
  s1 <- x[1]
  sk <- s1
  s <- vapply(x[-1], function(x) sk <<- (1 - a) * x + a * sk, 0)
  s <- c(s1, s)
  return(s)
}

system.time(s <- ewma_vectorized_v2(y,0.5))
# user  system elapsed 
#   1.74    0.01    1.76

【问题讨论】：

您只是想使用快速实现吗？ The TTR package implements it in C.
我试图在 R 中执行此操作并避免使用任何包。但这是一个有用的链接。
@Gregor：这是一个可怕的包。 :P OP 可以改用stats::filter。
@JoshuaUlrich 哈！在那个问题上，我当然会听从你的。我承认我没用过，只是关注了Gabor's link here。

标签： r apply

【解决方案1】：

您可以使用stats::filter：

ewma.filter <- function (x, ratio) {
  c(filter(x * ratio, 1 - ratio, "recursive", init = x[1]))
}
set.seed(21)
x <- rnorm(1e4)
all.equal(ewma.filter(x, 0.9), ewma(x, 0.9))
# [1] TRUE

这比你第一次尝试的编译版本快一点：

ewma <- compiler::cmpfun(function(x, a) {
  n <- length(x)
  s <- rep(NA,n)
  s[1] <- x[1]
  if (n > 1) {
    for (i in 2:n) {
      s[i] <- a * x[i] + (1 - a) * s[i-1]
    }
  }
  return(s)
})
microbenchmark(ewma.filter(x,0.9), ewma(x, 0.9))
Unit: microseconds
                expr      min        lq   median       uq      max neval
 ewma.filter(x, 0.9)  318.508  341.7395  368.737  473.254 1477.000   100
        ewma(x, 0.9) 1364.997 1403.4015 1458.961 1503.876 2221.252   100

【讨论】：

【解决方案2】：

在我的机器（R 3.3.2 windows）上，您的第一个循环大约需要 16 秒。启用 jit 编译，通过在函数定义前添加行 compiler::enableJIT(2)，代码在 ~1 秒内运行。

无论如何，如果你真的想快点，我认为你应该使用 C/C++，正如你在下面使用 Rcpp 的例子中看到的那样：

library(Rcpp)

sourceCpp(
  code = 
    "
     #include <Rcpp.h>
     // [[Rcpp::export]]
     Rcpp::NumericVector ewmaRcpp(Rcpp::NumericVector x, double a){
       int n = x.length();
       Rcpp::NumericVector s(n);
       s[0] = x[0];
       if (n > 1) {
         for (int i = 1; i < n; i++) {
           s[i] = a * x[i] + (1 - a) * s[i-1];
         }
       }
       return s;
     }

    ")

y <- 1:1e7
system.time(s2 <- ewmaRcpp(y,0.5))
# user  system elapsed 
# 0.06    0.01    0.07

【讨论】：

【解决方案3】：

@digEmAll 对 Rcpp 版本非常友好，但也请注意，您可以只使用 TTR 包，或者，正如其作者所说，我在（现已失效的）R Graph 上的帖子中使用的 stats::filter() 方法十年前的画廊。

无论如何，快速枪战显示 Rcpp 版本要快得多......这可能意味着我们的参数设置错误：

R> sourceCpp("/tmp/ema.cpp")

R> library(TTR)

R> library(microbenchmark)

R> y <- as.numeric(1:1e6)   # else the sequence creates ints

R> microbenchmark(ewmaRcpp(y,0.5), EMA(y, n=10))
Unit: milliseconds
             expr      min       lq     mean   median       uq      max neval cld
 ewmaRcpp(y, 0.5)  2.43666  2.45705  3.06699  2.47283  2.51439  9.76883   100  a 
   EMA(y, n = 10) 15.13208 15.37910 21.36930 15.59278 17.26318 76.45934   100   b
R>

实际上，lambda=0.5 是一种异常强烈的衰变，相当于一天的半衰期，或N=1。如果我使用它，差距更宽。

为了完整起见，整个文件可以是Rcpp::sourceCpp()-ed：

#include <Rcpp.h>
// [[Rcpp::export]]
Rcpp::NumericVector ewmaRcpp(Rcpp::NumericVector x, double a){
  int n = x.length();
  Rcpp::NumericVector s(n);
  s[0] = x[0];
  if (n > 1) {
    for (int i = 1; i < n; i++) {
      s[i] = a * x[i] + (1 - a) * s[i-1];
    }
  }
  return s;
}

/*** R
library(TTR)
library(microbenchmark)

y <- as.numeric(1:1e6)   # else the sequence creates ints
microbenchmark(ewmaRcpp(y,0.5), EMA(y, n=1))
*/

【讨论】：

我会为基准设置TTR::EMA(y, ratio=0.5)（并不是说我认为它会有很大的不同）。还要注意TTR::EMA 比stats::filter 和这个Rcpp 版本做得更多：它有一些错误检查，处理前导NA，并使用try.xts 和reclass 范式在内部处理许多不同的类型对象。