双for循环的高效实现答案

【问题标题】：Efficient implementation of double for-loop双for循环的高效实现
【发布时间】：2019-05-14 13:26:12
【问题描述】：

我是 R 新手，我想知道以下设置是否有更有效的实现？时间序列长度 (x,y) 约为 5000 并且 h != nrow(q)。

set.seed(1)
h = 21
x <- rnorm(5e3, 1)
y <- rnorm(5e3, 2)

q <- c(0.1, 0.3, 0.5, 0.7, 0.9)
qx <- quantile(x, probs = q)
qx <- expand.grid(qx, qx)
qy <- quantile(y, probs = q)
qy <- expand.grid(qy, qy)
q <- expand.grid(q, q)

f <- function(z, l, qz) {
n <- length(z)
1/(n - l) * sum((z[1:(n-l)] <= qz[[1]]) * (z[(1+l):n] <= qz[[2]])) - prod(q[i,])
}

sum = 0
for (i in 1:h) {
  for (j in 1:nrow(q)) {
    sum = sum + (f(x, l = i, qx[j,]) - f(y, l = i, qy[j,]))^2
  }
}
sum
# 0.0008698279

非常感谢！

【问题讨论】：

如果您用文字解释代码的目的可能会更容易为您提供帮助。
@JuliusVainora 代码的目的是根据估计分位数指标的协方差计算时间序列 x 和 y 之间的一些距离度量（用于聚类）。有关详细信息，请参阅https://link.springer.com/content/pdf/10.1007%2Fs11634-015-0208-8.pdf pp. 395-396，等式 (3) 和 (6)。

标签： r performance for-loop

【解决方案1】：

在某些情况下，一个更快的循环替代方案可能是sapply 函数。该函数的工作原理如下：对向量的每个元素执行一些函数。

或者，您可以查看foreach 包，它提供了一些快速循环。

这是一个使用 sapply 的示例：根据您的具体需要，您可能想要使用其中任何一个功能。此外，sapply 只是其中一种较快的方法，不一定是最快的。

# setup from the question
set.seed(1)
h = 1
x <- rnorm(5e3, 1)
y <- rnorm(5e3, 2)

q <- c(0.1, 0.3, 0.5, 0.7, 0.9)
qx <- quantile(x, probs = q)
qx <- expand.grid(qx, qx)
qy <- quantile(y, probs = q)
qy <- expand.grid(qy, qy)
q <- expand.grid(q, q)

f <- function(z, l, qz) {
  n <- length(z)
  1/(n - l) * sum((z[1:(n-l)] <= qz[[1]]) * (z[(1+l):n] <= qz[[2]])) - prod(q[i,])
}

# load microbenchmark library for comparison of execution times
library(microbenchmark)

microbenchmark({
  # the version from question with for loop
  sum = 0
  for (i in 1:h) {
    for (j in 1:nrow(q)) {
      sum = sum + (f(x, l = i, qx[j,]) - f(y, l = i, qy[j,]))^2
    }
  }
},
{
# using sapply and storing to object. this will give you h*j matrix as well as the sum
sum = 0
sapply(1:h, function(i) sapply(1:nrow(q), function(j) {sum <<- sum + (f(x, l = i, qx[j,]) - f(y, l = i, qy[j,]))^2}))
},
{
# use sapply and sum the output
sum(sapply(1:h, function(i) sapply(1:nrow(q), function(j) {(f(x, l = i, qx[j,]) - f(y, l = i, qy[j,]))^2})))},
# run each code 200 times to get the time comparison
times = 200
)

【讨论】：