如何计算R中两个向量的所有子集的相关性？答案

【问题标题】：How to compute correlation of all subset of two vectors in R?如何计算R中两个向量的所有子集的相关性？
【发布时间】：2015-04-07 10:57:54
【问题描述】：

我在 R 中工作。我有两个长度为 n 的向量，比如说 a 和 b。我想以这种方式计算所有长度为m的子集的相关性：

cor(a[1:m],b[1:m])
cor(a[m+1:2m],b[m+1:2m]) 
...
cor(a[km+1:n],b[km+1:n])

现在我正在使用循环，但它太慢了。我怎样才能更快地做到这一点？

【问题讨论】：

语法建议你还是用matlab....de.mathworks.com/matlabcentral/fileexchange/…

标签： r time-series subset apply correlation

【解决方案1】：

首先创建一个分组变量（index），然后按组计算相关性：

# Some fake data:
set.seed(123)
df <- data.frame(cbind(a = rnorm(100), b = rnorm(100), index = rep(1:10, each = 10)))

# Loading the pryr package:
library(plyr)

ddply(df, .(index), summarise, "corr" = cor(a, b))
   index        corr
1      1  0.26831285
2      2  0.14373593
3      3  0.21555988
4      4 -0.27461416
5      5 -0.08825786
6      6 -0.58680476
7      7 -0.02613450
8      8 -0.29408586
9      9  0.12030810
10    10 -0.04391428

或使用dplyr：

library(dplyr)
df %>% group_by(index) %>% summarise(cor(a,b))

或使用data.table：

library(data.table)
setDT(df)[,cor(a, b), by = index]

【讨论】：

问题依旧，我的向量长度是3172000，我的子集长度是61。脚本还在运行。
持续多久？第二个版本对我来说是最快的，持续时间不到一秒。