R中时间序列的相关系数运行窗口答案

【问题标题】：Running window of correlation coefficients for time series in RR中时间序列的相关系数运行窗口
【发布时间】：2017-03-09 00:26:33
【问题描述】：

我希望使用 R 中的运行窗口计算 12 个变量的相关系数。

我的数据存储在一个带有 %m.%d.%Y %H:%M:%S 索引的动物园对象中，12 个变量中的每一个都有 1343 个观察值。我不知道我要使用什么窗口大小，但如果需要我可以更改它。

@Joshua Ulrich 已发布here 如何使用 rollapplyr 计算滚动相关性，但此示例只有两个变量。以我有限的 R 经验，我不确定如何结合应用族函数之一来运行所有 12 个变量的相关性。

任何指针将不胜感激

我的数据如下：

> head(wideRawXTS)
                    DO0182U09A3 DO0182U09B3 DO0182U09C3 DO0182U21A1 DO0182U21A2 DO0182U21A3 DO0182U21B1 DO0182U21B2 DO0182U21B3
2017-01-20 16:30:00     -101.50     -103.37     -103.86     -104.78     -104.95     -105.33     -102.50      -99.43     -104.05
2017-01-20 16:45:00     -101.32     -102.75     -104.22     -104.51     -103.94     -105.29     -102.82     -101.99     -103.94
2017-01-20 17:00:00     -101.45     -103.30     -103.93     -104.70     -104.82     -105.13     -103.72     -103.95     -104.25
2017-01-20 17:15:00     -100.91      -95.92      -99.22     -103.83     -104.72     -105.19     -103.57     -101.36     -104.09
2017-01-20 17:30:00     -100.91     -103.04     -104.09     -102.15     -104.91     -105.18     -103.88     -104.09     -103.96
2017-01-20 17:45:00     -100.97     -103.67     -104.12     -105.07     -104.23      -97.48     -103.92     -103.89     -104.01
                    DO0182U21C1 DO0182U21C2 DO0182U21C3
2017-01-20 16:30:00     -104.51     -104.42     -105.17
2017-01-20 16:45:00     -104.74     -104.65     -105.25
2017-01-20 17:00:00     -105.02     -105.04     -105.32
2017-01-20 17:15:00     -103.90     -102.95     -105.16
2017-01-20 17:30:00     -104.75     -105.07     -105.23
2017-01-20 17:45:00     -105.08     -105.14     -104.89

#Importing Data
rawDF <- read.csv("RTWP_DO0182_14Day_Raw.csv",
                  header = F,
                  col.names = c("Period Start Time",
                                "PLMNname",
                                "RNCname",
                                "WBTSname",
                                "WBTSID",
                                "WCELname",
                                "WCELID",
                                "Overload",
                                "AverageRTWP",
                                "DC_RTWP_High_PRX",
                                "RTWP_Threshold"),
                  colClasses = c("character",
                                 "NULL", #drops PLMN name
                                 "NULL", #drops RNC name
                                 "character", 
                                 "integer",
                                 "character",
                                 "integer",
                                 "NULL", #drops Overload
                                 "numeric",
                                 "NULL", #drops DC_RTWP_High_PRX
                                 "NULL"), #drops RTWP_Threshold
                  strip.white=TRUE,
                  na.strings = c("NA", 
                                 "NULL"),
                  as.is = T,
                  skip=2)

#convert period.start.time to POSIXlt
rawDF$Period.Start.Time <- as.POSIXlt(strptime(rawDF$Period.Start.Time, 
                                               format="%m.%d.%Y %H:%M:%S"))

#dcast the long data frame to a wide data frame
wideRawDF <- dcast(rawDF, 
                   Period.Start.Time ~ WCELname,
                   value.var = "AverageRTWP")

#assign the date times as rownames for converting to XTS
rownames(wideRawDF) = wideRawDF[[1]]

#drops the duplicate Period Start Time column since date times are rownames
wideRawDF$Period.Start.Time <- NULL

#Convert wideRawDF to XTS time series
wideRawDF <- as.xts(wideRawDF)

#NA values replaced by interpolated values
wideRawDF <- na.approx(wideRawDF, na.rm = FALSE)

#DF is centered by subtracting the column means and scaled by dividing the
#centered columns by their standard deviations
wideRawDFscaled <- scale(wideRawDF, center = TRUE, scale = TRUE)

#window <- 20
#cors <- combn(names(wideRawDFscaled),2,
#              function(p) rollapply(wideRawDFscaled, window ,
#                                    function(x) cor(x[,p[1]],x[,p[2]]),
#                                    by.column = FALSE))
#colnames(cors) <- combn(names(wideRawDFscaled),2,paste,collapse=".")

#Running window of correlation coefficients
Cor <- function(x) {
  corr <- cor(wideRawDFscaled)
  out <- as.data.frame.table(corr)[lower.tri(corr), ]
  with(out, setNames(Freq, paste(Var1, Var2)))
}
slidingCor <- rollapplyr(wideRawDFscaled, 6, Cor, by.column = FALSE)

【问题讨论】：

标签： r time-series correlation zoo

【解决方案1】：

假设注释中可重复显示的 3 列输入并使用宽度为 3 的窗口进行说明，定义相关函数 Cor，该函数接受一个矩阵并计算其相关矩阵，提取下三角部分以消除冗余并添加列名.现在将它与rollapplyr 一起使用：

library(xts)

Cor <- function(x) {
    corr <- cor(x)
    out <- as.data.frame.table(corr)[lower.tri(corr), ]
    with(out, setNames(Freq, paste(Var1, Var2)))
}
rollapplyr(xx, 3, Cor, by.column = FALSE)

给予：

                    DO0182U09B3.DO0182U09A3 DO0182U09C3.DO0182U09A3 DO0182U09C3.DO0182U09B3
2017-01-20 16:30:00                      NA                      NA                      NA
2017-01-20 16:45:00                      NA                      NA                      NA
2017-01-20 17:00:00                 0.98573                -0.99613                -0.99671
2017-01-20 17:15:00                 0.98629                 0.95983                 0.99297
2017-01-20 17:30:00                 0.52664                 0.47475                 0.99820
2017-01-20 17:45:00                 0.56204                 0.50460                 0.99769

注意：输入xx的可重复形式为：

xx <- structure(c(-101.5, -101.32, -101.45, -100.91, -100.91, -100.97, 
-103.37, -102.75, -103.3, -95.92, -103.04, -103.67, -103.86, 
-104.22, -103.93, -99.22, -104.09, -104.12), .Dim = c(6L, 3L), .Dimnames = list(
    NULL, c("DO0182U09A3", "DO0182U09B3", "DO0182U09C3")), index = structure(c(1484947800, 
1484948700, 1484949600, 1484950500, 1484951400, 1484952300), tzone = "", tclass = c("POSIXct", 
"POSIXt")), class = c("xts", "zoo"), .indexCLASS = c("POSIXct", 
"POSIXt"), tclass = c("POSIXct", "POSIXt"), .indexTZ = "", tzone = "")

【讨论】：

@G，格洛腾迪克，感谢您的回复。我已经使用scale() 缩放了我的wideRawXTS 数据框，现在称为wideRawDFScaled，当我在wideRawDFScaled 上运行cor() 时，wideRawDFScaled 是一个 XTS 对象，尺寸为 [1:1343, 1:12] 我看到 DO0182U09B3.DO0182U09A3 的相关性为 0.2759166。当我在上面运行您的代码时，对于相同的两个变量，我得到相同的数字，但它在所有时间序列中重复。我已将图像附加到 OP，显示我所指的内容。数组返回也不应该是 12x12x1343 吗？如果我有 12 个变量，我不应该有 12x12 行和列吗？
如果您有 n 个变量，则在相关矩阵的对角线下方（或上方）有 choose(n, 2) = n * (n-1) / 2 个值。请将您的问题简化为具有相同问题且完全可重现的最小问题。
我再次尝试了您的代码，并按照您的示例将x 替换为我的动物园对象wideRawDFscaled，我相信它现在可以工作了。以我有限的编程经验，函数的定义让我很困惑，我见过很多人定义function(x)，我猜我的函数定义应该是Cor <- function(wideRawDFscaled) { corr <- cor(wideRawDFscaled)
您可以在函数的参数中使用任何您喜欢的变量名称，只要您在函数体中使用相同的名称即可。它们与函数外的同名变量无关。例如，将 function(x) x 更改为 function(y) y 没有任何作用。它们都为相同的输入产生相同的输出。
我已将输入名称更改为xx，以清楚地将其与Cor 中的形式参数x 区分开来。这不会改变任何东西，只是为了根据您的评论清楚。

【解决方案2】：

您可以使用combn 将关联应用于每对列。调整您引用的答案：

window <- 20
cors <- combn(colnames(wideRawXTS),2,
              function(p) rollapply(wideRawXTS, window  ,
                                    function(x) cor(x[,p[1]],x[,p[2]]), 
                                    by.column=FALSE))
colnames(cors) <- combn(colnames(wideRawXTS),2, paste, collapse=".")

【讨论】：

感谢您抽出宝贵时间回复。我有一个我无法弄清楚的错误，> colnames(cors) <- combn(names(wideRawDFscaled),2,paste,collapse=".") Error in colnames(*tmp*, value = c("DO0182U09A3.DO0182U09B3", "DO0182U09A3.DO0182U09C3", : length of 'dimnames' [2] not equal to array extent
尝试用colnames替换names
请输入您的数据 - 我无法重现错误