如何使用 PCA 对高度相关的变量进行时间序列预测？答案

【问题标题】：How to use PCA for time-series predictions on highly correlated variables?如何使用 PCA 对高度相关的变量进行时间序列预测？
【发布时间】：2018-03-04 21:50:04
【问题描述】：

假设我有一个名为 Economic_Trends 的数据集，它包含 5 列和 10 年的数据。例如，

Economic_Trend <- matrix(c(1:10, 2:11, rnorm(30)), byrow = FALSE, nrow = 10, ncol = 5)

然后这是我运行的代码：

mu <- colMeans(Economic_Trend)

Econ_pca <- prcomp(Economic_Trend)

PCA1 <- Econ_pca$x[,1]
PCA2 <- Econ_pca$x[,2]


plot(Econ_pca$x[,1], Econ_pca$x[,2])
plot(lm(PCA1 ~ PCA2))

我看到我现在有两个主要组成部分，但我不确定在哪里可以找到订单，因为如果年份搞砸了怎么办？例如，我想做预测，以便我可以找到未来 10 年。如何将我的 PC 拟合到回归模型，找到时间序列中的后续步骤，然后重建原始数据？

谢谢！

【问题讨论】：

标签： r pca

【解决方案1】：

最重要的是将您的 PC 保存为时间序列 (ts) 对象，以保持它们的顺序。这里有一些代码可以帮助解决这个问题，并绘制 PC：

require(xts)  ## Makes autoplot work
require(ggfortify)  ## Allows multiple PCs to be plotted in one graph
## Maybe some others

## Given code:
Economic_Trend <- matrix(c(1:10, 2:11, rnorm(30)), byrow = FALSE, nrow = 10, 
ncol = 5)
mu <- colMeans(Economic_Trend)

## Tweaked code: 
Econ_pca <- prcomp(Economic_Trend[,3:5]) ## First two columns are order 
## number
PCA1 <- ts(Econ_pca$x[,1])  ## Save first PC as a time series object
PCA2 <- ts(Econ_pca$x[,2])  ## Save second PC as a time series object
p <- autoplot(ts(cbind(PCA1,PCA2)), facets = FALSE) +  
     ## Create ggplot object plotting first 2 PCs
     ggtitle("First two PCs") + ylab("Econ ($?)") + ## Add plot and y-axis labels 
     theme_bw() +  ## Remove grayscale background grid
     theme(axis.text=element_text(size=rel(1)),legend.text = element_text(size = 12))
p  ## Display plot

【讨论】：

【解决方案2】：

我没有看到您的数据。但是，如果名称中有“趋势”，那么该系列可能有趋势并且您想对其进行预测。最好应用专为时间序列设计的特殊版本的 PCA。这就是奇异谱分析 (SSA)。

library(Rssa)
library(lattice)
# Decompose 'EuStockMarkets' series with default parameters
ss <- ssa(EuStockMarkets, kind = "mssa")
rec <- reconstruct(ss, groups = list(Trend = 1:2))

foreca <- rforecast(ss, groups = list(Trend = 1:2), 
                    len = 200, only.new = TRUE)
data <- cbind(EuStockMarkets, rec$Trend, foreca)
xyplot(data, type = "l", superpose = TRUE,
       auto.key = list(columns = 3),
       col = c("blue", "darkgreen", "red", "violet"),
       lty = c(rep(2, 4), rep(1, 4), rep(3, 4)))

【讨论】：