您正在寻找的是面板数据结构。面板数据,也称为横截面时间序列数据,是随时间和实体变化的数据。在您的情况下,您的waves 中的value 在每个实体内随时间而变化,而group 因实体而异。我们可以做一个简单的gather 和join 来得到一个典型的面板数据格式。
library(tidyr)
library(dplyr)
panel_df = df %>%
gather(index, value) %>%
inner_join(lookup, by = "index") %>%
group_by(index) %>%
mutate(time = 1:n())
# index value group time
# <chr> <dbl> <chr> <int>
# 1 waves1 0.0000000 healthy 1
# 2 waves1 0.2474040 healthy 2
# 3 waves1 0.4794255 healthy 3
# 4 waves1 0.6816388 healthy 4
# 5 waves1 0.8414710 healthy 5
# 6 waves1 0.9489846 healthy 6
# 7 waves1 0.9974950 healthy 7
# 8 waves1 0.9839859 healthy 8
# 9 waves1 0.9092974 healthy 9
# 10 waves1 0.7780732 healthy 10
# # ... with 476 more rows
这里index代表实体维度,我手动创建了一个time变量来表示面板数据的时间维度。
要可视化面板数据,您可以使用ggplot2 执行以下操作:
library(ggplot2)
# Visualize all waves, grouped by health status
ggplot(panel_df, aes(x = time, y = value, group = index)) +
geom_line(aes(color = group))
# Only Healthy people
panel_df %>%
filter(group == "healthy") %>%
ggplot(aes(x = time, y = value, color = index)) +
geom_line()
# Compare healthy and unhealthy people's waves
panel_df %>%
ggplot(aes(x = time, y = value, color = index)) +
geom_line() +
facet_grid(. ~ group)
使用时间维度:
# plot acf for each entity `value` time series
par(mfrow = c(3, 2))
by(panel_df$value, panel_df$index, function(x) acf(x))
library(forecast)
panel_df %>%
filter(index == "waves1") %>%
{autoplot(acf(.$value))}
最后,plm 包非常适合处理面板数据。实现了来自计量经济学的各种面板回归模型,但为了不再给出这个答案,我将留下一些链接供自己研究。 pdim告诉你面板数据的实体和时间维度以及是否平衡:
library(plm)
# Check dimension of Panel
pdim(panel_df, index = c("index", "time"))
# Balanced Panel: n=6, T=81, N=486
- What is Panel Data?
- Getting Started in Fixed/Random Effects Models using R
- Regressions with Panel Data
为了更好的演示,我已经修改了你的数据。
数据:
library(zoo)
w1 <- sin(seq(0,20,0.25))
w2 <- cos(seq(0,20,0.25))
w3 = w1*2
w4 = w2*0.5
w5 = w1*w2
w6 = w2^2
df <- data.frame(w1,w2,w3,w4,w5,w6, stringsAsFactors = FALSE)
names(df) <- paste("waves", 1:6, sep="")
waves <- zoo(df)
lookup <- data.frame(index = paste("waves", 1:6, sep=""),
group = c("healthy", "unhealthy"),
stringsAsFactors = FALSE)