从数据框生成时间序列列表答案

【问题标题】：Generate a List of TimeSeries from a dataframe从数据框生成时间序列列表
【发布时间】：2018-10-29 13:13:57
【问题描述】：

我有一个如下所示的数据框：

# A tibble: 6 x 4
# Groups:   IND_LOC [1]
  year_month    total_this_month mean_this_month IND_LOC
  <S3: yearmon>            <dbl>           <dbl> <fct>  
1 Jan 2013              3960268.         360024. 8_9    
2 Feb 2013              3051909.         277446. 8_9    
3 Mar 2013              3504636.         318603. 8_9    
4 Apr 2013              3234451.         294041. 8_9    
5 May 2013              3409146.         284096. 8_9    
6 Jun 2013              3619219.         301602. 8_9

最后一列“IND_LOC”有 89 个唯一值（1_1、1_2 ... 8_9）

我想生成与这些“IND_LOC”值相对应的时间序列列表，使其具有以下结构（这只是一个不同数据集的示例，将“$1_1”替换为“$Germany”等）：

> str(time_series)
List of 9
 $ Germany    : Time-Series [1:52] from 1960 to 2011: 684721 716424 749838   ...
 $ Singapore  : Time-Series [1:52] from 1960 to 2011: 7208 7795 8349   ...
 $ Finland    : Time-Series [1:37] from 1975 to 2011: 85842 86137 86344   ...

非常感谢任何帮助！

【问题讨论】：

哪个变量应该是时间序列变量，如果按'IND_LOC'分组，那么开始和结束应该从'year_month'开始？
我希望 'IND_LOC' ('1_1' ... '8_9') 的值是组对象，例如 $Germany （即时间序列正在绘制与 IND_LOC = 对应的所有值='1_1' ...'8_9'）。假设我们可以将“IND_LOC”视为“国家”变量，如预期结果示例所示。实际的时间序列数据将是 year_month 列（第一列）。
您可能需要稍后将“IND_LOC”映射到国家/地区，因为示例中没有国家/地区列

标签： r dplyr xts zoo lubridate

【解决方案1】：

另一个选项，使用split 和lapply；并使用zoo 作为助手转换为ts。

dat <- read.csv(text="year_month,total_this_month,mean_this_month,IND_LOC
Jan 2013,3960268,360024,8_9
Feb 2013,3051909,277446,8_9
Mar 2013,3504636,318603,8_9
Apr 2013,3234451,294041,8_9
May 2013,3409146,284096,8_9
Jun 2013,3619219,301602,8_9
Jan 2013,3960268,360024,9_9
Feb 2013,3051909,277446,9_9
Mar 2013,3504636,318603,9_9
Apr 2013,3234451,294041,9_9
May 2013,3409146,284096,9_9
Jun 2013,3619219,301602,9_9")
dat$year_month <- as.yearmon(dat$year_month)

library(zoo)
time_series <- lapply(split(dat, dat$IND_LOC),
  function(x) as.ts(zoo(x$total_this_month, x$year_month)))
str(time_series)
# List of 2
#   $ 8_9: Time-Series [1:6] from 1 to 6: 3234451 3051909 3960268 3619219 3504636
#   $ 9_9: Time-Series [1:6] from 1 to 6: 3234451 3051909 3960268 3619219 3504636
sapply(time_series, frequency)
# 8_9 9_9
#  12  12

【讨论】：

啊，真的。我忘了你可以在 data.frames 和类似对象上使用read.zoo()。

【解决方案2】：

我们可以和summarise进行分组

library(dplyr)
library(lubridate)
df %>%
  group_by(IND_LOC) %>%
  summarise(time_series = list(ts(total_this_month, 
       start= c(year(year_month[1]), month(year_month[1])), frequency = 12)))

【讨论】：

这太棒了 - 我已经命名了新变量 Akrun，以表敬意 :) 有没有办法在根据对应的“IND_LOC”命名每个列表时做到这一点？
@DavideLorino 谢谢你的 cmets.. 你让我开心:-)。关于命名。假设，如果输出对象是out <- df %>% group_by(..，那么names(out$time_series) <- unique(out$IND_LOC)