由于日期重复，无法将零需求日期添加到动物园时间序列答案

【问题标题】：Cannot add dates of zero demand to zoo time series due to duplicate dates由于日期重复，无法将零需求日期添加到动物园时间序列
【发布时间】：2017-03-29 10:19:17
【问题描述】：

对于任何不遵守发布问题规则的行为，我提前致歉。下面的数据表是我要转换为时间序列的示例。

> Materials
MaterialID  Date     Quantity
   1      2011-01-04      13
   1      2011-01-04      5
   2      2011-01-07      9
   3      2011-01-09      3
   3      2011-01-11      10

它由 2011-2014 之间的几个物料项目的交易条目组成。整个数据集的日期范围是 2011 年 1 月 4 日 - 2014 年 12 月 31 日。我想在此期间为每个物料创建一个交易条目，而通过将缺失日期的 Quantity 变量设置为零来计算缺失日期。换句话说，我希望的结果是，对于 2011 年 1 月 4 日至 2014 年 12 月 31 日之间的每个日期，数据集中的每种材料都会有一个条目，如下所示：

   Date    MaterialID_1  MaterialID_2 MaterialID_3
2011-01-04    13               0          0
2011-01-04    5                0          0
2011-01-05    0                0          0
2011-01-06    0                0          0
2011-01-07    0                9          0
2011-01-08    0                0          0
2011-01-09    0                0          3
2011-01-10    0                0          10
2011-01-11    0                0          0
    .         .                .          .
    .         .                .          .
    .         .                .          .
2014-12-31    0                0          0

我尝试了一些我在论坛中看到的方法，例如Add months of zero demand to zoo time series，但是因为我有重复的日期，我得到了错误，“'order.by' 中的索引条目不是唯一的”。如果我能得到任何建议或帮助，我将不胜感激。

把数据弄成这种格式后，我的意图是重塑数据集做批量预测。谢谢。

见下面的输入代码：

dput(Data)
structure(list(MaterialID = c(1L, 1L, 2L, 3L, 1L), Date = c("2011-01-04", 
"2011-01-04", "2011-01-07", "2011-01-09", "2011-01-11"), Quantity = c(13L, 
5L, 9L, 3L, 10L)), .Names = c("MaterialID", "Date", "Quantity"
), class = "data.frame", row.names = c(NA, -5L))

【问题讨论】：

不要使用图像来显示输入数据。如果 DF 是显示的 9 行，则在您的问题中显示 dput(DF) 的输出，并显示预期的输出是什么。如果输出太长，请更改您的问题，因此不会太长。阅读minimal reproducible example。
@G.Grothendieck。感谢您的指导。仍在学习中，但我会确保我未来的帖子和示例更符合这里的预期。
@G.Grothendieck。是的，这在这种情况下很有用。现在，我只想用这些数据来做一个 12 个月的预测。谢谢！
@G.Grothendieck 我试图解决这个问题。希望它现在看起来更好吗？

标签： r time-series zoo

【解决方案1】：

您可以使用 xts 对象通过 split-apply-combine 操作来做到这一点。与 zoo 不同，xts 对象允许重复索引。

# sample data
Data <- read.csv(text = "MaterialID,Date,Quantity
1,2011-01-04,13
1,2011-01-04,5
1,2011-05-06,9
1,2011-08-07,3
1,2011-12-08,10
2,2011-03-09,4
3,2011-02-10,7
3,2011-10-11,78
3,2014-31-12,32", as.is = TRUE)
# split data into groups by material id
dataByMaterialId <- split(Data, Data$MaterialID)
# create an xts object for each id
xts_list <- lapply(dataByMaterialId, function(id) {
  names <- list(NULL, paste0("Qty.", id$MaterialID[1]))
  xts(id$Quantity, as.Date(id$Date, "%Y-%d-%m"), dimnames = names)
})
# use do.call + merge to combine all your xts objects into one object
xts_merged <- do.call(merge, c(xts_list, fill = 0)())
#            Qty.1 Qty.2 Qty.3
# 2011-04-01    13     0     0
# 2011-04-01     5     0     0
# 2011-06-05     9     0     0
# 2011-07-08     3     0     0
# 2011-08-12    10     0     0
# 2011-09-03     0     4     0
# 2011-10-02     0     0     7
# 2011-11-10     0     0    78
# 2014-12-31     0     0    32

【讨论】：

感谢 Joshua，这正是我所需要的。它工作得很好。祝福你！

【解决方案2】：

我正在使用 expand.grid 来获取所有组合，然后使用 merge()。我在这里使用随机数据

df <- data.frame(materialid = rpois(10, 3), date = as.Date(seq(1, 365 * 4, length.out = 10), origin = '2011-01-01'), quantity = rpois(10, 100))

df2 <- expand.grid(unique(df$materialid), as.Date(min(df$date):max(df$date), origin = '1970-01-01'))
names(df2) <- c('materialid', 'date')

df2 <- merge(df2, df, by = c('materialid', 'date'), all.x = T)
df2$quantity[is.na(df2$quantity)] <- 0
summary(df2)

【讨论】：

谢谢德克。我测试了你的解决方案，它在填写空白日期方面做得很好。但是，Joshua 的解决方案以我需要的确切格式为我提供了最终输出。再次感谢。也祝福你！