【问题标题】:R time series, complicated sequenceR时间序列,复杂序列
【发布时间】:2011-03-18 06:37:32
【问题描述】:

我正在尝试合并 R 中具有以下特征的两个不同时间序列:

  1. 数据必须在每天 08:30 到 15:00 之间。
  2. 数据跨越数周,而不仅仅是某一天。
  3. 数据中存在随机间隔的间隙。
  4. 两个数据集不一定会在相同的时间间隔存在间隙

我想合并这两个数据集,所有时间都在 08:30 到 15:00 的序列中,并且每个数据集之间存在间隔,我希望前一个值(或下一个值)结转。

# I have verified that the csv files are imported correctly
# The first column contains dates. and the strptime
# function can convert strings into Date/Time objects.
#
sec1_dates <- strptime(sec1[,1], "%m/%d/%Y %H:%M:%S")
sec2_dates <- strptime(sec2[,1], "%m/%d/%Y %H:%M:%S")

# The second column contains the close.
# I use the zoo function to create zoo objects from that data.
# But for some reason this ends up creating duplicates PROBLEM 1
#
a <- zoo(sec1[,2], sec1_dates)
b <- zoo(sec2[,2], sec2_dates)

# I know that I need use seq to fill in gaps but I am clueless as to how
# Once I have the proper seq I can just use na.locf to fill the appropriate values
# HOWEVER seq(start(sec1_dates), end(sec1_dates), "min") would end up returning
# every minute for each day, and I only want 08:30 to 15:30. PROBLEM 2

# The merge function can combine two zoo objects, in union
# Obviously this fails because the two index sizes don't match PROBLEM 3
#
t.zoo <- merge(a, b, all=TRUE)

James,您对问题 1 的看法是正确的。谢谢。我验证了 csv 文件两次提取数据并删除数据修复了问题。我也将您的解决方案用于问题 2,但我不确定这是做我想做的事情的最有效方法。最终我可能想用它来运行回归,此时可能需要某种循环来提取任意数量的数据集。非常感谢我可能进行的任何优化。

更新的解决方案

library(zoo)
library(tseries)

# Read the CSV files into data frames
sec1 <- read.csv("C:\\exportdata\\sec1.csv", stringsAsFactors=F, header=F)
sec2 <- read.csv("C:\\exportdata\\sec2.csv", stringsAsFactors=F, header=F)

# The first column contains dates.  
# I use strptime to tell it what format these appear in.
sec1_dates <- strptime(sec1[,1], "%m/%d/%Y %H:%M:%S")
sec2_dates <- strptime(sec2[,1], "%m/%d/%Y %H:%M:%S")

# The second column contains the close prices for the securities.
# I use the zoo function to create zoo objects from that data.
# Input =  a vector of data and a vector of dates.
a <- zoo(sec1[,2], sec1_dates)
b <- zoo(sec2[,2], sec2_dates)

# create a discrete time-series with the exact time frame desired
# per tip from James
template <- zoo(NULL, seq(sec1_dates[1], tail(sec1_dates, 1), "min"))
template <- template[which(strftime(time(template),"%H:%M")>"08:30" & strftime(time(template),"%H:%M")<"15:00")]

# The merge function is then used to merge
# 1) each security to the template (uses the discrete date/time range)
# 2) remove the column of data from template (used only for dates)
# 3) each security to one another (this was the ultimate goal anyway.
a.zoo <- merge(a, template, all=TRUE)
a.zoo$template <- NULL
b.zoo <- merge(b, template, all=TRUE)
b.zoo$template <- NULL
t.zoo <- merge(a.zoo, b.zoo, all=TRUE)

# Fill all NA elements with the closest non NA value.
t <- na.locf(t.zoo)

【问题讨论】:

  • -1 请通过提供示例数据来澄清问题。使用dput 来做到这一点。展示你得到什么以及它与你想要的有什么不同。 “显然它失败了”一点也不明显。 merge.zoo 不需要匹配索引。

标签: r merge zoo seq


【解决方案1】:

问题 1

?zoo 有关于如何处理重复的详细信息,但这可能是因为strptime 创建的日期中有重复。

问题 2

您可以将[whichtimezoo 对象一起使用,参见?zoo,例如:

t.zoo[which(strftime(time(t.zoo),"%H:%M")>"08:30" & strftime(time(t.zoo),"%H:%M")<"15:30")]

问题 3

使用c 组合:t.zoo &lt;- c(a,b)

【讨论】:

  • 詹姆斯,非常感谢您的帮助!我使用了您对问题 1 和 2 的解决方案,并且我相当确定使用 c 来组合代替最终合并将提高我的代码处理速度。你有什么额外的修改建议吗? (注意:我确实更新了上面的代码)
猜你喜欢
  • 2011-08-16
  • 2015-04-18
  • 1970-01-01
  • 1970-01-01
  • 2020-09-15
  • 2012-12-04
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多