识别时间序列（字符）值的变化并标记相应数据集中新值的位置答案

【问题标题】：Identify changes in time series (character) values and flag location of new value in corresponding dataset识别时间序列（字符）值的变化并标记相应数据集中新值的位置
【发布时间】：2020-03-18 11:15:10
【问题描述】：

我正在使用带有字符输入的时间序列数据集（特别是未来的合同详细信息）。我想确定字符值与前一个可用日期不同的日期，并对于这些特定日期，确定哪些连续列具有该值。我在下面提供了一个示例数据集。我曾考虑在此 xts 对象上使用 lag()，但出现错误：

Error in `[.xts`(x, seq_len(xlen - n)) : subscript out of bounds

另外，我目前的方法有点蛮力，我想避免（特别是因为不同数据集对应的列数不同）。

目的：我有一个与字符时间序列格式相同的对应返回时间序列。通过识别此新角色（合同详细信息）所具有的相应列和日期位置 [新位置]，我想用新位置的回报替换该日期第一列中的现有回报。

共享字符时间序列tempContracts的样本dput输出：

structure(c("SPU19-USA", "SPU19-USA", "SPU19-USA", "SPU19-USA", 
"SPZ19-USA", "SPZ19-USA", "SPZ19-USA", "SPZ19-USA", "SPZ19-USA", 
"SPZ19-USA", "SPZ19-USA", "SPZ19-USA", "SPZ19-USA", "SPZ19-USA", 
"SPZ19-USA", "SPZ19-USA", "SPZ19-USA", "SPZ19-USA", "SPZ19-USA", 
"SPZ19-USA", "SPZ19-USA", "SPU19-USA", "SPU19-USA", "SPU19-USA", 
"SPU19-USA", "SPU19-USA", "SPU19-USA", "SPU19-USA", "SPU19-USA", 
"SPU19-USA", "SPU19-USA", "SPZ19-USA", "SPZ19-USA", "SPZ19-USA", 
"SPZ19-USA", "SPZ19-USA", "SPZ19-USA", "SPZ19-USA", "SPZ19-USA", 
"SPZ19-USA", "SPZ19-USA", "SPZ19-USA", "SPZ19-USA", "SPZ19-USA", 
"SPZ19-USA", "SPZ19-USA", "SPZ19-USA", "SPZ19-USA", "SPZ19-USA", 
"SPZ19-USA", "SPZ19-USA", "SPZ19-USA", "SPH20-USA", "SPH20-USA", 
"SPH20-USA", "SPH20-USA", "SPH20-USA", "SPH20-USA", "SPH20-USA", 
"SPH20-USA", "SPH20-USA", "SPH20-USA", "SPH20-USA"), class = c("xts", 
"zoo"), .indexCLASS = "Date", .indexTZ = "UTC", tclass = "Date", tzone = "UTC", index = structure(c(1567728000, 
1567987200, 1568073600, 1568160000, 1568246400, 1568332800, 1568592000, 
1568678400, 1568764800, 1568851200, 1568937600, 1569196800, 1569283200, 
1569369600, 1569456000, 1569542400, 1569801600, 1569888000, 1569974400, 
1570060800, 1570147200), tzone = "UTC", tclass = "Date"), .Dim = c(21L, 
3L), .Dimnames = list(NULL, c("SP00.USA", "SP.1.USA", "SP.2.USA"
)))

返回时间序列tempRI的样本dput输出：

structure(c(0.00295659400967452, -0.000872629691220261, 0.000100726912638294, 
0.00785891512466552, 0.00388982653805137, -0.00169370546773528, 
-0.00236269057182703, 0.00212999714436535, 0.000232693427461683, 
-0.000232693427461683, -0.00613601151530396, 0.00253900513908256, 
-0.00901586386319586, 0.00540587231028766, -0.001944091247152, 
-0.00561884290390235, 0.00494758931266404, -0.0137588161714284, 
-0.0196623961323645, 0.0107728408742762, 0.0133726493037134, 
0.00295659400967452, -0.000872629691220261, 0.000100726912638294, 
0.00785891512466552, 0.00325917345931082, -0.00179455699351827, 
-0.00243110601795671, 0.00209842695038276, 0.00029941614076634, 
-9.97954194730255e-05, -0.00550414199196148, 0.00253900513908256, 
-0.00901586386319586, 0.00540587231028766, -0.001944091247152, 
-0.00561884290390235, 0.00494758931266404, -0.0137588161714284, 
-0.0196623961323645, 0.0107728408742762, 0.0133726493037134, 
0.00298883607603528, -0.000805099003568621, 0.000134228188120922, 
0.00785444971143257, 0.0033236976786668, -0.00169370546773528, 
-0.00236269057182703, 0.00212999714436535, 0.000232693427461683, 
-0.000232693427461683, -0.00526667884051868, 0.00240344609471244, 
-0.00907650876598698, 0.00550263098282411, -0.00197611974611434, 
-0.00568212002020996, 0.00497781197372316, -0.0140212104959856, 
-0.019827124891334, 0.0105981832589173, 0.013540683361386), class = c("xts", 
"zoo"), .indexCLASS = "Date", .indexTZ = "UTC", tclass = "Date", tzone = "UTC", index = structure(c(1567728000, 
1567987200, 1568073600, 1568160000, 1568246400, 1568332800, 1568592000, 
1568678400, 1568764800, 1568851200, 1568937600, 1569196800, 1569283200, 
1569369600, 1569456000, 1569542400, 1569801600, 1569888000, 1569974400, 
1570060800, 1570147200), tzone = "UTC", tclass = "Date"), .Dim = c(21L, 
3L), .Dimnames = list(NULL, c("SP00.USA", "SP.1.USA", "SP.2.USA"
)))

预期输出 - 丢弃其余列 adjRI：

structure(c(0.00295659400967452, -0.000872629691220261, 0.000100726912638294, 
0.00785891512466552, 0.0033236976786668, -0.00169370546773528, 
-0.00236269057182703, 0.00212999714436535, 0.000232693427461683, 
-0.000232693427461683, -0.00613601151530396, 0.00253900513908256, 
-0.00901586386319586, 0.00540587231028766, -0.001944091247152, 
-0.00561884290390235, 0.00494758931266404, -0.0137588161714284, 
-0.0196623961323645, 0.0107728408742762, 0.0133726493037134), class = c("xts", 
"zoo"), .indexCLASS = "Date", .indexTZ = "UTC", tclass = "Date", tzone = "UTC", index = structure(c(1567728000, 
1567987200, 1568073600, 1568160000, 1568246400, 1568332800, 1568592000, 
1568678400, 1568764800, 1568851200, 1568937600, 1569196800, 1569283200, 
1569369600, 1569456000, 1569542400, 1569801600, 1569888000, 1569974400, 
1570060800, 1570147200), tzone = "UTC", tclass = "Date"), .Dim = c(21L, 
1L), .Dimnames = list(NULL, "SP00.USA"))

注意：2019-09-12 的价值变化

更新：对所需输出的快速评论：关注SP00.USA：

如果tempContracts 中的值与上一个日期报告的值相同，adjRI 将保持tempRI 中的默认返回值
日期，如2019-09-12，其中值与上一个日期不同，则：
1. 标记报告新合同的列和行（2019-09-12，SPZ19-USA 在列SP.2.USA，第 5 行）
2. 将SP00.USA 的日期返回替换为标记的列和行中提供的返回（例如：tempRI["2019-09-12"][,1] = tempRI["2019-09-12"][,3]）

非常感谢您对这个问题的任何帮助！

【问题讨论】：

嗨 Shriv，当我不明白 tempContracts 在做什么时，有点难以理解。查看您的预期输出，对于 tempRI 中的每一行，您似乎需要重复值？如果没有重复，你取最后一个值吗？
tempContract 是一个 xts 对象，它列出了每日交易的合约细节。 SP.1.USA 是相应日期的第一个（到期）合同详细信息，SP.2.USA 是第二个（到期）合同详细信息。第一列SP00.USA 是根据特定标准创建的人工列表。我有一个等效的价格数据集，它在各个日期的三列中列出了指定合约的相应价格。我要计算人工列表SP00.USA的返回值，和当前的tempRI <- diff(log(price object))。
为了获得SP00.USA 的准确回报指数，我必须用正确的“回报”替换SP00.USA 中合同发生变化的日期。举个具体的例子，合约在2019-09-12上从"SPU19-USA"更改为"SPZ19-USA"。tempRI中日期的返回应该从0.0038898265更改为0.0033236977（SP.2.USA）。
我的最终输出 (adjRI) 是 SP00.USA，在更正了由于合同名称更改而导致的日期的回报后，回报未与 price object 中的可比价格计算。我希望这有助于澄清上面提供的不同输出链
上传的 (dput) tempContracts 是否正确？如果您查看 tempContracts[5,]，全是“SPZ19-USA”

标签： r xts lag quantitative-finance

【解决方案1】：

是的，您可以使用延迟。像这样tempContracts[,"SP00.USA"]!=lag(tempContracts[,"SP00.USA"]) 进行延迟以识别切换的行。然后使用这个布尔索引，您可以替换 adjRI 中的值。见下文，我将其存储为测试，并与您提供的 adjRI 进行比较。

library(zoo)
library(xts)

test <- tempRI[,"SP00.USA",drop=FALSE]
toChange <- tempContracts[,"SP00.USA"]!=lag(tempContracts[,"SP00.USA"])
test[toChange,1] = tempRI[toChange,"SP.2.USA"]
identical(test,adjRI)

【讨论】：

【解决方案2】：

@StupidWolf - 感谢您的意见。很有帮助！

注意事项：

SP.2.USA

代码逐列检查以确定出现这种情况的位置，并在变量flagCol 中标记。对于这些日期，将第 1 列中的收益替换为新计算的收益（计算使用 tempPI，这是与 tempRI 格式相同的价格 xts 对象；cals 不重要，但我已经提供了）
李>

#identify dates in "SP00.USA" (col1) where there is change in value
flagRoll <- tempContracts[,1]!=lag(tempContracts[,1])
flagRoll[1] = FALSE #adjust for value on day 1 from NA

#for the dates in flagRoll, identify the column in which the contract appears       
#method: 
#1. look at one column at a time (k is set to >=3 for my specific case, but can be set to 
#   k>=2 for a more generic sample)
#2. for each col, identify locations where contracts are same (flagCol) within the set of 
#   flagRoll
#3. for those dates, replace return values as difference in log of correct prices from 
#   tempPI [for ref: tempPI is the same as tempRI but contains prices instead of log 
#   returns)
#4. after all incorrect returns are replaced, save tempRI as adjRI 

for (k in seq(3,ncol(tempContracts))){
       #identify reference col for the roll
       flagCol <- tempContracts[flagRoll,1] == lag(tempContracts[,k])[flagRoll] 
       #note that you will compare with contracts second to expire or later, which start
       #from cols to onwards 

       #replace returns from identified col in continuous time series
       tempRI[flagRoll,1][flagCol] = log(tempPI[flagRoll,1][flagCol]) - log(lag(tempPI[,k])[flagRoll][flagCol])

       rm(flagCol) #to allow it to be reset for next k
}

#replace column j in adjRI with revised roll returns
adjRI <- tempRI[,1]

#clear variables  to run for another sample   
rm(k)
rm(tempPI)
rm(tempRI)
rm(tempContracts)
rm(flagRoll)

【讨论】：