【问题标题】:Identify changes in time series (character) values and flag location of new value in corresponding dataset识别时间序列(字符)值的变化并标记相应数据集中新值的位置
【发布时间】:2020-03-18 11:15:10
【问题描述】:

我正在使用带有字符输入的时间序列数据集(特别是未来的合同详细信息)。我想确定字符值与前一个可用日期不同的日期,并对于这些特定日期,确定哪些连续列具有该值。我在下面提供了一个示例数据集。我曾考虑在此 xts 对象上使用 lag(),但出现错误:

Error in `[.xts`(x, seq_len(xlen - n)) : subscript out of bounds

另外,我目前的方法有点蛮力,我想避免(特别是因为不同数据集对应的列数不同)。

目的:我有一个与字符时间序列格式相同的对应返回时间序列。通过识别此新角色(合同详细信息)所具有的相应列和日期位置 [新位置],我想用新位置的回报替换该日期第一列中的现有回报。

共享字符时间序列tempContracts的样本dput输出:

structure(c("SPU19-USA", "SPU19-USA", "SPU19-USA", "SPU19-USA", 
"SPZ19-USA", "SPZ19-USA", "SPZ19-USA", "SPZ19-USA", "SPZ19-USA", 
"SPZ19-USA", "SPZ19-USA", "SPZ19-USA", "SPZ19-USA", "SPZ19-USA", 
"SPZ19-USA", "SPZ19-USA", "SPZ19-USA", "SPZ19-USA", "SPZ19-USA", 
"SPZ19-USA", "SPZ19-USA", "SPU19-USA", "SPU19-USA", "SPU19-USA", 
"SPU19-USA", "SPU19-USA", "SPU19-USA", "SPU19-USA", "SPU19-USA", 
"SPU19-USA", "SPU19-USA", "SPZ19-USA", "SPZ19-USA", "SPZ19-USA", 
"SPZ19-USA", "SPZ19-USA", "SPZ19-USA", "SPZ19-USA", "SPZ19-USA", 
"SPZ19-USA", "SPZ19-USA", "SPZ19-USA", "SPZ19-USA", "SPZ19-USA", 
"SPZ19-USA", "SPZ19-USA", "SPZ19-USA", "SPZ19-USA", "SPZ19-USA", 
"SPZ19-USA", "SPZ19-USA", "SPZ19-USA", "SPH20-USA", "SPH20-USA", 
"SPH20-USA", "SPH20-USA", "SPH20-USA", "SPH20-USA", "SPH20-USA", 
"SPH20-USA", "SPH20-USA", "SPH20-USA", "SPH20-USA"), class = c("xts", 
"zoo"), .indexCLASS = "Date", .indexTZ = "UTC", tclass = "Date", tzone = "UTC", index = structure(c(1567728000, 
1567987200, 1568073600, 1568160000, 1568246400, 1568332800, 1568592000, 
1568678400, 1568764800, 1568851200, 1568937600, 1569196800, 1569283200, 
1569369600, 1569456000, 1569542400, 1569801600, 1569888000, 1569974400, 
1570060800, 1570147200), tzone = "UTC", tclass = "Date"), .Dim = c(21L, 
3L), .Dimnames = list(NULL, c("SP00.USA", "SP.1.USA", "SP.2.USA"
)))

返回时间序列tempRI的样本dput输出:

structure(c(0.00295659400967452, -0.000872629691220261, 0.000100726912638294, 
0.00785891512466552, 0.00388982653805137, -0.00169370546773528, 
-0.00236269057182703, 0.00212999714436535, 0.000232693427461683, 
-0.000232693427461683, -0.00613601151530396, 0.00253900513908256, 
-0.00901586386319586, 0.00540587231028766, -0.001944091247152, 
-0.00561884290390235, 0.00494758931266404, -0.0137588161714284, 
-0.0196623961323645, 0.0107728408742762, 0.0133726493037134, 
0.00295659400967452, -0.000872629691220261, 0.000100726912638294, 
0.00785891512466552, 0.00325917345931082, -0.00179455699351827, 
-0.00243110601795671, 0.00209842695038276, 0.00029941614076634, 
-9.97954194730255e-05, -0.00550414199196148, 0.00253900513908256, 
-0.00901586386319586, 0.00540587231028766, -0.001944091247152, 
-0.00561884290390235, 0.00494758931266404, -0.0137588161714284, 
-0.0196623961323645, 0.0107728408742762, 0.0133726493037134, 
0.00298883607603528, -0.000805099003568621, 0.000134228188120922, 
0.00785444971143257, 0.0033236976786668, -0.00169370546773528, 
-0.00236269057182703, 0.00212999714436535, 0.000232693427461683, 
-0.000232693427461683, -0.00526667884051868, 0.00240344609471244, 
-0.00907650876598698, 0.00550263098282411, -0.00197611974611434, 
-0.00568212002020996, 0.00497781197372316, -0.0140212104959856, 
-0.019827124891334, 0.0105981832589173, 0.013540683361386), class = c("xts", 
"zoo"), .indexCLASS = "Date", .indexTZ = "UTC", tclass = "Date", tzone = "UTC", index = structure(c(1567728000, 
1567987200, 1568073600, 1568160000, 1568246400, 1568332800, 1568592000, 
1568678400, 1568764800, 1568851200, 1568937600, 1569196800, 1569283200, 
1569369600, 1569456000, 1569542400, 1569801600, 1569888000, 1569974400, 
1570060800, 1570147200), tzone = "UTC", tclass = "Date"), .Dim = c(21L, 
3L), .Dimnames = list(NULL, c("SP00.USA", "SP.1.USA", "SP.2.USA"
)))

预期输出 - 丢弃其余列 adjRI

structure(c(0.00295659400967452, -0.000872629691220261, 0.000100726912638294, 
0.00785891512466552, 0.0033236976786668, -0.00169370546773528, 
-0.00236269057182703, 0.00212999714436535, 0.000232693427461683, 
-0.000232693427461683, -0.00613601151530396, 0.00253900513908256, 
-0.00901586386319586, 0.00540587231028766, -0.001944091247152, 
-0.00561884290390235, 0.00494758931266404, -0.0137588161714284, 
-0.0196623961323645, 0.0107728408742762, 0.0133726493037134), class = c("xts", 
"zoo"), .indexCLASS = "Date", .indexTZ = "UTC", tclass = "Date", tzone = "UTC", index = structure(c(1567728000, 
1567987200, 1568073600, 1568160000, 1568246400, 1568332800, 1568592000, 
1568678400, 1568764800, 1568851200, 1568937600, 1569196800, 1569283200, 
1569369600, 1569456000, 1569542400, 1569801600, 1569888000, 1569974400, 
1570060800, 1570147200), tzone = "UTC", tclass = "Date"), .Dim = c(21L, 
1L), .Dimnames = list(NULL, "SP00.USA"))

注意:2019-09-12 的价值变化

更新:对所需输出的快速评论:关注SP00.USA

  • 如果tempContracts 中的值与上一个日期报告的值相同,adjRI 将保持tempRI 中的默认返回值

  • 日期,如2019-09-12,其中值与上一个日期不同,则:

    1. 标记报告新合同的列和行(2019-09-12SPZ19-USA 在列SP.2.USA,第 5 行)

    2. SP00.USA 的日期返回替换为标记的列和行中提供的返回(例如:tempRI["2019-09-12"][,1] = tempRI["2019-09-12"][,3]

非常感谢您对这个问题的任何帮助!

【问题讨论】:

  • 嗨 Shriv,当我不明白 tempContracts 在做什么时,有点难以理解。查看您的预期输出,对于 tempRI 中的每一行,您似乎需要重复值?如果没有重复,你取最后一个值吗?
  • tempContract 是一个 xts 对象,它列出了每日交易的合约细节。 SP.1.USA 是相应日期的第一个(到期)合同详细信息,SP.2.USA 是第二个(到期)合同详细信息。第一列SP00.USA 是根据特定标准创建的人工列表。我有一个等效的价格数据集,它在各个日期的三列中列出了指定合约的相应价格。我要计算人工列表SP00.USA的返回值,和当前的tempRI <- diff(log(price object))
  • 为了获得SP00.USA 的准确回报指数,我必须用正确的“回报”替换SP00.USA 中合同发生变化的日期。举个具体的例子,合约在2019-09-12上从"SPU19-USA"更改为"SPZ19-USA"tempRI中日期的返回应该从0.0038898265更改为0.0033236977SP.2.USA)。
  • 我的最终输出 (adjRI) 是 SP00.USA,在更正了由于合同名称更改而导致的日期的回报后,回报未与 price object 中的可比价格计算。我希望这有助于澄清上面提供的不同输出链
  • 上传的 (dput) tempContracts 是否正确?如果您查看 tempContracts[5,],全是“SPZ19-USA”

标签: r xts lag quantitative-finance


【解决方案1】:

是的,您可以使用延迟。像这样tempContracts[,"SP00.USA"]!=lag(tempContracts[,"SP00.USA"]) 进行延迟以识别切换的行。然后使用这个布尔索引,您可以替换 adjRI 中的值。见下文,我将其存储为测试,并与您提供的 adjRI 进行比较。

library(zoo)
library(xts)

test <- tempRI[,"SP00.USA",drop=FALSE]
toChange <- tempContracts[,"SP00.USA"]!=lag(tempContracts[,"SP00.USA"])
test[toChange,1] = tempRI[toChange,"SP.2.USA"]
identical(test,adjRI)

【讨论】:

    【解决方案2】:

    @StupidWolf - 感谢您的意见。很有帮助!

    注意事项:

      1234563示例中只有SP.2.USA,但实际数据集有多个(并且不同)列数要查看)合同出现在上一个更改日期。
    1. 代码逐列检查以确定出现这种情况的位置,并在变量flagCol 中标记。对于这些日期,将第 1 列中的收益替换为新计算的收益(计算使用 tempPI,这是与 tempRI 格式相同的价格 xts 对象;cals 不重要,但我已经提供了)

      李>
    #identify dates in "SP00.USA" (col1) where there is change in value
    flagRoll <- tempContracts[,1]!=lag(tempContracts[,1])
    flagRoll[1] = FALSE #adjust for value on day 1 from NA
    
    #for the dates in flagRoll, identify the column in which the contract appears       
    #method: 
    #1. look at one column at a time (k is set to >=3 for my specific case, but can be set to 
    #   k>=2 for a more generic sample)
    #2. for each col, identify locations where contracts are same (flagCol) within the set of 
    #   flagRoll
    #3. for those dates, replace return values as difference in log of correct prices from 
    #   tempPI [for ref: tempPI is the same as tempRI but contains prices instead of log 
    #   returns)
    #4. after all incorrect returns are replaced, save tempRI as adjRI 
    
    for (k in seq(3,ncol(tempContracts))){
           #identify reference col for the roll
           flagCol <- tempContracts[flagRoll,1] == lag(tempContracts[,k])[flagRoll] 
           #note that you will compare with contracts second to expire or later, which start
           #from cols to onwards 
    
           #replace returns from identified col in continuous time series
           tempRI[flagRoll,1][flagCol] = log(tempPI[flagRoll,1][flagCol]) - log(lag(tempPI[,k])[flagRoll][flagCol])
    
           rm(flagCol) #to allow it to be reset for next k
    }
    
    #replace column j in adjRI with revised roll returns
    adjRI <- tempRI[,1]
    
    #clear variables  to run for another sample   
    rm(k)
    rm(tempPI)
    rm(tempRI)
    rm(tempContracts)
    rm(flagRoll)
    
    

    【讨论】:

      猜你喜欢
      • 2023-04-10
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2016-05-10
      • 2018-09-14
      • 1970-01-01
      • 2023-01-09
      相关资源
      最近更新 更多