【问题标题】:Adding Column To Each Quantmod Symbol为每个 Quantmod 符号添加列
【发布时间】:2017-02-26 02:09:35
【问题描述】:

正在学习 R,不知道如何解决。

library(quantmod)
library(xts)

# get market data
Nasdaq100_Symbols <- c("AAPL", "AAL")
getSymbols(Nasdaq100_Symbols)

# merge them together
nasdaq100 <- data.frame(as.xts(merge(AAPL, AAL)))
#tail(nasdaq100[,1:12],2)

#make percent difference column
nasdaq100$PD <- (((nasdaq100$AAPL.High - nasdaq100$AAPL.Open)/nasdaq100$AAPL.Open) * 100)

我正在尝试添加百分比差异列,但上述代码仅适用于 AAPL 符号(或您使用的任何符号),而不会为每个符号创建 PD 列。

在与 xts 合并之前是否必须以某种方式添加该列,或者我可以告诉 R 为新合并帧中的每个符号创建它吗?

编辑:我正在做数据训练,所以我需要所有符号都是标题,例如:

           AAPL.Ope AAPL.High AAPL.Volume AAL.Open AAL.High

1/3/2007    86.29   86.58      309579900    53.89   56.92
1/4/2007    84.05   85.95      211815100    56.3    59.15
1/5/2007    85.77   86.2       208685400    58.83   59.15

【问题讨论】:

    标签: r xts quantmod


    【解决方案1】:

    根据我的经验,将您的财务数据保留为 xts 对象通常更有意义,以便将来使用其他技术指标等进行操作,除非您计划在 caret 中运行预测模型,在这种情况下转换data.frame 可能有意义。

    考虑将数据符号保留为容器的元素,例如

    update_sym_md <- function(sym, env = .GlobalEnv) {
        x <- get(sym, env)
        pd <- setNames((Hi(x) - Op(x)) / Op(x), "PD")
        merge(x, pd)
    }
    
    # Adjust env for location of xts symbol data
    l.syms <- lapply(Nasdaq100_Symbols, update_sym_md, env = .GlobalEnv)
    
    lapply(l.syms, head)
    # [[1]]
    # AAPL.Open AAPL.High AAPL.Low AAPL.Close AAPL.Volume AAPL.Adjusted          PD
    # 2007-01-03     86.29     86.58    81.90      83.80   309579900      10.85709 0.003360760
    # 2007-01-04     84.05     85.95    83.82      85.66   211815100      11.09807 0.022605556
    # 2007-01-05     85.77     86.20    84.40      85.05   208685400      11.01904 0.005013373
    # 2007-01-08     85.96     86.53    85.28      85.47   199276700      11.07345 0.006630991
    # 2007-01-09     86.45     92.98    85.15      92.57   837324600      11.99333 0.075534942
    # 2007-01-10     94.75     97.80    93.45      97.00   738220000      12.56728 0.032190006
    # 
    # [[2]]
    # AAL.Open AAL.High AAL.Low AAL.Close AAL.Volume AAL.Adjusted           PD
    # 2007-01-03    53.89    56.92   53.89     56.30    2955600     54.80361 0.0562256273
    # 2007-01-04    56.30    59.15   53.65     58.84    2614500     57.27610 0.0506217238
    # 2007-01-05    58.83    59.15   57.90     58.29    1656300     56.74072 0.0054394015
    # 2007-01-08    57.30    60.48   57.04     57.93    2163200     56.39028 0.0554974006
    # 2007-01-09    59.44    60.20   57.56     57.90    2098600     56.36108 0.0127860366
    # 2007-01-10    60.03    60.04   57.34     58.93    3892200     57.36371 0.0001666167
    

    此外,如果您确实想在一个 xts 对象中而不是在 data.frame 中比较符号之间的价格回报/原始价格,您可能会发现 qmao 包很有用。

    例如:

    install.packages("qmao", repos="http://R-Forge.R-project.org", type = "source")
    library(qmao)
    
    pf <- makePriceFrame(Nasdaq100_Symbols)
    head(pf, 3)
    #               AAPL      AAL
    # 2007-01-03 10.85709 54.80361
    # 2007-01-04 11.09807 57.27610
    # 2007-01-05 11.01904 56.74072
    rf <- makeReturnFrame(Nasdaq100_Symbols)
    head(rf)
    
    #                  AAPL           AAL
    # 2007-01-03           NA            NA
    # 2007-01-04  0.021952895  0.0441273684
    # 2007-01-05 -0.007146715 -0.0093913155
    # 2007-01-08  0.004926208 -0.0061951917
    # 2007-01-09  0.079799692 -0.0005179716
    # 2007-01-10  0.046745798  0.0176329011
    

    根据 OP 的评论进行更新:

    要将所有数据合并为一行,请尝试以下操作:

    (顺便说一句:如果您要在此 data.frame 上使用非线性预测模型,请确保首先考虑在每一行的证券之间扩展您的数据点。)

    x.cbind <- do.call(cbind, l.syms)
    head(x.cbind)
    # AAPL.Open AAPL.High AAPL.Low AAPL.Close AAPL.Volume AAPL.Adjusted          PD AAL.Open AAL.High AAL.Low AAL.Close AAL.Volume AAL.Adjusted         PD.1
    # 2007-01-03     86.29     86.58    81.90      83.80   309579900      10.85709 0.003360760    53.89    56.92   53.89     56.30    2955600     54.80361 0.0562256273
    # 2007-01-04     84.05     85.95    83.82      85.66   211815100      11.09807 0.022605556    56.30    59.15   53.65     58.84    2614500     57.27610 0.0506217238
    # 2007-01-05     85.77     86.20    84.40      85.05   208685400      11.01904 0.005013373    58.83    59.15   57.90     58.29    1656300     56.74072 0.0054394015
    # 2007-01-08     85.96     86.53    85.28      85.47   199276700      11.07345 0.006630991    57.30    60.48   57.04     57.93    2163200     56.39028 0.0554974006
    # 2007-01-09     86.45     92.98    85.15      92.57   837324600      11.99333 0.075534942    59.44    60.20   57.56     57.90    2098600     56.36108 0.0127860366
    # 2007-01-10     94.75     97.80    93.45      97.00   738220000      12.56728 0.032190006    60.03    60.04   57.34     58.93    3892200     57.36371 0.0001666167
    
    df.cbind <- data.frame("time" = index(x.cbind), coredata(x.cbind))
    head(df.cbind)
    # time AAPL.Open AAPL.High AAPL.Low AAPL.Close AAPL.Volume AAPL.Adjusted          PD AAL.Open AAL.High AAL.Low AAL.Close AAL.Volume AAL.Adjusted         PD.1
    # 1 2007-01-03     86.29     86.58    81.90      83.80   309579900      10.85709 0.003360760    53.89    56.92   53.89     56.30    2955600     54.80361 0.0562256273
    # 2 2007-01-04     84.05     85.95    83.82      85.66   211815100      11.09807 0.022605556    56.30    59.15   53.65     58.84    2614500     57.27610 0.0506217238
    # 3 2007-01-05     85.77     86.20    84.40      85.05   208685400      11.01904 0.005013373    58.83    59.15   57.90     58.29    1656300     56.74072 0.0054394015
    # 4 2007-01-08     85.96     86.53    85.28      85.47   199276700      11.07345 0.006630991    57.30    60.48   57.04     57.93    2163200     56.39028 0.0554974006
    # 5 2007-01-09     86.45     92.98    85.15      92.57   837324600      11.99333 0.075534942    59.44    60.20   57.56     57.90    2098600     56.36108 0.0127860366
    # 6 2007-01-10     94.75     97.80    93.45      97.00   738220000      12.56728 0.032190006    60.03    60.04   57.34     58.93    3892200     57.36371 0.0001666167
    

    为了更好地理解 qmao 函数的工作原理,为什么不查看文档中的示例并从那里开始呢? ?makeReturnFrame查看源代码以真正了解发生了什么(通过学习良好的编码风格同时成为更好的 R 程序员)

    【讨论】:

    • 我正在做数据训练,所以我确实希望所有符号都在顶部。无法真正弄清楚qmao是如何工作的,我只能让它拉出那个数据字段。
    • @Alteredorange 当您希望所有符号位于顶部时,我认为您的意思是您希望数据框每一列中的证券数据,因此每一行都是使用交叉的预测模型中的一个观察值- 部分证券价格数据。我已经按答案进行了编辑,以显示一种方法。
    • 这似乎正是我所需要的!还有一个问题,有没有办法将列命名为“symbol.PD”,所以它们将是 AAPL.PD 和 AAL.PD 而不是 PD 和 PD.1?
    • 不确定这是否正确,但我使用了:actsym &lt;- paste(sym,"pd") pd &lt;- setNames(((Hi(x) - Op(x)) / Op(x))*100, actsym),它似乎不起作用!
    • @Alteredorange 是的,效果很好。或者,更简洁的pd &lt;- setNames((Hi(x) - Op(x)) / Op(x), paste0(sym, ".PD"))
    【解决方案2】:

    我想添加一个使用全新且有趣的包tidyquant 的解决方案,它非常适合此类任务。你可以使用 tidyverse 中的所有东西,最重要的是它利用了 xts、quantmod 和 TTR 的量化能力!看看vignette 有很多例子。

    library(tidyquant)
    c("AAPL", "AAL") %>%
      tq_get(get = "stock.prices") %>%
      group_by(symbol.x) %>%
      tq_mutate(ohlc_fun = OHLCV, mutate_fun = OpCl,col_rename='diff') %>% 
      select(- c(low,volume)) # I deselect low and volume to show the added colum ‘diff'
    
    Source: local data frame [5,110 x 7]
    Groups: symbol.x [2]
    
       symbol.x       date  open  high close adjusted         diff
          <chr>     <date> <dbl> <dbl> <dbl>    <dbl>        <dbl>
    1      AAPL 2007-01-03 86.29 86.58 83.80 10.85709 0.0033607603
    2      AAPL 2007-01-04 84.05 85.95 85.66 11.09807 0.0226055559
    3      AAPL 2007-01-05 85.77 86.20 85.05 11.01904 0.0050133730
    4      AAPL 2007-01-08 85.96 86.53 85.47 11.07345 0.0066309913
    5      AAPL 2007-01-09 86.45 92.98 92.57 11.99333 0.0755349424
    

    更新:询问是否在列名中添加符号

    假设您将上述数据框保存到变量stocks

    lapply(unique(stocks$symbol.x), function(x) stocks[stocks$symbol.x == x,]) %>% 
    lapply( function(x) {
      names(x) <- paste0(x$symbol.x[1],'.',colnames(x))
      x
    }) 
    [[1]]
    Source: local data frame [2,555 x 7]
    Groups: symbol.x [1]
    
       AAPL.symbol.x  AAPL.date AAPL.open AAPL.high AAPL.close
               <chr>     <date>     <dbl>     <dbl>      <dbl>
    1           AAPL 2007-01-03     86.29     86.58      83.80
    2           AAPL 2007-01-04     84.05     85.95      85.66
    3           AAPL 2007-01-05     85.77     86.20      85.05
    4           AAPL 2007-01-08     85.96     86.53      85.47
    5           AAPL 2007-01-09     86.45     92.98      92.57
    6           AAPL 2007-01-10     94.75     97.80      97.00
    7           AAPL 2007-01-11     95.94     96.78      95.80
    8           AAPL 2007-01-12     94.59     95.06      94.62
    9           AAPL 2007-01-16     95.68     97.25      97.10
    10          AAPL 2007-01-17     97.56     97.60      94.95
    # ... with 2,545 more rows, and 2 more variables:
    #   AAPL.adjusted <dbl>, AAPL.diff <dbl>
    
    [[2]]
    Source: local data frame [2,555 x 7]
    Groups: symbol.x [1]
    
       AAL.symbol.x   AAL.date AAL.open AAL.high AAL.close
              <chr>     <date>    <dbl>    <dbl>     <dbl>
    1           AAL 2007-01-03    53.89    56.92     56.30
    2           AAL 2007-01-04    56.30    59.15     58.84
    3           AAL 2007-01-05    58.83    59.15     58.29
    4           AAL 2007-01-08    57.30    60.48     57.93
    5           AAL 2007-01-09    59.44    60.20     57.90
    6           AAL 2007-01-10    60.03    60.04     58.93
    7           AAL 2007-01-11    59.18    61.20     61.20
    8           AAL 2007-01-12    61.20    62.50     60.81
    9           AAL 2007-01-16    60.81    62.10     61.96
    10          AAL 2007-01-17    60.96    61.89     58.65
    # ... with 2,545 more rows, and 2 more variables:
    #   AAL.adjusted <dbl>, AAL.diff <dbl>
    

    【讨论】:

    • 好的,我得到了这个工作,甚至将 tibble 变异为和 xts,但我正在进行数据训练,所以我需要所有符号作为标题。 tidyquant 能做到吗? AAL.Volume AAL.Adjusted 2007-01-03 2955600 54.803612007-01-04 2614500 57.276102007-01-05 1656300 56.740722007-01-08 2163200 56.390282007-01-09 2098600 56.361082007-01-10 3892200 57.36371
    • @Alteredorange,查看更新的解决方案。这就是您所说的“标题中的符号”吗?
    • 请注意tidyquant 包已更新。列名“symbol.x”现在将是“symbol”。这在 v0.4.0 中已更改。
    【解决方案3】:

    当我使用quantmods getSymbols 函数时,我通常会编写自己的小包装函数,类似这样:

    library(quantmod)
    # 1. write the wrapper function
    my_wrapper <- function(tickers, from, to) {
      result_list <- lapply(tickers, function(ticker) {
        tmp <- getSymbols(ticker, from = from, to = to, auto.assign = F)
        tmp_df <- data.frame(date = index(tmp), 
                             ticker = ticker,
                             open = as.numeric(Op(tmp)), 
                             high = as.numeric(Hi(tmp)), 
                             close = as.numeric(Cl(tmp)),
                             adj = as.numeric(Ad(tmp)))
      })
    
      result_df <- do.call(rbind, result_list)
      return(result_df)
    }
    
    # 2. download and inspect the data
    my_df <- my_wrapper(c("AAPL", "AAL"), from = "2010-01-01", to = "2016-12-31")
    summary(my_df)
    # date             ticker          open             high            close             adj         
    # Min.   :2010-01-04   AAPL:1762   Min.   :  3.99   Min.   :  4.06   Min.   :  4.00   Min.   :  3.894  
    # 1st Qu.:2011-09-30   AAL :1762   1st Qu.: 17.16   1st Qu.: 17.46   1st Qu.: 17.23   1st Qu.: 16.770  
    # Median :2013-07-04               Median : 72.94   Median : 73.45   Median : 73.02   Median : 41.920  
    # Mean   :2013-07-02               Mean   :168.48   Mean   :170.10   Mean   :168.40   Mean   : 49.208  
    # 3rd Qu.:2015-04-06               3rd Qu.:318.11   3rd Qu.:320.39   3rd Qu.:318.23   3rd Qu.: 72.969  
    # Max.   :2016-12-30               Max.   :702.41   Max.   :705.07   Max.   :702.10   Max.   :127.966  
    

    然后,为了计算差异,我建议使用dplyrdata.table 或其他一些数据框操作包。这里我使用 dplyr。

    # 3. Calculate the difference using dplyr
    library(dplyr)
    
    my_rets <- my_df %>% group_by(ticker) %>% mutate(pd = (high - open) / open)
    
    my_rets
    # Source: local data frame [3,524 x 7]
    # Groups: ticker [2]
    # 
    #         date ticker   open   high  close      adj           pd
    #        <date> <fctr>  <dbl>  <dbl>  <dbl>    <dbl>        <dbl>
    # 1  2010-01-04   AAPL 213.43 214.50 214.01 27.72704 0.0050133440
    # 2  2010-01-05   AAPL 214.60 215.59 214.38 27.77498 0.0046132153
    # 3  2010-01-06   AAPL 214.38 215.23 210.97 27.33318 0.0039649549
    # 4  2010-01-07   AAPL 211.75 212.00 210.58 27.28265 0.0011806659
    # 5  2010-01-08   AAPL 210.30 212.00 211.98 27.46403 0.0080837473
    # 6  2010-01-11   AAPL 212.80 213.00 210.11 27.22176 0.0009398731
    # 7  2010-01-12   AAPL 209.19 209.77 207.72 26.91211 0.0027725991
    # 8  2010-01-13   AAPL 207.87 210.93 210.65 27.29172 0.0147206905
    # 9  2010-01-14   AAPL 210.11 210.46 209.43 27.13366 0.0016657655
    # 10 2010-01-15   AAPL 210.93 211.60 205.93 26.68020 0.0031764188
    # # ... with 3,514 more rows
    

    附:你在这里找到一个很好的 dplyr 介绍:https://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html

    【讨论】:

      猜你喜欢
      • 2012-08-27
      • 1970-01-01
      • 2011-10-08
      • 2013-03-09
      • 2022-01-22
      • 1970-01-01
      • 1970-01-01
      • 2014-02-19
      • 2017-12-17
      相关资源
      最近更新 更多