看看rol的值,都是NaN的——
rol = stock.rolling(50).mean()
rol
Out:
value
Date
2002-05-23 NaN
2002-05-24 NaN
2002-05-28 NaN
2002-05-29 NaN
2002-05-30 NaN
2002-05-31 NaN
2002-06-03 NaN
2002-06-04 NaN
2002-06-05 NaN
2002-06-06 NaN
2002-06-07 NaN
2002-06-10 NaN
2002-06-11 NaN
2002-06-12 NaN
2002-06-13 NaN
2002-06-14 NaN
2002-06-17 NaN
2002-06-18 NaN
2002-06-19 NaN
2002-06-20 NaN
2002-06-21 NaN
2002-06-24 NaN
2002-06-25 NaN
2002-06-26 NaN
2002-06-27 NaN
2002-06-28 NaN
2002-07-01 NaN
2002-07-02 NaN
2002-07-03 NaN
2002-07-05 NaN
当您进行滚动时,它会使用一个大小为 50 的窗口来捕获值。默认情况下,在边缘窗口捕获的值少于所需的值,并用 NaN's/ 填充它。在您的情况下,窗口的大小远大于 DataFrame 的大小 - 因此,所有值都设置为 NaN /p>
为了证明这个概念,看看较小的窗口大小:
rol = stock.rolling(20).mean()
print(rol)
Out:
value
Date
2002-05-23 NaN
2002-05-24 NaN
2002-05-28 NaN
2002-05-29 NaN
2002-05-30 NaN
2002-05-31 NaN
2002-06-03 NaN
2002-06-04 NaN
2002-06-05 NaN
2002-06-06 NaN
2002-06-07 NaN
2002-06-10 NaN
2002-06-11 NaN
2002-06-12 NaN
2002-06-13 NaN
2002-06-14 NaN
2002-06-17 NaN
2002-06-18 NaN
2002-06-19 NaN
2002-06-20 1.086143
2002-06-21 1.075286
2002-06-24 1.063714
2002-06-25 1.054071
2002-06-26 1.048321
2002-06-27 1.041929
2002-06-28 1.038071
2002-07-01 1.033036
2002-07-02 1.035786
2002-07-03 1.039143
2002-07-05 1.043857
- 第一个非 NaN 值恰好是第 20 位。
为避免这种行为,您可以为rolling 的min_period 参数提供一个值:
rol = stock.rolling(50, min_periods=1).mean()
print(rol)
Out:
value
Date
2002-05-23 1.196429
2002-05-24 1.203215
2002-05-28 1.187857
2002-05-29 1.166786
2002-05-30 1.147714
2002-05-31 1.135834
2002-06-03 1.134796
2002-06-04 1.132679
2002-06-05 1.134286
2002-06-06 1.139072
2002-06-07 1.137208
2002-06-10 1.138810
2002-06-11 1.139945
2002-06-12 1.136582
2002-06-13 1.133000
2002-06-14 1.123839
2002-06-17 1.111975
2002-06-18 1.100794
2002-06-19 1.092932
2002-06-20 1.086143
2002-06-21 1.081054
2002-06-24 1.076396
2002-06-25 1.071522
2002-06-26 1.068065
2002-06-27 1.063086
2002-06-28 1.060632
2002-07-01 1.059418
2002-07-02 1.063469
2002-07-03 1.068670
2002-07-05 1.075595
- 因此,如果元素数量小于窗口大小,则滚动使用提供的数量。
关于min_periods的文档:
min_periods : int,默认无
具有值所需的窗口中的最小观察次数
(否则结果为 NA)。对于由偏移量指定的窗口,
这将默认为 1。
在以下行中,您“松开”了最后三个将它们设置为 NaN 的值:
profitMade = ((stock.shift(-3) - stock)/stock)
profitMade
Out:
...
2002-07-01 1.276429
2002-07-02 NaN
2002-07-03 NaN
2002-07-05 NaN
- 所以,我想,你应该放弃它(可能我错了,因为我对这个特定的任务不太熟悉)。然后重新索引stock 和rol,因为您需要相同的大小来进行进一步的操作。
profitMade = profitMade.dropna()
stock = stock.loc[profitMade.index]
rol = rol.loc[profitMade.index]
好的,有三个大小相同的表。我更改了返回一个满是 NaN 的表的行
profitMade[(stock<stock.shift(-1))&(stock>rol)]
Out:
value
Date
2002-05-23 NaN
2002-05-24 NaN
2002-05-28 NaN
2002-05-29 NaN
2002-05-30 NaN
2002-05-31 NaN
2002-06-03 NaN
2002-06-04 NaN
2002-06-05 0.008095
2002-06-06 NaN
2002-06-07 NaN
2002-06-10 NaN
到
profitMade[(stock['value'] < stock['value'].shift(-1)) & (stock['value'] > rol['value'])]
Out:
value
Date
2002-06-05 0.008095
- 它处理特定列并删除 NaN。
另外,我不明白你在这里做什么:
profitMade[profitMade.pct_change()].mean()
- profitMade.pct_change() 返回一个包含 float 值(虚拟百分比)的表格,但 profitMade[...] 需要一个布尔对象 - 您应该澄清并编辑您的问题。
完整代码:
rol = stock.rolling(50, min_periods=1).mean()
profitMade = ((stock.shift(-3)-stock)/stock).dropna()
rol = rol.loc[profitMade.index]
stock = stock.loc[profitMade.index]
profitMade[(stock['value'] < stock['value'].shift(-1)) & (stock['value'] > rol['value'])]
Out:
value
Date
2002-06-05 0.008095