【问题标题】:Why is my code returning 'nan' using pandas while calculating a percentage?为什么我的代码在计算百分比时使用熊猫返回“nan”?
【发布时间】:2019-01-29 19:24:18
【问题描述】:

硬件问题:考虑这种投资策略:只要价格高于 50 天移动平均线就买入,然后在 3 个交易日后卖出。我们平均可以获得多少利润(以百分比计)?在交易日 x,如果 (1) 价格低于交易日 x-1 的移动平均线和 (2) 价格高于交易日的移动平均线,我们说价格“高于”50 天移动平均线第x天。

rol=stock.rolling(50).mean()

profitMade=((stock.shift(-3)-stock)/stock)

stock>rol

profitMade[(stock<stock.shift(-1))&(stock>rol)]

profitMade.pct_change()

profitMade[profitMade.pct_change()].mean()

最后一行返回 'nan' 需要一个值

样本数据:

Date
2002-05-23      1.196429
2002-05-24      1.210000
2002-05-28      1.157143
2002-05-29      1.103571
2002-05-30      1.071429
2002-05-31      1.076429
2002-06-03      1.128571
2002-06-04      1.117857
2002-06-05      1.147143
2002-06-06      1.182143
2002-06-07      1.118571
2002-06-10      1.156429
2002-06-11      1.153571
2002-06-12      1.092857
2002-06-13      1.082857
2002-06-14      0.986429
2002-06-17      0.922143
2002-06-18      0.910714
2002-06-19      0.951429
2002-06-20      0.957143
2002-06-21      0.979286
2002-06-24      0.978571
2002-06-25      0.964286
2002-06-26      0.988571
2002-06-27      0.943571
2002-06-28      0.999286
2002-07-01      1.027857
2002-07-02      1.172857
2002-07-03      1.214286
2002-07-05      1.276429

【问题讨论】:

  • 您能否提供数据样本?
  • 您在寻找百分比变化的平均值吗?我觉得应该是profitMade.pct_change().mean()
  • @kerwei 我刚刚尝试了建议解决方案并得到了相同的答案:'nan'?看起来很奇怪
  • @HenryWoody 刚刚也添加了我的示例数据。
  • 不清楚你打算在这里做什么:profitMade[profitMade.pct_change()].mean(),请澄清这一点

标签: python-3.x pandas


【解决方案1】:

看看rol的值,都是NaN的——

rol = stock.rolling(50).mean()
rol
Out:                                        
                               value                   
Date                                
2002-05-23                       NaN
2002-05-24                       NaN
2002-05-28                       NaN
2002-05-29                       NaN
2002-05-30                       NaN
2002-05-31                       NaN
2002-06-03                       NaN
2002-06-04                       NaN
2002-06-05                       NaN
2002-06-06                       NaN
2002-06-07                       NaN
2002-06-10                       NaN
2002-06-11                       NaN
2002-06-12                       NaN
2002-06-13                       NaN
2002-06-14                       NaN
2002-06-17                       NaN
2002-06-18                       NaN
2002-06-19                       NaN
2002-06-20                       NaN
2002-06-21                       NaN
2002-06-24                       NaN
2002-06-25                       NaN
2002-06-26                       NaN
2002-06-27                       NaN
2002-06-28                       NaN
2002-07-01                       NaN
2002-07-02                       NaN
2002-07-03                       NaN
2002-07-05                       NaN

当您进行滚动时,它会使用一个大小为 50 的窗口来捕获值。默认情况下,在边缘窗口捕获的值少于所需的值,并用 NaN's/ 填充它。在您的情况下,窗口的大小远大于 DataFrame 的大小 - 因此,所有值都设置为 NaN /p>

为了证明这个概念,看看较小的窗口大小:

rol = stock.rolling(20).mean()
print(rol)
Out:
            value                   
Date                                
2002-05-23                       NaN
2002-05-24                       NaN
2002-05-28                       NaN
2002-05-29                       NaN
2002-05-30                       NaN
2002-05-31                       NaN
2002-06-03                       NaN
2002-06-04                       NaN
2002-06-05                       NaN
2002-06-06                       NaN
2002-06-07                       NaN
2002-06-10                       NaN
2002-06-11                       NaN
2002-06-12                       NaN
2002-06-13                       NaN
2002-06-14                       NaN
2002-06-17                       NaN
2002-06-18                       NaN
2002-06-19                       NaN
2002-06-20                  1.086143
2002-06-21                  1.075286
2002-06-24                  1.063714
2002-06-25                  1.054071
2002-06-26                  1.048321
2002-06-27                  1.041929
2002-06-28                  1.038071
2002-07-01                  1.033036
2002-07-02                  1.035786
2002-07-03                  1.039143
2002-07-05                  1.043857

- 第一个非 NaN 值恰好是第 20 位。

为避免这种行为,您可以为rollingmin_period 参数提供一个值:

rol = stock.rolling(50, min_periods=1).mean()
print(rol)
Out:
            value                   
Date                                
2002-05-23                  1.196429
2002-05-24                  1.203215
2002-05-28                  1.187857
2002-05-29                  1.166786
2002-05-30                  1.147714
2002-05-31                  1.135834
2002-06-03                  1.134796
2002-06-04                  1.132679
2002-06-05                  1.134286
2002-06-06                  1.139072
2002-06-07                  1.137208
2002-06-10                  1.138810
2002-06-11                  1.139945
2002-06-12                  1.136582
2002-06-13                  1.133000
2002-06-14                  1.123839
2002-06-17                  1.111975
2002-06-18                  1.100794
2002-06-19                  1.092932
2002-06-20                  1.086143
2002-06-21                  1.081054
2002-06-24                  1.076396
2002-06-25                  1.071522
2002-06-26                  1.068065
2002-06-27                  1.063086
2002-06-28                  1.060632
2002-07-01                  1.059418
2002-07-02                  1.063469
2002-07-03                  1.068670
2002-07-05                  1.075595

- 因此,如果元素数量小于窗口大小,则滚动使用提供的数量。

关于min_periods的文档:

min_periods : int,默认无
具有值所需的窗口中的最小观察次数
(否则结果为 NA)。对于由偏移量指定的窗口,
这将默认为 1。

在以下行中,您“松开”了最后三个将它们设置为 NaN 的值:

profitMade = ((stock.shift(-3) - stock)/stock)
profitMade
Out:
...
2002-07-01  1.276429
2002-07-02  NaN
2002-07-03  NaN
2002-07-05  NaN

- 所以,我想,你应该放弃它(可能我错了,因为我对这个特定的任务不太熟悉)。然后重新索引stockrol,因为您需要相同的大小来进行进一步的操作。

profitMade  = profitMade.dropna()
stock = stock.loc[profitMade.index]
rol = rol.loc[profitMade.index]

好的,有三个大小相同的表。我更改了返回一个满是 NaN 的表的行

profitMade[(stock<stock.shift(-1))&(stock>rol)]
Out:
               value
Date                
2002-05-23       NaN
2002-05-24       NaN
2002-05-28       NaN
2002-05-29       NaN
2002-05-30       NaN
2002-05-31       NaN
2002-06-03       NaN
2002-06-04       NaN
2002-06-05  0.008095
2002-06-06       NaN
2002-06-07       NaN
2002-06-10       NaN

profitMade[(stock['value'] < stock['value'].shift(-1)) & (stock['value'] > rol['value'])]
Out:
    value
Date    
2002-06-05  0.008095

- 它处理特定列并删除 NaN。

另外,我不明白你在这里做什么:

profitMade[profitMade.pct_change()].mean()

- profitMade.pct_change() 返回一个包含 float 值(虚拟百分比)的表格,但 profitMade[...] 需要一个布尔对象 - 您应该澄清并编辑您的问题。

完整代码:

rol = stock.rolling(50, min_periods=1).mean()
profitMade = ((stock.shift(-3)-stock)/stock).dropna()
rol = rol.loc[profitMade.index]
stock = stock.loc[profitMade.index]

profitMade[(stock['value'] < stock['value'].shift(-1)) & (stock['value'] > rol['value'])]
Out:
               value
Date                
2002-06-05  0.008095

【讨论】:

    猜你喜欢
    • 2021-01-05
    • 2020-04-27
    • 2020-09-21
    • 2022-11-19
    • 2017-05-13
    • 2021-12-19
    • 2020-11-18
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多