【问题标题】:How to successively update each NaN value in a dataframe column如何连续更新数据框列中的每个 NaN 值
【发布时间】:2020-06-13 15:00:12
【问题描述】:
  • 我有以下数据框a,尺寸为 1762 行 × 9 列。在ema 列中,除第13 个元素外,其他所有元素均为NaNind 列包含对应行的索引。
a.head(20)
>>>
       date symbol       open      close        low       high      volume        ema  ind
 2010-01-04   YHOO  16.940001  17.100000  16.879999  17.200001  16587400.0        NaN    0
 2010-01-05   YHOO  17.219999  17.230000  17.000000  17.230000  11718100.0        NaN    1
 2010-01-06   YHOO  17.170000  17.170000  17.070000  17.299999  16422000.0        NaN    2
 2010-01-07   YHOO  16.809999  16.700001  16.570000  16.900000  31816300.0        NaN    3
 2010-01-08   YHOO  16.680000  16.700001  16.620001  16.760000  15470000.0        NaN    4
 2010-01-11   YHOO  16.770000  16.740000  16.480000  16.830000  16181900.0        NaN    5
 2010-01-12   YHOO  16.650000  16.680000  16.600000  16.860001  15672400.0        NaN    6
 2010-01-13   YHOO  16.879999  16.900000  16.650000  16.980000  16955600.0        NaN    7
 2010-01-14   YHOO  16.809999  17.120001  16.799999  17.230000  16715600.0        NaN    8
 2010-01-15   YHOO  17.250000  16.820000  16.750000  17.250000  18415000.0        NaN    9
 2010-01-19   YHOO  16.780001  16.750000  16.639999  16.959999  15182600.0        NaN   10
 2010-01-20   YHOO  16.650000  16.379999  16.250000  16.680000  14419500.0        NaN   11
 2010-01-21   YHOO  16.389999  16.200001  16.100000  16.580000  21858400.0  16.884166   12
 2010-01-22   YHOO  16.080000  15.880000  15.810000  16.209999  25132800.0        NaN   13
 2010-01-25   YHOO  16.070000  15.860000  15.740000  16.110001  19683700.0        NaN   14
 2010-01-26   YHOO  15.820000  15.990000  15.700000  16.170000  43979400.0        NaN   15
 2010-01-27   YHOO  16.459999  15.980000  15.770000  16.490000  41701000.0        NaN   16
 2010-01-28   YHOO  15.930000  15.440000  15.440000  15.960000  30159500.0        NaN   17
 2010-01-29   YHOO  15.510000  15.010000  14.900000  15.670000  39664600.0        NaN   18
 2010-02-01   YHOO  15.140000  15.050000  14.870000  15.300000  29865700.0        NaN   19
  • 对于ema 列中的所有元素,从第14 行开始(即ind 列中的值从第13 行开始),我想通过使用以下apply 函数将它们更改为0.84*(ema value in previous row) + 0.16*(value of 'open' in previous row)
a['ema']=a.apply(lambda x: (a.loc[x['ind']-1,'open']*0.16 + a.loc[x['ind']-1, 'ema']*0.84) if x['ind']>12 else x['ema'] ,axis=1)
  • 仅更新第 14 行元素,后续行保持为 NaN
a.head(20)
>>>
       date symbol       open      close        low       high      volume        ema  ind
 2010-01-04   YHOO  16.940001  17.100000  16.879999  17.200001  16587400.0        NaN    0
 2010-01-05   YHOO  17.219999  17.230000  17.000000  17.230000  11718100.0        NaN    1
 2010-01-06   YHOO  17.170000  17.170000  17.070000  17.299999  16422000.0        NaN    2
 2010-01-07   YHOO  16.809999  16.700001  16.570000  16.900000  31816300.0        NaN    3
 2010-01-08   YHOO  16.680000  16.700001  16.620001  16.760000  15470000.0        NaN    4
 2010-01-11   YHOO  16.770000  16.740000  16.480000  16.830000  16181900.0        NaN    5
 2010-01-12   YHOO  16.650000  16.680000  16.600000  16.860001  15672400.0        NaN    6
 2010-01-13   YHOO  16.879999  16.900000  16.650000  16.980000  16955600.0        NaN    7
 2010-01-14   YHOO  16.809999  17.120001  16.799999  17.230000  16715600.0        NaN    8
 2010-01-15   YHOO  17.250000  16.820000  16.750000  17.250000  18415000.0        NaN    9
 2010-01-19   YHOO  16.780001  16.750000  16.639999  16.959999  15182600.0        NaN   10
 2010-01-20   YHOO  16.650000  16.379999  16.250000  16.680000  14419500.0        NaN   11
 2010-01-21   YHOO  16.389999  16.200001  16.100000  16.580000  21858400.0  16.884166   12
 2010-01-22   YHOO  16.080000  15.880000  15.810000  16.209999  25132800.0  16.805099   13
 2010-01-25   YHOO  16.070000  15.860000  15.740000  16.110001  19683700.0        NaN   14
 2010-01-26   YHOO  15.820000  15.990000  15.700000  16.170000  43979400.0        NaN   15
 2010-01-27   YHOO  16.459999  15.980000  15.770000  16.490000  41701000.0        NaN   16
 2010-01-28   YHOO  15.930000  15.440000  15.440000  15.960000  30159500.0        NaN   17
 2010-01-29   YHOO  15.510000  15.010000  14.900000  15.670000  39664600.0        NaN   18
 2010-02-01   YHOO  15.140000  15.050000  14.870000  15.300000  29865700.0        NaN   19
  • 重复执行该命令,为后续行生成 ema 的正确值,一次一个。
  • 谁能帮忙告诉我这里出了什么问题?

【问题讨论】:

    标签: pandas dataframe lambda apply


    【解决方案1】:

    当前脚本问题

    • 如果x['ind']>12 else x['ema'] 低于ind 12 不会发生变化。
    • a.loc[x['ind']-1,'ema'] 您正在根据 openema 的先前值计算 ema
      • 一开始,ema 中只有一个值,所以只有下一行被填充。
      • 填充不会发生在适当的位置,因此其余值保持未填充状态,直到您再次运行脚本。
    • 当你用 NaN 计算一个值时,结果是 Nan

    apply

    • 更新全局变量
    import numpy as np
    import pandas as pd
    
    updated_ema = np.nan
    
    def test(x):
        global updated_ema
        if x['ind'] > 12:
            prev_ema = df.loc[x['ind']-1, 'ema']
            prev_open = df.loc[x['ind']-1, 'open'] * 0.16
            if not np.isnan(prev_ema):
                updated_ema = prev_open + prev_ema * 0.84
            else:
                updated_ema = prev_open + updated_ema * 0.84
            return updated_ema
        else:
            return x['ema']
    
    
    df.ema = df.apply(lambda x: test(x), axis=1)
    

    【讨论】:

    • 带有全局变量的应用在大约 168 毫秒内执行它,而循环解决方案需要 3.36 秒!!。非常感谢。
    【解决方案2】:

    问题是a.apply 正在完全计算新列,并且只有在最后您才分配结果。

    这意味着所有计算都将基于原始未更改的数据,这解释了为什么只有一行被更新。

    一种解决方案就是遍历行并一次更新一行单元格(顺便说一句,这种方法没有理由变慢)。

    【讨论】:

      猜你喜欢
      • 2018-12-20
      • 2023-03-17
      • 2023-01-10
      • 1970-01-01
      • 2018-07-30
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多