【问题标题】:Pandas - Update Columns Conditionally with np.where()Pandas - 使用 np.where() 有条件地更新列
【发布时间】:2018-02-21 10:13:12
【问题描述】:

我正在尝试在 Pandas 中创建一个交易回测器,但使用 np.where() 有条件地更新其他列的“if”语句存在一些问题。

我的初始 df,其中信号指示是否买入/卖出 (1/-1/0),根据这些信号,我想更新 Cash、Hold、Value 和 Total 列。

                        open         high        low        close   change  signal  Cash  Hold Value Total 
time                                        
2017-09-09 03:01:00 4255.000000 4256.799805 4233.600098 4252.799805 -0.000065   0   10000.0 0.0 0.0 10000.0
2017-09-09 03:02:00 4251.399902 4258.500000 4247.500000 4258.399902 0.002046    1   10000.0 0.0 0.0 10000.0
2017-09-09 03:03:00 4256.500000 4289.299805 4256.500000 4273.700195 0.001262    1   10000.0 0.0 0.0 10000.0
2017-09-09 03:04:00 4273.100098 4299.899902 4262.580566 4284.100098 0.001905    1   10000.0 0.0 0.0 10000.0
2017-09-09 03:05:00 4291.200195 4299.799805 4284.200195 4289.899902 -0.000854   -1  10000.0 0.0 0.0 10000.0
2017-09-09 03:06:00 4295.000000 4298.799805 4279.500000 4279.500000 -0.000047   0   10000.0 0.0 0.0 10000.0
2017-09-09 03:07:00 4278.000000 4278.299805 4277.000000 4277.799805 -0.000244   0   10000.0 0.0 0.0 10000.0

我可以通过根据信号手动调用以下每个函数来做到这一点:

def buy_update(i=i):
    pf['Cash'].iloc[i] = pf['Cash'].iloc[i-1] - trade_size
    pf['Holdings'].iloc[i] = pf['Holdings'].iloc[i-1] + (trade_size / pf['close'].iloc[i])
    pf['Holdings Value'].iloc[i] = pf['close'].iloc[i] * pf['Holdings'].iloc[i] # Update Values
    pf['Total Holding'].iloc[i] = pf['Cash'].iloc[i] + pf['Holdings Value'].iloc[i] # Update Values

def sell_update(i=i):
    pf['Cash'].iloc[i] = (pf['Cash'].iloc[i-1] + (pf['Holdings'].iloc[i-1] * pf['close'].iloc[i])) # get cash for sale
    pf['Holdings'].iloc[i] = 0 # Sell down all assets
    pf['Holdings Value'].iloc[i] = pf['close'].iloc[i] * pf['Holdings'].iloc[i] # Update Values
    pf['Total Holding'].iloc[i] = pf['Cash'].iloc[i] + pf['Holdings Value'].iloc[i] # Update Value

def no_action(i=i):
    pf['Cash'].iloc[i] = pf['Cash'].iloc[i-1]
    pf['Holdings'].iloc[i] = pf['Holdings'].iloc[i-1]
    pf['Holdings Value'].iloc[i] = pf['close'].iloc[i] * pf['Holdings'].iloc[i] # Update Values
    pf['Total Holding'].iloc[i] = pf['Cash'].iloc[i] + pf['Holdings Value'].iloc[i] # Update Values

然后产生这个:

                        open         high        low        close   change  signal  Cash        Hold       Value      Total 
time                                                        
2017-09-09 03:01:00 4255.000000 4256.799805 4233.600098 4252.799805 -0.000065   0   10000.00000 0.000000    0.000000    10000.000000
2017-09-09 03:02:00 4251.399902 4258.500000 4247.500000 4258.399902 0.002046    1   9900.00000  0.023483    100.000000  10000.000000
2017-09-09 03:03:00 4256.500000 4289.299805 4256.500000 4273.700195 0.001262    1   9800.00000  0.046882    200.359297  10000.359297
2017-09-09 03:04:00 4273.100098 4299.899902 4262.580566 4284.100098 0.001905    1   9700.00000  0.070224    300.846864  10000.846864
2017-09-09 03:05:00 4291.200195 4299.799805 4284.200195 4289.899902 -0.000854   -1  10001.25415 0.000000    0.000000    10001.254150
2017-09-09 03:06:00 4295.000000 4298.799805 4279.500000 4279.500000 -0.000047   0   10001.25415 0.000000    0.000000    10001.254150
2017-09-09 03:07:00 4278.000000 4278.299805 4277.000000 4277.799805 -0.000244   0   10001.25415 0.000000    0.000000    10001.254150

我认为嵌套的 np.where() 可以根据信号列调用正确的函数,但我没有任何运气。下面循环遍历每一行。

for i in range(len(pf)):
    np.where(pf['signal'].iloc[i] == -1, sell_update(i), np.where(pf['signal'].iloc[i] == 1, buy_update(i), no_action(i)))
    print(i)

我认为它目前调用每个函数 - 卖出,然后买入,然后没有(每个都覆盖最后一个)以及产生 SettingWithCopyWarning 警告。

此外,每一行的 for 循环显然非常慢,有没有办法对其进行矢量化?

【问题讨论】:

    标签: python pandas vectorization


    【解决方案1】:

    当计算代码变得复杂时,很难对其进行向量化。由于 pandas 中逐个元素的处理速度很慢,您可以将数据帧转换为 dict 列表,然后进行计算,这里是一个使用 cytoolz 的示例:

    import io
    import pandas as pd
    
    text="""time                        open         high        low        close   change  signal  Cash  Hold Value Total 
    2017-09-09 03:01:00 4255.000000 4256.799805 4233.600098 4252.799805 -0.000065   0   10000.0 0.0 0.0 10000.0
    2017-09-09 03:02:00 4251.399902 4258.500000 4247.500000 4258.399902 0.002046    1   10000.0 0.0 0.0 10000.0
    2017-09-09 03:03:00 4256.500000 4289.299805 4256.500000 4273.700195 0.001262    1   10000.0 0.0 0.0 10000.0
    2017-09-09 03:04:00 4273.100098 4299.899902 4262.580566 4284.100098 0.001905    1   10000.0 0.0 0.0 10000.0
    2017-09-09 03:05:00 4291.200195 4299.799805 4284.200195 4289.899902 -0.000854   -1  10000.0 0.0 0.0 10000.0
    2017-09-09 03:06:00 4295.000000 4298.799805 4279.500000 4279.500000 -0.000047   0   10000.0 0.0 0.0 10000.0
    2017-09-09 03:07:00 4278.000000 4278.299805 4277.000000 4277.799805 -0.000244   0   10000.0 0.0 0.0 10000.0"""
    df = pd.read_csv(io.StringIO(text), delim_whitespace=True)
    trade_size = 100
    
    import cytoolz
    
    def f(p, c):
        change = c["signal"]   
        if change == 0:
            cash = c["Cash"]
            hold = c["Hold"]        
        elif change == 1:
            cash = p["Cash"] - trade_size
            hold = p["Hold"] + trade_size / c["close"]
        elif change == -1:
            cash = p["Cash"] + p["Hold"] * c["close"]
            hold = 0
        return cytoolz.merge(c, {"Cash":cash, "Hold":hold})
    
    pd.DataFrame(list(cytoolz.accumulate(f, df.to_dict("records"))))
    

    【讨论】:

    • 谢谢,相比之下速度快得惊人!我仍在纠结到底发生了什么。为什么 (p,c) 适用于当前值和以前的值?我还添加了更多内容,并希望返回其他列,例如return cytoolz.merge(c, {"Cash":cash, "Holdings":hold, "Holdings Value":holdings_value, "Total Holding":total_holding}),这会导致错误local variable 'holdings_value' referenced before assignment。知道为什么会这样吗?
    • 你需要在if elif中设置total_holdingholding_value变量
    • 是的,我做到了,sn-p 下面。我应该提出一个新问题吗? def f(p, c): change = c["signal"] if change == 0: cash = p["Cash"] hold = p["Holdings"] holdings_value = p["Holdings Value"] total_holding = p["Total Holding"]
    猜你喜欢
    • 2016-08-04
    • 1970-01-01
    • 2013-08-14
    • 1970-01-01
    • 2017-02-02
    • 1970-01-01
    • 2016-10-24
    • 2017-11-18
    • 2020-12-29
    相关资源
    最近更新 更多