优化函数python数据框答案

【问题标题】：optimize function python dataframe优化函数python数据框
【发布时间】：2022-01-17 08:20:59
【问题描述】：

我有这个超级趋势实现的 python 代码。我正在使用熊猫数据框。代码工作正常，但是随着数据帧长度的增加，supertrend 函数运行得越来越慢。我想知道是否可以在代码中更改任何内容以优化它并使其运行得更快，即使数据帧长度很大。

def trueRange(df):
    df['prevClose'] = df['close'].shift(1)
    df['high-low'] = df['high'] - df['low']
    df['high-pClose'] = abs(df['high'] - df['prevClose'])
    df['low-pClose'] = abs(df['low'] - df['prevClose'])
    tr = df[['high-low','high-pClose','low-pClose']].max(axis=1)
    
    return tr

def averageTrueRange(df, peroid=12):
    df['trueRange'] = trueRange(df)
    the_atr = df['trueRange'].rolling(peroid).mean()
    
    return the_atr
    

def superTrend(df, peroid=5, multipler=1.5):
    df['averageTrueRange'] = averageTrueRange(df, peroid=peroid)
    h2 = ((df['high'] + df['low']) / 2)
    df['Upperband'] = h2 + (multipler * df['averageTrueRange'])
    df['Lowerband'] = h2 - (multipler * df['averageTrueRange'])
    df['inUptrend'] = None

    for current in range(1,len(df.index)):
        prev = current- 1
        
        if df['close'][current] > df['Upperband'][prev]:
            df['inUptrend'][current] = True
            
        elif df['close'][current] < df['Lowerband'][prev]:
            df['inUptrend'][current] = False
        else:
            df['inUptrend'][current] = df['inUptrend'][prev]
            
            if df['inUptrend'][current] and df['Lowerband'][current] < df['Lowerband'][prev]:
                df['Lowerband'][current] = df['Lowerband'][prev]
                
            if not df['inUptrend'][current] and df['Upperband'][current] > df['Upperband'][prev]:
                df['Upperband'][current] = df['Upperband'][prev]

矢量版

def superTrend(df, peroid=5, multipler=1.5):
    df['averageTrueRange'] = averageTrueRange(df, peroid=peroid)
    h2 = ((df['high'] + df['low']) / 2)
    df['Upperband'] = h2 + (multipler * df['averageTrueRange'])
    df['Lowerband'] = h2 - (multipler * df['averageTrueRange'])
    df['inUptrend'] = None


    cond1 = df['close'].values[1:] > df['Upperband'].values[:-1]
    cond2 = df['close'].values[1:] < df['Lowerband'].values[:-1]

    df.loc[cond1, 'inUptrend'] = True
    df.loc[cond2, 'inUptrend'] = False

    df.loc[(~cond1) & (cond2), 'inUptrend'] = df['inUptrend'][:-1]
    df.loc[(~cond1) & (cond2) & (df['inUptrend'].values[1:] == True) & (df['Lowerband'].values[1:] < df['Lowerband'].values[:-1]), 'Lowerband'] = df['Lowerband'][:-1]
    df.loc[(~cond1) & (cond2) & (df['inUptrend'].values[1:] == False) & (df['Upperband'].values[1:] > df['Upperband'].values[:-1]), 'Upperband'] = df['Upperband'][:-1]

【问题讨论】：

像 for current in range(1,len(df.index)): 这样的循环对于较大的 DataFrame 几乎总是会变慢。 “nopython”模式下的 Numba 有时可用于加速这样的循环。 numba.readthedocs.io/en/stable/user/… 否则，请尝试找到您正在尝试执行的操作的“矢量化”版本以避免循环。
我得到了这个错误 TypingError: cannot determine Numba type of when I paid using jit(nopython=True)
Numba 在nopython=True 时仅支持 Python 的一个子集。如果您想以这种方式进行优化，您需要以一种可以调整的方式来构建您的代码。
我已经用矢量化版本编辑了问题，但打印出来的和循环版本不一样，你能帮帮我吗
您能否提供/指向一些示例数据进行测试。

标签： python pandas dataframe

【解决方案1】：

尝试使用Modin，而不是import pandas as pd。 Modin 自动使 pandas 更快。只需执行import modin.pandas as pd。除了导入之外，您无需更改任何代码。

如果需要使用df.apply()方法，有一个包叫Swifter。在你pip install swifter 之后，你需要做的就是import swifter，然后不要做df.apply()，而是做df.swifter.apply()。方便的是 Swifter 也可以与 Modin 一起使用。

【讨论】：

【解决方案2】：

这是您的代码的 Numba / Numpy 版本。您必须将 df[ 'close' ]、df[ 'high' ] 和 df[ 'low' ] 转换为 numpy 数组以获得速度优势。我没有检查输出值是否正确，但你明白了。

import numpy as np
from numba import jit

# UNCOMMENT THIS LINE IF YOU DON'T HAVE THE OPEN PRICES
# c_open = np.concatenate((np.array([np.nan]), c_close[1:]))

@jit(nopython=True)
def true_range(c_open, c_high, c_low):
    return np.maximum(np.maximum(c_high - c_low, np.abs(c_high - c_open)), np.abs(c_low - c_open))


@jit(nopython=True)
def average_true_range(c_open, c_high, c_low, period=12):
    true_r = true_range(c_open, c_high, c_low)
    size = len(true_r)
    out = np.array([np.nan] * size)
    for i in range(period - 1, size):
        window = true_r[i - period + 1:i + 1]
        out[i] = np.mean(window)
    return out


@jit(nopython=True)
def super_trend(c_close, c_open, c_high, c_low, period=5, multipler=1.5):
    size = len(c_close)
    avg_true_r = average_true_range(c_open, c_high, c_low, period=period)
    h2 = (c_high + c_low) / 2
    upper_band = h2 + (multipler * avg_true_r)
    lower_band = h2 - (multipler * avg_true_r)
    in_up_trend = np.array([np.nan] * size)
    for current in range(1, size):
        prev = current - 1
        if c_close[current] > upper_band[prev]:
            in_up_trend[current] = True
        elif c_close[current] < lower_band[prev]:
            in_up_trend[current] = False
        else:
            in_up_trend[current] = in_up_trend[prev]
            if in_up_trend[current] and lower_band[current] < lower_band[prev]:
                lower_band[current] = lower_band[prev]
            if not in_up_trend[current] and upper_band[current] > upper_band[prev]:
                upper_band[current] = upper_band[prev]
    return upper_band, lower_band, in_up_trend

编辑：如果你不使用 Heiken Ashi，你不需要改变收盘价来获得最后的收盘价，因为它们等同于开盘价;)

随时查看我的lib of fast indicators @ github

【讨论】：