如何有效地计算列组之间的 pct 变化？答案

【问题标题】：How can i calculate pct changes between groups of colums efficiently?如何有效地计算列组之间的 pct 变化？
【发布时间】：2021-12-03 14:12:33
【问题描述】：

我有一组这样的列：

q1_cash_total, q2_cash_total, q3_cash_total, 
q1_shop_us, q2_shop_us, q3_shop_us,

等等，我有大约 40 个类似这样命名的列名。我希望计算每组 3 个之间的 pct 变化。例如我知道我可以做到：

df[['q1_cash_total', 'q2_cash_total', 'q3_cash_total']].pct_change().add_suffix('_PCT_CHG')

每做 3 次就这样做：

q1 =  [col for col in df.columns if 'q1' in col ]
q2 =  [col for col in df.columns if 'q2' in col ]
q3 =  [col for col in df.columns if 'q3' in col ]
q_cols = q1+q2+q3
dflist = []
for col in df[q_cols].columns:
    #col[3:] to just get col name without the q1_/q2_ etc 
    print(col[3:])
    cols = [c for c in df.columns if col[3:] in c]
    pct = df[cols].pct_change().add_suffix('_PCT_CHG')
    dflist.append(pct) 

pcts_df = pd.concat(dflist)

我想不出更清洁的方法来做到这一点。有人有什么想法吗？我怎样才能做到这一点，以便我也在 q1 和 q3 之间进行 pct 更改而不是连续进行。

【问题讨论】：

标签： python pandas dataframe percentage

【解决方案1】：

您可以创建一个仅包含所需列的数据框，为此，filter 列名称以 q 开头，紧跟一个或多个数字和一个下划线 (^q\d+?_)。删除前缀并仅使用 pd.unique 保留唯一的列名。对于每个唯一的列名称，过滤具有该特定名称的列，并沿列轴 (.pct_change(axis='columns')) 应用百分比变化，以获得 q1、q2 和 q3 之间的变化。

要获得q1 和q3 之间的百分比变化，您可以在之前创建的数据框 (df_q) 上按名称选择这些列，并应用之前执行的相同 pct_change。

df用作输入

   q1_cash_total  q1_shop_us  q2_cash_total  q2_shop_us  q3_cash_total  q3_shop_us  another_col   numCols  dataCols
0             52          93             15          72             61          21           83        87        75
1             75          88             24           3             22          53            2        88        30
2             38           2             64          60             21          33           76        58        22
3             89          49             91          59             42          92           60        80        15
4             62          62             47          62             51          55           64         3        51

df_q = df.filter(regex='^q\d+?_')
unique_cols = pd.unique([c[3:] for c in df_q.columns])

dflist = []
for col in unique_cols:
    q_name = df_q.filter(like=col)
    df_s = q_name.pct_change(axis='columns').add_suffix('_PCT_CHG')
    dflist.append(df_s)
    df_s = df_q[[f'q1_{col}', f'q3_{col}']].pct_change(axis='columns').add_suffix('_Q1-Q3')
    dflist.append(df_s)

pcts_df = pd.concat(dflist, axis=1)

pcts_df的输出

   q1_cash_total_PCT_CHG  q2_cash_total_PCT_CHG  q3_cash_total_PCT_CHG  ...  q3_shop_us_PCT_CHG  q1_shop_us_Q1-Q3  q3_shop_us_Q1-Q3
0                    NaN              -0.711538               3.066667  ...           -0.708333               NaN         -0.774194
1                    NaN              -0.680000              -0.083333  ...           16.666667               NaN         -0.397727
2                    NaN               0.684211              -0.671875  ...           -0.450000               NaN         15.500000
3                    NaN               0.022472              -0.538462  ...            0.559322               NaN          0.877551
4                    NaN              -0.241935               0.085106  ...           -0.112903               NaN         -0.112903

[5 rows x 10 columns]

【讨论】：