带有条件的 Python Pandas 聚合答案

【问题标题】：Python Pandas aggregation with condition带有条件的 Python Pandas 聚合
【发布时间】：2019-01-08 21:29:57
【问题描述】：

我需要对我的数据框进行分组，并在不同的列上使用多个聚合函数。而且有些聚合是有条件的。

这是一个例子。数据是来自 2 个客户的所有订单，我想计算每个客户的一些信息。就像他们的订单数量、总支出和平均支出一样。

import pandas as pd

data = {'order_id' : range(1,9),
        'cust_id' : [1]*5 + [2]*3,
        'order_amount' : [100,50,70,75,80,105,30,20],
        'cust_days_since_reg' : [0,10,25,37,52,0,17,40]}

orders = pd.DataFrame(data)

aggregation = {'order_id' : 'count',
               'order_amount' : ['sum', 'mean']}

cust = orders.groupby('cust_id').agg(aggregation).reset_index()
cust.columns = ['_'.join(col) for col in cust.columns.values]

这很好，给了我：

但我必须添加一个带有参数和条件的聚合函数：客户在前 X 个月内花费的金额（X 必须是可定制的）

因为我在这个聚合中需要一个参数，所以我尝试了：

def spendings_X_month(group, n_months):
    return group.loc[group['cust_days_since_reg'] <= n_months*30, 
                     'order_amount'].sum()

aggregation = {'order_id' : 'count',
               'order_amount' : ['sum',
                                 'mean',
                                 lambda x: spendings_X_month(x, 1)]}

cust = orders.groupby('cust_id').agg(aggregation).reset_index()

但最后一行让我得到了错误：KeyError: 'cust_days_since_reg'。一定是范围错误，cust_days_since_reg 列在这种情况下一定不可见。

我可以单独计算最后一列，然后将生成的数据框连接到第一列，但必须有一个更好的解决方案，让每件事都只在一个 groupby 中。

谁能帮我解决这个问题？

谢谢

【问题讨论】：

标签： python pandas grouping conditional-statements aggregation

【解决方案1】：

你不能使用agg，因为每个函数只使用一个列，所以这种基于另一个列的过滤是不可能的。

解决方案使用GroupBy.apply:

def spendings_X_month(group, n_months):
    a = group['order_id'].count()
    b = group['order_amount'].sum()
    c = group['order_amount'].mean()
    d = group.loc[group['cust_days_since_reg'] <= n_months*30, 
                     'order_amount'].sum()
    cols = ['order_id_count','order_amount_sum','order_amount_mean','order_amount_spendings']
    return pd.Series([a,b,c,d], index=cols)

cust = orders.groupby('cust_id').apply(spendings_X_month, 1).reset_index()
print (cust)
   cust_id  order_id_count  order_amount_sum  order_amount_mean  \
0        1             5.0             375.0          75.000000   
1        2             3.0             155.0          51.666667   

   order_amount_spendings  
0                   220.0  
1                   135.0

【讨论】：