有没有办法在熊猫聚合函数中制作自定义函数？

【问题标题】：Is there a way to make custom function in pandas aggregation function?有没有办法在熊猫聚合函数中制作自定义函数？
【发布时间】：2020-07-04 19:19:20
【问题描述】：

想要在 Dataframe 中应用自定义函数例如。数据框

    index City  Age 
0   1    A    50    
1   2    A    24    
2   3    B    65    
3   4    A    40     
4   5    B    68    
5   6    B    48

应用函数

def count_people_above_60(age):
     **    ***                       #i dont know if the age can or can't be passed as series or list to perform any operation later
     return count_people_above_60

期待做类似的事情

df.groupby(['City']).agg{"AGE" : ["mean",""count_people_above_60"]}

预期输出

City  Mean People_Above_60
 A    38    0
 B    60.33    2

【问题讨论】：

标签： python python-3.x pandas aggregate pandas-groupby

【解决方案1】：

如果性能很重要，则创建由转换为integers 的比较值填充的新列，因此计数使用聚合sum：

df = (df.assign(new = df['Age'].gt(60).astype(int))
        .groupby(['City'])
        .agg(Mean= ("Age" , "mean"), People_Above_60= ('new',"sum")))
print (df)
           Mean  People_Above_60
City                            
A     38.000000                0
B     60.333333                2

您的解决方案应使用比较值和sum 进行更改，但如果有很多组或较大的DataFrame，则会很慢：

def count_people_above_60(age):
    return (age > 60).sum()

df = (df.groupby(['City']).agg(Mean=("Age" , "mean"), 
                               People_Above_60=('Age',count_people_above_60)))
print (df)
           Mean  People_Above_60
City                            
A     38.000000                0
B     60.333333                2

【讨论】：