【问题标题】:Is there a way to make custom function in pandas aggregation function?有没有办法在熊猫聚合函数中制作自定义函数?
【发布时间】:2020-07-04 19:19:20
【问题描述】:

想要在 Dataframe 中应用自定义函数 例如。数据框

    index City  Age 
0   1    A    50    
1   2    A    24    
2   3    B    65    
3   4    A    40     
4   5    B    68    
5   6    B    48    

应用函数

def count_people_above_60(age):
     **    ***                       #i dont know if the age can or can't be passed as series or list to perform any operation later
     return count_people_above_60 

期待做类似的事情

df.groupby(['City']).agg{"AGE" : ["mean",""count_people_above_60"]}

预期输出

City  Mean People_Above_60
 A    38    0
 B    60.33    2

【问题讨论】:

    标签: python python-3.x pandas aggregate pandas-groupby


    【解决方案1】:

    如果性能很重要,则创建由转换为integers 的比较值填充的新列,因此计数使用聚合sum

    df = (df.assign(new = df['Age'].gt(60).astype(int))
            .groupby(['City'])
            .agg(Mean= ("Age" , "mean"), People_Above_60= ('new',"sum")))
    print (df)
               Mean  People_Above_60
    City                            
    A     38.000000                0
    B     60.333333                2
    

    您的解决方案应使用比较值和sum 进行更改,但如果有很多组或较大的DataFrame,则会很慢:

    def count_people_above_60(age):
        return (age > 60).sum()
    
    df = (df.groupby(['City']).agg(Mean=("Age" , "mean"), 
                                   People_Above_60=('Age',count_people_above_60)))
    print (df)
               Mean  People_Above_60
    City                            
    A     38.000000                0
    B     60.333333                2
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2020-03-05
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-09-07
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多