【问题标题】:How to apply aggregate function with a condition on a pivot table in Pandas?如何在 Pandas 的数据透视表上应用具有条件的聚合函数?
【发布时间】:2020-02-10 11:15:52
【问题描述】:

我的数据框看起来“像”这样:

index   name     method     values
0.      A       estimated     4874
1.      A       counted        847
2.      A       estimated     1152
3.      B       estimated      276
4.      B       counted       6542
5.      B       counted       1152
6.      B       estimated     3346
7.      C       counted       7622
8.      C       estimated       26
...

我想要做的是将每个“名称”的“估计”和“计数”值的总数相加。我尝试像在这段代码中那样使用 pivot_table 来做,但我一次只能为其中一种方法做。有没有办法可以在相同的代码中为这两种方法做到这一点?

count = df.groupby(['name']).apply(lambda sub_df: sub_df\
        .pivot_table(index=['method'], values=['values'], 
                     aggfunc= {'values': lambda x: x[df.iloc[x.index['method']=='estimated'].sum()}, 
                     margins=True, margins_name == 'total_estimated')
count

我想得到的最后是这样的:

index   name     method       values
0.      A       estimated       4874
1.      A       counted          847
2.      A       estimated       1152
3.      A    total_counted       847
4.      A   total_estimated     6026
5.      B       estimated        276
6.      B       counted         6542
7.      B       counted         1152
8.      B       estimated       3346
9.      B    total_counted      7694
10.     B   total_estimated     3622
11.     C       counted         7622
12.     C       estimated         26
13.     C    total_counted      7622
14.     C   total_estimated       26
...

【问题讨论】:

    标签: python pandas indexing pivot-table aggregate


    【解决方案1】:

    使用DataFrame.pivot_table 要数,那么我们可以用DataFrame.stack + DataFrame.joinDataFrame.melt + DataFrame.merge 加入原始DataFrame:

    #if index is a columns
    #df = df = df.set_index('index')
    new_df = (df.join(df.pivot_table(index = 'name',
                                      columns = 'method',
                                      values = 'values',
                                      aggfunc = 'sum')
                        .add_prefix('total_') 
                        .stack()
                        .rename('new_value'),
                      on = ['name','method'],how = 'outer')
    
                .assign(values = lambda x: x['values'].fillna(x['new_value']))
                .drop(columns = 'new_value')
                .sort_values(['name','method'])
    )
    print(new_df)
    

    #if index is a columns
    #df = df = df.set_index('index')
    new_df = (df.merge(df.pivot_table(index = 'name',
                                      columns = 'method',
                                      values = 'values',
                                      aggfunc = 'sum')
                .add_prefix('total_')         
                .T
                .reset_index()
                .melt('method',value_name = 'values'),
                       on = ['name','method'],how = 'outer')
                .assign(values = lambda x: x['values_x'].fillna(x['values_y']))
                .loc[:,df.columns]
                .sort_values(['name','method'])
    )
    print(new_df)
    

    输出

       name           method  values
    2     A          counted   847.0
    0     A        estimated  4874.0
    1     A        estimated  1152.0
    9     A    total_counted   847.0
    10    A  total_estimated  6026.0
    5     B          counted  6542.0
    6     B          counted  1152.0
    3     B        estimated   276.0
    4     B        estimated  3346.0
    11    B    total_counted  7694.0
    12    B  total_estimated  3622.0
    7     C          counted  7622.0
    8     C        estimated    26.0
    13    C    total_counted  7622.0
    14    C  total_estimated    26.0
    

    但如果我是你,我会改用DataFrame.add_suffix

    new_df = (df.join(df.pivot_table(index = 'name',
                                      columns = 'method',
                                      values = 'values',
                                      aggfunc = 'sum')
                        .add_suffix('_total') 
                        .stack()
                        .rename('new_value'),
                      on = ['name','method'],how = 'outer')
    
                .assign(values = lambda x: x['values'].fillna(x['new_value']))
                .drop(columns = 'new_value')
                .sort_values(['name','method'])
             )
    print(new_df)
    
          name           method  values
    index                              
    1.0      A          counted   847.0
    8.0      A    counted_total   847.0
    0.0      A        estimated  4874.0
    2.0      A        estimated  1152.0
    8.0      A  estimated_total  6026.0
    4.0      B          counted  6542.0
    5.0      B          counted  1152.0
    8.0      B    counted_total  7694.0
    3.0      B        estimated   276.0
    6.0      B        estimated  3346.0
    8.0      B  estimated_total  3622.0
    7.0      C          counted  7622.0
    8.0      C    counted_total  7622.0
    8.0      C        estimated    26.0
    8.0      C  estimated_total    26.0
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2019-06-12
      • 1970-01-01
      • 2018-02-03
      • 2020-08-17
      • 2013-01-19
      • 1970-01-01
      • 2021-08-19
      相关资源
      最近更新 更多