【问题标题】:how to add new row into each group of groupby in PANDAS , one of the value of that row is sum of values of each groups如何在 PANDAS 中的每组 groupby 中添加新行,该行的值之一是每组值的总和
【发布时间】:2021-11-07 23:54:43
【问题描述】:

假设我有一个这样的数据框

eff_date,mdl_cd,ast_cd,prop_cd,value
2021-09-22,Comm,Agri,Car,-0.1234
2021-09-22,Comm,Agri,Fund,0.5123
2021-09-22,Comm,Agri,Mmt,-0.7612
2021-09-22,Comm,Engy,Car,0.1212
2021-09-22,Comm,Engy,Fund,-0.1234
2021-09-22,Comm,Engy,Mmt,0.5123
2021-09-22,Comm,Industry,Car,-0.7612
2021-09-22,Comm,Industry,Fund,0.1212
2021-09-22,Comm,Industry,Mmt,-0.1234
2021-09-22,Comm,Metal,Car,0.5123
2021-09-22,Comm,Metal,Fund,-0.7612
2021-09-22,Comm,Metal,Mmt,0.1212
2021-09-23,Equity,Agri,Car,0.6541
2021-09-23,Equity,Agri,Fund,0.5123
2021-09-23,Equity,Agri,Mmt,-0.1874
2021-09-23,Equity,Engy,Car,0.1212
2021-09-23,Equity,Engy,Fund,-0.6234
2021-09-23,Equity,Engy,Mmt,0.5123 
2021-09-23,Equity,Industry,Car,-0.1612
2021-09-23,Equity,Industry,Fund,0.1212
2021-09-23,Equity,Industry,Mmt,-0.1934
2021-09-23,Equity,Metal,Car,0.5123
2021-09-23,Equity,Metal,Fund,0.5412
2021-09-23,Equity,Metal,Mmt,0.1212

我想在每组 groupby(by=['eff_date','mdl_cd','ast_cd']) 中添加一个新行 其中eff_date,mdl_cdast_cd 的列值将保持sameprop_cd 的值变为Hlds 并且值值列变为该组的值的总和,例如对于 value 列的第一组值将是 (-0.1234+0.5123+-0.7612) 即 -0.3723

因此输出将是这样的

eff_date,mdl_cd,ast_cd,prop_cd,value
2021-09-22,Comm,Agri,Car,-0.1234
2021-09-22,Comm,Agri,Fund,0.5123
2021-09-22,Comm,Agri,Mmt,-0.7612
2021-09-22,Comm,Agri,Hlds,-0.3723        +row added   (sum of value in that group)

2021-09-22,Comm,Engy,Car,0.1212
2021-09-22,Comm,Engy,Fund,-0.1234
2021-09-22,Comm,Engy,Mmt,0.5123
2021-09-22,Comm,Engy,Hlds,0.5101         +row added  (sum of value in that group)

2021-09-22,Comm,Industry,Car,-0.7612
2021-09-22,Comm,Industry,Fund,0.1212
2021-09-22,Comm,Industry,Mmt,-0.1234
2021-09-22,Comm,Industry,Hlds,-0.7634     +row added (sum of value in that group)

2021-09-22,Comm,Metal,Car,0.5123
2021-09-22,Comm,Metal,Fund,-0.7612
2021-09-22,Comm,Metal,Mmt,0.1212
2021-09-22,Comm,Metal,Hlds,-0.1277        +row added (sum of value in that group)

2021-09-23,Equity,Agri,Car,0.6541
2021-09-23,Equity,Agri,Fund,0.5123
2021-09-23,Equity,Agri,Mmt,-0.1874
2021-09-23,Equity,Agri,Hlds,0.979          +row added (sum of value in that group)

2021-09-23,Equity,Engy,Car,0.1212
2021-09-23,Equity,Engy,Fund,-0.6234
2021-09-23,Equity,Engy,Mmt,0.5123 
2021-09-23,Equity,Engy,Hlds,0.0101         +row added (sum of value in that group)

2021-09-23,Equity,Industry,Car,-0.1612
2021-09-23,Equity,Industry,Fund,0.1212
2021-09-23,Equity,Industry,Mmt,-0.1934
2021-09-23,Equity,Industry,Hlds,-0.2334    +row added (sum of value in that group)

2021-09-23,Equity,Metal,Car,0.5123
2021-09-23,Equity,Metal,Fund,0.5412
2021-09-23,Equity,Metal,Mmt,0.1212
2021-09-23,Equity,Metal,Hlds,1.1747        +row added (sum of value in that group)

如何使用 pandas 执行此计算

【问题讨论】:

  • 你能不能一次性完成 groupby 和 sum 并将其连接回原始 df 然后排序。

标签: python pandas pandas-groupby


【解决方案1】:

您可以通过.groupby().sum() 创建每个组的总和的数据框,通过.assign()prop_cd 设置为Hlds

然后,通过pd.concat() 与原始数据框连接,并通过.sort_values() 对列进行排序以将总和行与其各自的组重新组合在一起,如下所示:

df_sum = df.groupby(['eff_date','mdl_cd','ast_cd'], as_index=False)['value'].sum().assign(prop_cd='Hlds')

df_out = pd.concat([df, df_sum]).sort_values(['eff_date','mdl_cd','ast_cd'], kind='stable', ignore_index=True)

结果:

print(df_out)

      eff_date  mdl_cd    ast_cd prop_cd   value
0   2021-09-22    Comm      Agri     Car -0.1234
1   2021-09-22    Comm      Agri    Fund  0.5123
2   2021-09-22    Comm      Agri     Mmt -0.7612
3   2021-09-22    Comm      Agri    Hlds -0.3723
4   2021-09-22    Comm      Engy     Car  0.1212
5   2021-09-22    Comm      Engy    Fund -0.1234
6   2021-09-22    Comm      Engy     Mmt  0.5123
7   2021-09-22    Comm      Engy    Hlds  0.5101
8   2021-09-22    Comm  Industry     Car -0.7612
9   2021-09-22    Comm  Industry    Fund  0.1212
10  2021-09-22    Comm  Industry     Mmt -0.1234
11  2021-09-22    Comm  Industry    Hlds -0.7634
12  2021-09-22    Comm     Metal     Car  0.5123
13  2021-09-22    Comm     Metal    Fund -0.7612
14  2021-09-22    Comm     Metal     Mmt  0.1212
15  2021-09-22    Comm     Metal    Hlds -0.1277
16  2021-09-23  Equity      Agri     Car  0.6541
17  2021-09-23  Equity      Agri    Fund  0.5123
18  2021-09-23  Equity      Agri     Mmt -0.1874
19  2021-09-23  Equity      Agri    Hlds  0.9790
20  2021-09-23  Equity      Engy     Car  0.1212
21  2021-09-23  Equity      Engy    Fund -0.6234
22  2021-09-23  Equity      Engy     Mmt  0.5123
23  2021-09-23  Equity      Engy    Hlds  0.0101
24  2021-09-23  Equity  Industry     Car -0.1612
25  2021-09-23  Equity  Industry    Fund  0.1212
26  2021-09-23  Equity  Industry     Mmt -0.1934
27  2021-09-23  Equity  Industry    Hlds -0.2334
28  2021-09-23  Equity     Metal     Car  0.5123
29  2021-09-23  Equity     Metal    Fund  0.5412
30  2021-09-23  Equity     Metal     Mmt  0.1212
31  2021-09-23  Equity     Metal    Hlds  1.1747

设置

df = pd.read_clipboard(',')

      eff_date  mdl_cd    ast_cd prop_cd   value
0   2021-09-22    Comm      Agri     Car -0.1234
1   2021-09-22    Comm      Agri    Fund  0.5123
2   2021-09-22    Comm      Agri     Mmt -0.7612
3   2021-09-22    Comm      Engy     Car  0.1212
4   2021-09-22    Comm      Engy    Fund -0.1234
5   2021-09-22    Comm      Engy     Mmt  0.5123
6   2021-09-22    Comm  Industry     Car -0.7612
7   2021-09-22    Comm  Industry    Fund  0.1212
8   2021-09-22    Comm  Industry     Mmt -0.1234
9   2021-09-22    Comm     Metal     Car  0.5123
10  2021-09-22    Comm     Metal    Fund -0.7612
11  2021-09-22    Comm     Metal     Mmt  0.1212
12  2021-09-23  Equity      Agri     Car  0.6541
13  2021-09-23  Equity      Agri    Fund  0.5123
14  2021-09-23  Equity      Agri     Mmt -0.1874
15  2021-09-23  Equity      Engy     Car  0.1212
16  2021-09-23  Equity      Engy    Fund -0.6234
17  2021-09-23  Equity      Engy     Mmt  0.5123
18  2021-09-23  Equity  Industry     Car -0.1612
19  2021-09-23  Equity  Industry    Fund  0.1212
20  2021-09-23  Equity  Industry     Mmt -0.1934
21  2021-09-23  Equity     Metal     Car  0.5123
22  2021-09-23  Equity     Metal    Fund  0.5412
23  2021-09-23  Equity     Metal     Mmt  0.1212

中期结果:

print(df_sum)

     eff_date  mdl_cd    ast_cd   value prop_cd
0  2021-09-22    Comm      Agri -0.3723    Hlds
1  2021-09-22    Comm      Engy  0.5101    Hlds
2  2021-09-22    Comm  Industry -0.7634    Hlds
3  2021-09-22    Comm     Metal -0.1277    Hlds
4  2021-09-23  Equity      Agri  0.9790    Hlds
5  2021-09-23  Equity      Engy  0.0101    Hlds
6  2021-09-23  Equity  Industry -0.2334    Hlds
7  2021-09-23  Equity     Metal  1.1747    Hlds

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2021-12-17
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2020-04-04
    • 2018-05-22
    • 2020-10-19
    • 2015-09-04
    相关资源
    最近更新 更多