【问题标题】:How to aggregate one column based on another column in Pandas如何根据 Pandas 中的另一列聚合一列
【发布时间】:2022-01-20 09:56:25
【问题描述】:
year fruit sales
0 2010 Apple 10
1 2011 Apple 20
2 2010 Banans 50000
3 2011 Banans 30
我想要的是这样的:
fruit min_year sales_2010 max_year sales_2011
0 Apple 2010 10 2011 20
1 Banans 2010 50000 2011 30
【问题讨论】:
标签:
python
pandas
dataframe
pandas-groupby
【解决方案1】:
首先将min 和max 聚合到df1 和DataFrame.add_suffix,然后通过DataFrame.pivot 和DataFrame.add_prefix 进行旋转,最后通过concat 连接在一起:
df1 = df.groupby('fruit')['year'].agg(['min','max']).add_suffix('_year')
df2 = df.pivot('fruit','year','sales').add_prefix('sales_')
df = pd.concat([df1, df2], axis=1)
print (df)
min_year max_year sales_2010 sales_2011
fruit
Apple 2010 2011 10 20
Banans 2010 2011 50000 30
【解决方案2】:
一个选项:
(df
.pivot("fruit", "year", "sales")
.assign(min_year=lambda df: df.columns.min(),
max_year=lambda df: df.columns[:-1].max())
.rename(columns=lambda col: f"sales_{col}"
if isinstance(col, int)
else col)
.rename_axis(columns=None)
.reset_index()
)
fruit sales_2010 sales_2011 min_year max_year
0 Apple 10 20 2010 2011
1 Banans 50000 30 2010 2011
另一种可能更有效的选择:
grouper = df.groupby('fruit')
(df
.assign(min_year=grouper.year.transform("min"),
max_year=grouper.year.transform("max"))
.pivot(["fruit", "min_year", "max_year"], "year", "sales")
.add_prefix("sales_")
.rename_axis(columns=None)
.reset_index()
)
fruit min_year max_year sales_2010 sales_2011
0 Apple 2010 2011 10 20
1 Banans 2010 2011 50000 30