【发布时间】:2017-12-25 11:42:10
【问题描述】:
我有一个数据框my_df,然后我想创建一个新的数据框new_df。每个new_df 列都是由groupby my_id 创建的,然后在my_df 中取一列的max。
下面是我的代码,它工作正常。但是,我想知道有没有更好的方法?特别是在未来我将处理数百列而不是仅仅 6 列?非常感谢!
tmp_df1 = my_df.groupby(['my_id'], as_index=False).col_A.agg({"max_A": "max"})
tmp_df2 = my_df.groupby(['my_id'], as_index=False).col_B.agg({"max_B": "max"})
tmp_df3 = my_df.groupby(['my_id'], as_index=False).col_C.agg({"max_C": "max"})
tmp_df4 = my_df.groupby(['my_id'], as_index=False).col_D.agg({"max_D": "max"})
tmp_df5 = my_df.groupby(['my_id'], as_index=False).col_E.agg({"max_E": "max"})
tmp_df6 = my_df.groupby(['my_id'], as_index=False).col_F.agg({"max_F": "max"})
combine_df1 = pd.merge(tmp_df1,tmp_df2,how="inner",on=['my_id'])
combine_df2 = pd.merge(combine_df1,tmp_df3,how="inner",on=['my_id'])
combine_df3 = pd.merge(combine_df2,tmp_df4,how="inner",on=['my_id'])
combine_df4 = pd.merge(combine_df3,tmp_df5,how="inner",on=['my_id'])
new_df = pd.merge(combine_df4,tmp_df6,how="inner",on=['my_id'])
【问题讨论】:
标签: python python-3.x pandas group-by aggregation