【发布时间】:2020-03-28 11:52:15
【问题描述】:
我已将 4 个文件合并为一个。
df1:
ID name location case pass
1 John NY tax Y
2 Jack NJ payment N
3 John CA remote Y
4 Rose MA income Y
df2:
ID name location case pass
1 John NY car N
2 Jack NJ train Y
3 John CA car Y
4 Rose MA bike N
df3:
ID name location case pass
1 John NY spring Y
2 Jack NJ spring Y
3 John CA fall Y
4 Rose MA winter N
df4:
ID name location case pass
1 John NY red N
2 Jack NJ green N
3 John CA yellow Y
4 Rose MA yellow Y
这是我合并这些表的方法。
dfs = [df1,df2,df3,df4]
df_final = reduce(lambda left,right: pd.merge(left,right,on=[ID,name,location]), dfs)
但结果有点难以阅读。我需要将这些 case_x,case_y,pass_x,pass_y 转换为特定的列名。合并表格时可以这样做吗?
ID name location case_x pass_x case_y pass_y case_x pass_x case_y pass_y
1 John NY tax Y car N spring Y red N
2 Jack NJ payment N train Y spring Y green N
3 John CA remote Y car Y fall Y yellow Y
4 Rose MA income Y bike N winter N yellow Y
这是我的预期输出,
ID name location case_money pass_money case_trans pass_trans case_season pass_season case_color pass_color
1 John NY tax Y car N spring Y red N
2 Jack NJ payment N train Y spring Y green N
3 John CA remote Y car Y fall Y yellow Y
4 Rose MA income Y bike N winter N yellow Y
【问题讨论】:
-
merge函数中有后缀选项,但是我不知道在处理多个数据帧时这将如何工作。我会(考虑到你知道输出)在reduce之后使用.rename(columns={'case_x':'case_trans','case_y':'case_season','pass_x':'pass_trans','pass_y':'pass_season'})。 -
谢谢@IvanLibedinsky 重命名列有点困难。因为我的工作是合并20张表。
标签: python pandas numpy dataframe merge