【问题标题】:Change duplicated column names after merging multiple tables in python在python中合并多个表后更改重复的列名
【发布时间】:2020-03-28 11:52:15
【问题描述】:

我已将 4 个文件合并为一个。

df1:
ID   name    location   case     pass
1    John      NY       tax       Y
2    Jack      NJ       payment   N
3    John      CA       remote    Y
4    Rose      MA       income    Y
df2:
ID   name    location   case   pass
1    John      NY       car     N
2    Jack      NJ       train   Y
3    John      CA       car     Y
4    Rose      MA       bike    N
df3:
ID   name    location   case     pass
1    John      NY       spring    Y
2    Jack      NJ       spring    Y
3    John      CA       fall      Y
4    Rose      MA       winter    N
df4:
ID   name    location   case    pass
1    John      NY       red      N
2    Jack      NJ       green    N
3    John      CA       yellow   Y
4    Rose      MA       yellow   Y

这是我合并这些表的方法。

dfs = [df1,df2,df3,df4]
df_final = reduce(lambda left,right: pd.merge(left,right,on=[ID,name,location]), dfs)

但结果有点难以阅读。我需要将这些 case_x,case_y,pass_x,pass_y 转换为特定的列名。合并表格时可以这样做吗?

 ID   name    location     case_x  pass_x  case_y      pass_y   case_x      pass_x  case_y   pass_y
    1    John      NY       tax       Y      car       N        spring      Y       red      N
    2    Jack      NJ       payment   N      train     Y        spring      Y      green     N
    3    John      CA       remote    Y      car       Y        fall        Y      yellow    Y 
    4    Rose      MA       income    Y      bike      N        winter      N      yellow    Y  

这是我的预期输出,

ID   name    location  case_money  pass_money  case_trans   pass_trans   case_season      pass_season  case_color  pass_color
1    John      NY       tax       Y           car                N        spring                 Y       red      N 
2    Jack      NJ       payment   N           train              Y        spring                 Y      green     N
3    John      CA       remote    Y           car                Y        fall                   Y      yellow    Y 
4    Rose      MA       income    Y           bike               N        winter                 N      yellow    Y  

【问题讨论】:

  • merge 函数中有后缀选项,但是我不知道在处理多个数据帧时这将如何工作。我会(考虑到你知道输出)在reduce 之后使用.rename(columns={'case_x':'case_trans','case_y':'case_season','pass_x':'pass_trans','pass_y':'pass_season'})
  • 谢谢@IvanLibedinsky 重命名列有点困难。因为我的工作是合并20张表。

标签: python pandas numpy dataframe merge


【解决方案1】:

通过suffixes 选项和列表pop 仍然可以使用reduce

suff = ['_trans', '_season', '_color']
dfs = [df1,df2,df3,df4]
df_final = reduce(lambda left,right: pd.merge(left,right,on=['ID','name','location'], 
                                          suffixes=('', suff.pop(0))), dfs)

Out[1944]:
   ID  name location     case pass case_trans pass_trans case_season  \
0  1   John  NY       tax      Y    car        N          spring
1  2   Jack  NJ       payment  N    train      Y          spring
2  3   John  CA       remote   Y    car        Y          fall
3  4   Rose  MA       income   Y    bike       N          winter

  pass_season case_color pass_color
0  Y           red        N
1  Y           green      N
2  Y           yellow     Y
3  N           yellow     Y

注意:请注意列表suff。您需要在重新运行代码之前重新启动它。


如果您想将第一个 casepass 重命名为 _money,只需链接额外的 rename

df_final = (reduce(lambda left,right: pd.merge(left,right,on=['ID','name','location'], 
                                          suffixes=('', suff.pop(0))), dfs)
                 .rename({'case': 'case_money', 'pass': 'pass_money'}, axis=1))

Out[1951]:
   ID  name location case_money pass_money case_trans pass_trans case_season  \
0  1   John  NY       tax        Y          car        N          spring
1  2   Jack  NJ       payment    N          train      Y          spring
2  3   John  CA       remote     Y          car        Y          fall
3  4   Rose  MA       income     Y          bike       N          winter

  pass_season case_color pass_color
0  Y           red        N
1  Y           green      N
2  Y           yellow     Y
3  N           yellow     Y

这样做,您只需要重命名第一组case, pass,所有其他case, pass 组已经被suffixes 命名为merge

【讨论】:

    【解决方案2】:

    我使用concatpivot_table 的方法:

    names = ['money', 'trans', 'season', 'color']
    dfs = [df1,df2,df3,df4]
    
    new_df = (pd.concat(d.assign(name=n) for n,d in zip(names, dfs))
                .pivot_table(index=['ID','location', 'location'],
                             columns='name',
                             values=['case','pass'],
                             aggfunc='first')
             )
    new_df.columns = [f'{x}_{y}' for x,y in new_df.columns]
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2015-10-02
      • 2017-09-05
      • 2015-12-08
      • 2020-02-05
      • 2012-04-13
      • 2012-12-26
      • 2018-09-24
      • 2013-09-27
      相关资源
      最近更新 更多