【问题标题】:joining three dataframes horizontally and merging like columns水平连接三个数据框并像列一样合并
【发布时间】:2021-09-01 16:44:03
【问题描述】:

我有三个数据框,每个数据框包含更多的 556、555 和 ~ 1600 列。我想水平堆叠它们,同时合并类似的列。我将如何使用这么多列来做到这一点?我尝试重新索引,因此索引从第一个 df 的 0-252、第二个 df 的 232-2518 和最终的 2519 到 ~4000,但我仍然收到以下错误:

InvalidIndexError: Reindexing only valid with uniquely valued Index objects

在这种情况下,使用合并或连接是否更好?

数据可以在这里找到: https://github.com/eoefelein/sample_data

非常感谢!

【问题讨论】:

    标签: python pandas join merge concatenation


    【解决方案1】:

    您是否有跨每个数据框的唯一标识符来加入它们?

    如果不是,我认为您只需要一个普通的 pd.concat 它将合并您的数据帧,并且总列数将是所有 3 个数据帧中的不同列数

    import pandas as pd
    
    df1 = pd.read_csv('sample_data/final_pre_rfe_fiverr.csv')
    df2 = pd.read_csv('sample_data/final_pre_rfe_freelancer.csv')
    df3 = pd.read_csv('sample_data/final_pre_rfe_pph.csv')
    pd.concat((df1,df2,df3))
    
    

    在下面的输出中注意,新列水平堆叠,而旧列被合并。

    输出:

         Unnamed: 0              title  .net  360 photography  2d animation  \
    0           253             mobile   0.0              0.0             0   
    1           254  quality assurance   0.0              0.0             0   
    2           255     data scientist   0.0              0.0             0   
    3           256     data scientist   0.0              0.0             0   
    4           257  quality assurance   0.0              0.0             0   
    ..          ...                ...   ...              ...           ...   
    248         248     data scientist   NaN              NaN             0   
    249         249          fullstack   NaN              NaN             0   
    250         250          fullstack   NaN              NaN             0   
    251         251          fullstack   NaN              NaN             0   
    252         252          fullstack   NaN              NaN             0   
    
         3d modelling  3d rendering  3d texturing  3ddesign  3dmodeling  ...  \
    0             0.0             0           0.0       0.0         0.0  ...   
    1             0.0             0           0.0       0.0         0.0  ...   
    2             0.0             0           0.0       0.0         0.0  ...   
    3             0.0             0           0.0       0.0         0.0  ...   
    4             0.0             0           0.0       0.0         0.0  ...   
    ..            ...           ...           ...       ...         ...  ...   
    248           NaN             0           NaN       NaN         NaN  ...   
    249           NaN             0           NaN       NaN         NaN  ...   
    250           NaN             0           NaN       NaN         NaN  ...   
    251           NaN             0           NaN       NaN         NaN  ...   
    252           NaN             0           NaN       NaN         NaN  ...   
    
         webui studio 2013 for asp.net  windows administration  \
    0                              NaN                     NaN   
    1                              NaN                     NaN   
    2                              NaN                     NaN   
    3                              NaN                     NaN   
    4                              NaN                     NaN   
    ..                             ...                     ...   
    248                            0.0                     0.0   
    249                            0.0                     0.0   
    250                            0.0                     0.0   
    251                            0.0                     0.0   
    252                            0.0                     0.0   
    
         windows powershell programming language.1  wordpress e-commerce  \
    0                                          NaN                   NaN   
    1                                          NaN                   NaN   
    2                                          NaN                   NaN   
    3                                          NaN                   NaN   
    4                                          NaN                   NaN   
    ..                                         ...                   ...   
    248                                        0.0                   0.0   
    249                                        0.0                   0.0   
    250                                        0.0                   0.0   
    251                                        0.0                   0.0   
    252                                        0.0                   0.0   
    
         wordpress plugin.1  wordpress template  worpress migration  zapier  \
    0                   NaN                 NaN                 NaN     NaN   
    1                   NaN                 NaN                 NaN     NaN   
    2                   NaN                 NaN                 NaN     NaN   
    3                   NaN                 NaN                 NaN     NaN   
    4                   NaN                 NaN                 NaN     NaN   
    ..                  ...                 ...                 ...     ...   
    248                 0.0                 0.0                 0.0     0.0   
    249                 0.0                 0.0                 0.0     0.0   
    250                 0.0                 0.0                 0.0     0.0   
    251                 0.0                 0.0                 0.0     0.0   
    252                 0.0                 0.0                 0.0     0.0   
    
         zend framework  zimbra  
    0               NaN     NaN  
    1               NaN     NaN  
    2               NaN     NaN  
    3               NaN     NaN  
    4               NaN     NaN  
    ..              ...     ...  
    248             0.0     0.0  
    249             0.0     0.0  
    250             0.0     0.0  
    251             0.0     0.0  
    252             0.0     0.0  
    
    [4194 rows x 2194 columns]
    

    希望对您有所帮助!如果不是,您介意再澄清一下您要查找的内容吗?

    【讨论】:

    • 那行得通,但我必须从我输出它们的地方读取 csv,而不是在我的笔记本中创建它们的地方?奇怪,但很高兴这有效!谢谢!
    • 不,您绝对不必这样做:)。我只是想向您展示我实际上正在使用您链接的 csvs。你应该可以在你的笔记本上做同样的事情。您可能必须在每个数据帧上运行 reset_index()
    【解决方案2】:

    pd.concat()axis=1ignore_index=True 一起使用:

    假设您已经将 CSV 文件读入数据帧 df1df2df3

    df_out = pd.concat([df1, df2, df3], axis=1, ignore_index=True)
    

    编辑

    我可能忽略了您想要水平堆叠它们。在这种情况下,只需使用默认的axis=0

    df_out = pd.concat([df1, df2, df3], ignore_index=True)
    

    保留ignore_index=True 以便重新序列化行索引。

    【讨论】:

      猜你喜欢
      • 2021-05-24
      • 2020-09-21
      • 1970-01-01
      • 2019-10-14
      • 2014-01-11
      • 2017-12-02
      • 1970-01-01
      • 2020-05-27
      相关资源
      最近更新 更多