【问题标题】:How to unify column names to append dataframes using pandas?如何统一列名以使用熊猫附加数据框?
【发布时间】:2021-07-27 10:59:41
【问题描述】:

我有两个如下所示的数据框

df1 = pd.DataFrame({'person_id': [101,101,101,101,202,202,202],
                   'person_type':['A','A','B','C','D','B','A'],
                   'test_id':[1,2,3,3,4,4,5],
                   'login_date':['5/7/2013 09:27:00 AM','09/08/2013 11:21:00 AM','06/06/2014 08:00:00 AM','06/06/2014 05:00:00 AM','12/11/2011 10:00:00 AM','13/10/2012 12:00:00 AM','13/12/2012 11:45:00 AM']})

df2 = pd.DataFrame({'subject_id': [101,101,101,101,202,202,202],
                   'test_date':['5/7/2013 09:27:00 AM','09/08/2013 11:21:00 AM','06/06/2014 08:00:00 AM','06/06/2014 05:00:00 AM','12/11/2011 10:00:00 AM','13/10/2012 12:00:00 AM','13/12/2012 11:45:00 AM']})

我想将df2 的形状更改为df1。所谓形状,是指列名。

例如:我想让df2 在列名方面看起来与df1 完全相同,但保留 df2 的值。

我尝试了以下

df2.rename(columns={'subject_id':'person_id', 'test_date':'login_date'}, inplace=True)
final_columns = df1.columns
previous_columns = df2.columns.tolist()
mapping = {previous_columns[i]: final_columns[i] for i in range(2)}
df2.rename(mapping, inplace=True)
final_df = df1.append(df2)

我希望我的输出如下所示

【问题讨论】:

    标签: python python-3.x pandas dataframe numpy


    【解决方案1】:

    首先在两个df中分配列

    df1['DATA FROM']='df1'
    df2['DATA FROM']='df2'
    

    最后:

    通过append()+rename():

    df1.append(df2.rename(columns={'subject_id':'person_id','test_date':'login_date'}))
    

    通过concat()+rename():

    pd.concat([df1,df2.rename(columns={'subject_id':'person_id','test_date':'login_date'})])
    

    输出:

      person_id person_type  test_id              login_date   DATA FROM
    0        101           A      1.0    5/7/2013 09:27:00 AM       df1
    1        101           A      2.0  09/08/2013 11:21:00 AM       df1
    2        101           B      3.0  06/06/2014 08:00:00 AM       df1
    3        101           C      3.0  06/06/2014 05:00:00 AM       df1
    4        202           D      4.0  12/11/2011 10:00:00 AM       df1
    5        202           B      4.0  13/10/2012 12:00:00 AM       df1
    6        202           A      5.0  13/12/2012 11:45:00 AM       df1
    0        101         NaN      NaN    5/7/2013 09:27:00 AM       df2
    1        101         NaN      NaN  09/08/2013 11:21:00 AM       df2
    2        101         NaN      NaN  06/06/2014 08:00:00 AM       df2
    3        101         NaN      NaN  06/06/2014 05:00:00 AM       df2
    4        202         NaN      NaN  12/11/2011 10:00:00 AM       df2
    5        202         NaN      NaN  13/10/2012 12:00:00 AM       df2
    6        202         NaN      NaN  13/12/2012 11:45:00 AM       df2
    

    【讨论】:

    • 一个简单的问题。我遵循上述解决方案。但是当我尝试执行此地图操作df_final["person_id"] = df_final['person_id'].map(person_identifier) 后,我收到以下错误ValueError: cannot reindex from a duplicate axis
    • 这在追加操作之前没有发生。此外,我验证person_identifier 中没有重复项。但是,是的,df_final['person_id'] 有重复项(但 tat 无关紧要并且被排除在外)。你知道可能是什么问题
    • @TheGreat 由于person_identifier 中的重复值,您会收到此错误
    • @TheGreat 然后尝试合并而不是映射或尝试df_final['person_id'].map(person_identifier.drop_duplicates()) 如果再次遇到相同的错误,那么您必须合并而不是映射值...合并也会给您相同的结果您从映射中获得(除了由于重复键您将获得更多行,但您也可以删除它)
    【解决方案2】:

    尝试使用pd.concat

    import pandas as pd
    
    pd.concat([
        df1.assign(Data_From="df1"),
        df2.assign(Data_From="df2") \
            .rename(columns={"subject_id": "person_id", "test_date": "login_date"})
    ])
    

       person_id person_type  test_id              login_date Data_From
    0        101           A      1.0    5/7/2013 09:27:00 AM       df1
    1        101           A      2.0  09/08/2013 11:21:00 AM       df1
    2        101           B      3.0  06/06/2014 08:00:00 AM       df1
    3        101           C      3.0  06/06/2014 05:00:00 AM       df1
    4        202           D      4.0  12/11/2011 10:00:00 AM       df1
    5        202           B      4.0  13/10/2012 12:00:00 AM       df1
    6        202           A      5.0  13/12/2012 11:45:00 AM       df1
    0        101         NaN      NaN    5/7/2013 09:27:00 AM       df2
    1        101         NaN      NaN  09/08/2013 11:21:00 AM       df2
    2        101         NaN      NaN  06/06/2014 08:00:00 AM       df2
    3        101         NaN      NaN  06/06/2014 05:00:00 AM       df2
    4        202         NaN      NaN  12/11/2011 10:00:00 AM       df2
    5        202         NaN      NaN  13/10/2012 12:00:00 AM       df2
    6        202         NaN      NaN  13/12/2012 11:45:00 AM       df2
    

    【讨论】:

      【解决方案3】:

      concatkeys 参数一起使用。

      df3 = pd.concat([df1,df2.rename(columns=
                            {'subject_id' : 'person_id',
                            'test_date' : 'login_date'})],
                   join='outer',
                   keys=['df1','df2'])
      

      然后使用.loc 切片您的df。

      print(df3.loc['df1'])
      
         person_id person_type  test_id              login_date
      0        101           A      1.0    5/7/2013 09:27:00 AM
      1        101           A      2.0  09/08/2013 11:21:00 AM
      2        101           B      3.0  06/06/2014 08:00:00 AM
      3        101           C      3.0  06/06/2014 05:00:00 AM
      4        202           D      4.0  12/11/2011 10:00:00 AM
      5        202           B      4.0  13/10/2012 12:00:00 AM
      6        202           A      5.0  13/12/2012 11:45:00 AM
      

      打印(df3)

             person_id person_type  test_id              login_date
      df1 0        101           A      1.0    5/7/2013 09:27:00 AM
          1        101           A      2.0  09/08/2013 11:21:00 AM
          2        101           B      3.0  06/06/2014 08:00:00 AM
          3        101           C      3.0  06/06/2014 05:00:00 AM
          4        202           D      4.0  12/11/2011 10:00:00 AM
          5        202           B      4.0  13/10/2012 12:00:00 AM
          6        202           A      5.0  13/12/2012 11:45:00 AM
      df2 0        101         NaN      NaN    5/7/2013 09:27:00 AM
          1        101         NaN      NaN  09/08/2013 11:21:00 AM
          2        101         NaN      NaN  06/06/2014 08:00:00 AM
          3        101         NaN      NaN  06/06/2014 05:00:00 AM
          4        202         NaN      NaN  12/11/2011 10:00:00 AM
          5        202         NaN      NaN  13/10/2012 12:00:00 AM
          6        202         NaN      NaN  13/12/2012 11:45:00 AM
      

      【讨论】:

        猜你喜欢
        • 2014-01-03
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2019-10-22
        • 2013-12-20
        • 1970-01-01
        • 2018-01-08
        • 2018-10-24
        相关资源
        最近更新 更多