【问题标题】:Merging two Irregular data frames in Python在 Python 中合并两个不规则数据框
【发布时间】:2018-06-05 09:14:20
【问题描述】:

我有两个数据框 df1 和 df2

    ID      Range(US)            Count(US)          Mean(US)
0   690      1-3                 266                4.0
1            4-7                 277                NaN
2   354      1-3                 233                2.0
3            4-7                 85                 NaN
4   947      1-3                 156                4.0

    ID   Range(UK)           Count(UK)          Mean(UK)
0   690      1-3                 186                4.0
1            4-7                 25                 NaN
2   354      1-3                 44                 1.0
3   947      1-3                 213                3.0
4            4-7                 33                 NaN

我使用代码合并:
In:df=df1.merge(df2, left_on='deviceid',right_on='deviceid', how='left') df

 ID  Range(US)   Count(US)    Mean(US)   Range(UK)  Count(UK)    Mean(UK)       
 0  690    1-3      266         4.0        1-3        186         4.0
 1         4-7      277         NaN        4-7        25          NaN
 2         4-7      277         NaN        4-7        33          NaN
 3  354    1-3      233         2.0        1-3        44          1.0
 4         4-7      85          NaN        4-7        25          NaN
 5         4-7      85          NaN        4-7        33          NaN
 6  947    1-3      156         4.0        1-3        213         3.0

从上面我们看到,如果某些值不存在,这些值会再次重复

但预期的输出是

   ID  Range(US)   Count(US)  Mean(US)   Range(UK)  Count(UK)    Mean(UK)       
 0  690    1-3      266         4.0        1-3        186         4.0
 1         4-7      277         NaN        4-7        25          NaN
 2  354    1-3      233         2.0        1-3        44          1.0
 3         4-7      85          NaN        Nan        NaN         NaN
 4  947    1-3      156         4.0        1-3        213         3.0
 5         4-7      Nan         Nan        4-7        33          Nan

【问题讨论】:

    标签: python python-3.x pandas dataframe merge


    【解决方案1】:

    首先删除在DataFrames 中替换duplicated ID

    #df1['ID'] = df1['ID'].mask(df['ID'].duplicated(), '') 
    #df2['ID'] = df2['ID'].mask(df['ID'].duplicated(), '') 
    
    print (df1)
        ID Range(US)  Count(US)  Mean(US)
    0  690       1-3        266       4.0
    1  690       4-7        277       NaN
    2  354       1-3        233       2.0
    3  354       4-7         85       NaN
    4  947       1-3        156       4.0
    
    print (df2)
        ID Range(UK)  Count(UK)  Mean(UK)
    0  690       1-3        186       4.0
    1  690       4-7         25       NaN
    2  354       1-3         44       1.0
    3  947       1-3        213       3.0
    4  947       4-7         33       NaN
    

    然后通过外连接将两列合并:

    df = df1.merge(df2, left_on=['ID', 'Range(US)'], right_on=['ID', 'Range(UK)'], how='outer')
    print (df)
        ID Range(US)  Count(US)  Mean(US) Range(UK)  Count(UK)  Mean(UK)
    0  690       1-3      266.0       4.0       1-3      186.0       4.0
    1  690       4-7      277.0       NaN       4-7       25.0       NaN
    2  354       1-3      233.0       2.0       1-3       44.0       1.0
    3  354       4-7       85.0       NaN       NaN        NaN       NaN
    4  947       1-3      156.0       4.0       1-3      213.0       3.0
    5  947       NaN        NaN       NaN       4-7       33.0       NaN
    

    【讨论】:

    • df1['ID'] = df1['ID'].mask(df['ID'].duplicated(), '') 这里的df是什么?
    • @san - 这是错字,需要#df1['ID'] = df1['ID'].mask(df1['ID'].duplicated(), '') #df2['ID'] = df2['ID'].mask(df2['ID'].duplicated(), '')
    • 仍然没有得到预期的输出
    • 如果将outer 更改为leftinner 有帮助吗?
    • 你能解释更多吗?问题出在真实数据上?或者,如果使用数据样本数据得到像我一样的不同输出?
    猜你喜欢
    • 2018-11-13
    • 1970-01-01
    • 2019-07-26
    • 1970-01-01
    • 2021-05-30
    • 2021-05-24
    • 2020-05-07
    • 1970-01-01
    • 2017-12-16
    相关资源
    最近更新 更多