【问题标题】:Show differences at row level between columns of 2 dataframes Pandas在行级别显示 2 个数据框 Pandas 的列之间的差异
【发布时间】:2020-07-30 08:53:23
【问题描述】:

我有 2 个包含姓名和一些人口统计信息的数据框,由于每月的变化,数据框并不相同。

我想创建另一个 df 来仅显示他们的 COUNTRY 或 JOBCODE 或 MANAGERNAME 列中发生更改的人的姓名,并显示这些更改的类型。

到目前为止,已经尝试了以下代码,并且能够检测到公共行的 2 个数据框中国家列的变化。

但我不太确定如何捕捉 MOVEMENT 列中的运动。感谢任何形式的帮助。

#Merge first
dfmerge = pd.merge(df1, df2, how ='inner', on ='EMAIL')

#create function to get COUNTRY_CHANGE column
def change_in(dfmerge):
    if dfmerge['COUNTRY_x'] != dfmerge['COUNTRY_y']:
        return 'YES'
    else:
        return 'NO'
dfmerge['COUNTRYCHANGE'] = dfmerge.apply(change_in, axis = 1) 

数据框 1

NAME           EMAIL                COUNTRY      JOBCODE      MANAGERNAME
Jason Kelly    jasonkelly@123.com   USA          1221         Jon Gilman  
Jon Gilman     jongilman@123.com    CANADA       1222         Cindy Lee 
Jessica Lang   jessicalang@123.com  AUSTRALIA    1221         Esther Donato
Bob Wilder     bobwilder@123.com    ROMANIA      1355         Mike Lens 
Samir Bala     samirbala@123.com    CANADA       1221         Ricky Easton

数据框 2

NAME           EMAIL                COUNTRY      JOBCODE      MANAGERNAME
Jason Kelly    jasonkelly@123.com   VIETNAM      1221         Jon Gilman  
Jon Gilman     jongilman@123.com    CANADA       4464         Sheldon Tracey 
Jessica Lang   jessicalang@123.com  AUSTRALIA    2224         Esther Donato
Bob Wilder     bobwilder@123.com    ROMANIA      1355         Emilia Tanner 

期望的输出

EMAIL                COUNTRY_CHANGE COUNTRY_MOVEMENT     JOBCODE_CHANGE JOBCODE_MOVEMENT  MGR_CHANGE MGR_MOVEMENT
jasonkelly@123.com   YES            FROM USA TO VIETNAM  NO             NO                NO         NO
jongilman@123.com    NO             NO                   YES            FROM 1222 to 4464 YES        FROM Cindy Lee to Sheldon Tracey 
jessicalang@123.com  NO             NO                   YES            FROM 1221 to 2224 NO         NO
bobwilder@123.com    NO             NO                   NO             NO                YES        FROM Mike Lens to Emilia Tanner

【问题讨论】:

    标签: python-3.x pandas dataframe join merge


    【解决方案1】:

    pandas 中没有直接的功能可以提供帮助,但我们可以利用以下合并功能。我们正在合并数据框并为合并列提供后缀,然后通过此代码报告它们的差异。

    # Assuming df1 and df2 are input data frames in your example.
    df3 = pd.merge(df1, df2, on=['name', 'email'], suffixes=['past', 'present'])
    
    dfans = pd.DataFrame() # this is the final output data frame
    for column in df1.columns:
        if not (column + 'present' in df3.columns or column + 'past' in df3.columns):
            # Here we handle those columns which will not be merged like name and email.
            dfans.loc[:, column] = df1.loc[:, column]  # filling name and email as it is
        else:
            # string manipulation to name columns correctly in output
            newColumn1 = '{}_CHANGE'.format(column)
            newColumn2 = '{}_MOVEMENT'.format(column)
            past, present = "{}past".format(column), "{}present".format(column)
            
            # creating the output based on input
            dfans.loc[:, newColumn1] = (df3[past] == df3[present]).map(lambda x: "YES" if x != 1 else "NO")
            dfans.loc[:, newColumn2] = ["FROM {} TO {}".format(x, y) if x != y else "NO" for x, y in
                                        zip(df3[past], df3[present])]
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2020-06-16
      • 1970-01-01
      • 2021-01-14
      • 2020-03-30
      • 1970-01-01
      • 2018-03-29
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多