【问题标题】:Python - Pandas - finding matches between two data framesPython - Pandas - 查找两个数据帧之间的匹配项
【发布时间】:2021-05-08 05:33:10
【问题描述】:

假设我有 2 个 pandas 数据框,它们共享相同的列名,如下所示:

    name:       dob:       role:
James Franco   1-1-1980    Actor
Cameron Diaz   4-2-1976    Actor
Jim Carey      12-1-1968   Actor
Miley Cyrus    5-23-1987   Actor


    name:       dob:       role:
50 cent       4-6-1984     Singer
lil baby      12-1-1990    Singer
ghostmane     8-10-1989    Singer
Miley Cyrus   5-23-1987    Singer

假设我想识别具有相同姓名和出生日期的个人,并且存在于两个数据框中(因此,具有 2 个不同的角色)。

我该怎么做?

类似于如果所有内容都存在于 1 个数据框中,我做了一个 df.groupby(["name", "dob"]).count())

我希望能够识别这些人,打印出来并计算出现次数。

谢谢

【问题讨论】:

    标签: python python-3.x pandas


    【解决方案1】:
    df2=df.append(df1)#append the two dfs
    dfnew=df2[df2.duplicated(subset=['name:',"dob:"], keep=False)]#keep all duplicated on the columns you wires to check
    

    【讨论】:

    • 嗯,谢谢-这很有帮助,但是当我像您的示例中所示那样实现时,我只是简单地返回了一个数据帧,其中包含两个数据帧的所有结果......所以只是两个数据帧结合在一起。
    【解决方案2】:

    好吧,这只会给你匹配:

    df1.merge(df2, on=["name:","dob:",])

    输出:

             name:       dob: role:_x role:_y
    0  Miley Cyrus  5-23-1987   Actor  Singer
    

    您可以使用外连接来获取所有结果并根据需要过滤它们:

    df1.merge(df2, how="outer", on=["name:","dob:",])

    输出:

              name:       dob: role:_x role:_y
    0  James Franco   1-1-1980   Actor     NaN
    1  Cameron Diaz   4-2-1976   Actor     NaN
    2     Jim Carey  12-1-1968   Actor     NaN
    3   Miley Cyrus  5-23-1987   Actor  Singer
    4       50 cent   4-6-1984     NaN  Singer
    5      lil baby  12-1-1990     NaN  Singer
    6     ghostmane  8-10-1989     NaN  Singer
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2018-07-16
      • 2019-03-16
      • 1970-01-01
      • 1970-01-01
      • 2018-05-08
      • 2018-04-28
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多