如何进行完全外连接，不包括两个熊猫数据框之间的交集？答案

【问题标题】：How to do a full outer join excluding the intersection between two pandas dataframes?如何进行完全外连接，不包括两个熊猫数据框之间的交集？
【发布时间】：2022-01-19 14:13:20
【问题描述】：

我有两个具有相同列标题的数据集，我想删除所有 100% 相同的数据，只保留它们没有完全相同的共同点。我该怎么做呢？

感谢您的宝贵时间！

【问题讨论】：

标签： python pandas join duplicates

【解决方案1】：

要获取除了两个 pandas 数据集的交集之外的所有内容，请尝试以下操作：

# Everything from the first except what is on second
r1 = df1[~df1.isin(df2)]

# Everything from the second except what is on first
r2 = df2[~df2.isin(df1)]

# concatenate and drop NANs
result = pd.concat(
    [r1, r2]
).dropna().reset_index(drop=True)

但有一个警告，当使用布尔掩码进行过滤时，您的 int 值可能会变成浮点数。默认情况下，pandas 用浮点版本的 NAN 替换不需要的 (False) 值，并将整个列转换为浮点数。您可以在下面的示例中看到这种情况。

为避免这种情况，请在创建数据框时显式声明数据类型。

示例

import pandas as pd

df1 = pd.read_csv("./csv1.csv") #, dtype='Int64')
print(f"csv1\n{df1}\n")

df2 = pd.read_csv("./csv2.csv") #, dtype='Int64')
print(f"csv2\n{df2}\n")

# Everything from first except what is on second
r1 = df1[~df1.isin(df2)]
# Everything from second except what is on first
r2 = df2[~df2.isin(df1)]

# concatenate and drop NANs
result = pd.concat(
    [r1, r2]
).dropna().reset_index(drop=True)

print(f"result\n{result}\n")

输入

csv1
   A   B   C
0  1   2   3
1  4   5   6
2  7   8   9

csv2
    A   B   C
0   1   2   3
1   4   5   6
2  10  11  12

输出

result
      A     B     C
0   7.0   8.0   9.0
1  10.0  11.0  12.0

【讨论】：

这可能行得通。本质上，我想摆脱两者的共同点，只留下他们没有共同点的东西。所以也许加入可能是一个更好的答案。
我的错，我误会了。除了十字路口，你什么都想要。我会更新我的答案！
非常感谢 pbsb！我知道必须有办法做到这一点。我会试一试，但看你的解释，这很有意义！
PBSB - 再次感谢您！您的建议非常有效！
不错！我很高兴能帮上忙