如果它们的 columnS 值存在于其他数据框 df2 中，则从数据框 df1 中删除行答案

【问题标题】：Remove rows from dataframe df1 if their columnS valueS exist in other dataframe df2如果它们的 columnS 值存在于其他数据框 df2 中，则从数据框 df1 中删除行
【发布时间】：2020-10-13 12:22:40
【问题描述】：

我试过这个：

res = df1[~(getattr(df1, 'A').isin(getattr(df2, 'A')) & getattr(df1, 'C').isin(getattr(df2, 'C')))]

它可以工作但是在这个例子中列的列表是可变的 columns = ['A', 'C'] 我如何循环它以根据值动态获取上述表达式列表“列”

exp：df1：

       A      B  C   D
0     oo    one  0   0
1    bar   one1  1   2
2    foo   two2  2   4
3    bar   one1  3   6
4    foo    two  4   8
5    bar    two  5  10
6    foo    one  6  12
7  fowwo  three  7  14

df2:

       A      B  C   D
0     oo    one  0   0
2    foo   two2  2   4
3    bar   one1  3   6
4    foo    two  4   8
5    bar    two  5  10
6    foo    one  6  12
7  fowwo  three  7  14

回复：

     A     B  C  D
1  bar  one1  1  2

【问题讨论】：

你能解释一下你是怎么得到输出bar one1的吗，它似乎也出现在两个数据框中？
是的，这是正确的，只是我将列列表从 ['A','B'] 更改为 ['A', 'C'] 感谢您提及

标签： pandas dataframe duplicates compare multiple-columns

【解决方案1】：

用途：

column_list = ["A","C"]
df1[(~pd.concat((getattr(df1, col).isin(getattr(df2, col)) for col in column_list), axis=1 )).any(1)]

输出：

    A   B       C   D
1   bar one1    1   2

编辑

你在cmets中解释的新情况可以用merge解决。

数据框：

df3= pd.DataFrame({'A': '1010994595 1017165396 1020896102 1028915753 1028915753 1030811227 1033837508 1047224448 1047559040 1053827106 1094815936 1113339076 1115345471 1121416375 1122392586 1122981502 1132224809 '.split(), 'B': '99203 99232 99233 99231 99291 99291 99232 99232 99242 99232 99244 G0425 99213 99203 99606 99243 99214'.split(), 'C': np.arange(17), 'D': np.arange(17) * 2})
df4= pd.DataFrame({'A': '1115345471 1113339076 1020896102 1047224448 1053827106 1121416375 1122392586 1028915753 1132224809 1030811227 1094815936 1033837508 1047559040 1122981502 1028915753 1030811227 1017165396 '.split(), 'B': '99213 G0425 99291 99232 99291 99243 99606 99291 99214 99291 99244 99233 99242 99243 99291 99291 99232 '.split(), 'C': np.arange(17), 'D': np.arange(17) * 2})

从 df4 中选择不在 df3 中的行的代码（对于 column_list 中的列）：

list_col = ["A","B"]
df4[df4.merge(df3.drop_duplicates(), on=list_col, how='left', indicator=True)["_merge"] == "left_only"]

输出：

    A           B       C   D
2   1020896102  99291   2   4
4   1053827106  99291   4   8
5   1121416375  99243   5   10
11  1033837508  99233   11  22

如果要重置新表的索引，请在末尾添加.reset_index(drop=True)

【讨论】：

我需要将 df1 和 df2 之间的比较建立在特定列上，而不是所有列都不会给出相同的结果！
进口大熊猫作为PD进口numpy的作为NP DF3 = pd.DataFrame（{ 'A'： '1010994595 1017165396 1020896102 1028915753 1028915753 1030811227 1033837508 1047224448 1047559040 1053827106 1094815936 1113339076 1115345471 1121416375 1122392586 1122981502 1132224809' .split（） , 'B': '99203 99232 99233 99231 99291 99291 99232 99232 99242 99232 99244 G0425 99213 99203 99606 99243 99214'.split(), 'C'): np.arange * 2})
DF4 = pd.DataFrame（{ 'A'： '1115345471 1113339076 1020896102 1047224448 1053827106 1121416375 1122392586 1028915753 1132224809 1030811227 1094815936 1033837508 1047559040 1122981502 1028915753 1030811227 1017165396' .split（）， 'B'：“99213 G0425 99291 99232 99291 99243 99606 99291 99214 99291 99244 99233 99242 99243 99291 99291 99232 '.split(), 'C': np.arange(17), 'D': np.arange(17)
df4[(~pd.concat((getattr(df4, col).isin(getattr(df3, col)) for col in ['A', 'B']), axis=1 )).any(1)]
对于这个例子，它没有给出正确的输出（很抱歉格式我无法将它重新格式化为这个注释中的代码）

【解决方案2】：

答案是：

columns = ['A', 'B']
common_data_between_df1_and_df2_relative_to_columns = df1.merge(df2, on=columns , right_index=True)
res = df1[~(df1.index.isin(common_data_between_df1_and_df2 .index))].dropna()

【讨论】：