删除具有给定子字符串值的行答案

【问题标题】：Drop rows with sub-string value given删除具有给定子字符串值的行
【发布时间】：2019-03-27 10:51:17
【问题描述】：

鉴于子字符串存在于特定 col 的行中，从数据帧中删除行。

df:

Parent  Child   score
1stqw   Whoert      0.305125
tWowe   Tasert      0.308132
Worert  Picert      0.315145

子字符串 = [Wor,Tas]

删除包含子字符串的行。

更新 df：

 Parent Child   score
1stqw   Whoert      0.305125

谢谢！！

【问题讨论】：

标签： python python-3.x python-2.7 pandas dataframe

【解决方案1】：

你可以连接然后使用pd.Series.str.contains:

L = ['Wor', 'Tas']

df = df[~(df['Parent'] + df['Child']).str.contains('|'.join(L))]

print(df)

  Parent   Child     score
0  1stqw  Whoert  0.305125

有关效率/性能，请参阅Pandas filtering for multiple substrings in series。

【讨论】：

【解决方案2】：

在DataFrame 的子集中使用str.contains 和apply，然后添加any 以测试每行至少一个True：

cols = ['Parent', 'Child']
mask = df[cols].apply(lambda x: x.str.contains('|'.join(substrings))).any(axis=1)

或通过|（按位或）将布尔掩码链接在一起：

mask = (df['Parent'].str.contains('|'.join(substrings)) | 
        df['Child'].str.contains('|'.join(substrings)))

df = df[~mask]
print (df)
  Parent   Child     score
0  1stqw  Whoert  0.305125

【讨论】：

谢谢老兄！！执行此步骤后我得到了一些重复的行，不知道为什么。
@vijayathithya - 数据中可能有一些重复，尝试df = df.drop_duplicates()，如有必要，仅测试 som 列以用于欺骗df = df.drop_duplicates(subset=['Parent', 'Child'])