删除 pandas 数据框中具有混合数据类型的所有行，这些数据类型在多列中包含特定字符串答案

【问题标题】：remove all rows in pandas dataframe with mixed data types that contain a specific string in multiple columns删除 pandas 数据框中具有混合数据类型的所有行，这些数据类型在多列中包含特定字符串
【发布时间】：2018-02-23 18:58:56
【问题描述】：

如果一行在任何列中包含“9999-Don't Know”，我如何删除数据框中的所有行？

我已经能够找到解决方案，在整个数据框中根据值格式（字符串、数字等）删除行，或根据特定列中的值删除行，或从具有通过使用它们的名称来创建几列。

This 是我找到的最接近的东西，但这个解决方案对我不起作用，因为由于数量庞大（76 列以上），我无法输入所有列名。

下面是一个示例数据集

pd.DataFrame.from_items([('RespondentId', ['1ghi3g','335hduu','4vlsiu4','5nnvkkt','634deds','7kjng']), ('Satisfaction - Timing', ['9-Excellent','9-Excellent','9999-Don\'t Know','8-Very Good','1-Very Unsatisfied','9999-Don\'t Know']),('Response Speed - Time',['9999-Don\'t Know','9999-Don\'t Know','9-Excellent','9-Excellent','9-Excellent','9-Excellent'])])

删除包含 '9999-Don't Know' 的 4 行后，输出应如下所示，以便我可以使用清理后的数据编写一个新的 Excel 文件。

pd.DataFrame.from_items([('RespondentId', ['5nnvkkt','634deds']), ('Satisfaction - Timing', ['8-Very Good','1-Very Unsatisfied']),('Response Speed - Time',['9-Excellent','9-Excellent'])])

【问题讨论】：

标签： python pandas

【解决方案1】：

使用

In [677]: df[~(df == "9999-Don't Know").any(axis=1)]
Out[677]:
  RespondentId Satisfaction - Timing Response Speed - Time
3      5nnvkkt           8-Very Good           9-Excellent
4      634deds    1-Very Unsatisfied           9-Excellent

或者

In [683]: df[(df != "9999-Don't Know").all(axis=1)]
Out[683]:
  RespondentId Satisfaction - Timing Response Speed - Time
3      5nnvkkt           8-Very Good           9-Excellent
4      634deds    1-Very Unsatisfied           9-Excellent

同

In [686]: df[~df.eq("9999-Don't Know").any(axis=1)]
Out[686]:
  RespondentId Satisfaction - Timing Response Speed - Time
3      5nnvkkt           8-Very Good           9-Excellent
4      634deds    1-Very Unsatisfied           9-Excellent

或者

In [687]: df[df.ne("9999-Don't Know").all(axis=1)]
Out[687]:
  RespondentId Satisfaction - Timing Response Speed - Time
3      5nnvkkt           8-Very Good           9-Excellent
4      634deds    1-Very Unsatisfied           9-Excellent

混合列类型见@PiR的评论df.astype(object)

In [695]: df[df.astype(object).ne("9999-Don't Know").all(axis=1)]
Out[695]:
  RespondentId Satisfaction - Timing Response Speed - Time
3      5nnvkkt           8-Very Good           9-Excellent
4      634deds    1-Very Unsatisfied           9-Excellent

【讨论】：

如果你混合了dtype...使用df.astype(object).ne("9999-Don't Know").all(axis=1)
@John Galt。感谢您的解决方案，但没有奏效。我得到了 TypeError: Could not compare ["9999-Don't Know"] 与块值。当我尝试计算包含此字符串的行数时，我遇到了同样的错误。整个问题可以在here 找到
我正在尝试使用替换的解决方案，但这非常快。 +1
也许在替换添加regex
啊，我没有看到这个早先的问题，就像@piRSquared 提到的那样，将dtype 转换为df.astype(object).. 然后点击它！