如何在特定列中选择具有 NaN 的行？答案

【问题标题】：How to select rows with NaN in particular column?如何在特定列中选择具有 NaN 的行？
【发布时间】：2022-03-29 04:34:06
【问题描述】：

给定这个数据框，如何只选择那些“Col2”等于NaN的行？

df = pd.DataFrame([range(3), [0, np.NaN, 0], [0, 0, np.NaN], range(3), range(3)], columns=["Col1", "Col2", "Col3"])

看起来像：

   0   1   2
0  0   1   2
1  0 NaN   0
2  0   0 NaN
3  0   1   2
4  0   1   2

结果应该是这个：

   0   1   2
1  0 NaN   0

【问题讨论】：

标签： python pandas dataframe

【解决方案1】：

尝试以下方法：

df[df['Col2'].isnull()]

【讨论】：

或者df.loc[df['Col2'].isnull()] 如果你喜欢 .loc
问：如何否定 thi，即“不为空”列中的数据？答：使用.notnull() 运算符。
当多列为空时如何选择df的行？不只是任何一个，但只有当一组列为空时。
@NaveenReddyMarthala 试试这个：df[df['Col1'].isnull() & df['Col2'].isnull()]

【解决方案2】：

@qbzenker 提供了most idiomatic method IMO

这里有几个选择：

In [28]: df.query('Col2 != Col2') # Using the fact that: np.nan != np.nan
Out[28]:
   Col1  Col2  Col3
1     0   NaN   0.0

In [29]: df[np.isnan(df.Col2)]
Out[29]:
   Col1  Col2  Col3
1     0   NaN   0.0

【讨论】：

【解决方案3】：

如果你想选择至少有一个 NaN 值的行，那么你可以在axis=1 上使用isna + any：

df[df.isna().any(axis=1)]

如果要选择具有一定数量 NaN 值的行，则可以在 axis=1 + gt 上使用 isna + sum。例如，以下将获取至少具有 2 个 NaN 值的行：

df[df.isna().sum(axis=1)>1]

如果您想将检查限制在特定列，您可以先选择它们，然后检查：

df[df[['Col1', 'Col2']].isna().any(axis=1)]

如果要选择所有 NaN 值的行，可以在 axis=1 上使用 isna + all：

df[df.isna().all(axis=1)]

如果你想选择没有 NaN 值的行，你可以 notna + all on axis=1:

df[df.notna().all(axis=1)]

这相当于：

df[df['Col1'].notna() & df['Col2'].notna() & df['Col3'].notna()]

如果有很多列，这可能会变得乏味。相反，您可以使用 functools.reduce 链接 & 运算符：

import functools, operator
df[functools.reduce(operator.and_, (df[i].notna() for i in df.columns))]

或numpy.logical_and.reduce:

import numpy as np
df[np.logical_and.reduce([df[i].notna() for i in df.columns])]

如果您正在寻找使用query 过滤某些列中没有NaN 的行，您可以使用engine='python' 参数来实现：

df.query('Col2.notna()', engine='python')

或使用NaN!=NaN 喜欢@MaxU - stop WAR against UA 的事实

df.query('Col2==Col2')

【讨论】：