与 np.nan 和 isnull() 比较的区别答案

【问题标题】：The difference between comparison to np.nan and isnull()与 np.nan 和 isnull() 比较的区别
【发布时间】：2017-05-11 14:08:34
【问题描述】：

我以为

data[data.agefm.isnull()]

和

data[data.agefm == numpy.nan]

是等价的。但是不，第一个真正返回agefm 为NaN 的行，但第二个返回一个空的DataFrame。感谢省略的值总是等于np.nan，但这似乎是错误的。

agefm 列有 float64 类型：

(Pdb) data.agefm.describe()
count    2079.000000
mean       20.686388
std         5.002383
min        10.000000
25%        17.000000
50%        20.000000
75%        23.000000
max        46.000000
Name: agefm, dtype: float64

请您解释一下，data[data.agefm == np.nan] 到底是什么意思？

【问题讨论】：

认为你应该使用np.isnan。
他们不一样：stackoverflow.com/questions/20320022/…
@Divakar 那么，如果我应该这样做，这是 pandas 中的一个空白还是我犯了一个概念上的错误？
对 pandas 的 isnull 方法不是很熟悉，但是我们使用 np.isnan 来检测带有 NumPy 数组的 NaN。

标签： python pandas numpy

【解决方案1】：

np.nan 不能直接与 np.nan... 相提并论。

np.nan == np.nan

False

虽然

np.isnan(np.nan)

True

也可以

pd.isnull(np.nan)

True

示例
什么都不过滤，因为没有什么等于np.nan

s = pd.Series([1., np.nan, 2.])
s[s != np.nan]

0    1.0
1    NaN
2    2.0
dtype: float64

过滤掉空值

s = pd.Series([1., np.nan, 2.])
s[s.notnull()]

0    1.0
2    2.0
dtype: float64

使用奇怪的比较行为来得到我们想要的东西。如果np.nan != np.nan 是True 那么

s = pd.Series([1., np.nan, 2.])
s[s == s]

0    1.0
2    2.0
dtype: float64

就dropna

s = pd.Series([1., np.nan, 2.])
s.dropna()

0    1.0
2    2.0
dtype: float64

【讨论】：

np.isnan 和 pd.isnull 是否等效？
@sergzach 没有，但很接近。我相信pd.isnull 会检查更多内容是否为空。
@sergzach 是 NaN，而不是 None。您不能将np.isnan 用于对象数组。
你的例子很好，但在实践中可能不使用s[s == s]？这似乎模棱两可。