如何忽略 numpy 数组中的 NaN 数据点并在 Python 中生成规范化数据？答案

【问题标题】：How to disregard the NaN data point in numpy array and generate the normalized data in Python?如何忽略 numpy 数组中的 NaN 数据点并在 Python 中生成规范化数据？
【发布时间】：2016-10-11 12:28:18
【问题描述】：

假设我有一个 numpy 数组，它有一些 float('nan')，我现在不想估算这些数据，我想先对这些数据进行归一化并将 NaN 数据保留在原始空间，有什么办法吗我能做到吗？

之前我在sklearn.Preprocessing 中使用了normalize 函数，但该函数似乎不能将任何包含NaN 的数组作为输入。

【问题讨论】：

你的问题不清楚。你想规范化包含 NaN 的数组并忽略 NaN 吗？
我想忽略 NaN

标签： python numpy scipy scikit-learn

【解决方案1】：

您可以使用numpy.ma.array 函数屏蔽您的数组，然后应用任何numpy 操作：

import numpy as np

a = np.random.rand(10)            # Generate random data.
a = np.where(a > 0.8, np.nan, a)  # Set all data larger than 0.8 to NaN

a = np.ma.array(a, mask=np.isnan(a)) # Use a mask to mark the NaNs

a_norm  = a / np.sum(a) # The sum function ignores the masked values.
a_norm2 = a / np.std(a) # The std function ignores the masked values.

您仍然可以访问您的原始数据：

print a.data

【讨论】：

太好了，然后我怎样才能恢复 NaN 值？
恢复是什么意思？
我想把那些 NaN 值放回数组中。
他们会留在那里。 numpy 操作只是跳过 NaN 值。
太棒了！非常感谢！

【解决方案2】：

您可以使用numpy.nansum 计算范数并忽略nan：

In [54]: x
Out[54]: array([  1.,   2.,  nan,   3.])

这是 nan 被忽略的标准：

In [55]: np.sqrt(np.nansum(np.square(x)))
Out[55]: 3.7416573867739413

y 是标准化数组：

In [56]: y = x / np.sqrt(np.nansum(np.square(x)))

In [57]: y
Out[57]: array([ 0.26726124,  0.53452248,         nan,  0.80178373])

In [58]: np.linalg.norm(y[~np.isnan(y)])
Out[58]: 1.0

【讨论】：

【解决方案3】：

nansum 和 np.ma.array 答案是不错的选择，但是，这些函数不像以下那样常用或明确 (恕我直言)：

import numpy as np
def rms(arr):
    arr = np.array(arr) # Sanitize the input
    np.sqrt(np.mean(np.square(arr[np.isfinite(arr)]))) #root-mean-square

print(rms([np.nan,-1,0,1]))

【讨论】：