如何用零替换numpy数组中的所有值，除了一个特定值？答案

【问题标题】：How to replace all values in a numpy array with zero except one specific value?如何用零替换numpy数组中的所有值，除了一个特定值？
【发布时间】：2018-06-26 18:07:12
【问题描述】：

我有一个具有“n”个唯一值的 2D numpy 数组。我想生成一个二进制矩阵，其中所有值都替换为 “零”，我指定的值被分配为“一”。

例如，我有一个如下数组，我想要所有实例 35 个被分配“一个”：

array([[12, 35, 12, 26],
       [35, 35, 12, 26]])

我正在尝试获得以下输出：

array([[0, 1, 0, 0],
       [1, 1, 0, 0]])

在 Python 中最有效的方法是什么？

【问题讨论】：

使用 numpy.zeros()，但是保存你想要的值的索引。在该索引中替换一个之后。

标签： python numpy matrix multidimensional-array

【解决方案1】：

与所有其他解决方案相比，一种更优雅的方式是使用np.isin()

>>> arr
array([[12, 35, 12, 26],
       [35, 35, 12, 26]])

# get the result as binary matrix
>>> np.isin(arr, 35).astype(np.uint8)
array([[0, 1, 0, 0],
       [1, 1, 0, 0]])

np.isin() 将返回一个带有True 值的布尔掩码，其中给定元素（此处为35）存在于原始数组中，而False 存在于其他地方。

另一个变体是使用 np.asarray() 和数据类型 np.uint8 来转换布尔结果以获得更快的速度：

In [18]: np.asarray(np.isin(x, 35), dtype=np.uint8)
Out[18]: 
array([[0, 1, 0, 0],
       [1, 1, 0, 0]], dtype=uint8)

基准测试

通过将布尔结果显式转换为 uint8，我们可以获得超过 3 倍以上的性能。（感谢@Divakar 指出这一点！）请看下面的时间安排：

# setup (large) input array
In [3]: x = np.arange(25000000)
In [4]: x[0] = 35
In [5]: x[1000000] = 35
In [6]: x[2000000] = 35
In [7]: x[-1] = 35
In [8]: x = x.reshape((5000, 5000))

# timings
In [20]: %timeit np.where(x==35, 1, 0)
427 ms ± 25.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [21]: %timeit (x == 35) + 0
450 ms ± 72 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [22]: %timeit (x == 35).astype(np.uint8)
126 ms ± 37.6 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# the fastest choice to go for!    
In [23]: %timeit np.isin(x, 35).astype(np.uint8)
115 ms ± 2.21 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [24]: %timeit np.asarray(np.isin(x, 35), dtype=np.uint8)
117 ms ± 2.91 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

如果你想要一匹真正的战马，请使用numexpr，如下所示：

In [8]: import numexpr as ne

In [9]: %timeit ne.evaluate("x==35").astype(np.uint8)
23 ms ± 2.69 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

这是大约。比使用基于 NumPy 的计算的最慢方法快 20 倍。

最后，如果 views 没问题，我们可以使用 NumPy 方法本身获得如此疯狂的加速。

In [13]: %timeit (x == 35).view(np.uint8)
20.1 ms ± 93.2 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [15]: %timeit np.isin(x, 35).view(np.uint8)
30.2 ms ± 1.16 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

（再次感谢@Divakar提到these super nice tricks！）

【讨论】：

【解决方案2】：

另一种选择是使用np.where；此解决方案比 @yuji's solution 慢（请参阅下面的时间安排），但如果您想做任何其他事情，但输入 0 和 1（请参阅下面的示例），它会更灵活。

import numpy as np
x = np.array([[12, 35, 12, 26], [35, 35, 12, 26]])
np.where(x==35, 1, 0)

产生

array([[0, 1, 0, 0],
       [1, 1, 0, 0]])

人们可以像这样阅读它

如前所述，您现在拥有很大的灵活性，例如，您可以还可以执行以下操作：

np.where(x==35, np.sqrt(x), x - 3) array([[ 9. , 5.91607978, 9. , 23. ], [ 5.91607978, 5.91607978, 9. , 23. ]])

所以在x 等于35 的任何地方，你都会得到平方根，然后从所有其他值中减去3。

时间安排：

%timeit np.where(x==35, 1, 0) 100000 loops, best of 3: 5.85 µs per loop %timeit (x == 35).astype(int) 100000 loops, best of 3: 3.23 µs per loop %timeit np.isin(x, 35).astype(int) 10000 loops, best of 3: 18.7 µs per loop %timeit (x == 35) + 0 100000 loops, best of 3: 5.85 µs per loop

【讨论】：

【解决方案3】：

我喜欢@yuji approach。非常优雅！

只是为了多样性，这里是另一个需要大量劳动的答案....

>>> from numpy import np
>>> x = np.array([[12, 35, 12, 26],[35, 35, 12, 26]])
>>> x
array([[12, 35, 12, 26],
       [35, 35, 12, 26]])
>>> y=np.zeros(x.shape)
>>> y[np.where(x==35)] = np.ones(len(np.where(x==35)[0]))
>>> y
array([[ 0.,  1.,  0.,  0.],
       [ 1.,  1.,  0.,  0.]])
>>>

【讨论】：

@juanpa.arrivillaga 我同意并编辑我的答案，但通常效果很好

【解决方案4】：

import numpy as np
x = np.array([[12, 35, 12, 26], [35, 35, 12, 26]])
(x == 35) + 0

数组([[0, 1, 0, 0], [1, 1, 0, 0]])

【讨论】：

哦，非常聪明。我继续做了一个小的编辑。这个想法很好，但是，我认为boolean_array + 0 不是转换数据类型的最有效方法。

【解决方案5】：

如果您的数组是一个 numpy 数组，那么您可以在数组上使用 '==' 运算符来返回一个布尔数组。然后使用 astype 功能将其变为 0 和 1。

import numpy as np
my_array = np.array([[12, 35, 12, 26],
                     [35, 35, 12, 26]])

indexed = (my_array == 35).astype(int)

print indexed

【讨论】：

【解决方案6】：

import numpy as np
x = np.array([[12, 35, 12, 26], [35, 35, 12, 26]])
(x == 35).astype(int)

会给你：

array([[0, 1, 0, 0],
       [1, 1, 0, 0]])

numpy 中的 == 运算符执行逐元素比较，当将布尔值转换为整数时，True 编码为 1，False 编码为 0。

【讨论】：

@kmario23 好吧，最快的是(x == 35).astype()，但使用基于8-bit的dtypes之一-stackoverflow.com/a/38988035。因此，它将是 -(x == 35).astype(np.uint8) 之一。
@kmario23 并利用 numexpr 模块用于大型阵列 - import numexpr as ne; ne.evaluate('x==35').astype(np.uint8) 以进一步加速。此外，如果视图正常，我们可以查看那些 - (x == 35).view(np.uint8) 等。
@Divakar: numexpr 确实是迄今为止最快的；感谢这个不错的选择。
@Divakar 感谢您的所有建议！在下面添加了全面的时间安排，包括您建议的方法:)