Numpy 索引将 1 设置为最大值，将零设置为所有其他值答案

【问题标题】：Numpy indexing set 1 to max value and zero's to all othersNumpy 索引将 1 设置为最大值，将零设置为所有其他值
【发布时间】：2017-05-23 19:26:32
【问题描述】：

我想我误解了 numpy 中的索引。

我有一个形状为 (dim_x, dim_y, dim_z) 的 3D-numpy 数组，我想找到沿第三个轴 (dim_z) 的最大值，并将其值设置为 1 并将所有其他值设置为零。

问题是我最终在同一行中有几个 1，即使值不同。

代码如下：

>>> test = np.random.rand(2,3,2)
>>> test
array([[[ 0.13110146,  0.07138861],
        [ 0.84444158,  0.35296986],
        [ 0.97414498,  0.63728852]],

       [[ 0.61301975,  0.02313646],
        [ 0.14251848,  0.91090492],
        [ 0.14217992,  0.41549218]]])

>>> result = np.zeros_like(test)
>>> result[:test.shape[0], np.arange(test.shape[1]), np.argmax(test, axis=2)]=1
>>> result
array([[[ 1.,  0.],
        [ 1.,  1.],
        [ 1.,  1.]],

       [[ 1.,  0.],
        [ 1.,  1.],
        [ 1.,  1.]]])

我希望以 :

结尾

array([[[ 1., 0.],
        [ 1., 0.],
        [ 1., 0.]],

       [[ 1., 0.],
        [ 0., 1.],
        [ 0., 1.]]])

我可能在这里遗漏了一些东西。据我了解，0:dim_x, np.arange(dim_y) 返回dim_y 元组中的dim_x，np.argmax(test, axis=dim_z) 的形状为(dim_x, dim_y)，因此如果索引的形式为[x, y, z]，则不应该出现一对[x, y]两次。

谁能解释我哪里错了？提前致谢。

【问题讨论】：

标签： python numpy multidimensional-array

【解决方案1】：

我们在寻找什么

我们得到沿最后一个轴的 argmax 索引 -

idx = np.argmax(test, axis=2)

对于给定的样本数据，我们有idx：

array([[0, 0, 0],
       [0, 1, 1]])

现在，idx 覆盖第一个和第二个轴，同时获取那些 argmax 索引。

要在输出中分配相应的轴，我们需要为前两个轴创建范围数组，覆盖沿这些轴的长度，并根据idx 的形状对齐。现在，idx 是一个2D 形状数组(m,n)，其中m = test.shape[0] 和n = test.shape[1]。

因此，分配到前两个输出轴的范围数组必须是 -

X = np.arange(test.shape[0])[:,None]
Y = np.arange(test.shape[1])

注意，需要将第一个范围数组扩展到 2D 以使其与 idx 的行对齐，Y 将与 idx 的列对齐 -

In [239]: X
Out[239]: 
array([[0],
       [1]])

In [240]: Y
Out[240]: array([0, 1, 2])

示意图-

idx :
    Y array
    --------->
    x x x | X array
    x x x |
          v

原代码中的错误

你的代码是 -

result[:test.shape[0], np.arange(test.shape[1]), ..

这本质上是：

result[:, np.arange(test.shape[1]), ...

因此，您选择的是沿第一个轴的所有元素，而不是仅选择与 idx 索引对应的相应元素。在该过程中，您选择的元素比分配所需的元素多得多，因此您在 result 数组中看到的 1s 比所需的多得多。

更正

因此，唯一需要的修正是使用范围数组索引到第一个轴，一个可行的解决方案是 -

result[np.arange(test.shape[0])[:,None], np.arange(test.shape[1]), ...

替代方案

或者，使用之前使用X 和Y 创建的范围数组-

result[X,Y,idx] = 1

获取X,Y 的另一种方法是使用np.mgrid -

m,n = test.shape[:2]
X,Y = np.ogrid[:m,:n]

【讨论】：

【解决方案2】：

我认为混合基本（切片）和高级索引存在问题。从数组中选择值比使用这个赋值更容易看到；但它可能导致转置轴。对于这样的问题，最好使用高级索引，如ix_提供的那样

In [24]: test = np.random.rand(2,3,2)
In [25]: idx=np.argmax(test,axis=2)
In [26]: idx
Out[26]: 
array([[1, 0, 1],
       [0, 1, 1]], dtype=int32)

基础和高级：

In [31]: res1 = np.zeros_like(test)
In [32]: res1[:, np.arange(test.shape[1]), idx]=1
In [33]: res1
Out[33]: 
array([[[ 1.,  1.],
        [ 1.,  1.],
        [ 0.,  1.]],

       [[ 1.,  1.],
        [ 1.,  1.],
        [ 0.,  1.]]])

高级：

In [35]: I,J = np.ix_(range(test.shape[0]), range(test.shape[1]))
In [36]: I
Out[36]: 
array([[0],
       [1]])
In [37]: J
Out[37]: array([[0, 1, 2]])
In [38]: res2 = np.zeros_like(test)
In [40]: res2[I, J , idx]=1
In [41]: res2
Out[41]: 
array([[[ 0.,  1.],
        [ 1.,  0.],
        [ 0.,  1.]],

       [[ 1.,  0.],
        [ 0.,  1.],
        [ 0.,  1.]]])

进一步考虑，如果目标是设置或找到 6 个 argmax 值，则将切片用于第 1 维是错误的

In [54]: test
Out[54]: 
array([[[ 0.15288242,  0.36013289],
        [ 0.90794601,  0.15265616],
        [ 0.34014976,  0.53804266]],

       [[ 0.97979479,  0.15898605],
        [ 0.04933804,  0.89804999],
        [ 0.10199319,  0.76170911]]])
In [55]: test[I, J, idx]
Out[55]: 
array([[ 0.36013289,  0.90794601,  0.53804266],
       [ 0.97979479,  0.89804999,  0.76170911]])

In [56]: test[:, J, idx]
Out[56]: 
array([[[ 0.36013289,  0.90794601,  0.53804266],
        [ 0.15288242,  0.15265616,  0.53804266]],

       [[ 0.15898605,  0.04933804,  0.76170911],
        [ 0.97979479,  0.89804999,  0.76170911]]])

通过切片，它从test（或res）中选择一组（2,3,2）值，而不是预期的（2,3）。还有 2 行。

【讨论】：

【解决方案3】：

这是一种更简单的方法：

>>>  test == test.max(axis=2, keepdims=1)
array([[[ True, False],
        [ True, False],
        [ True, False]],

       [[ True, False],
        [False,  True],
        [False,  True]]], dtype=bool)

...如果你真的想要它作为浮点 1.0 和 0.0，然后转换它：

>>> (test==test.max(axis=2, keepdims=1)).astype(float)
array([[[ 1.,  0.],
        [ 1.,  0.],
        [ 1.,  0.]],

       [[ 1.,  0.],
        [ 0.,  1.],
        [ 0.,  1.]]])

这是一种方法，每个行列组合只有一个获胜者（即没有平局，如 cmets 中所述）：

rowmesh, colmesh = np.meshgrid(range(test.shape[0]), range(test.shape[1]), indexing='ij')
maxloc = np.argmax(test, axis=2)
flatind = np.ravel_multi_index( [rowmesh, colmesh, maxloc ], test.shape )
result = np.zeros_like(test)
result.flat[flatind] = 1

阅读 hpaulj 的回答后更新：

rowmesh, colmesh = np.ix_(range(test.shape[0]), range(test.shape[1]))

是我的meshgrid 调用的更高效、更 numpythonic 的替代方法（其余代码保持不变）

为什么你的方法失败的问题很难解释，但这是直觉可以开始的一个地方：你的切片方法说“所有行，乘以所有列，乘以特定的层序列” .该切片总共有多少个元素？相比之下，您实际上想要将多少个元素设置为 1？查看您尝试分配给的切片的相应 test 值时，查看您获得的值可能很有启发性：

>>> test[:, :, maxloc].shape
(2, 3, 2, 3)   # oops!  it's because maxloc itself is 2x3

>>> test[:, :, maxloc]
array([[[[ 0.13110146,  0.13110146,  0.13110146],
         [ 0.13110146,  0.07138861,  0.07138861]],

        [[ 0.84444158,  0.84444158,  0.84444158],
         [ 0.84444158,  0.35296986,  0.35296986]],

        [[ 0.97414498,  0.97414498,  0.97414498],
         [ 0.97414498,  0.63728852,  0.63728852]]],


       [[[ 0.61301975,  0.61301975,  0.61301975],
         [ 0.61301975,  0.02313646,  0.02313646]],

        [[ 0.14251848,  0.14251848,  0.14251848],
         [ 0.14251848,  0.91090492,  0.91090492]],

        [[ 0.14217992,  0.14217992,  0.14217992],
         [ 0.14217992,  0.41549218,  0.41549218]]]])  # note the repetition, because in maxloc you're repeatedly asking for layer 0 sometimes, and sometimes repeatedly for layer 1

【讨论】：

我认为 OP 想要每行一个，以防平局。这就是他们使用np.argmax(test, axis=2) 的原因。有了这个==，我们可以有多个。
感谢您这么快回复。确实，这样更容易。但只是为了理解，为什么这个索引不起作用？因为正如 Divakar 所说，关于文档，我可能会在 max() 中得到几个 true