使用 Numpy.where 从基于另一个数组的数组中获取 Max/Min答案

【问题标题】：Get Max/Min from array based on another array using Numpy.where使用 Numpy.where 从基于另一个数组的数组中获取 Max/Min
【发布时间】：2017-02-09 11:18:33
【问题描述】：

从这里开始：

import numpy as np
x = np.array([0,   2,  8,  9,  4,    1, 12,  4, 33, 11,    5,  3 ])
y = np.array(['', '', '', '', '', 'yo', '', '', '', '', 'yo', '' ])
i = np.array([0,   1,  2,  3,  4,    5,  6,  7,  8,  9,   10, 11 ])
print np.amax(x[:3] )         
print np.amin(x[:3] )

尝试使用numpy.where 获取前三个项目的最大值或最小值。因此，本质上是尝试使用数组中的“索引” np.where。如果有更高效的方法可以做到这一点，请展示。

对此进行了尝试：

np.where(y == "yo", np.amax(x[:3] ) ,"")

结果（为什么它返回一个字符串？）：

array(['', '', '', '', '', '8', '', '', '', '', '8', ''], 
      dtype='|S21')

想要：

 array(['', '', '', '', '', 9, '', '', '', '', 33, ''], 
      dtype='|S21')

【问题讨论】：

你不能在 numpy.array 中混合 str 和 int。见dtype
@Lucas，是的，你可以。 dtype =“对象”。但这不是问题

标签： python arrays numpy

【解决方案1】：

首先看一下where 的简单版本，它可以找到索引：

In [266]: np.where(y=='yo')
Out[266]: (array([ 5, 10], dtype=int32),)

显然你想要y 的所有值，但是用来自x 的一些值替换yo：

In [267]: np.where(y=='yo',x,y)
Out[267]: 
array(['', '', '', '', '', '1', '', '', '', '', '5', ''], 
      dtype='<U11')

y是字符串类型，由于''不能转为数字，所以数字转为字符串。

现在如果 y 是对象 dtype：

In [268]: y = np.array(['', '', '', '', '', 'yo', '', '', '', '', 'yo', '' ],object)
In [269]: np.where(y=='yo')
Out[269]: (array([ 5, 10], dtype=int32),)
In [270]: np.where(y=='yo',x,y)
Out[270]: array(['', '', '', '', '', 1, '', '', '', '', 5, ''], dtype=object)

替换也是对象 dtype，可以混合使用数字和字符串。

在此使用中，所有 3 个术语具有相同的长度。在您的使用中，x 和 y 被替换为标量

In [271]: np.max(x[:3])
Out[271]: 8
In [272]: np.where(y=='yo',8, '')
Out[272]: 
array(['', '', '', '', '', '8', '', '', '', '', '8', ''], 
      dtype='<U11')
In [273]: np.where(y=='yo',8, y)
Out[273]: array(['', '', '', '', '', 8, '', '', '', '', 8, ''], dtype=object)

要插入 9 和 33，您已经找到了收集前 3 个项目的最大值的方法，即运行或滚动最大值。 where 本身无济于事。

accumulate 近似于此（这是cumsum 的“最大”版本）

In [276]: xm=np.maximum.accumulate(x)
In [277]: xm
Out[277]: array([ 0,  2,  8,  9,  9,  9, 12, 12, 33, 33, 33, 33], dtype=int32)
In [278]: np.where(y=='yo',xm, y)
Out[278]: array(['', '', '', '', '', 9, '', '', '', '', 33, ''], dtype=object)

xm 不是前面三个值的最大值，而是前面所有值的最大值。在这种情况下，这是相同的，但通常不会。对于这个x，最后一个值是不同的

这是获得前 3 个最大值的一种方法，诚然有点粗略（通过列表理解）：

In [305]: x1=np.concatenate(([0,0],x))
In [306]: xm = [max(x1[i:i+3]) for i in range(0,len(x1))][:len(x)]
In [307]: xm
Out[307]: [0, 2, 8, 9, 9, 9, 12, 12, 33, 33, 33, 11]
In [308]: np.where(y=='yo',xm, y)
Out[308]: array(['', '', '', '', '', 9, '', '', '', '', 33, ''], dtype=object)

带有as_strided的滑动窗口（改编自Numpy: Matrix Array Shift / Insert by Index）

In [317]: xm=np.lib.stride_tricks.as_strided(x1[::-1],shape=(3,12),strides=(-4,-4))
In [318]: xm
Out[318]: 
array([[ 3,  5, 11, 33,  4, 12,  1,  4,  9,  8,  2,  0],
       [ 5, 11, 33,  4, 12,  1,  4,  9,  8,  2,  0,  0],
       [11, 33,  4, 12,  1,  4,  9,  8,  2,  0,  0,  0]])
In [319]: xm.max(axis=0)
Out[319]: array([11, 33, 33, 33, 12, 12,  9,  9,  9,  8,  2,  0])
In [320]: xm = xm.max(axis=0)[::-1]
In [321]: xm
Out[321]: array([ 0,  2,  8,  9,  9,  9, 12, 12, 33, 33, 33, 11])

仅使用 Paul Panzer 的想法yo：

In [29]: idx=np.where(y=='yo')
In [30]: idx
Out[30]: (array([ 5, 10], dtype=int32),)

In [32]: xm = [max(x[i-3:i]) for i in idx[0]]
In [33]: xm
Out[33]: [9, 33]
In [34]: y[idx]=xm
In [35]: y
Out[35]: array(['', '', '', '', '', 9, '', '', '', '', 33, ''], dtype=object)

如果yo 可能出现在前 3 个元素中，我们需要将xm 细化为：

xm = [max(x[max(i-3,0):i+1]) if i>0 else x[i] for i in idx[0]]

否则我们会因尝试获取max([])而出错。

【讨论】：

MSeifert 的“调整”scipy.ndimage.maximum_filter 有什么问题？
它导入ndimage :) 我也可以在stackoverflow.com/questions/42036229/… 中调整我的as_strided 答案
没有得到太多赞赏，是吗？人们想要简单的答案！无论如何，我已经给了它一个怜悯的赞成票 ;-)
（赞成澄清一些问题，）如果“yo”在 1500 个样本中随机出现 1 或 2 次。在某些样本中根本没有。似乎提出的解决方案使用了大量额外的计算周期。
@Merlin 这是一个很好的观察！您可以使用where 的一种参数形式来获得一个仅包含少数数字的索引数组（每个“哟”一个），然后您可以使用它们直接提取x 和@987654358 中的三元组@你想最大化。概念上简单且非常高效。

【解决方案2】：

恐怕你不能拥有“想要的”项目，因为你不能在字符串 dtype 的数组中拥有数字。 where 以您使用的形式将其最后两个参数“混合”到一个数组中。为此，它必须选择一个 dtype。因为

>>> np.can_cast(str, int)
False
>>> np.can_cast(int, str)
True

所以 str 是两个参数的 dtypes/types 之一，可以容纳来自两个参数的值。

除了数据类型，你可能想看看scipy.ndimage.maximum_filter：

>>> scipy.ndimage.maximum_filter(x, 3)
array([ 2,  8,  9,  9,  9, 12, 12, 33, 33, 33, 11,  5])

您可能需要修正偏移量以满足您的需求。

【讨论】：

【解决方案3】：

不确定我是否理解您想要的，但这是否有帮助：

x = np.sort(x)
sel = np.where(y=="yo")[0]
y[sel] = x[-len(sel):]

？

【讨论】：