除了 NumPy 矩阵中的特定列之外，是否有一种有效的方法来获取最大元素的位置？答案

【问题标题】：Is there an efficient way to get the position of the max element except for a specific column in a NumPy matrix?除了 NumPy 矩阵中的特定列之外，是否有一种有效的方法来获取最大元素的位置？
【发布时间】：2020-06-20 12:06:04
【问题描述】：

例如，有一个二维 Numpy 矩阵M：

[[1,10,3],
 [4,15,6]]

除了M[:][1]之外的最大元素是6，它的位置是(1,2)。所以答案是(1,2)。

非常感谢您的帮助！

【问题讨论】：

我认为这完全取决于你的数据结构是什么样的。对于小型数组，例如您的示例，在没有给定列的情况下复制矩阵可能会更快，然后取最大值。对于较大的数组，取列的任一侧的最大值并取列的左侧和右侧的最大值可能会更快。
@Dunes 是的，但是确定目标的最终位置可能有点麻烦，因为我们必须确定最大值在左侧还是右侧，然后计算其最终位置。
@Dunes 谢谢。我根据您的观察添加了答案。
[:] 应该做什么？

标签： python numpy max numpy-ndarray

【解决方案1】：

一种方式：

col = 1
skip_col = np.delete(x, col, axis=1)
row, column = np.unravel_index(skip_col.argmax(), skip_col.shape)
if column >= col:
    column += 1

翻译：

删除列
找到最大参数（argmax 给出一个展平的结果，unravel_index 给出二维数组中的位置）
如果列大于或等于跳过的列，则添加一列

在Dunes comment 之后，我喜欢这个建议。它的行数几乎相同，但不需要副本（如在 np.delete 中）。因此，如果您受内存限制（如在真正的大数据中）：

col = 1
row, column = np.unravel_index(x[:, :col].argmax(), x[:, :col].shape)  # left max, saving a line assuming it's the global max, but less readable
right_max = np.unravel_index(x[:, col+1:].argmax(), x[:, col+1:].shape)
if x[right_max] > x[row, column]:
    row, column = right_max
    column += col

【讨论】：

【解决方案2】：

这是一个利用 nan 函数集的解决方案：

In [180]: arr = np.array([[1,10,3],[4,15,6]])                                   
In [181]: arr1 = arr.astype(float)                                              
In [182]: arr1[:,1]=np.nan                                                      
In [183]: arr1                                                                  
Out[183]: 
array([[ 1., nan,  3.],
       [ 4., nan,  6.]])
In [184]: np.nanargmax(arr1)                                                    
Out[184]: 5
In [185]: np.unravel_index(np.nanargmax(arr1),arr.shape)                        
Out[185]: (1, 2)

在时间上它可能不是最佳的，但可能更容易调试替代方案。

查看np.nanargmax，我发现它只是将np.nan 替换为-np.inf。因此，我们只需将排除列值替换为足够小的整数，这样它们就不会是最大值。

In [188]: arr1=arr.copy()                                                       
In [189]: arr1[:,1] = np.min(arr1)-1                                            
In [190]: arr1                                                                  
Out[190]: 
array([[1, 0, 3],
       [4, 0, 6]])
In [191]: np.argmax(arr1)                                                       
Out[191]: 5
In [192]: np.unravel_index(np.argmax(arr1),arr.shape)                           
Out[192]: (1, 2)

我也可以想象一个使用 np.ma.masked_array 的解决方案，但这往往比速度工具更方便。

【讨论】：

【解决方案3】：

同意Dunes的评论：

对于小型数组，例如您的示例，可能会更快复制矩阵，没有给定的列，然后取最大限度。使用更大的阵列，在任一侧取最大值可能会更快列的最大值，并取左侧和右侧的最大值列。

这里是每个案例的实现，以及一个调度函数。（THRESHOLD_SIZE 的值需要根据实验添加。）

小数组案例

创建删除指定列的数组。计算总体最大值，然后计算它出现的位置。如果列在右侧，则将其添加到列。

大数组情况

它创建包含列最大值的临时一维数组。这些通常（尽管不是在每种情况下）都明显小于二维数组。首先，识别排除列的哪一侧包含最大值，然后识别它是哪一列，最后它是哪一行。这避免了检查每个元素两次的需要。该代码还避免在任何点创建数组的任何二维切片。

THRESHOLD_SIZE = .....


def get_max_position(m, exclude_column):
    return (get_max_position_largearray if m.size > THRESHOLD_SIZE 
            else get_max_position_smallarray)(m, exclude_column)


def get_max_position_smallarray(m, exclude_column):

    mnew = np.delete(m, exclude_column, axis=1)

    row, col = np.argwhere(mnew == np.max(mnew))[0]

    # uses: int(True)=1 and int(False)=0
    return (row, col + (col >= exclude_column))


def get_max_position_largearray(m, exclude_column):

    column_maxima = np.max(m, axis=0)

    l_col_maxima = column_maxima[:exclude_column]
    r_col_maxima = column_maxima[exclude_column + 1:]

    l_max = np.max(l_col_maxima) if l_col_maxima.size else None
    r_max = np.max(r_col_maxima) if r_col_maxima.size else None

    use_left = (True if r_max == None else
                False if l_max == None else
                (l_max > r_max))

    if use_left:
        themax = l_max
        col = np.argwhere(l_col_maxima == themax)[0][0]
    else:
        themax = r_max
        col = exclude_column + 1 + np.argwhere(r_col_maxima == themax)[0][0]

    row = np.argwhere(m[:,col] == themax)[0][0]

    return (row, col)

这是问题中的示例，通过两种方法：

m = np.array([[1,10,3],
              [4,15,6]])

exclude_column = 1

print(get_max_position_largearray(m, exclude_column))
print(get_max_position_smallarray(m, exclude_column))

输出：

(1, 2)
(1, 2)

【讨论】：

【解决方案4】：

你可以这样做：

m = [[1,10,3],
     [4,15,6]]

c = 1 # Choose the column to exclude 

a = max([[n,(k,b)] for k,i in enumerate(m) for b,n in enumerate(i) if b!=c])[1]

print(a)

输出：

(1, 2)

【讨论】：

【解决方案5】：

另一种没有副本的方法，用列表索引列：

import numpy as np

m = np.array([[1, 10, 3], [4, 15, 6]])
exclude_col = 1

# assign nicer names to the shape
rows, cols = m.shape

# generate indices for slicing
inds = list(range(cols))
inds.remove(exclude_col)

# find the maximum in the sliced array
max_ind = np.unravel_index(np.argmax(m[:, inds]), (rows, cols - 1))
# fix the found column index if we exceeded exclude_col
max_ind = (max_ind[0], max_ind[1] if max_ind[1] < exclude_col else max_ind[1] + 1)

最后一行是 Python3.8 赋值表达式的一个很好的候选，所以在 Python3.8+ 中你可以这样写：

max_ind = (max_ind[0], v if (v := max_ind[1]) < exclude_col else v + 1)

编辑：这样的索引可能也会创建一个副本，我还没有测试过，但是这些元素在内存中并不连续。

【讨论】：