如何检查一个二维 NumPy 数组中是否包含特定模式的值？答案

【问题标题】：How can I check if one two-dimensional NumPy array contains a specific pattern of values inside it?如何检查一个二维 NumPy 数组中是否包含特定模式的值？
【发布时间】：2015-12-08 11:33:51
【问题描述】：

我有一个大的NumPy.arrayfield_array 和一个较小的数组match_array，它们都由int 值组成。使用以下示例，我如何检查 field_array 的任何 match_array 形段是否包含与 match_array 中的值完全对应的值？

import numpy
raw_field = ( 24,  25,  26,  27,  28,  29,  30,  31,  23, \
              33,  34,  35,  36,  37,  38,  39,  40,  32, \
             -39, -38, -37, -36, -35, -34, -33, -32, -40, \
             -30, -29, -28, -27, -26, -25, -24, -23, -31, \
             -21, -20, -19, -18, -17, -16, -15, -14, -22, \
             -12, -11, -10,  -9,  -8,  -7,  -6,  -5, -13, \
              -3,  -2,  -1,   0,   1,   2,   3,   4,  -4, \
               6,   7,   8,   4,   5,   6,   7,  13,   5, \
              15,  16,  17,   8,   9,  10,  11,  22,  14)
field_array = numpy.array(raw_field, int).reshape(9,9)
match_array = numpy.arange(12).reshape(3,4)

这些示例应该返回 True，因为 match_array 描述的模式与 [6:9,3:7] 对齐。

【问题讨论】：

可能值得指出的是，raw_field 中的 x 和 y 坐标最终会在我注意到的对齐切片中转置...

标签： python arrays numpy pattern-matching

【解决方案1】：

这是使用stride_tricks 模块中的as_strided() 函数的解决方案

import numpy as np
from numpy.lib.stride_tricks import as_strided

# field_array (I modified it to have two matching arrays)
A = np.array([[ 24,  25,  26,  27,  28,  29,  30,  31,  23],
              [ 33,   0,   1,   2,   3,  38,  39,  40,  32],
              [-39,   4,   5,   6,   7, -34, -33, -32, -40],
              [-30,   8,   9,  10,  11, -25, -24, -23, -31],
              [-21, -20, -19, -18, -17, -16, -15, -14, -22],
              [-12, -11, -10,  -9,  -8,  -7,  -6,  -5, -13],
              [ -3,  -2,  -1,   0,   1,   2,   3,   4,  -4],
              [  6,   7,   8,   4,   5,   6,   7,  13,   5],
              [ 15,  16,  17,   8,   9,  10,  11,  22,  14]])

# match_array
B = np.arange(12).reshape(3,4)


# Window view of A
A_w = as_strided(A, shape=(A.shape[0] - B.shape[0] + 1,
                           A.shape[1] - B.shape[1] + 1,
                           B.shape[0], B.shape[1]),
                    strides=2*A.strides).reshape(-1, B.shape[0], B.shape[1])

match = (A_w == B).all(axis=(1,2))

我们还可以找到A中每个匹配块的第一个元素的索引

where = np.where(match)[0]
ind_flat = where + (B.shape[1] - 1)*(np.floor(where/(A.shape[1] - B.shape[1] + 1)).astype(int))
ind = [tuple(row) for row in np.array(np.unravel_index(ind_flat, A.shape)).T]

结果

print(match.any())
True

print(ind)
[(1, 1), (6, 3)]

【讨论】：

【解决方案2】：

为了添加已经发布的答案，我想添加一个考虑到浮点精度导致的错误的答案，以防矩阵来自例如图像处理，其中数字受浮点数的影响操作。

您可以递归较大矩阵的索引，搜索较小的矩阵。然后，您可以提取与较小矩阵的大小匹配的较大矩阵的子矩阵。

如果“大”的子矩阵和“小”矩阵的内容都匹配，则表明匹配。

以下示例显示如何返回找到匹配的大矩阵中位置的第一个索引。如果这是意图，扩展此函数以返回找到匹配的位置数组将是微不足道的。

import numpy as np

def find_submatrix(a, b):
    """ Searches the first instance at which 'b' is a submatrix of 'a', iterates
        rows first. Returns the indexes of a at which 'b' was found, or None if
        'b' is not contained within 'a'"""
    a_rows=a.shape[0]
    a_cols=a.shape[1]

    b_rows=b.shape[0]
    b_cols=b.shape[1]

    row_diff = a_rows - b_rows
    col_diff = a_cols - b_cols

    for idx_row in np.arange(row_diff):
        for idx_col in np.arange(col_diff):
            row_indexes = [idx + idx_row for idx in np.arange(b_rows)]
            col_indexes = [idx + idx_col for idx in np.arange(b_cols)]

            submatrix_indexes = np.ix_(row_indexes, col_indexes)
            a_submatrix = a[submatrix_indexes]

            are_equal = np.allclose(a_submatrix, b)  # allclose is used for floating point numbers, if they
                                                     # are close while comparing, they are considered equal.
                                                     # Useful if your matrices come from operations that produce
                                                     # floating point numbers.
                                                     # You might want to fine tune the parameters to allclose()
            if (are_equal):
                return[idx_col, idx_row]

    return None

使用上面的函数你可以运行下面的例子：

large_mtx = np.array([[1,  2, 3, 7, 4, 2, 6],
                      [4,  5, 6, 2, 1, 3, 11],
                      [10, 4, 2, 1, 3, 7, 6],
                      [4,  2, 1, 3, 7, 6, -3],
                      [5,  6, 2, 1, 3, 11, -1],
                      [0,  0, -1, 5, 4, -1, 2],
                      [10, 4, 2, 1, 3, 7, 6],
                      [10, 4, 2, 1, 3, 7, 6] 
                     ])

# Example 1: An intersection at column 2 and row 1 of large_mtx
small_mtx_1 = np.array([[4, 2], [2,1]])
intersect = find_submatrix(large_mtx, small_mtx_1)
print "Example 1, intersection (col,row): " + str(intersect)

# Example 2: No intersection
small_mtx_2 = np.array([[-14, 2], [2,1]])
intersect = find_submatrix(large_mtx, small_mtx_2)
print "Example 2, intersection (col,row): " + str(intersect)

哪个会打印：

示例 1，交集：[1, 2] 示例 2，交叉点：无

【讨论】：

谢谢这个解决方案！但是近似值对于比较相对较小的整数数组有用吗？（我并不是说您的答案根本没有用，而是在数字小于 200 的情况下。）另外，它在处理效率方面的表现如何？（这不是我要求您重新测试这里的所有解决方案，只是可能将其与类似的 [-looking] 但非常慢的seek_array 解决方案进行比较..？）对于近似值有用的情况，这将是显而易见的选择，但它如何（速度方面）进行精确比较？ :D
@Augusta 足够公平和彻底的问题:)。我认为如果你使用整数，我会去掉 allclose() 并用 np.all(np.equal()) 代替它，这将减少比较次数，因为代码不会处理公差。我不知道它与这里提出的其他解决方案相比如何，但是这个解决方案在效率性能方面肯定可以提高，因为这个算法是基于扫描的，试图匹配整个矩阵。我们可以改为预先索引潜在的比较点，并且仅在这些点比较完整的矩阵。
我刚刚通过删除基于容差并将其更改为整数比较来运行分析。这些是您问题中提供的数据的分析结果：Ran 2000 search cycles in 2.9543030262 [sec] = 0.0014771515131 [sec/cycle]
所以 Divakar 的回答中建议的基于 cv2_based 的方法肯定会更好

【解决方案3】：

方法#1

此方法源自 a solution 到 Implement Matlab's im2col 'sliding' in python，设计为 rearrange sliding blocks from a 2D array into columns。因此，为了解决我们的问题，field_array 中的滑块可以堆叠为列，并与match_array 的列向量版本进行比较。

这是重新排列/堆叠函数的正式定义 -

def im2col(A,BLKSZ):   

    # Parameters
    M,N = A.shape
    col_extent = N - BLKSZ[1] + 1
    row_extent = M - BLKSZ[0] + 1

    # Get Starting block indices
    start_idx = np.arange(BLKSZ[0])[:,None]*N + np.arange(BLKSZ[1])

    # Get offsetted indices across the height and width of input array
    offset_idx = np.arange(row_extent)[:,None]*N + np.arange(col_extent)

    # Get all actual indices & index into input array for final output
    return np.take (A,start_idx.ravel()[:,None] + offset_idx.ravel())

为了解决我们的问题，这里是基于im2col的实现-

# Get sliding blocks of shape same as match_array from field_array into columns
# Then, compare them with a column vector version of match array.
col_match = im2col(field_array,match_array.shape) == match_array.ravel()[:,None]

# Shape of output array that has field_array compared against a sliding match_array
out_shape = np.asarray(field_array.shape) - np.asarray(match_array.shape) + 1

# Now, see if all elements in a column are ONES and reshape to out_shape. 
# Finally, find the position of TRUE indices
R,C = np.where(col_match.all(0).reshape(out_shape))

问题中给定样本的输出将是 -

In [151]: R,C
Out[151]: (array([6]), array([3]))

方法 #2

鉴于 opencv 已经具有计算差异平方的模板匹配功能，您可以使用它并寻找零差异，这将是您的匹配位置。因此，如果您可以访问 cv2（opencv 模块），则实现看起来像这样 -

import cv2
from cv2 import matchTemplate as cv2m

M = cv2m(field_array.astype('uint8'),match_array.astype('uint8'),cv2.TM_SQDIFF)
R,C = np.where(M==0)

给我们 -

In [204]: R,C
Out[204]: (array([6]), array([3]))

基准测试

本部分比较了为解决问题而建议的所有方法的运行时间。本节中列出的各种方法的功劳归功于他们的贡献者。

方法定义-

def seek_array(search_in, search_for, return_coords = False):
    si_x, si_y = search_in.shape
    sf_x, sf_y = search_for.shape
    for y in xrange(si_y-sf_y+1):
        for x in xrange(si_x-sf_x+1):
            if numpy.array_equal(search_for, search_in[x:x+sf_x, y:y+sf_y]):
                return (x,y) if return_coords else True
    return None if return_coords else False

def skimage_based(field_array,match_array):
    windows = view_as_windows(field_array, match_array.shape)
    return (windows == match_array).all(axis=(2,3)).nonzero()

def im2col_based(field_array,match_array):   
    col_match = im2col(field_array,match_array.shape)==match_array.ravel()[:,None]
    out_shape = np.asarray(field_array.shape) - np.asarray(match_array.shape) + 1  
    return np.where(col_match.all(0).reshape(out_shape))

def cv2_based(field_array,match_array):
    M = cv2m(field_array.astype('uint8'),match_array.astype('uint8'),cv2.TM_SQDIFF)
    return np.where(M==0)

运行时测试 -

案例#1（来自问题的样本数据）：

In [11]: field_array
Out[11]: 
array([[ 24,  25,  26,  27,  28,  29,  30,  31,  23],
       [ 33,  34,  35,  36,  37,  38,  39,  40,  32],
       [-39, -38, -37, -36, -35, -34, -33, -32, -40],
       [-30, -29, -28, -27, -26, -25, -24, -23, -31],
       [-21, -20, -19, -18, -17, -16, -15, -14, -22],
       [-12, -11, -10,  -9,  -8,  -7,  -6,  -5, -13],
       [ -3,  -2,  -1,   0,   1,   2,   3,   4,  -4],
       [  6,   7,   8,   4,   5,   6,   7,  13,   5],
       [ 15,  16,  17,   8,   9,  10,  11,  22,  14]])

In [12]: match_array
Out[12]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [13]: %timeit seek_array(field_array, match_array, return_coords = False)
1000 loops, best of 3: 465 µs per loop

In [14]: %timeit skimage_based(field_array,match_array)
10000 loops, best of 3: 97.9 µs per loop

In [15]: %timeit im2col_based(field_array,match_array)
10000 loops, best of 3: 74.3 µs per loop

In [16]: %timeit cv2_based(field_array,match_array)
10000 loops, best of 3: 30 µs per loop

案例#2（更大的随机数据）：

In [17]: field_array = np.random.randint(0,4,(256,256))

In [18]: match_array = field_array[100:116,100:116].copy()

In [19]: %timeit seek_array(field_array, match_array, return_coords = False)
1 loops, best of 3: 400 ms per loop

In [20]: %timeit skimage_based(field_array,match_array)
10 loops, best of 3: 54.3 ms per loop

In [21]: %timeit im2col_based(field_array,match_array)
10 loops, best of 3: 125 ms per loop

In [22]: %timeit cv2_based(field_array,match_array)
100 loops, best of 3: 4.08 ms per loop

【讨论】：

出于好奇，这些方法与 ajcr 的 skimage 解决方案或我发布的一次块方法相比如何？（如果你不知道或想测试它，那完全可以理解。XD）cv2 的解决方案看起来很有趣，但我没有模块，对此一无所知。不过，我会把它放在我的待检清单上！
@Augusta 不用担心，添加了基准测试代码和结果。希望这会有所帮助！
哇，太棒了！我原以为我的普通支票会很慢，但我认为没有其他替代品会更快！感谢您的结果！看起来cv2 毕竟要在该列表上排队削减一点。 ;)
@Augusta 是的，一定要查看cv2 并安装它。我认为它很容易安装，而不是我记得步骤:) 祝你好运，继续学习！

【解决方案4】：

NumPy 没有内置这样的搜索功能，但在 NumPy 中肯定可以做到

只要您的数组不是太庞大*，您就可以使用滚动窗口方法：

from skimage.util import view_as_windows

windows = view_as_windows(field_array, match_array.shape)

view_as_windows 函数纯粹是用 NumPy 编写的，所以如果你没有 skimage，你可以随时从 here 复制代码。

然后看子数组是否出现在较大的数组中，可以这样写：

>>> (windows == match_array).all(axis=(2,3)).any()
True

要查找子数组左上角匹配位置的索引，您可以这样写：

>>> (windows == match_array).all(axis=(2,3)).nonzero()
(array([6]), array([3]))

这种方法也适用于更高维度的数组。

*虽然数组windows 不占用额外内存（仅更改步幅和形状以创建数据的新视图），但写入windows == match_array 会创建一个大小为 (7, 6, 3, 4）这是504字节的内存。如果您使用的是非常大的数组，这种方法可能不可行。

【讨论】：

【解决方案5】：

一种解决方案是一次搜索整个search_in 数组块（“块”是search_for 形切片），直到找到匹配段或search_for 数组用尽.我可以使用它来获取匹配块的坐标，或者通过发送True 或False 来获取bool 的结果return_coords 可选参数...

def seek_array(search_in, search_for, return_coords = False):
    """Searches for a contiguous instance of a 2d array `search_for` within a larger `search_in` 2d array.
If the optional argument return_coords is True, the xy coordinates of the zeroeth value of the first matching segment of search_in will be returned, or None if there is no matching segment.
If return_coords is False, a boolean will be returned.
 * Both arrays must be sent as two-dimensional!"""
    si_x, si_y = search_in.shape
    sf_x, sf_y = search_for.shape

    for y in xrange(si_y-sf_y+1):
        for x in xrange(si_x-sf_x+1):
            if numpy.array_equal(search_for, search_in[x:x+sf_x, y:y+sf_y]):
                return (x,y) if return_coords else True  # don't forget that coordinates are transposed when viewing NumPy arrays!
    return None if return_coords else False

我想知道NumPy 是否还没有可以做同样事情的函数，不过...

【讨论】：

这是对问题的回答，因为代码功能正确地完成了我需要的工作，尽管也许有更有效的方法。一般来说，政策是拒绝回答问题本身，是吗？我的意思是，正是出于这个原因，我们可以选择立即回答我们的问题，我想......
不过，我可能应该选择一种不同的方式来开始回答帖子。我现在就编辑它。
我只是觉得 how-to 问题比 better-way 问题更适合 SO 的格式，因为像“更好”这样的主观词”倾向于以不同意见的形式招来麻烦。 “我怎样才能 - ？”问题只问方法；为此，我提供了一个解决方案，希望其他人提供更多更好的解决方案。就是这样。