比较数组，找到相同的元素并返回索引答案

【问题标题】：Compare arrays, find same elements and return index比较数组，找到相同的元素并返回索引
【发布时间】：2017-05-26 23:09:56
【问题描述】：

我有两个 numpy 数组（长度不同）

第一个是(n)like：

a = [0, 1, 2, 5, 6, 7]

第二个是(n,3)like：

b = [[0, 1, 3],[8, 3, 9],[9, 8, 4],[0, 4, 5],[1, 7, 3],[1, 5, 7],[2, 3, 7],[4, 2, 6],[5, 4, 6],[5, 6, 7]]

现在我想检查第二个数组的每一列是否包含第一个数组中的一个数字，并尽可能返回该列的索引。

b[0] -> [0, 1, 3] contains 0 and 1 so I need that index (only once)
b[1] -> [8, 3, 9] does not contain any of the numbers from a, so I don't need that index

结果 shell 是一个包含所有这些索引的数组，在这个例子中像：

indexes = [0, 3, 4, 5....]

有没有办法检查？处理速度不是问题！

【问题讨论】：

我们如何在输出indexes 中有2？看起来像是错字，必须改成3。
对不起，你是对的！ - 我会尝试编辑它

标签： python numpy

【解决方案1】：

您可以使用np.in1d 获取匹配掩码。现在，np.in1d 在处理之前将输入展平。因此，我们需要在之后重新整形回2D，然后检查每一行是否有任何匹配项，并获取np.flatnonzero 的行索引。

因此，实现将是 -

np.flatnonzero(np.in1d(b,a).reshape(b.shape).any(1))

具有给定样本的中间和最终输出的样本运行 -

In [143]: a
Out[143]: array([0, 1, 2, 5, 6, 7])

In [144]: b
Out[144]: 
array([[0, 1, 3],
       [8, 3, 9],
       [9, 8, 4],
       [0, 4, 5],
       [1, 7, 3],
       [1, 5, 7],
       [2, 3, 7],
       [4, 2, 6],
       [5, 4, 6],
       [5, 6, 7]])

In [145]: np.in1d(b,a).reshape(b.shape)
Out[145]: 
array([[ True,  True, False],
       [False, False, False],
       [False, False, False],
       [ True, False,  True],
       [ True,  True, False],
       [ True,  True,  True],
       [ True, False,  True],
       [False,  True,  True],
       [ True, False,  True],
       [ True,  True,  True]], dtype=bool)

In [146]: np.in1d(b,a).reshape(b.shape).any(1)
Out[146]: array([ True, False, False,  True,  True, 
                    True,  True,  True,  True,  True], dtype=bool)

In [147]: np.flatnonzero(np.in1d(b,a).reshape(b.shape).any(1))
Out[147]: array([0, 3, 4, 5, 6, 7, 8, 9])

【讨论】：

【解决方案2】：

列表b 被理解为一个矩阵，其中子列表是行，而不是列。

话虽如此，考虑到您提供的示例，我假设您真正想要做的是在 b 的行中查找匹配项。然后我们将按照以下方式进行：

检查 a 中的任何数字是否与 b 的给定子列表中包含的数字匹配。
获取一个数组，其元素是标识那些 b 的子列表的索引，这些子列表至少具有 a 的数字之一。

我会使用标准 Python 3 sintax，使用列表。然后，我将使用 numpy asarray 函数将其转换为数组：

import numpy as np
def matches(a,b):
list = [] 
for i in range(len(b)):
    for j in range(len(b[0])):
        if b[i][j] in a:
            list = list+[i]
            break
        else:
             pass
arrayIndexes = np.asarray(list)
return arrayIndexes

print(matches([0,1,2,5,6,7],
         [[0,1,3],
          [8,3,9],
          [9,8,4],
          [0,4,5],
          [1,7,3],
          [1,5,7],
          [2,3,7],
          [4,2,6],
          [5,4,6],
          [5,6,7]]))

返回的带有索引的 numpy 数组将是名为 arrayIndexes 的对象，并且将包含以下内容：

array([0,3,4,5,6,7,8,9])

【讨论】：

【解决方案3】：

使用in1d:

a = [0, 1, 2, 5, 6, 7]
b = [[0, 1, 3],[8, 3, 9],[9, 8, 4],[0, 4, 5],[1, 7, 3],[1, 5, 7],[2, 3, 7],[4, 2, 6],[5, 4, 6],[5, 6, 7]]
a, b = map(np.array, (a, b))
np.where(np.in1d(b, a).reshape(b.shape).any(axis=1))[0]

【讨论】：

这与使用np.in1d 的其他帖子基本相同吗，因为flatnonzero 仅适用于1D 和np.where 只需同时获得row 和column 索引，所以我们在那里得到第一个输出？ :) OP 已经将输入作为数组。所以，我们似乎也不需要这些映射。

【解决方案4】：

通过一个简单（但可能很大）的广播比较：

In [14]: mask=a==b[:,:,None]
In [15]: mask.shape
Out[15]: (10, 3, 6)
In [16]: mask.any(axis=2)    # any match over elements of a
Out[16]: 
array([[ True,  True, False],
       [False, False, False],
       [False, False, False],
       [ True, False,  True],
       [ True,  True, False],
       [ True,  True,  True],
       [ True, False,  True],
       [False,  True,  True],
       [ True, False,  True],
       [ True,  True,  True]], dtype=bool)
In [17]: mask.any(axis=2).any(axis=1)   # and match on any colum
Out[17]: array([ True, False, False,  True,  True,  True,  True,  True,  True,  True], dtype=bool)
In [18]: np.where(mask.any(axis=2).any(axis=1))
Out[18]: (array([0, 3, 4, 5, 6, 7, 8, 9], dtype=int32),)

in1d 有多种模式，具体取决于 2 个输入的相对大小，但我认为其中一个会做等效的事情。

In [26]: timeit np.where(np.any(a==b[:,:,None], axis=(1,2)))[0]
The slowest run took 4.19 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 18.3 µs per loop
In [27]: timeit np.flatnonzero(np.in1d(b,a).reshape(b.shape).any(1))
10000 loops, best of 3: 38.2 µs per loop

【讨论】：

【解决方案5】：

您可以简单地使用list comprehension 并使用any() 来完成此操作，如下例所示：

a = [0, 1, 2, 5, 6, 7]
b = [[0, 1, 3],[8, 3, 9],[9, 8, 4],[0, 4, 5],[1, 7, 3],[1, 5, 7],[2, 3, 7],[4, 2, 6],[5, 4, 6],[5, 6, 7]]

final = [k for k in range(len(b)) if any(j in b[k] for j in a)]
print(final)

输出：

[0, 3, 4, 5, 6, 7, 8, 9]

【讨论】：