numpy 数组中的索引，其中在另一个数组中切片答案

【问题标题】：Indices in a numpy array where slice in another arraynumpy 数组中的索引，其中在另一个数组中切片
【发布时间】：2014-03-22 21:37:39
【问题描述】：

实际问题出在某些机器学习应用程序中，数据变得有点复杂。所以这是一个抓住问题本质的 MWE：

我有两个数组如下：

L = np.arange(12).reshape(4,3)
M = np.arange(12).reshape(6,2)

现在，我想在L 中找到R 行，这样M 中就存在一些行，它由R 中除最后一个之外的所有元素组成。

从上面的示例代码中，L 和 M 看起来像这样：

array([[ 0,  1,  2],  # L
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

array([[ 0,  1],  # M
       [ 2,  3],
       [ 4,  5],
       [ 6,  7],
       [ 8,  9],
       [10, 11]])

我想从这些 L 中标记的行作为一个 numpy 数组：

array([[ 0,  1,  2],
       [ 6,  7,  8]])

如果我将L 和M 表示为python 列表，我会这样做：

L = [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11]]
M = [[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, 11]]
answer = [R for R in L if R[:-1] in M]

现在，我知道我可以在 numpy 中使用类似的列表推导并将结果转换为数组，numpy 就像它一样棒，可能有一种更优雅的方式来做到这一点，我不知道。

我尝试查看np.where（以获取所需的索引，然后我可以用它订阅 L），但这似乎不能满足我的需要。

感谢您的帮助

【问题讨论】：

如果一行 M 包含一行 L 的前两个元素，它不能包含最后一个元素，因为它空间不足。在您的实际应用中也会出现这种情况吗？
@user2357112：绝对正确。这就是为什么我要测试“M 中的某些行，它由 R 中除了最后一个元素之外的所有元素组成”。 L 中一行的最后一个元素是基于一些额外计算的添加维度
当我看到这样的问题时，我希望 NumPy 的集合操作不都是一维的。我认为您可以通过将几个 sort 和 in1d 电话放在一起来做一些事情。不过，这个想法现在还很模糊。
@user2357112：这超出了我的 numpy-fu 范围。我很感激一个例子

标签： python arrays numpy python-3.3

【解决方案1】：

好的，我想我明白了。诀窍是给M添加另一个维度，然后你就可以使用广播了：

M.shape += (1,)
E = np.all(L[:,:-1].T == M, 1)

您会得到一个 6x4 布尔矩阵 E，它会为您提供 L 的所有行与 M 的所有行进行比较的结果。

从这里很容易完成：

result = L[np.any(E,0)]

通过这种方式简化了解决方案，您不需要任何 lambda 函数或“隐式循环”（例如 np.apply_along_axis()）。

是的，numpy 向量化很漂亮（但有时你必须想得很抽象）...

【讨论】：

要么我不明白，要么它不起作用。您能否实现简单的完成以向我展示如何获得所需的值？
好的，我已经明确表示了 - 我通常尽量不给出人们只能复制和粘贴的答案。
很棒的答案！我更新了我的数组广播魔法的详细说明。感谢您增加我的 numpy-fu ！ :)

【解决方案2】：

和Bitwise的回答很相似：

def fn(a):
    return lambda b: np.all(a==b, axis=1)
matches = np.apply_along_axis(fn(M), 1, L[:,:2])
result = L[np.any(matches, axis=1)]

幕后发生的事情是这样的（我将使用 Bitwise 的示例，它更容易演示）：

>>> M
array([[ 0,  1],
       [ 2,  3],
       [ 4,  5],
       [ 6,  7],
       [ 8,  9],
       [10, 11]])
>>> M.shape+=(1,)
>>> M
array([[[ 0],
        [ 1]],

       [[ 2],
        [ 3]],

       [[ 4],
        [ 5]],

       [[ 6],
        [ 7]],

       [[ 8],
        [ 9]],

       [[10],
        [11]]])

这里我们为 M 数组添加了另一个维度，现在是 (6,2,1)。

>>> L2 = L[:,:-1].T

然后我们去掉最后一列2，对数组进行转置，使得维度为(2,4)

神奇的是，M 和 L2 现在可以广播到维度为 (6,2,4) 的数组。

正如 numpy 的文档所述：

一组数组被称为“可广播”到相同的形状，如果上述规则产生一个有效的结果，即以下之一是真的：
The arrays all have exactly the same shape.
The arrays all have the same number of dimensions and the length of each dimensions is either a common length or 1.
The arrays that have too few dimensions can have their shapes prepended with a dimension of length 1 to satisfy property 2.
例子

如果a.shape是(5,1)，b.shape是(1,6)，c.shape是(6,)，d.shape是 () 使得 d 是一个标量，那么 a、b、c 和 d 都可以广播到尺寸（5,6）；和
a acts like a (5,6) array where a[:,0] is broadcast to the other columns,
b acts like a (5,6) array where b[0,:] is broadcast to the other rows,
c acts like a (1,6) array and therefore like a (5,6) array where c[:] is broadcast to every row, and finally,
d acts like a (5,6) array where the single value is repeated.

M[:,:,0] 将重复 4 次以填充 3 个暗淡，L2 将添加一个新维度并重复 6 次以填充它。

>>> B = np.broadcast_arrays(L2,M)
>>> B
[array([[[ 0,  3,  6,  9],
        [ 1,  4,  7, 10]],

       [[ 0,  3,  6,  9],
        [ 1,  4,  7, 10]],

       [[ 0,  3,  6,  9],
        [ 1,  4,  7, 10]],

       [[ 0,  3,  6,  9],
        [ 1,  4,  7, 10]],

       [[ 0,  3,  6,  9],
        [ 1,  4,  7, 10]],

       [[ 0,  3,  6,  9],
        [ 1,  4,  7, 10]]]),


array([[[ 0,  0,  0,  0],
        [ 1,  1,  1,  1]],

       [[ 2,  2,  2,  2],
        [ 3,  3,  3,  3]],

       [[ 4,  4,  4,  4],
        [ 5,  5,  5,  5]],

       [[ 6,  6,  6,  6],
        [ 7,  7,  7,  7]],

       [[ 8,  8,  8,  8],
        [ 9,  9,  9,  9]],

       [[10, 10, 10, 10],
        [11, 11, 11, 11]]])]

我们现在可以逐元素比较它们：

>>> np.equal(*B)
array([[[ True, False, False, False],
        [ True, False, False, False]],

       [[False, False, False, False],
        [False, False, False, False]],

       [[False, False, False, False],
        [False, False, False, False]],

       [[False, False,  True, False],
        [False, False,  True, False]],

       [[False, False, False, False],
        [False, False, False, False]],

       [[False, False, False, False],
        [False, False, False, False]]], dtype=bool)

行到行（轴 = 1）：

>>> np.all(np.equal(*B), axis=1)
array([[ True, False, False, False],
       [False, False, False, False],
       [False, False, False, False],
       [False, False,  True, False],
       [False, False, False, False],
       [False, False, False, False]], dtype=bool)

在 L 上聚合：

>>> C = np.any(np.all(np.equal(*B), axis=1), axis=0)
>>> C
array([ True, False,  True, False], dtype=bool)

这为您提供了应用于 L 的布尔掩码。

>>> L[C]
array([[0, 1, 2],
       [6, 7, 8]])

apply_along_axis 将利用相同的功能，但减少 L 的维度而不是增加 M 的维度（从而添加隐式循环）。

【讨论】：

感谢您提供如此详细的帖子。我将继续使用 behzad.nouri 的答案，只是为了简单。但我真的很喜欢你的细节和教我更多的东西。我正在为此添加书签以供将来参考

【解决方案3】：

>>> import hashlib
>>> fn = lambda xs: hashlib.sha1(xs).hexdigest()
>>> m = np.apply_along_axis(fn, 1, M)
>>> l = np.apply_along_axis(fn, 1, L[:,:-1])
>>> L[np.in1d(l, m)]
array([[0, 1, 2],
       [6, 7, 8]])

【讨论】：

你能补充解释吗？我不完全理解解决方案
@inspectorG4dget 因为in1d 仅适用于一维数组，我正在对行进行哈希处理，然后应用in1d。

【解决方案4】：

>>> print np.array([row for row in L if row[:-1] in M])
[[0 1 2]
 [6 7 8]]

【讨论】：