在numpy中查找第一个非零行答案

【问题标题】：Find first non-zero row in numpy在numpy中查找第一个非零行
【发布时间】：2019-07-07 22:51:17
【问题描述】：

假设我们有像a 这样的数组，我们想在其中找到第一个非零行。 a 可以很大，即单通道图像。

a = np.array([[0, 0, 0], [0, 0, 0], [0, 1, 0], [2, 3, 2]])

array([[0, 0, 0],
       [0, 0, 0],
       [0, 1, 0],
       [2, 3, 2]])

在 numpy 中最快、最优雅的方法是什么？

现在我正在这样做：

row_idx = np.argmin(np.sum(a, axis=1)==0)

【问题讨论】：

a.any(1).argmax() 请注意，np.NaN 算作非零。

标签： python numpy

【解决方案1】：

这是一个非常快但仅适用于连续数组的方法（下面的 pp）。它使用视图转换来布尔并利用短路。在下面的比较中，我冒昧地修正了其他答案，因此它们可以正确处理全零输入。

结果：

                                pp    galaxyan  WeNYoBen1  WeNYoBen2
contiguous small sparse   1.863220    1.465050   3.522510   4.861850
           large dense    2.086379  865.158230  68.337360  42.832701
                 medium   2.136710  726.706850  71.640330  43.047541
                 sparse  11.146050  694.993751  71.333189  42.406949
non cont.  small sparse   1.683651    1.516769   3.193740   4.017490
           large dense   55.097940  433.429850  64.628370  72.984670
                 medium  60.434350  397.200490  67.545200  51.276210
                 sparse  61.433990  387.847329  67.141630  45.788040

代码：

import numpy as np

def first_nz_row(a):
    if a.flags.c_contiguous:
        b = a.ravel().view(bool)
        res = b.argmax()
        return res // (a.shape[1]*a.itemsize) if res or b[res] else a.shape[0]
    else:
        b = a.astype(bool).ravel()
        res = b.argmax()
        return res // a.shape[1] if res or b[res] else a.shape[0]

def use_nz(a):
    b = np.nonzero(a)[0]
    return b[0] if b.size else a.shape[0]

def any_max(a):
    b = a.any(1)
    res = b.argmax()
    return res if res or b[res] else a.shape[0]

def max_max(a):
    b = a.max(1).astype(bool)
    res = b.argmax()
    return res if res or b[res] else a.shape[0]

from timeit import timeit


A = [np.random.uniform(-R, 1, (N,M)).clip(0,None)
     for R,N,M in [[100,2,2], [10,100,1000], [1000,100,1000], [10000,100,1000]]]
t = 10000*np.array(
    [[timeit(f, number=100) for f in (lambda: first_nz_row(a),
                                      lambda: use_nz(a),
                                      lambda: any_max(a),
                                      lambda: max_max(a))]
     for a in A] +
    [[timeit(f, number=100) for f in (lambda: first_nz_row(a),
                                      lambda: use_nz(a),
                                      lambda: any_max(a),
                                      lambda: max_max(a))]
     for a in [a[:,::2] for a in A]])

import pandas as pd
index = "dense medium sparse".split()
index = pd.MultiIndex([['contiguous', 'non cont.'], ['small', 'large'], index], [np.repeat((0,1),4), np.repeat((0,1,0,1,),(1,3,1,3)), np.r_[2, :3, 2, :3]])
t = pd.DataFrame(t, columns="pp galaxyan WeNYoBen1 WeNYoBen2".split(), index=index)
print(t)

【讨论】：

【解决方案2】：

nonzero 将发现所有项目都不为零并返回行/列号

np.nonzero(a)[0][0]

2

【讨论】：

【解决方案3】：

我会做什么

a.any(1).argmax()
2

或者

a.max(1).astype(bool).argmax()
2

【讨论】：