numpy：累积“更大”操作答案

【问题标题】：numpy: accumulate 'greater' operationnumpy：累积“更大”操作
【发布时间】：2021-08-21 01:50:21
【问题描述】：

我正在尝试编写一个函数来检测所有上升沿 - 向量中的索引值超过某个阈值。此处描述了类似的内容：Python rising/falling edge oscilloscope-like trigger，但我想添加滞后，以便触发器不会触发，除非该值低于另一个限制。

我想出了以下代码：

import numpy as np

arr = np.linspace(-10, 10, 60)
sample_values = np.sin(arr) + 0.6 * np.sin(arr*3)

above_trigger = sample_values > 0.6
below_deadband = sample_values < 0.0
combined = 1 * above_trigger - 1 * below_deadband

现在在combined 数组中，1 的原始值高于上限，-1 的值低于下限，0 的值介于两者之间：

>>> combined
array([ 1,  1, -1, -1, -1, -1, -1, -1, -1, -1, -1,  0,  1,  1,  1,  0,  0,
        1,  1,  1,  0, -1, -1, -1, -1, -1, -1, -1, -1, -1,  0,  1,  1,  1,
        0,  0,  1,  1,  1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,  1,  1,
        1,  0,  0,  1,  1,  1,  0, -1, -1])

我的想法是使用一些聪明的函数来顺序处理这个向量，并用它们之前的任何非零值替换所有零序列。那么问题就归结为简单地找到值从-1 变为1 的位置。

我认为如果正确使用 greater 操作将实现此目的：-1 编码为 True 和 1 编码为 False：

(True ("-1") > -1) -> True ("-1")
(True ("-1") > 1) -> False ("1")
(True ("-1") > 0) -> True ("-1")
(False ("1") > -1) -> True ("-1")
(False ("1") > 1) -> False ("1")
(False ("1") > 0) -> False ("1")

但结果不是我所期望的：

>>> 1 - 2 * np.greater.accumulate(combined)
array([-1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1])

在这种情况下，greater 函数似乎无法正确地将布尔值与数值进行比较，即使它在用于标量或成对时工作正常：

>>> np.greater(False, -1)
True
>>> np.greater.outer(False, combined)
array([False, False,  True,  True,  True,  True,  True,  True,  True,
        True,  True, False, False, False, False, False, False, False,
       False, False, False,  True,  True,  True,  True,  True,  True,
        True,  True,  True, False, False, False, False, False, False,
       False, False, False,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True, False, False, False, False, False,
       False, False, False, False,  True,  True])

这是预期的行为吗？我在这里做错了什么，有什么办法可以解决这个问题吗？

或者，也许您可以建议另一种方法来解决这个问题？

谢谢。

【问题讨论】：

标签： python numpy numpy-ufunc

【解决方案1】：

我不确定np.greater.accumulate 的问题是什么（它的行为似乎确实不像宣传的那样），但以下应该可行：

import numpy as np
import numpy as np

arr = np.linspace(-10, 10, 60)
sample_values = np.sin(arr) + 0.6 * np.sin(arr*3)

above_trigger = sample_values > 0.6
below_deadband = sample_values < 0.0
combined = 1 * above_trigger - 1 * below_deadband

mask = combined != 0
idx = np.where(mask,np.arange(len(mask)),0)
idx = np.maximum.accumulate(idx)
result = combined[idx]

print(f"combined:\n {combined}\n")
print(f"result:\n {result}")

它给出：

combined:
 [ 1  1 -1 -1 -1 -1 -1 -1 -1 -1 -1  0  1  1  1  0  0  1  1  1  0 -1 -1 -1
 -1 -1 -1 -1 -1 -1  0  1  1  1  0  0  1  1  1 -1 -1 -1 -1 -1 -1 -1 -1 -1
 -1  1  1  1  0  0  1  1  1  0 -1 -1]

result:
 [ 1  1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1  1  1  1  1  1  1  1  1  1 -1 -1 -1
 -1 -1 -1 -1 -1 -1 -1  1  1  1  1  1  1  1  1 -1 -1 -1 -1 -1 -1 -1 -1 -1
 -1  1  1  1  1  1  1  1  1  1 -1 -1]

那么值从-1跳转到1的索引可以得到如下：

np.nonzero(result[1:] > result[:-1])[0] + 1

它给出：

array([12, 31, 49])

【讨论】：

【解决方案2】：

我一直在开发一个名为 ufunclab 的包，其中包含满足您需求的函数 hysteresis_relay。我没有把它放在 PyPI 上，所以你必须获取源代码并自己构建才能使用它。

In [122]: import numpy as np

In [123]: from ufunclab import hysteresis_relay

In [124]: arr = np.linspace(-10, 10, 60)

In [125]: sample_values = np.sin(arr) + 0.6 * np.sin(arr*3)

In [126]: hysteresis_relay(sample_values, 0.0, 0.6, -1, 1, 1).astype(int)
Out[126]: 
array([ 1,  1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,  1,  1,  1,  1,  1,
        1,  1,  1,  1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,  1,  1,  1,
        1,  1,  1,  1,  1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,  1,  1,
        1,  1,  1,  1,  1,  1,  1, -1, -1])

另一种选择是使用 Pandas（但我怀疑@bb1 的回答会比这更有效，@bb1 的回答避免依赖另一个库）。

将combined 转换为 Pandas 系列。
将系列中的 0 替换为 pd.NA。
使用fillna() 和method='ffill' 方法“向前填充”NA 值。
使用 to_numpy() 方法将 Series 转换回 NumPy 数组。

In [107]: combined
Out[107]: 
array([ 1,  1, -1, -1, -1, -1, -1, -1, -1, -1, -1,  0,  1,  1,  1,  0,  0,
        1,  1,  1,  0, -1, -1, -1, -1, -1, -1, -1, -1, -1,  0,  1,  1,  1,
        0,  0,  1,  1,  1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,  1,  1,
        1,  0,  0,  1,  1,  1,  0, -1, -1])

In [108]: import pandas as pd

In [109]: pd.Series(combined).replace(0, pd.NA).fillna(method='ffill').to_numpy()
Out[109]: 
array([ 1,  1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,  1,  1,  1,  1,  1,
        1,  1,  1,  1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,  1,  1,  1,
        1,  1,  1,  1,  1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,  1,  1,
        1,  1,  1,  1,  1,  1,  1, -1, -1])

【讨论】：

这确实是一个非常有用的模块，虽然我在 VS2019 上编译它时遇到了问题。它阻塞了 vnorm 函数中的复杂数据类型。如果我在 vnorm 中复杂之前检出版本，则编译良好。
微软的 Visual Studio 主要是一个 C++ 编译器。他们对 C 标准（在本例中为 C99）的支持并不完整。
好的，那么有推荐的在 Windows 上构建它的方法吗？我刚刚执行了pip install .，显然它检测到了我的 VS2019 安装并使用了它。

【解决方案3】：

这是另一个简单的解决方案：

def gen(arr, start=0):
    y = start
    for x in arr:
        if x != 0:
            y = x
        yield y

g = gen(combined)
# set count for performance
np.fromiter(g, dtype=int, count=combined.size)

>>> array([ 1,  1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,  1,  1,  1,  1,  1,
    1,  1,  1,  1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,  1,  1,  1,
    1,  1,  1,  1,  1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,  1,  1,
    1,  1,  1,  1,  1,  1,  1, -1, -1])

您可以编写类似的生成器或循环来直接检测跳转：

p = 0
for i, x in enumerate(combined):
    if x - p == 2:
        print(i)
        break
    if x != 0:
        p = x

combined[i-5:i+1]
>>> 12
>>> array([-1, -1, -1, -1,  0,  1])

【讨论】：

【解决方案4】：

谢谢大家的回答。

作为记录，以下是建议解决方案的时间结果：

import numpy as np
import pandas as pd
import ufunclab

arr = np.linspace(-10, 10, 600)
values = np.sin(arr)+0.6*np.sin(arr*3)


def trigger_using_greater(values):
    # This doesn't give correct results
    combined = 1 * (values > 0.6) - 1 * (values < 0)
    return 1 - 2 * np.greater.accumulate(combined)


def trigger_using_masked_indexes(values):
    combined = 1 * (values > 0.6) - 1 * (values < 0)
    mask = combined != 0
    idx = np.where(mask, np.arange(len(mask)), 0)
    idx = np.maximum.accumulate(idx)
    return combined[idx]


def trigger_using_hysteresis_relay(values):
    result = ufunclab.hysteresis_relay(values, 0.0, 0.6, -1, 1, 1).astype(int)
    return result


def trigger_using_pandas(values):
    combined = 1 * (values > 0.6) - 1 * (values < 0)
    result = pd.Series(combined).replace(0, pd.NA).fillna(method='ffill').to_numpy()
    return result

def gen(arr, start=0):
    y = start
    for x in arr:
        if x != 0:
            y = x
        yield y

def trigger_using_generator(values):
    combined = 1 * (values > 0.6) - 1 * (values < 0)
    g = gen(combined)
    return np.fromiter(g, dtype=int, count=combined.size)

In [8]: %timeit trigger_using_greater(values)
21.9 µs ± 1.3 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [9]: %timeit trigger_using_masked_indexes(values)
26.8 µs ± 563 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [10]: %timeit trigger_using_hysteresis_relay(values)
7.31 µs ± 759 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [11]: %timeit trigger_using_pandas(values)
755 µs ± 63.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [12]: %timeit trigger_using_generator(values)
165 µs ± 3.37 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

hysteresis_relay 显然是这里的赢家，但代价是编译 ufunclab 包。顺便说一句，非常有用的包。 Warren，考虑将它发布到 PyPI。理想情况下，我希望至少将其中一些功能集成到 SciPy 中。

屏蔽索引解决方案几乎与我原来的（不工作的）解决方案一样快，并且不需要外部库。

Pandas 解决方案速度慢得惊人，甚至比标准 Python 生成器还要慢。

【讨论】：