【问题标题】:Efficient way of checking if value within boundaries defined in 2d array检查二维数组中定义的边界内的值是否有效的方法
【发布时间】:2017-11-02 09:03:05
【问题描述】:

我正在尝试编写一个 Python 程序,该程序使用来自眼动追踪设备的输入数据并检查它是否在给定范围内。输入是一个归一化值,对应于注视的 x 位置。范围总是预先排序的。我需要检查这个位置 x 是否在二维数组中任何一对元素的边界内,如果是这样,就运行一个函数。比如:

x = 0.23 # input variable
boundaries = [[0.0, 0.025], [0.025, 0.1], [0.1, 0.14], [0.15, 0.25]]

for i, pair in enumerate(boundaries):
    if x >= pair[0] and x <= pair[1]:
        print(i) # some function

现在,问题是输入 x 是以 60Hz 发出的实时数据,边界有时可能是长列表(1000 个元素),因此使用这种方法每秒将进行数十万次检查。进行此计算的最有效方法是什么?我想也许在 numpy 中有一个很好的矢量化版本,但我在微积分方面相当糟糕。

我已经进行了测试,以确定@Meitham 在答案中发布的解决方案是否给我带来了明显的差异,但是我使用 Python 方法获得了:

import numpy as np

n = 10000
b1 = np.linspace(0.0, 1.0, n)
boundaries = [[b1[i], b1[i] + 0.01] for i in range(n)]
x = 0.23
final = []

for i, pair in enumerate(boundaries):
    if x >= pair[0] and x <= pair[1]:
        final.append(pair)

100000000 loops, best of 3: 0.0121 usec per loop

还有 numpy 方法:

import numpy as np

n = 10000
b1 = np.linspace(0.0, 1.0, n)
boundaries = [[b1[i], b1[i] + 0.01] for i in range(n)]
x = 0.23
a = np.array(boundaries)
final = a[(a[...,0] < x) & (a[...,1] > x)]

100000000 loops, best of 3: 0.0122 usec per loop

所以我认为这两种方法之间没有任何有意义的区别。也许我以错误的方式测试它?

【问题讨论】:

  • 这些边界元组是否总是排序的,即 [(x, y)...] 其中总是 x
  • @Darien,你能澄清一下边界是如何排列的吗?对可以相交吗?从您的示例中,Yn
  • @Meitham,是的,他们总是满足条件 x
  • @IgorKleinerman。边界对 [(x,y), ...] 始终是起点和终点,其中 x != y 始终为真,实际上 Yn
  • @Meitham,对不起,我不能再编辑了。它们确实总是排序的,但总是 x != y,所以 (x

标签: python arrays numpy


【解决方案1】:

x = 0.23 boundaries = [[0.0, 0.025], [0.025, 0.1], [0.1, 0.14], [0.15, 0.25]] filter_list = [item for item in boundaries if x >= item[0] and x <= item[1] ] print(filter_list)

【讨论】:

    【解决方案2】:
    >>> import numpy as np    
    >>> boundaries = [[0.0, 0.025], [0.025, 0.1], [0.1, 0.14], [0.15, 0.25], [1.0, 0.5]]
    >>> x = 0.23
    

    这使用numpy,如果边界反转为::

    >>> a = np.array(boundaries)
    >>> (np.min(a, 1) < x) & (x < np.max(a, 1))
    array([False, False, False,  True, False], dtype=bool)
    >>> a[(small < x) & (x < large)]
    array([[ 0.15,  0.25]])
    

    如果边界点 (x, y) 保证为 (x

    >>> a[(a[...,0] < x) & (a[...,1] > x)]
    array([[ 0.15,  0.25]])
    

    没有numpy 的纯python 解决方案可能如下所示::

    >>> [(low, high) for (low, high) in boundaries if high <= x <= low]
    

    对于小序列,python 解决方案可能看起来很快,就像在示例中一样,但numpy 会在如您在问题中所述的边界序列很大时发光。

    >>> %timeit [(low, high) for (low, high) in boundaries if high <= x <= low]
    The slowest run took 26.73 times longer than the fastest. This could mean that an intermediate result is being cached.
    1000000 loops, best of 3: 419 ns per loop
    
    >>> %timeit a[(a[...,0] < x) & (a[...,1] > x)]
    The slowest run took 26.76 times longer than the fastest. This could mean that an intermediate result is being cached.
    100000 loops, best of 3: 3.97 µs per loop
    
    >>> %timeit (np.min(a, 1) < x) & (x < np.max(a, 1))
    The slowest run took 40.57 times longer than the fastest. This could mean that an intermediate result is being cached.
    100000 loops, best of 3: 7.22 µs per loop
    

    然而,使用只有 1000 个元素的更大序列::

    >>> import random
    >>> l = [(random.random(), random.random()) for _ in xrange(1000)]
    >>> %timeit [(low, high) for (low, high) in l if high <= x <= low]
    The slowest run took 5.41 times longer than the fastest. This could mean that an intermediate result is being cached.
    10000 loops, best of 3: 82 µs per loop
    >>> a = np.array(l)
    >>> %timeit a[(a[...,0] < x) & (a[...,1] > x)]
    The slowest run took 6.01 times longer than the fastest. This could mean that an intermediate result is being cached.
    100000 loops, best of 3: 10.6 µs per loop
    >>> 
    

    【讨论】:

    • 虽然这是一个不错的解决方案,但似乎没有性能差异:100000000 loops, best of 3: 0.012 usec per loop 与:100000000 loops, best of 3: 0.0121 usec per loop 与原始方法相比
    • @Darien 我已经用性能更新了我的答案,因为从你的评论中不清楚你正在测量的两个操作是什么,我希望这会有所帮助。
    • 谢谢你。我测量的测试与您在这里的测试类似。我已经用我的测试细节更新了我的原始帖子。我不确定我是否做错了什么,但如您所见,我似乎在性能上没有重大差异。可能是什么原因?
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2013-10-20
    • 2022-10-14
    • 1970-01-01
    • 2014-05-15
    • 2021-06-01
    • 2017-08-26
    • 2011-06-01
    相关资源
    最近更新 更多