在 Python 中创建严格递增列表的最快方法答案

【问题标题】：Fastest way to create strictly increasing lists in Python在 Python 中创建严格递增列表的最快方法
【发布时间】：2017-09-20 12:35:02
【问题描述】：

我想了解在 Python 中实现以下目标的最有效方法是什么：

假设我们有两个列表a 和b，它们的长度相等，最多包含1e7 个元素。但是，为了便于说明，我们可以考虑以下内容：

a = [2, 1, 2, 3, 4, 5, 4, 6, 5, 7, 8, 9, 8,10,11]
b = [1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13,14,15]

目标是从a 创建一个严格单调的列表a_new，而仅使用具有相同值的样本点的第一个样本点。在a 中必须删除的相同索引也应在b 中删除，这样最终结果将是：

a_new = [2, 3, 4, 5, 6, 7, 8, 9,10,11]
b_new = [1, 4, 5, 6, 8,10,11,12,14,15]

当然，这可以使用计算量大的 for 循环来完成，但由于数据量巨大，这并不适合。

非常感谢任何建议。

【问题讨论】：

非常感谢所有的贡献，这些贡献都比经典的 for 循环方法快得多。我对所有解决方案都投了赞成票，因为我真的很喜欢它们并且它们都有效。 @piRSquared 的解决方案原来是最快的。所以我会接受他的回答。

标签： python python-2.7 pandas numpy scipy

【解决方案1】：

您可以计算a的累积最大值，然后用np.unique删除重复项，您还可以记录唯一索引以便相应地子集b：

a = np.array([2,1,2,3,4,5,4,6,5,7,8,9,8,10,11])
b = np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15])

a_cummax = np.maximum.accumulate(a)    
a_new, idx = np.unique(a_cummax, return_index=True)

a_new
# array([ 2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

b[idx]
# array([ 1,  4,  5,  6,  8, 10, 11, 12, 14, 15])

【讨论】：

【解决方案2】：

使用numba 运行@juanpa.arrivillaga 函数的一个版本

import numba

def psi(A):
    a_cummax = np.maximum.accumulate(A)
    a_new, idx = np.unique(a_cummax, return_index=True)
    return idx

def foo(arr):
    aux=np.maximum.accumulate(arr)
    flag = np.concatenate(([True], aux[1:] != aux[:-1]))
    return np.nonzero(flag)[0]

@numba.jit
def f(A):
    m = A[0]
    a_new, idx = [m], [0]
    for i, a in enumerate(A[1:], 1):
        if a > m:
            m = a
            a_new.append(a)
            idx.append(i)
    return idx

时机

%timeit f(a)
The slowest run took 5.37 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 1.83 µs per loop

%timeit foo(a)
The slowest run took 9.41 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 6.35 µs per loop

%timeit psi(a)
The slowest run took 9.66 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 9.95 µs per loop

【讨论】：

哇。这是一个显着的改进。我回家后可能会用Cython 试试看。
您能否将计时结果发布为文本而不是文本图片？您的回答看起来很有趣，但我无法阅读结果。
@Cornstalks 我现在在飞机上。我不能。然而结果是 numba 为 1.8 微秒，hpauljs foo 为 6.3 微秒，psidoms 为 9.9 微秒
另外，请注意，这并没有像我原来的解决方案那样利用单通道。 A[1:] 基本上复制了几乎整个列表。您应该遍历 for i, a in enumerate(A, i) 并使用 float('-inf') 技巧。
@Cornstalks：单击该图像以更合理的尺寸查看它

【解决方案3】：

unique 和 return_index 使用 argsort。不需要maximum.accumulate。所以我们可以蚕食unique 并这样做：

In [313]: a = [2,1,2,3,4,5,4,6,5,7,8,9,8,10,11]
In [314]: arr = np.array(a)
In [315]: aux = np.maximum.accumulate(arr)
In [316]: flag = np.concatenate(([True], aux[1:] != aux[:-1])) # key unique step
In [317]: idx = np.nonzero(flag)
In [318]: idx
Out[318]: (array([ 0,  3,  4,  5,  7,  9, 10, 11, 13, 14], dtype=int32),)
In [319]: arr[idx]
Out[319]: array([ 2,  3,  4,  5,  6,  7,  8,  9, 10, 11])
In [320]: np.array(b)[idx]
Out[320]: array([ 1,  4,  5,  6,  8, 10, 11, 12, 14, 15])

In [323]: np.unique(aux, return_index=True)[1]
Out[323]: array([ 0,  3,  4,  5,  7,  9, 10, 11, 13, 14], dtype=int32)

def foo(arr):
    aux=np.maximum.accumulate(arr)
    flag = np.concatenate(([True], aux[1:] != aux[:-1]))
    return np.nonzero(flag)[0]

In [330]: timeit foo(arr)
....
100000 loops, best of 3: 12.5 µs per loop
In [331]: timeit np.unique(np.maximum.accumulate(arr), return_index=True)[1]
....
10000 loops, best of 3: 21.5 µs per loop

具有 (10000,) 形状 medium 这种无排序的独特具有显着的速度优势：

In [334]: timeit np.unique(np.maximum.accumulate(medium[0]), return_index=True)[1]
1000 loops, best of 3: 351 µs per loop
In [335]: timeit foo(medium[0])
The slowest run took 4.14 times longer ....
10000 loops, best of 3: 48.9 µs per loop

[1]：使用np.source(np.unique) 查看代码，或者？？在 IPython 中

【讨论】：

您的计时方式不同。在第一种情况下存在函数调用开销
@kmario23 单个函数调用不会导致这些差异。
剩下要做的就是对数组进行切片。 np.nonzero 返回副本所在位置的索引...就像 np.unique 和 return_index=True
为了好玩，我用大量随机数据分析了np.unique 解决方案：cumtime:np_monotonic: 493mscummax: 47msunique: 433ms 其中毫无意义的argsort: 292ms。

【解决方案4】：

这是一个简单的 Python 解决方案，只通过一次：

>>> a = [2,1,2,3,4,5,4,6,5,7,8,9,8,10,11]
>>> b = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
>>> a_new, b_new = [], []
>>> last = float('-inf')
>>> for x, y in zip(a, b):
...     if x > last:
...         last = x
...         a_new.append(x)
...         b_new.append(y)
...
>>> a_new
[2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
>>> b_new
[1, 4, 5, 6, 8, 10, 11, 12, 14, 15]

我很想知道它与numpy 解决方案相比如何，后者具有相似的时间复杂度，但对数据进行了几次传递。

这里有一些时间。首先，设置：

>>> small = ([2,1,2,3,4,5,4,6,5,7,8,9,8,10,11], [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15])
>>> medium = (np.random.randint(1, 10000, (10000,)), np.random.randint(1, 10000, (10000,)))
>>> large = (np.random.randint(1, 10000000, (10000000,)), np.random.randint(1, 10000000, (10000000,)))

现在有两种方法：

>>> def monotonic(a, b):
...     a_new, b_new = [], []
...     last = float('-inf')
...     for x,y in zip(a,b):
...         if x > last:
...             last = x
...             a_new.append(x)
...             b_new.append(y)
...     return a_new, b_new
...
>>> def np_monotonic(a, b):
...     a_new, idx = np.unique(np.maximum.accumulate(a), return_index=True)
...     return a_new, b[idx]
...

请注意，这些方法并不是严格等效的，一种保留在原版 Python 领域，另一种保留在 numpy 数组领域。假设您从相应的数据结构（numpy.array 或 list）开始，我们将比较性能：

首先是一个小列表，与 OP 的示例相同，我们看到 numpy 实际上并不快，这对于小型数据结构来说并不奇怪：

>>> timeit.timeit("monotonic(a,b)", "from __main__ import monotonic, small; a, b = small", number=10000)
0.039130652003223076
>>> timeit.timeit("np_monotonic(a,b)", "from __main__ import np_monotonic, small, np; a, b = np.array(small[0]), np.array(small[1])", number=10000)
0.10779813499539159

现在是 10,000 个元素的“中等”列表/数组，我们开始看到 numpy 的优势：

>>> timeit.timeit("monotonic(a,b)", "from __main__ import monotonic, medium; a, b = medium[0].tolist(), medium[1].tolist()", number=10000)
4.642718859016895
>>> timeit.timeit("np_monotonic(a,b)", "from __main__ import np_monotonic, medium; a, b = medium", number=10000)
1.3776302759943064

现在，有趣的是，“大”数组的优势似乎缩小了，大约 1e7 个元素：

>>> timeit.timeit("monotonic(a,b)", "from __main__ import monotonic, large; a, b = large[0].tolist(), large[1].tolist()", number=10)
4.400254560023313
>>> timeit.timeit("np_monotonic(a,b)", "from __main__ import np_monotonic, large; a, b = large", number=10)
3.593393853981979

注意，在最后一对计时中，我每个只做了10次，但如果有人有更好的机器或更多的耐心，请随时增加number

【讨论】：

根据我的整体经验，C 编译模块比纯 Python 快 10 到 100 倍，所以我敢打赌 numpy 解决方案会更快……如果我是对的，我很好奇。希望OP提供计时结果。
@Claudio 根据我的经验，numpy 数组与普通 Python 列表相比，像这样的简单操作在速度方面的优势并不总是那么简单，并且取决于数据结构的大小。查看我刚刚添加的编辑，其中显示了不同大小数据结构的时间安排。

【解决方案5】：

a = [2, 1, 2, 3, 4, 5, 4, 6, 5, 7, 8, 9, 8,10,11]
b = [1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13,14,15]
print(sorted(set(a)))
print(sorted(set(b)))
#[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
#[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]

【讨论】：

您的解决方案不符合问题中列出的要求。当您回答一个已被接受的三年前的问题时，您可能需要考虑您的答案是否增加了价值。