这是一个简单的 Python 解决方案,只通过一次:
>>> a = [2,1,2,3,4,5,4,6,5,7,8,9,8,10,11]
>>> b = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
>>> a_new, b_new = [], []
>>> last = float('-inf')
>>> for x, y in zip(a, b):
... if x > last:
... last = x
... a_new.append(x)
... b_new.append(y)
...
>>> a_new
[2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
>>> b_new
[1, 4, 5, 6, 8, 10, 11, 12, 14, 15]
我很想知道它与numpy 解决方案相比如何,后者具有相似的时间复杂度,但对数据进行了几次传递。
这里有一些时间。首先,设置:
>>> small = ([2,1,2,3,4,5,4,6,5,7,8,9,8,10,11], [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15])
>>> medium = (np.random.randint(1, 10000, (10000,)), np.random.randint(1, 10000, (10000,)))
>>> large = (np.random.randint(1, 10000000, (10000000,)), np.random.randint(1, 10000000, (10000000,)))
现在有两种方法:
>>> def monotonic(a, b):
... a_new, b_new = [], []
... last = float('-inf')
... for x,y in zip(a,b):
... if x > last:
... last = x
... a_new.append(x)
... b_new.append(y)
... return a_new, b_new
...
>>> def np_monotonic(a, b):
... a_new, idx = np.unique(np.maximum.accumulate(a), return_index=True)
... return a_new, b[idx]
...
请注意,这些方法并不是严格等效的,一种保留在原版 Python 领域,另一种保留在 numpy 数组领域。假设您从相应的数据结构(numpy.array 或 list)开始,我们将比较性能:
首先是一个小列表,与 OP 的示例相同,我们看到 numpy 实际上并不快,这对于小型数据结构来说并不奇怪:
>>> timeit.timeit("monotonic(a,b)", "from __main__ import monotonic, small; a, b = small", number=10000)
0.039130652003223076
>>> timeit.timeit("np_monotonic(a,b)", "from __main__ import np_monotonic, small, np; a, b = np.array(small[0]), np.array(small[1])", number=10000)
0.10779813499539159
现在是 10,000 个元素的“中等”列表/数组,我们开始看到 numpy 的优势:
>>> timeit.timeit("monotonic(a,b)", "from __main__ import monotonic, medium; a, b = medium[0].tolist(), medium[1].tolist()", number=10000)
4.642718859016895
>>> timeit.timeit("np_monotonic(a,b)", "from __main__ import np_monotonic, medium; a, b = medium", number=10000)
1.3776302759943064
现在,有趣的是,“大”数组的优势似乎缩小了,大约 1e7 个元素:
>>> timeit.timeit("monotonic(a,b)", "from __main__ import monotonic, large; a, b = large[0].tolist(), large[1].tolist()", number=10)
4.400254560023313
>>> timeit.timeit("np_monotonic(a,b)", "from __main__ import np_monotonic, large; a, b = large", number=10)
3.593393853981979
注意,在最后一对计时中,我每个只做了10次,但如果有人有更好的机器或更多的耐心,请随时增加number