连接两个排序的整数列表的更好方法答案

【问题标题】：Better way for concatenating two sorted list of integers连接两个排序的整数列表的更好方法
【发布时间】：2016-01-07 13:48:21
【问题描述】：

假设我有一个列表和另一个元组，它们都已经排序：

A = [10, 20, 30, 40]
B = (20, 60, 81, 90)

我需要将 B 中的所有元素添加到 A 中，以使 A 保持排序状态。

我能想到的解决方案是：

for item in B:
    for i in range(0, len(A)):
        if item > A[i]:
            i += 1
        else: 
            A.insert(i, item)

假设 A 的大小为 m，B 的大小为 n；这个解决方案在最坏的情况下需要 O(mxn)，我怎样才能让它表现得更好？

【问题讨论】：

查看合并排序中使用的替换合并算法。这需要O(m+n) 时间。
我建议您与sorted(A+B) 的幼稚（但可读）版本进行比较。此外，您可能想搜索现有的库，这可能比手工实现要快：grantjenks.com/docs/sortedcontainers
提示：不要修改任何一个列表；创建一个新的。代码将变得干净而快速。
我很快就被否决了...但您表示“A 仍然排序”。它已经排序，所以我的解决方案只是将 B 附加到 A。不知道为什么我会因此受到打击......
@Holmes：正确的解决方案应该是A == [10, 20, 20, 30, 40, 60, 81, 90]，而不是B 连接到A。

标签： python algorithm optimization

【解决方案1】：

一个简单的方法是heapq.merge:

A = [10, 20, 30, 40]

B = (20, 60, 81, 90)

from heapq import merge

for ele in merge(A,B):
    print(ele)

输出：

使用其他O(n) 解决方案的一些计时：

In [53]: A = list(range(10000))

In [54]: B = list(range(1,20000,10))

In [55]: timeit list(merge(A,B))
100 loops, best of 3: 2.52 ms per loop

In [56]: %%timeit
C = []
i = j = 0
while i < len(A) and j < len(B):
    if A[i] < B[j]:
        C.append(A[i])
        i += 1
    else:
        C.append(B[j])
        j += 1
C += A[i:] + B[j:]
   ....: 
100 loops, best of 3: 4.29 ms per loop
In [58]: m =list(merge(A,B))
In [59]: m == C
Out[59]: True

如果你想自己滚动，这比合并要快一点：

def merger_try(a, b):
    if not a or not b:
        yield chain(a, b)
    iter_a, iter_b = iter(a), iter(b)
    prev_a, prev_b = next(iter_a), next(iter_b)
    while True:
        if prev_a >= prev_b:
            yield prev_b
            try:
                prev_b = next(iter_b)
            except StopIteration:
                yield prev_a
                break
        else:
            yield prev_a
            try:
                prev_a = next(iter_a)
            except StopIteration:
                yield prev_b
                break
    for ele in chain(iter_b, iter_a):
        yield ele

一些时间安排：

In [128]: timeit list(merge(A,B))
1 loops, best of 3: 771 ms per loop

In [129]: timeit list(merger_try(A,B))
1 loops, best of 3: 581 ms per loop

In [130]: list(merger_try(A,B))  == list(merge(A,B))
Out[130]: True

In [131]: %%timeit                                 
C = []
i = j = 0
while i < len(A) and j < len(B):
    if A[i] < B[j]:
        C.append(A[i])
        i += 1
    else:
        C.append(B[j])
        j += 1
C += A[i:] + B[j:]
   .....: 
1 loops, best of 3: 919 ms per loop

【讨论】：

这个算法的顺序是什么？
@Arman 文档没有说明，但可以合理地假设合并操作将是 O(n)，其中 n 是输入的组合大小。要将结果返回到A，请使用A = list(merge(A, B))。

【解决方案2】：

bisect 模块“提供了对按排序顺序维护列表的支持，而无需在每次插入后对列表进行排序”：

import bisect
for b in B:
    bisect.insort(A, b)

此解决方案不会创建新列表。

请注意bisect.insort(A, b)等价于

A.insert(bisect.bisect_right(A, b), b)

即使搜索速度很快 (O(log n))，插入速度也很慢 (O(n))。

【讨论】：

这是 O(nlgm)。可以在 O(n+m) 中做到这一点。
这仍然是一个 O(n^2) 的解决方案，可以在 O(n) 中完成。即使是最易读的解决方案sorted(A+B) 也是 O(n*logn)，因此对于大型输入来说更快。 bisect.insort 在您逐个插入元素并且需要在每一步之后保持对结果进行排序时很有用 - 这里不是这种情况。
插入不是真正的 O(n) 对吗？列表扩展不是一次完成 1 次。 Python“过度分配”。
@JayanthKoushik 插入是 O(n) 因为插入点之后的所有元素都需要移动。过度分配没有帮助。

【解决方案3】：

这篇文章中有很多很好的讨论！争论时间很难，所以我写了一些时间脚本。这是相当初级的，但我认为它现在就可以了。我也附上了结果。

import timeit
import math
import matplotlib.pyplot as plt
from collections import defaultdict


setup = """
import bisect
import heapq
from random import randint


A = sorted((randint(1, 10000) for _ in range({})))
B = sorted((randint(1, 10000) for _ in range({})))


def bisect_sol(A, B):
    for b in B:
        bisect.insort(A, b)


def merge_sol(A, B):
    ia = ib = 0
    while ib < len(B):
        if ia < len(A) and A[ia] < B[ib]:
            if ia < len(A):
                ia += 1
        else:
            A.insert(ia + 1, B[ib])
            ib += 1


def heap_sol(A, B):
    return heapq.merge(A, B)


def sorted_sol(A, B):
    return sorted(A + B)
"""


sols = ["bisect", "merge", "heap", "sorted"]
times = defaultdict(list)
iters = [100, 1000, 2000, 5000, 10000, 20000, 50000, 100000]

for n in iters:
    for sol in sols:
        t = min(timeit.repeat(stmt="{}_sol(A, B)".format(sol), setup=setup.format(n, n), number=1, repeat=5))
        print("({}, {}) done".format(n, sol))
        times[sol].append(math.log(t))

for sol in sols:
    plt.plot(iters, times[sol])
plt.xlabel("iterations")
plt.ylabel("log time")
plt.legend(sols)
plt.show()

这是结果：

很明显，修改列表是主要瓶颈，因此创建一个新列表是要走的路。

【讨论】：

你也可以添加我的答案吗，我想看看它的顺序，因为文档会说
@Arman：添加；同时，请看一下代码。不希望测试代码中出现任何错误！
谢谢老兄，但坡度越低意味着时机越差？
是的。 y轴是时间。所以堆解决方案是最好的，合并是最差的。
我正在运行测试以进行更多迭代 - 不过应该得到相同的结果。

【解决方案4】：

这是O(n)中的解决方案：

A = [10, 20, 30, 40]
B = [20, 60, 81, 90]
C = []
i = j = 0
while i < len(A) and j < len(B):
    if A[i] < B[j]:
        C.append(A[i])
        i += 1
    else:
        C.append(B[j])
        j += 1
C += A[i:] + B[j:]

【讨论】：

【解决方案5】：

您需要执行合并。但是“传统”合并会生成一个新列表；因此您需要进行一些修改才能扩展一个列表。

ia = ib = 0
while ib < len(B):
    if ia < len(A) and A[ia] < B[ib]:
        if ia < len(A):
            ia += 1
    else:
        A.insert(ia + 1, B[ib])
        ib += 1

【讨论】：

【解决方案6】：

已编辑

l1 = [10,20,30,40]
l2 = (10,20,30,40)
l2 = list(l2)
l1 = sorted(l1+l2)

【讨论】：

每次查找索引都可以在 O(logn) 内完成，但是插入需要 O(n)，所以总数是 O(n^2)。
为什么你同时做extend和l1 + l2？您将 l2 的元素添加了两次。
@JayanthKoushik，你说得对，我删除了行扩展