为什么排序时 Python 列表会变慢？答案

【问题标题】：Why is Python list slower when sorted?为什么排序时 Python 列表会变慢？
【发布时间】：2021-12-25 05:17:39
【问题描述】：

在下面的代码中，我创建了两个具有相同值的列表：一个未排序 (s_not) 的列表，另一个已排序 (s_yes)。这些值由 randint() 创建。我为每个列表运行一些循环并计时。

import random
import time

for x in range(1,9):

    r = 10**x # do different val for the bound in randint()
    m = int(r/2)

    print("For rand", r)

    # s_not is non sorted list
    s_not = [random.randint(1,r) for i in range(10**7)]

    # s_yes is sorted
    s_yes = sorted(s_not)

    # do some loop over the sorted list
    start = time.time()
    for i in s_yes:
        if i > m:
            _ = 1
        else:
            _ = 1
    end = time.time()
    print("yes", end-start)

    # do the same to the unsorted list
    start = time.time()
    for i in s_not:
        if i > m:
            _ = 1
        else:
            _ = 1
    end = time.time()
    print("not", end-start)

    print()

有输出：

For rand 10
yes 1.0437555313110352
not 1.1074268817901611

For rand 100
yes 1.0802974700927734
not 1.1524150371551514

For rand 1000
yes 2.5082249641418457
not 1.129960298538208

For rand 10000
yes 3.145440101623535
not 1.1366300582885742

For rand 100000
yes 3.313387393951416
not 1.1393756866455078

For rand 1000000
yes 3.3180911540985107
not 1.1336982250213623

For rand 10000000
yes 3.3231537342071533
not 1.13503098487854

For rand 100000000
yes 3.311596393585205
not 1.1345293521881104

因此，当增加 randint() 中的界限时，排序列表上的循环会变慢。为什么？

【问题讨论】：

n=10^7 可能是矫枉过正。低至 n=10^5 给了我可比较的结果，并且只需要大约 2 秒即可运行。
对于那些归因于缓存未命中的人：所有r 的列表大小相同，但在数字超过 10**100 之前，运行时没有差异
与this Java question基本相同的问题。
@Davislor：那没什么区别； Python 的list 已经是连续的了，排序是通过交换数据来完成的。 sorted 没有就地执行（有点；它创建了一个新的list，然后对 that 进行了就地排序），但这在很大程度上是无关紧要的； list 存储指向其中各种对象的指针，而不是原始数据，因此两个 lists 都在为相同的对象设置别名。
@nocomment：我们都是对的，我们只是使用不同的上下文。是的，在内部，CPython 使用的是 TimSort（修改后的合并排序），它不会就地排序，也不会随其进行交换。我说的是 Python 层的可观察行为（无论如何，对于 list.sort，你不能告诉 sorted 的任何这些，因为它在内部生成了新的 list，你不能检查它），其中原始的list 被就地修改，并且其中包含相同的对象（没有重新创建东西）。如果 Davislor 说的是真正的数组（array 模块或 numpy），那我就离题了。

标签： python list caching

【解决方案1】：

缓存未命中。当N int 对象被背靠背分配时，为保存它们而保留的内存往往位于连续的块中。因此，按分配顺序遍历列表往往会以连续、连续、递增的顺序访问保存 int 值的内存。

随机播放，爬取列表时的访问模式也是随机的。缓存未命中比比皆是，只要有足够多不同的 int 对象，它们并不都适合缓存。

在r==1 和r==2，CPython 碰巧将这样的小整数视为单例，因此，例如，尽管列表中有 1000 万个元素，但在 r==2 它只包含（最多）100不同的 int 对象。这些数据的所有数据同时放入缓存中。

不过，除此之外，您可能会获得更多、更多、更独特的 int 对象。当访问模式是随机的时，硬件缓存变得越来越没用。

说明：

>>> from random import randint, seed
>>> seed(987987987)
>>> for x in range(1, 9):
...     r = 10 ** x
...     js = [randint(1, r) for _ in range(10_000_000)]
...     unique = set(map(id, js))
...     print(f"{r:12,} {len(unique):12,}")
...     
          10           10
         100          100
       1,000    7,440,909
      10,000    9,744,400
     100,000    9,974,838
   1,000,000    9,997,739
  10,000,000    9,999,908
 100,000,000    9,999,998

【讨论】：

@arne sorted 立即排序。所以在循环和计时开始之前。你似乎是一个 C++ 人。 Python int 是对象，列表只存储它们的地址。不是vector<int>，而是vector<int*>。按顺序读取指针是缓存友好的。但是它们指向的整数的缓存友好性取决于整数在内存中的位置。
@arne 记住差异对于深入理解 Python 的工作原理至关重要。
@ThomasWeller，最后的效果是一样的：原来的链表按分配顺序访问，改组后按内存块的“随机”顺序访问。
我要补充一点，这里的“排序”并不是真正的基本驱动因素：any 影响随机排列的方式将具有相同的最终效果。在原始版本中，因为值本身是随机的，所以排序会产生随机排列。
@arne 我的评论不仅仅是关于性能。我发现了解事情在幕后是如何运作的很有用。你可能不同意，并且发现没有这些知识 Python 完全可以使用，但我认为你错过了。

【解决方案2】：

正如其他人所说，缓存未命中。不是值/排序。相同的排序值，但具有新的顺序创建的对象，再次快速（实际上甚至比not 的情况快一点）：

s_new = [--x for x in s_yes]

只选择一种尺寸：

For rand 1000000
yes 3.6270992755889893
not 1.198620080947876
new 1.02010178565979

查看从一个元素到下一个元素的地址差异（只有 10⁶ 个元素）表明，特别是对于 s_new，元素在内存中很好地按顺序排列（下一个元素的时间为 99.2%元素晚了 32 个字节），而 s_yes 完全没有（只有 0.01% 晚了 32 个字节）：

s_yes:
    741022 different address differences occurred. Top 5:
    Address difference 32 occurred 102 times.
    Address difference 0 occurred 90 times.
    Address difference 64 occurred 37 times.
    Address difference 96 occurred 17 times.
    Address difference 128 occurred 9 times.

s_not:
    1048 different address differences occurred. Top 5:
    Address difference 32 occurred 906649 times.
    Address difference 96 occurred 8931 times.
    Address difference 64 occurred 1845 times.
    Address difference -32 occurred 1816 times.
    Address difference -64 occurred 1812 times.

s_new:
    19 different address differences occurred. Top 5:
    Address difference 32 occurred 991911 times.
    Address difference 96 occurred 7825 times.
    Address difference -524192 occurred 117 times.
    Address difference 0 occurred 90 times.
    Address difference 64 occurred 37 times.

代码：

from collections import Counter

for s in 's_yes', 's_not', 's_new':
    print(s + ':')
    ids = list(map(id, eval(s)))
    ctr = Counter(j - i for i, j in zip(ids, ids[1:]))
    print('   ', len(ctr), 'different address differences occurred. Top 5:')
    for delta, count in ctr.most_common(5):
        print(f'    Address difference {delta} occurred {count} times.')
    print()

【讨论】：

一个更好的说明为什么这是关于局部性的原因是创建s_yes = list(range(10**7))、s_not = s_yes[:]、random.shuffle(s_not)。现在排序后的数组也是连续分配的，而未排序的数组是不连续的，所以时间应该反过来。
@ShadowRanger 嗯，我不认为这“更好”。也许同样好。但这与他们的数据相去甚远。

【解决方案3】：

答案可能是数据的局部性。超过一定大小限制的整数是动态分配的。创建列表时，整数对象是从（大部分）附近的内存中分配的。因此，当您遍历列表时，内容往往会在缓存中，硬件预取器可以将它们放在那里。

在排序的情况下，对象被打乱，导致更多的缓存未命中。

【讨论】：

绝对是。在按给定顺序分配内存后随机访问数据比按原始顺序顺序访问数据要慢。反过来这样做表明排序与它无关（除了更改访问顺序）。即，以 s_yes 作为范围开始，以 s_not 作为 s_yes 的随机副本开始，会导致 s_not 的时间更长。
那个“一定大小”是256。
当然，@user17242583 可能会发生变化。我认为它在过去已经改变过一次。