改进我对埃拉托色尼筛的实施答案

【问题标题】：Improving my implementation of the sieve of Eratosthenes改进我对埃拉托色尼筛的实施
【发布时间】：2020-04-15 13:22:07
【问题描述】：

我正在创建一个 Eratosthenes 筛子，以便更有效地对 1 和大数 n 之间的素数求和。我要做的是创建一个从2到n的列表，然后删除2的倍数，然后是3的倍数，然后是列表中下一个数字的倍数，依此类推。我创建的代码我认为它在时间上的性能非常慢，这几乎就像通过检查每个条目是否是质数来创建一个列表。我猜我的操作数量是有序的： n 的平方根（第一个 while 循环）乘以（略小于）n 的平方根（对于第二个 while 循环）。所以我不确定删除方法或其他方法是否会减慢它的速度。

我的代码是这个：

def sieve_of_Eratosthenes(n):
L= list(range(2,n+1))
# creates a list from 2 to n

i=2
while i*i <=n: # going to remove multiples of i, starting at i^2
    k=i        # if j <i then ij already checked
    while i*k <= max(L):
        try:
            L.remove(i*k)   # there is an error if the element is not in 
                            # the list so need to add these two lines
        except ValueError:  
            pass     # do nothing!
        k=k+1
    i=L[i-1]      # list index starts at 0, want i to be next element in the list
# print(L)
return L

【问题讨论】：

有很多实现贴here
删除是罪魁祸首，它破坏了直接可索引性。不得删除，而是标记而不删除。删除按值删除，因此必须扫描列表，因此每次删除都是 O(n)； O(1) 按索引标记是目标，也是整体速度的前提。
@WillNess 这就是我正在寻找的答案！谢谢
自相矛盾，一些移除不是 O(n)，所以这些移除是好的。有些是 O(1)（哈希表）有些是 O(log n)（来自树，或者在合并排序的递增的倍数列表时），这是可以容忍的。
这里说从列表中删除一个元素是 O(n) wiki.python.org/moin/TimeComplexity

标签： python performance primes

【解决方案1】：

假设

问题在于如何提高软件的运行时间，因为它非常慢。`

执行以下两个代码更改以加快您的代码速度

与其保留素数列表，不如将数字检查为素数 (True) 或非素数（假）
只检查奇数 > 2 的素数

代码

def sieve_of_Eratosthenes2(n):
    if n < 2:
        return []
    if n < 3:
        return [2]

    L = [True] * (n+1)    # all numbers set as primes initially

    # modifies prime flag in list for odd numbers
    for i in range(3, n, 2): # Check odd numbers for prime (no need to check even numbers)
        if L[i]: # A prime
            L[i*i::i] = [False]*len(L[i*i::i]) # from i^2 in increments of i

    # Report prime 2 + odd primes
    return [2] + [i for i in range(3, n, 2) if L[i]]  # Get odd numbers whose flag is 
                                                      # still True

新代码

%timeit sieve_of_Eratosthenes2(1000)
188 µs ± 16.6 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit sieve_of_Eratosthenes2(100000)
16 ms ± 1.58 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In going from 1, 000 to 100, 000 time 
(i.e. 100X), time increased by ~85, 
so almost linear

旧代码

 %timeit sieve_of_Eratosthenes(1000)
 25.2 ms ± 1.59 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
 sieve_of_Eratosthenes2(100000)
 261.45 seconds  (using time module)

In going from 1, 000 to 100, 000 (100X)
time increased by factor of ~10, 000
So quadratic increase in time (i.e. 100^2).

说明

Sieve of Eratosthenes 的复杂度是

O(N log (log N))

这几乎是线性的，因为将数组中的数字标记为 True（素数）和 False（非素数）的操作通常是 O(1)。

在原始算法中，非质数在哪里被移除而不是被标记出来：

O(N) per removal.

这为埃拉托色尼筛法的复杂度增加了一个额外的因子 N，导致原始算法复杂度为：

O(N*N*log (log N))

因此几乎是二次的，正如运行时间所证实的那样。

【讨论】：

感谢您的回答，只是几个小问题：在您列表的原始定义中，L = [True] * (n+1) 我猜您想要一个包含 n 个 True 的列表，不管它有多长，只要它比 n+1 长（因为列表索引从 0 开始）。这是因为，最后，您只关心存储 True 的那些位置（最多 n 个）（这些将是素数列表）。第二个问题：不应该是 range(3, n+1, 2) 吗（这样它就包含了 n，以防 n 为奇数）？
@inquisitor——所有问题的答案都是肯定的。顺便说一句，您会注意到我们只在数组中使用奇数索引（因为我们知道偶数是非质数）。这允许您将数组大小减少一半，但需要稍微复杂的数组索引。
一个大小点的比较大多没有意义。 en.wikipedia.org/wiki/…，请！
@WillNess--添加了额外的解释和数据点。这是否增加了更多意义？
是的，当然。干杯！（顺便说一句，没有通知，因为“--”之前没有空格）