如何解决这个内存错误 - python答案

【问题标题】：How to solve this memory error - python如何解决这个内存错误 - python
【发布时间】：2011-05-16 02:04:27
【问题描述】：

我有以下代码

    def gen_primes():

        D = {}  

        q = 2  

        while True:
            if q not in D:         
                yield q        
                D[q * q] = [q]
            else:           
                for p in D[q]:
                    D.setdefault(p + q, []).append(p)
                del D[q]

            q += 1


    f = open("primes1.txt","w")

    filen = 1
    ran1 = 1
    ran2 = 10000000

    k = 1
    for i in gen_primes():

        if (k >= ran1) and (k <= ran2):

            f.write(str(i) + "\n")
            if k%1000000 == 0:
                print k
            k = k + 1
        else:
            ran1 = ran2 + 1
            ran2 = ran2 + 10000000
            f.close()
            filen = filen + 1;
            f = open("primes" + str(filen) + ".txt","w")

        if k > 100000000:           
            break
    f.close()

素数生成算法取自Simple Prime Generator in Python

这个程序出现内存错误

Traceback (most recent call last):
  File "C:\Python25\Projects\test.py", line 43, in <module>
    for i in gen_primes():
  File "C:\Python25\Projects\test.py", line 30, in gen_primes
    D.setdefault(p + q, []).append(p)
MemoryError

我正在尝试将连续的 10,000,000 个素数存储在一个文件中。

【问题讨论】：

另见该问题：stackoverflow.com/questions/2211990

标签： python memory primes

【解决方案1】：

这个素数生成器不使用太多内存。它也不是很快。

def gcd(a, b):
    rem = a % b
    while rem != 0:
        a = b
        b = rem
        rem = a % b
    return b

def primegen():
    yield 2
    yield 3
    yield 5
    yield 7
    yield 11
    accum = 2*3*5*7
    out = file('tmp_primes.txt', 'w')
    inp = file('tmp_primes.txt', 'r+')
    out.write('0x2\n0x3\n0x5\n0x7\n0xb\n')
    inp.read(20)
    inpos = inp.tell()
    next_accum = 11
    next_square = 121
    testprime = 13
    while True:
        if gcd(accum, testprime) == 1:
            accum *= testprime # It's actually prime!
            out.writelines((hex(testprime), '\n'))
            yield testprime
        testprime += 2
        if testprime >= next_square:
            accum *= next_accum
            nextline = inp.readline()
            if (len(nextline) < 1) or (nextline[-1] != '\n'):
                out.flush()
                inp.seek(inpos)
                nextline = inp.readline()
            inpos = inp.tell()
            next_accum = int(nextline, 16)
            next_square = next_accum * next_accum

def next_n(iterator, n):
    """Returns the next n elements from an iterator.

    >>> list(next_n(iter([1,2,3,4,5,6]), 3))
    [1, 2, 3]
    >>> list(next_n(primegen(), 10))
    [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]
    """
    while n > 0:
        yield iterator.next()
        n -= 1

【讨论】：

如果有人愿意在绝对必要之前不将素数乘入累加器并使用包含迄今为止发现的素数列表的临时文件，这可以大大改善接下来要乘以累加器的素数可以很容易地读入，而无需将它们全部存储在内存中。
您正在使用欧几里得算法进行 gcd。您能告诉我如何使用此代码生成素数。我无法理解 next_n 的工作原理。
@Vinod：primegen 函数是一个生成器，它将生成无限数量的素数，例如您的示例代码中的gen_primes。 next_n 函数是一个生成器，它简单地从迭代器返回下一个 n 元素。如果您有一个无限长的迭代器，但只需要其中的 100 个元素，这将非常有用。
@Vinod：我稍微修改了一下，试图让它更清楚到底发生了什么。知道next_n 纯粹是一个有用的实用函数可能会有所帮助，而对于生成素数来说根本不是必需的。
@Vinod：我已经修改了primegen，正如我在我的 cmets 中谈到的那样。它现在应该更快并且使用更少的内存。它通过将生成的素数列表存储在文件中并根据需要读取它们以更新“累加器”来完成此操作，累加器是要检查的素数列表。

【解决方案2】：

尝试使用此生成器：http://code.activestate.com/recipes/366178-a-fast-prime-number-list-generator/

速度非常快（几秒钟内 10000000 个素数），而且不占用内存

要保存在文件中，可能更容易执行以下操作：

interval_start = 100
interval_length = 10000000
f = open("primes1.txt","w")

for prime in primes(interval_start + interval_length)[interval_start::]:
   f.write(str(prime) + "\n")

f.close()

【讨论】：

我需要前 1 亿个素数。这会导致内存问题。
只需在 interval_lenght 中添加一个额外的零，我只是对其进行测试，它创建了一个 0.1e^9 行的 48,7Mb 文件...
@Vinod，你的内存有多大？在我2GB内存的电脑上很好用。
@xiao 我的笔记本电脑有 2.5GB 内存。
@Vinod 您使用的是 64 位版本的 python 吗？我在 32 位 python 安装中尝试了代码并捕获了 MemoryError，尽管它在 python 64 下工作正常

【解决方案3】：

我在大约 30 秒的时间内在我的机器上运行了 10^6 个素数的稍微修改的代码（我正在运行 Python 3.2）

代码如下：

def gen_primes():  
    D = {}  
    # The running integer that's checked for primeness
    q = 2  

    while True:
        if q not in D:
            yield q        
            D[q * q] = [q]
        else:
            for p in D[q]:
                D.setdefault(p + q, []).append(p)
            del D[q]

        q += 1


def main():
    j = 0
    f = open("primes1.txt","w")
    for i in gen_primes():
        j += 1
        #print(j, i)
        f.write(str(i) + "\n")
        if (j > 10000000): break
    f.close()

if __name__ == "__main__":
    main()

【讨论】：

我不能用这个生成 1 亿个素数。内存问题依然存在。

【解决方案4】：

安装包gmpy就可以写文件了，完全不需要太多内存

import gmpy
p=2
with open("primes.txt","w") as f:
    for n in xrange(100000000):
        print >> f, p
        p = gmpy.next_prime(p)

【讨论】：

我在 Windows 中安装 gmpy 时遇到问题。我也安装了 gmp。