【发布时间】:2013-08-08 20:02:08
【问题描述】:
我正在尝试使用 IPython.parallel 映射。我希望并行化的函数的输入是生成器。由于大小/内存的原因,我无法将生成器转换为列表。见以下代码:
from itertools import product
from IPython.parallel import Client
c = Client()
v = c[:]
c.ids
def stringcount(longstring, substrings):
scount = [longstring.count(s) for s in substrings]
return scount
substrings = product('abc', repeat=2)
longstring = product('abc', repeat=3)
# This is what I want to do in parallel
# I should be 'for longs in longstring' I use range() because it can get long.
for num in range(10):
longs = longstring.next()
subs = substrings.next()
print(subs, longs)
count = stringcount(longs, subs)
print(count)
# This does not work, and I understand why.
# I don't know how to fix it while keeping longstring and substrings as
# generators
v.map(stringcount, longstring, substrings)
for r in v:
print(r.get())
【问题讨论】:
-
您能否更具体地说明您对内存中可以存在多少项的要求?由于执行是异步的,如果你遍历一个生成器,你可能几乎所有的输入都在内存中,除非你在提交新任务之前开始等待结果。
-
因为我运行的是 64 位,我想我的限制是系统内存为 8GB,或者可以使用 32GB 的机器。例如,product('abcd', repeat=10) 变得非常大,基本上一旦我根据计数找到满足我要求的结果,我就可以停止。我假设/希望我 map() 会根据需要从生成器中获取。等结果就好了。
标签: python multiprocessing ipython ipython-parallel