使用工人池和生成器对象python异步生成排列答案

【问题标题】：Asynchronously generating permutations using a pool of workers and a generator object python使用工人池和生成器对象python异步生成排列
【发布时间】：2018-12-05 01:15:54
【问题描述】：

我需要让一些排列异步运行，以减少生成列表中所有可能排列的文件所需的时间。我曾多次尝试对此进行多重处理，但均未成功。

要求的结果：

包含以下格式的字符串列表的文件： PRE + JOINEDPERMUTATION

其中 PRE 来自列表“前缀”

从 "".join(x) 中找到 JOINEDPERMUTATION

其中 x 是从排列（项目、重复）中找到的

ITEMS 是我需要检索排列的值列表

REPETITIONS 我希望在 range(8) 重复中找到此列表的每个排列

items=['a','b','c']
prefix=['one','two','three']
from itertools import permutations
from multiprocessing import Pool
pool=Pool(14)

def permutations(pre, repetitions, items):
    PERMS = [ pre + "".join(x) for x in permutations(items, repetitions) ]
    return PERMS

def result_collection(result):
    results.extend(result)
    return results

results=[]

args = ((pre, repetitions, items) for pre in prefix for repetitions in range(5))

for pre, repetitions, items in args:
    pool.apply_async(permutations, (pre, repetitions, items), callback=result_collection)
pool.close()
pool.join()

with open('file.txt','a',encoding='utf-8') as file:
    file.writelines(results)

我本身并没有收到错误，但是在使用列表运行该程序后，ITEMS 有 50 个元素，PREFIXES 有 5 个； 8 小时后仍未完成，我不知道如何进一步调查。

还有一个快速的备用查询我是否认为在多处理模块中基本上没有用 'pool.map' ，因为它只会利用一个工人？为什么会在这里？

【问题讨论】：

标签： python multiprocessing generator itertools pool

【解决方案1】：

很难相信你本身没有收到错误，这件事应该像疯了一样引发 RuntimeError。

在新生成的进程中，生成它的模块被加载，即执行。这意味着您的代码尝试创建 14 个进程，每个进程都尝试创建 14 个进程，每个进程都尝试创建 14 个……您可能会在这里看到这种模式：)

您必须将只能从主进程执行的所有内容放在__name__ == '__main__' 块中。这将阻止这些部分代码在工作人员中执行，因为对于他们来说，__name__ 是__mp_name__。

这样做将修复多处理部分，但还有另一个问题。从 itertools 导入 permutations，然后在命名空间中创建一个同名的函数，有效地覆盖来自 itertools 的函数。当您的进程调用 your 函数 permutations 时，PERMS = [ pre + "".join(x) for x in permutations(items, repetitions) ] 行将引发 TypeError，因为您在那里调用 your 排列函数，但使用两个参数，而不是三个你的函数定义需要的。

这应该做你想做的：

from itertools import permutations as it_perms
from multiprocessing import Pool
items=['a','b','c']
prefix=['one','two','three']


def permutations(pre, repetitions, items):
    PERMS = [ pre + "".join(x) for x in it_perms(items, repetitions) ]
    return PERMS

def result_collection(result):
    results.extend(result)
    return results


if __name__ == '__main__':
    pool = Pool(14)
    results = []

    args = ((pre, repetitions, items) for pre in prefix for repetitions in range(5))

    for pre, repetitions, items in args:
        pool.apply_async(permutations, (pre, repetitions, items), callback=result_collection)
    pool.close()
    pool.join()

    with open('file.txt','a',encoding='utf-8') as file:
        file.writelines(results)

至于你的侧面查询：你从哪里得到 pool.map() 只会利用一个工人的想法？您可能想在this question 上查看答案，尤其是this one

【讨论】：

哇！这就解释了它是如何让 16 个处理器以 100% 的速度运行的！！ Ryzen 7 给我留下了深刻的印象。它甚至没有减慢 chrome 的速度！
关于实际上我在将代码复制到我的问题时更改了名称的排列（偶然；希望它显示了尝试在一周内学习 python 的疲倦）。
我现在将尝试您的解决方案，并感谢您如此彻底。一旦一切顺利，我将标记为已接受。
关于 pool.map 的想法——那是我的第一个解决方案：我会把它放进去以防万一你能提示我为什么它只在一个 CPU 上运行。
您不厌其烦地帮忙。非常感谢