如何使用多处理并行使用 python 生成器？答案

【问题标题】：How to consume a python gneerator in parallel using multiprocessing?如何使用多处理并行使用 python 生成器？
【发布时间】：2021-02-20 11:49:55
【问题描述】：

如何提高networkx函数的性能local_bridgeshttps://networkx.org/documentation/stable//reference/algorithms/generated/networkx.algorithms.bridges.local_bridges.html#networkx.algorithms.bridges.local_bridges

我已经尝试过使用 pypy - 但到目前为止，我仍然坚持在单核上使用生成器。我的图有 30 万条边。一个例子：

# construct the nx Graph:
import networkx as nx
# construct an undirected graph here - this is just a dummy graph
G = nx.cycle_graph(300000)

# fast - as it only returns an generator/iterator
lb = nx.local_bridges(G)

# individual item is also fast
%%time
next(lb)
CPU times: user 1.01 s, sys: 11 ms, total: 1.02 s
Wall time: 1.02 s

# computing all the values is very slow.
lb_list = list(lb)

如何并行使用此迭代器以利用所有处理器内核？当前的幼稚实现仅使用单核！

我幼稚的多线程第一次尝试是：

import multiprocessing as mp
lb = nx.local_bridges(G)
pool = mp.Pool()
lb_list = list(pool.map((), lb))

但是，我不想应用特定的函数 - () 而只是从迭代器中并行获取 next 元素。

相关： python or dask parallel generator?

编辑

我想它归结为如何并行化：

lb_res = []
lb = nx.local_bridges(G)
for node in range(1, len(G) +1):
    lb_res.append(next(lb))
    
lb_res

天真地使用多处理显然失败了：

# from multiprocessing import Pool
# https://stackoverflow.com/questions/41385708/multiprocessing-example-giving-attributeerror
from multiprocess import Pool
lb_res = []
lb = nx.local_bridges(G)

def my_function(thing):
    return next(thing)

with Pool(5) as p:
    parallel_result = p.map(my_function, range(1, len(G) +1))
    
parallel_result

但我不清楚如何将生成器作为参数传递给 map 函数 - 并完全使用生成器。

编辑 2

对于这个特定的问题，瓶颈是with_span=True 参数的最短路径计算。禁用时，速度相当快。

当需要计算跨度时，我建议cugraph 在 GPU 上快速实现 SSSP。尽管如此，对边集的迭代并不是并行发生的，应该进一步改进。

但是，要了解更多信息，我有兴趣了解如何在 python 中并行化生成器的消耗。

【问题讨论】：

Re：并行化事物，也许从生成器中读取项目，然后将它们发布到 Queue，然后让您的工作人员将项目出列以进行处理？

标签： python multithreading generator networkx

【解决方案1】：

您不能并行使用生成器，每个非平凡生成器的下一个状态都由其当前状态决定。您必须按顺序拨打next()。

来自https://github.com/networkx/networkx/blob/master/networkx/algorithms/bridges.py#L162的函数是这样实现的

for u, v in G.edges:
    if not (set(G[u]) & set(G[v])):
        yield u, v

因此您可以使用类似这样的方式将其并行化，但是您将不得不承担使用类似multiprocessing.Manager 的方式合并这些单独列表的惩罚。我认为这只会让整个事情变得更慢，但你可以自己计时。

def process_edge(e):
    u, v = e
    lb_list = []
    if not (set(G[u]) & set(G[v])):
        lb_list.append((u,v))
with Pool(os.cpu_count()) as pool:
    pool.map(process_edge, G.edges)

另一种方法是将图形拆分为多个顶点范围并同时处理它们。

def process_nodes(nodes):
    lb_list = []
    for u in nodes:
        for v in G[u]:
            if not (set(G[u]) & set(G[v])):
                lb_list.append((u,v))

with Pool(os.cpu_count()) as pool:
    pool.map(process_nodes, np.array_split(list(range(G.number_of_nodes())), 
os.cpu_count()))

也许您还可以检查是否存在针对此问题的更好算法。或者找一个用 C 实现的更快的库。

【讨论】：