Python 多处理比常规处理要慢。我该如何改进？答案

【问题标题】：Python multiprocessing is slower than regular. How can I improve?Python 多处理比常规处理要慢。我该如何改进？
【发布时间】：2018-11-21 21:44:11
【问题描述】：

基本上有一个脚本来梳理节点/点的数据集以删除那些重叠的。实际的脚本更复杂，但我将其缩减为基本上是一个简单的重叠检查，对演示没有任何作用。

我尝试了几种使用锁、队列、池的变体，一次添加一项作业而不是批量添加。一些最严重的违规者的速度要慢几个数量级。最终，我以最快的速度完成了它。

发送到各个进程的重叠检查算法：

def check_overlap(args):
    tolerance = args['tolerance']
    this_coords = args['this_coords']
    that_coords = args['that_coords']

    overlaps = False
    distance_x = this_coords[0] - that_coords[0]
    if distance_x <= tolerance:
        distance_x = pow(distance_x, 2)
        distance_y = this_coords[1] - that_coords[1]
        if distance_y <= tolerance:
            distance = pow(distance_x + pow(distance_y, 2), 0.5)
            if distance <= tolerance:
               overlaps = True

    return overlaps

处理函数：

def process_coords(coords, num_processors=1, tolerance=1):
    import multiprocessing as mp
    import time

    if num_processors > 1:
        pool = mp.Pool(num_processors)
        start = time.time()
        print "Start script w/ multiprocessing"

    else:
        num_processors = 0
        start = time.time()
        print "Start script w/ standard processing"

    total_overlap_count = 0

    # outer loop through nodes
    start_index = 0
    last_index = len(coords) - 1
    while start_index <= last_index:

        # nature of the original problem means we can process all pairs of a single node at once, but not multiple, so batch jobs by outer loop
        batch_jobs = []

        # inner loop against all pairs for this node
        start_index += 1
        count_overlapping = 0
        for i in range(start_index, last_index+1, 1):

            if num_processors:
                # add job
                batch_jobs.append({
                    'tolerance': tolerance,
                    'this_coords': coords[start_index],
                    'that_coords': coords[i]
                })

            else:
                # synchronous processing
                this_coords = coords[start_index]
                that_coords = coords[i]
                distance_x = this_coords[0] - that_coords[0]
                if distance_x <= tolerance:
                    distance_x = pow(distance_x, 2)
                    distance_y = this_coords[1] - that_coords[1]
                    if distance_y <= tolerance:
                        distance = pow(distance_x + pow(distance_y, 2), 0.5)
                        if distance <= tolerance:
                            count_overlapping += 1

        if num_processors:
            res = pool.map_async(check_overlap, batch_jobs)
            results = res.get()
            for r in results:
                if r:
                    count_overlapping += 1

        # stuff normally happens here to process nodes connected to this node
        total_overlap_count += count_overlapping

    print total_overlap_count
    print "  time: {0}".format(time.time() - start)

及测试功能：

from random import random

coords = []
num_coords = 1000
spread = 100.0
half_spread = 0.5*spread
for i in range(num_coords):
    coords.append([
        random()*spread-half_spread,
        random()*spread-half_spread
    ])

process_coords(coords, 1)
process_coords(coords, 4)

尽管如此，非多处理始终在不到 0.4 秒的时间内运行，而多处理我可以得到略低于 3.0 秒的运行时间。我知道这里的算法可能太简单而无法真正获得收益，但考虑到上述情况有近 50 万次迭代，而实际情况有更多，多处理速度慢了一个数量级对我来说很奇怪。

我缺少什么/我可以做些什么来改进？

【问题讨论】：

那么您是否尝试过在实际数据上实际运行它并确定多处理是否值得？例如，不要使用 500 万，而是使用 500 万，看看会发生什么？过早的优化可能会花费您大量时间
@IanQuah，我尝试过一次，速度较慢，但我使用了一种被废弃的方法，结果证明效率低下，因此进行了隔离和试验。也就是说，在这一点上，这更像是个人的好奇心。如果我不能让这样一个看似简单的用例受益，我什么时候使用 python 多处理？
您的计算量不足以收回 IPC 开销。如果您的顺序代码不需要几秒钟即可完成，请不要开始考虑使用多处理。阅读this 和this 可能会提高你的理解。
@Darkonaut 是的，我读到的更多信息似乎与我对它的其他熟悉程度（主要是在 Java 中）相比，Python 在多处理方面有很多开销。也就是说，这些链接中的 numpy 建议是基本数学运算的一个很好的解决方法。

标签： python multiprocessing python-multiprocessing

【解决方案1】：

构建序列化代码中未使用的O(N**2) 3-element dicts，并通过进程间管道传输它们，是保证多处理无济于事的好方法 ;-) 没有什么是免费的——一切都是有代价的。

以下是执行许多相同代码的重写，无论它是以串行模式还是多处理模式运行。没有新的字典等。一般来说，len(coords) 越大，它从多处理中获得的好处就越大。在我的机器上，20000 多处理运行大约需要挂钟时间的三分之一。

关键是所有进程都有自己的coords 副本。这是通过在创建池时仅传输一次来完成的。这应该适用于所有平台。在 Linux-y 系统上，它可能会“通过魔法”发生，而不是通过分叉的进程继承。将跨进程发送的数据量从 O(N**2) 减少到 O(N) 是一项巨大的改进。

充分利用多处理需要更好的负载平衡。照原样，对check_overlap(i) 的调用会将coords[i] 与coords[i+1:] 中的每个值进行比较。 i 越大，它要做的工作就越少，对于 i 的最大值，只是在进程之间传输 i 的成本 - 并将结果传输回来 - 淹没了花费的时间在check_overlap(i).

def init(*args):
    global _coords, _tolerance
    _coords, _tolerance = args

def check_overlap(start_index):
    coords, tolerance = _coords, _tolerance
    tsq = tolerance ** 2
    overlaps = 0
    start0, start1 = coords[start_index]
    for i in range(start_index + 1, len(coords)):
        that0, that1 = coords[i]
        dx = abs(that0 - start0)
        if dx <= tolerance:
            dy = abs(that1 - start1)
            if dy <= tolerance:
                if dx**2 + dy**2 <= tsq:
                    overlaps += 1
    return overlaps

def process_coords(coords, num_processors=1, tolerance=1):
    global _coords, _tolerance
    import multiprocessing as mp
    _coords, _tolerance = coords, tolerance
    import time

    if num_processors > 1:
        pool = mp.Pool(num_processors, initializer=init, initargs=(coords, tolerance))
        start = time.time()
        print("Start script w/ multiprocessing")
    else:
        num_processors = 0
        start = time.time()
        print("Start script w/ standard processing")

    N = len(coords)
    if num_processors:
        total_overlap_count = sum(pool.imap_unordered(check_overlap, range(N))) 
    else:
        total_overlap_count = sum(check_overlap(i) for i in range(N))

    print(total_overlap_count)
    print("  time: {0}".format(time.time() - start))

if __name__ == "__main__":
    from random import random

    coords = []
    num_coords = 20000
    spread = 100.0
    half_spread = 0.5*spread
    for i in range(num_coords):
        coords.append([
            random()*spread-half_spread,
            random()*spread-half_spread
        ])

    process_coords(coords, 1)
    process_coords(coords, 4)

【讨论】：

哦，很有趣。不知道initializer/initargs - 我正在寻找一种方法来简单地共享数据而无需重新创建每个进程的开销。对此进行调查，我想知道 manager 实例是否也会有所帮助。但是，谢谢，这开辟了很多新的想法！