python multiprocessing vs threading for cpu bound work on windows and linux答案

【问题标题】：python multiprocessing vs threading for cpu bound work on windows and linux
【发布时间】：2010-11-20 08:40:54
【问题描述】：

所以我敲了一些测试代码，看看多处理模块在 cpu 绑定工作上与线程相比如何扩展。在 linux 上，我得到了预期的性能提升：

linux（双四核至强）：
serialrun 耗时 1192.319 毫秒
并行运行耗时 346.727 毫秒
线程运行耗时 2108.172 毫秒

我的双核 macbook pro 显示相同的行为：

osx (双核 macbook pro)
serialrun 耗时 2026.995 毫秒
并行运行耗时 1288.723 毫秒
线程运行耗时 5314.822 毫秒

然后我在一台windows机器上试了一下，得到了一些非常不同的结果。

windows (i7 920):
serialrun 耗时 1043.000 毫秒
并行运行耗时 3237.000 毫秒
线程运行耗时 2343.000 毫秒

为什么，为什么，Windows 上的多处理方法这么慢？

这是测试代码：

#!/usr/bin/env python

导入多处理
导入线程
进口时间

def print_timing(func):
    def 包装器（*arg）：
        t1 = time.time()
        res = func(*arg)
        t2 = time.time()
        print '%s 耗时 %0.3f ms' % (func.func_name, (t2-t1)*1000.0)
        返回资源
    返回包装


定义计数器（）：
    对于 xrange(1000000) 中的 i：
        经过

@print_timing
def 串行运行（x）：
    对于 xrange(x) 中的 i：
        柜台（）

@print_timing
定义并行运行（x）：
    进程列表 = []
    对于 xrange(x) 中的 i：
        p = multiprocessing.Process（目标=计数器）
        proclist.append(p)
        p.start()

    对于我在 proclist 中：
        我加入（）

@print_timing
def 线程运行（x）：
    线程列表 = []
    对于 xrange(x) 中的 i：
        t = threading.Thread（目标=计数器）
        线程列表.append(t)
        t.start()

    对于线程列表中的 i：
        我加入（）

定义主（）：
    串行运行（50）
    并行运行（50）
    线程运行(50)

如果 __name__ == '__main__'：
    主要（）

【问题讨论】：

我在运行 Win2K3 的四核 Dell PowerEdge 840 上运行了您的测试代码，结果没有您的那么显着，但您的观点仍然有效：serialrun 耗时 1266.000 ms parallelrun 耗时 1906.000 ms threadedrun 耗时4359.000 ms 我很想看看你得到什么答案。我自己都不认识。

标签： python multiprocessing

【解决方案1】：

刚启动池需要很长时间。我在“现实世界”程序中发现，如果我可以保持一个池打开并将其重用于许多不同的进程，通过方法调用（通常使用 map.async）向下传递引用，那么在 Linux 上我可以节省几个百分点，但在 Windows 上我通常可以将花费的时间减半。对于我的特定问题，Linux 总是更快，但即使在 Windows 上，我也可以从多处理中获得净收益。

【讨论】：

【解决方案2】：

在 UNIX 变体下，进程要轻得多。 Windows 进程很繁重，需要更多时间才能启动。线程是在 Windows 上进行多处理的推荐方式。

【讨论】：

哦，有趣的是，这是否意味着改变测试的平衡，比如计数更高但次数更少，会让 Windows 恢复一些多处理性能？我会试一试的。
尝试重新校准以计数到 10.000.000 和 8 次迭代，结果更适合 Windows：
```
serialrun 耗时 1651.000 ms parallelrun 耗时 696.000 ms threadedrun 耗时 3665.000 ms
```
跨度>

【解决方案3】：

目前，您的 counter() 函数没有修改太多状态。尝试更改 counter() 以便它修改许多内存页面。然后运行一个 cpu 绑定循环。看看linux和windows之间是否还有很大的差距。

我现在没有运行 python 2.6，所以我不能自己尝试。

【讨论】：

【解决方案4】：

python documentation for multiprocessing 将 Windows 中的问题归咎于缺少 os.fork()。这里可能适用。

查看导入 psyco 时会发生什么。首先，easy_install：

C:\Users\hughdbrown>\Python26\scripts\easy_install.exe psyco
Searching for psyco
Best match: psyco 1.6
Adding psyco 1.6 to easy-install.pth file

Using c:\python26\lib\site-packages
Processing dependencies for psyco
Finished processing dependencies for psyco

将此添加到您的 python 脚本的顶部：

import psyco
psyco.full()

我得到这些结果没有：

serialrun took 1191.000 ms
parallelrun took 3738.000 ms
threadedrun took 2728.000 ms

我得到这些结果：

serialrun took 43.000 ms
parallelrun took 3650.000 ms
threadedrun took 265.000 ms

并行仍然很慢，但是其他的会烧橡胶。

编辑：另外，尝试使用多处理池。（这是我第一次尝试这个，它是如此之快，我想我一定错过了一些东西。）

@print_timing
def parallelpoolrun(reps):
    pool = multiprocessing.Pool(processes=4)
    result = pool.apply_async(counter, (reps,))

结果：

C:\Users\hughdbrown\Documents\python\StackOverflow>python  1289813.py
serialrun took 57.000 ms
parallelrun took 3716.000 ms
parallelpoolrun took 128.000 ms
threadedrun took 58.000 ms

【讨论】：

非常整洁！在提高计数值的同时降低迭代（进程）的数量表明，正如 Byron 所说，并行缓慢来自 Windows 进程的设置时间增加。
池似乎没有等待自己完成，池有一个 join() 方法，但它似乎没有做我认为应该做的事情：P。

【解决方案5】：

据说在 Windows 上创建进程比在 linux 上更昂贵。如果您在该网站周围搜索，您会发现一些信息。这是我很容易找到的one。

【讨论】：