Python 多处理 - 如何拆分工作负载以提高速度？答案

【问题标题】：Python multiprocessing - How can I split workload to get speed improvement?Python 多处理 - 如何拆分工作负载以提高速度？
【发布时间】：2017-11-28 14:34:50
【问题描述】：

我正在编写一个裁剪图像并保存它的简单代码。
但是问题是图片的数量在150,000+左右，我想提高速度。

所以，起初我用简单的 for 循环编写了一个代码，如下所示：

import cv2
import numpy
import sys

textfile=sys.argv[1]
file_list=open(textfile)
files=file_list.read().split('\n')
idx=0
for eachfile in files:
    image=cv2.imread(eachfile)
    idx+=1
    if image is None:
        pass
    outName=eachfile.replace('/data','/changed_data')
    if image.shape[0]==256:
        image1=image[120:170,120:170]
    elif image.shape[0]==50:
        image1=image
    cv2.imwrite(outName,image1)
    print idx,outName

这段代码花了大约 38 秒处理 90000 张图片。但是，使用双核比单进程花费更多时间，同样的 90000 张图像大约需要 48 秒。

import cv2
import sys
import numpy
from multiprocessing import Pool

def crop(eachfile):
    image=cv2.imread(eachfile)
    idx+=1
    if image is None:
        pass
    outName=eachfile.replace('/data','/changed_data')
    if image.shape[0]==256:
        image1=image[120:170,120:170]
    elif image.shape[0]==50:
        image1=image
    cv2.imwrite(outName,image1)
    print idx,outName


if __name__=='__main__':
    textfile=sys.argv[1]
    file_list=open(textfile)
    files=file_list.read().split('\n')
    pool=Pool(2)
    pool.map(crop,files)

我在加快流程方面做得对吗？还是应该拆分列表并将每个列表发送到进程？

任何 cmets 都认为我的代码很棒！！！

提前致谢！！！

【问题讨论】：

顺便说一句，程序正在读取文本文件，其中文件由 \n 字符分隔。

标签： python multiprocessing python-multiprocessing opencv3.0

【解决方案1】：

您确实应该将任务拆分为两个核心。玩弄这个“稍微修改”的示例代码。 OP可以找到here。你在哪里看到data 那是你提供图像的钩子。使用multiprocessing 时，defs 在课堂上不起作用...如果您尝试使用 pathos...您会从 cPickle 收到错误...最新 2.7 版本的一些烦人的问题。不会出现在 3.5 或其他版本中。尽情享受吧！

import multiprocessing

def mp_worker((inputs, the_time)):
    print " Process %s\tWaiting %s seconds" % (inputs, the_time)
    time.sleep(int(the_time))
    print " Process %s\tDONE" % inputs
    sys.stdout.flush()

def mp_handler():                           # Non tandem pair processing
    p = multiprocessing.Pool(2)
    p.map(mp_worker, data)

def mp_handler_tandem():
    subdata = zip(data[0::2], data[1::2])
#    print subdata
    for task1, task2 in subdata:
        p = multiprocessing.Pool(2)
        p.map(mp_worker, (task1, task2))

#data = (['a', '1'], ['b', '2'], ['c', '3'], ['d', '4'])
data = (['a', '2'], ['b', '3'], ['c', '1'], ['d', '4'], 
        ['e', '1'], ['f', '2'], ['g', '3'], ['h', '4'])

if __name__ == '__main__':
    sys.stdout.flush()
#    print 'mp_handler():'
#    mp_handler()
#    print '---'
#    time.sleep(2)

#    print '\nmp_handler_tandem():'
#    mp_handler_tandem()
    print '---'
#    time.sleep(2)

    Multiprocess().qmp_handler()

在编辑器中工作：使用 sys.stdout.flush() 在发生时将输出刷新到屏幕。

但还要检查here 使用内核和拆分作业。

【讨论】：