【发布时间】:2019-07-13 11:14:03
【问题描述】:
我写了一个简单的 python 多处理,它从 csv 读取一堆行,调用一个 api,然后写入新的 csv。但是,我看到的是该程序的性能与顺序执行相同。更改池大小没有任何效果。出了什么问题?
from multiprocessing import Pool
from random import randint
from time import sleep
import csv
import requests
import json
def orders_v4(order_number):
response = requests.request("GET", url, headers=headers, params=querystring, verify=False)
return response.json()
newcsvFile=open('gom_acr_status.csv', 'w')
writer = csv.writer(newcsvFile)
def process_line(row):
ol_key = row['\ufeffORDER_LINE_KEY']
order_number=row['ORDER_NUMBER']
orders_json = orders_v4(order_number)
oms_order_key = orders_json['oms_order_key']
order_lines = orders_json["order_lines"]
for order_line in order_lines:
if ol_key==order_line['order_line_key']:
print(order_number)
print(ol_key)
ftype = order_line['fulfillment_spec']['fulfillment_type']
status_desc = order_line['statuses'][0]['status_description']
print(ftype)
print(status_desc)
listrow = [ol_key, order_number, ftype, status_desc]
#(writer)
writer.writerow(listrow)
newcsvFile.flush()
def get_next_line():
with open("gom_acr.csv", 'r') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
yield row
f = get_next_line()
t = Pool(processes=50)
for i in f:
t.map(process_line, (i,))
t.join()
t.close()
【问题讨论】:
-
我认为您需要修改您的代码,以便能够执行以下操作:
t.map(process_line, reader)。其中reader与get_next_line中的相同 -
你应该在检查你的程序之前监控你的系统资源。如果您有 4 个核心以接近 100% 的速度运行,那么拥有更多进程几乎没有什么区别。当然,这只是一个例子。
-
即使在您修复了
map()位置(循环外)之后 - 仍有很大的优化空间 -
我使用
results = t.map_async(process_line, (i,))和results.get()等待完成。现在程序非常快。
标签: python python-3.x python-multiprocessing