【发布时间】:2021-03-08 13:43:32
【问题描述】:
环境:CentOS 8 和 Windows 7 上的 Python 3.6.8。 我正在尝试通过多处理来加速我的代码。 并且我正在尝试弄清楚什么时候使用ThreadPoolExecutor,什么时候使用ProcessPoolExecutor,它们之间有什么区别。
以下示例代码可以正常工作:
# source code here: https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor-example
from concurrent import futures
import urllib.request
URLS = ["http://www.foxnews.com/",
"http://www.cnn.com/",
"http://europe.wsj.com/",
"http://www.bbc.co.uk/",
"http://some-made-up-domain.com/"]
def load_url(url, timeout):
return urllib.request.urlopen(url, timeout=timeout).read()
def main1():
with futures.ThreadPoolExecutor(max_workers=5) as executor:
future_to_url = dict((executor.submit(load_url, url, 60), url) for url in URLS)
for future in futures.as_completed(future_to_url):
url = future_to_url[future]
try:
print("%r page is %d bytes" % (url, len(future.result())))
except Exception as e:
print("%r generated an exception: %s" % (url, e))
if __name__ == "__main__":
main1()
输出 1:
'http://some-made-up-domain.com/' page is 64668 bytes
'http://europe.wsj.com/' generated an exception: HTTP Error 403: Forbidden
'http://www.cnn.com/' page is 1146005 bytes
'http://www.bbc.co.uk/' page is 308991 bytes
'http://www.foxnews.com/' page is 328413 bytes
但是当我用 ProcessPoolExecutor 替换 ThreadPoolExecutor
以下示例代码无法运行:
from concurrent import futures
import urllib.request
URLS = ["http://www.foxnews.com/",
"http://www.cnn.com/",
"http://europe.wsj.com/",
"http://www.bbc.co.uk/",
"http://some-made-up-domain.com/"]
def load_url(url, timeout):
return urllib.request.urlopen(url, timeout=timeout).read()
def main2():
with futures.ProcessPoolExecutor(max_workers=5) as executor:
future_to_url = dict((executor.submit(load_url, url, 60), url) for url in URLS)
for future in futures.as_completed(future_to_url):
url = future_to_url[future]
try:
print("%r page is %d bytes" % (url, len(future.result())))
except Exception as e:
print("%r generated an exception: %s" % (url, e))
if __name__ == "__main__":
main2()
输出 2:
'http://some-made-up-domain.com/' page is 64668 bytes
Process Process-4:
Traceback (most recent call last):
File "/usr/lib64/python3.6/concurrent/futures/process.py", line 175, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "process_pool_executor_so_2.py", line 12, in load_url
return urllib.request.urlopen(url, timeout=timeout).read()
File "/usr/lib64/python3.6/urllib/request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib64/python3.6/urllib/request.py", line 532, in open
response = meth(req, response)
File "/usr/lib64/python3.6/urllib/request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib64/python3.6/urllib/request.py", line 564, in error
result = self._call_chain(*args)
File "/usr/lib64/python3.6/urllib/request.py", line 504, in _call_chain
result = func(*args)
File "/usr/lib64/python3.6/urllib/request.py", line 756, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "/usr/lib64/python3.6/urllib/request.py", line 532, in open
response = meth(req, response)
File "/usr/lib64/python3.6/urllib/request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib64/python3.6/urllib/request.py", line 564, in error
result = self._call_chain(*args)
File "/usr/lib64/python3.6/urllib/request.py", line 504, in _call_chain
result = func(*args)
File "/usr/lib64/python3.6/urllib/request.py", line 756, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "/usr/lib64/python3.6/urllib/request.py", line 532, in open
response = meth(req, response)
File "/usr/lib64/python3.6/urllib/request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib64/python3.6/urllib/request.py", line 570, in error
return self._call_chain(*args)
File "/usr/lib64/python3.6/urllib/request.py", line 504, in _call_chain
result = func(*args)
File "/usr/lib64/python3.6/urllib/request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib64/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib64/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib64/python3.6/concurrent/futures/process.py", line 178, in _process_worker
result_queue.put(_ResultItem(call_item.work_id, exception=exc))
File "/usr/lib64/python3.6/multiprocessing/queues.py", line 341, in put
obj = _ForkingPickler.dumps(obj)
File "/usr/lib64/python3.6/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
TypeError: cannot serialize '_io.BufferedReader' object
'http://www.foxnews.com/' generated an exception: A process in the process pool was terminated abruptly while the future was running or pending.
'http://www.cnn.com/' generated an exception: A process in the process pool was terminated abruptly while the future was running or pending.
'http://europe.wsj.com/' generated an exception: A process in the process pool was terminated abruptly while the future was running or pending.
'http://www.bbc.co.uk/' generated an exception: A process in the process pool was terminated abruptly while the future was running or pending.
为什么ThreadPoolExecutor 处理异常好,如果一个线程失败,其他线程返回好的值,
但是当我使用ProcessPoolExecutor时,如果一个进程失败,所有其他进程都会终止并给出错误The process in the process pool was abruptly terminated while the future was running or waiting?
如果其中一个进程崩溃,如何修复终止进程?
【问题讨论】:
-
不幸的是无法在
Ubuntu上重现3.8.6的问题,但如果你想通过网络调用加速代码,你绝对应该选择ThreadPoolExecutor,因为这是更好的选择用于 I/O 绑定任务。而ProcessPoolExecutor在 CPU 密集型任务方面表现更好。有关该主题的更多信息:stackoverflow.com/questions/868568/… -
谢谢,我知道了,但是如何修复错误?
-
非常有趣,您的评论帮助我发现了问题。在环境 Python 3.7.7 和 Windows 7 代码中,
ProcessPoolExecutor工作正常,结果:'http://some-made-up-domain.com/' page is 64668 bytes'http://europe.wsj.com/' generated an exception: cannot serialize '_io.BufferedReader' object'http://www.bbc.co.uk/' page is 308803 bytes'http://www.cnn.com/' page is 1146005 bytes'http://www.foxnews.com/' page is 332240 bytes -
似乎“europe.wsj.com”正在生成异常。如果您将
for future in futures.as_completed(future_to_url)块移到上下文管理器范围之外怎么办? -
查看我的回答,了解 3.6.x 的可能解决方法
标签: python multithreading exception process