【发布时间】:2017-08-22 17:40:07
【问题描述】:
我有两个 Python 脚本,它们本质上应该做同样的事情:在内存中抓取一个大对象,然后 fork 一堆孩子。第一个脚本使用裸os.fork:
import time
import signal
import os
import gc
gc.set_debug(gc.DEBUG_STATS)
class GracefulExit(Exception):
pass
def child(i):
def exit(sig, frame):
raise GracefulExit("{} out".format(i))
signal.signal(signal.SIGTERM, exit)
while True:
time.sleep(1)
if __name__ == '__main__':
workers = []
d = {}
for i in xrange(30000000):
d[i] = i
for i in range(5):
pid = os.fork()
if pid == 0:
child(i)
else:
print pid
workers.append(pid)
while True:
wpid, status = os.waitpid(-1, os.WNOHANG)
if wpid:
print wpid, status
time.sleep(1)
第二个脚本使用multiprocessing 模块。我同时在 Linux(Ubuntu 14.04)上运行,所以它也应该在后台使用 os.fork,正如 documentation 所述:
import multiprocessing
import time
import signal
import gc
gc.set_debug(gc.DEBUG_STATS)
class GracefulExit(Exception):
pass
def child(i):
def exit(sig, frame):
raise GracefulExit("{} out".format(i))
signal.signal(signal.SIGTERM, exit)
while True:
time.sleep(1)
if __name__ == '__main__':
workers = []
d = {}
for i in xrange(30000000):
d[i] = i
for i in range(5):
p = multiprocessing.Process(target=child, args=(i,))
p.start()
print p.pid
workers.append(p)
while True:
for worker in workers:
if not worker.is_alive():
worker.join()
time.sleep(1)
这两个脚本之间的区别如下:当我杀死一个孩子(发送 SIGTERM)时,bare-fork 脚本会尝试垃圾收集共享字典,尽管它仍然被父进程引用并且不是实际上复制到了孩子的记忆中(因为写时复制)
kill <pid>
Traceback (most recent call last):
File "test_mp_fork.py", line 33, in <module>
child(i)
File "test_mp_fork.py", line 19, in child
time.sleep(1)
File "test_mp_fork.py", line 15, in exit
raise GracefulExit("{} out".format(i))
__main__.GracefulExit: 3 out
gc: collecting generation 2...
gc: objects in each generation: 521 3156 0
gc: done, 0.0024s elapsed.
(perf record -e page-faults -g -p <pid> 输出:)
+ 99,64% python python2.7 [.] PyInt_ClearFreeList
+ 0,15% python libc-2.19.so [.] vfprintf
+ 0,09% python python2.7 [.] 0x0000000000144e90
+ 0,06% python libc-2.19.so [.] strlen
+ 0,05% python python2.7 [.] PyArg_ParseTupleAndKeywords
+ 0,00% python python2.7 [.] PyEval_EvalFrameEx
+ 0,00% python python2.7 [.] Py_AddPendingCall
+ 0,00% python libpthread-2.19.so [.] sem_trywait
+ 0,00% python libpthread-2.19.so [.] __errno_location
虽然基于多处理的脚本不做这样的事情:
kill <pid>
Process Process-3:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "test_mp.py", line 19, in child
time.sleep(1)
File "test_mp.py", line 15, in exit
raise GracefulExit("{} out".format(i))
GracefulExit: 2 out
(perf record -e page-faults -g -p <pid> 输出:)
+ 62,96% python python2.7 [.] 0x0000000000047a5b
+ 32,28% python python2.7 [.] PyString_Format
+ 2,65% python python2.7 [.] Py_BuildValue
+ 1,06% python python2.7 [.] PyEval_GetFrame
+ 0,53% python python2.7 [.] Py_AddPendingCall
+ 0,53% python libpthread-2.19.so [.] sem_trywait
我还可以强制在基于多处理的脚本上执行相同的行为,方法是在引发 GracefulExit 之前显式调用 gc.collect()。奇怪的是,相反的情况并非如此:在bare-fork 脚本中调用gc.disable(); gc.set_threshold(0) 并不能帮助摆脱PyInt_ClearFreeList 调用。
到实际问题:
- 为什么会这样?我有点理解为什么 python 想在进程退出时释放所有分配的内存,而忽略了子进程实际上并不拥有它的事实,但是为什么多处理模块不这样做呢?
- 我想用bare-fork解决方案实现类似第二个脚本的行为(即:不试图释放父进程分配的内存)(主要是因为我使用第三方进程管理器库不使用多处理);我怎么可能这样做?
【问题讨论】:
标签: python unix garbage-collection multiprocessing fork