【问题标题】:multiprocess initializer and pickling多进程初始化器和酸洗
【发布时间】:2016-01-06 03:04:21
【问题描述】:

我一直在玩multiprocessing.Pool 并试图了解initializer 参数的工作原理。据我了解,每个进程都会调用初始化函数,所以我假设它的参数(即initargs)必须跨越进程边界进行腌制。我知道池的map 方法也使用酸洗作为它的参数,所以我假设任何作为初始化器参数的东西也应该作为映射参数。

但是,当我运行以下代码时,initialize 被调用得很好,但随后map 抛出了一个关于无法腌制模块的异常。 (使用当前模块作为参数没有什么特别之处;它只是想到的第一个不可腌制的对象。)有谁知道这种差异背后的原因是什么?

from __future__ import print_function
import multiprocessing
import sys


def get_pid():
    return multiprocessing.current_process().pid


def initialize(module):
    print('Got module {} in PID {}'.format(module, get_pid()))


def worker(module):
    print('Got module {} in PID {}'.format(module, get_pid()))


current_module = sys.modules[__name__]
work = [current_module]

print('Main process has PID {}'.format(get_pid()))
pool = multiprocessing.Pool(None, initialize, work)
pool.map(worker, work)

【问题讨论】:

    标签: python pickle python-multiprocessing


    【解决方案1】:

    初始化不需要酸洗,但map 调用需要。也许这会有所启发……(这里我使用multiprocess 而不是multiprocessing 来提供更好的酸洗和交互性)。

    >>> from __future__ import print_function
    >>> import multiprocess as multiprocessing
    >>> import sys
    >>> 
    >>> def get_pid():
    ...     return multiprocessing.current_process().pid
    ... 
    >>> 
    >>> def initialize(module):
    ...     print('Got module {} in PID {}'.format(module, get_pid()))
    ... 
    >>> 
    >>> def worker(module):
    ...     print('Got module {} in PID {}'.format(module, get_pid()))
    ... 
    >>> 
    >>> current_module = sys.modules[__name__]
    >>> work = [current_module]
    >>> 
    >>> print('Main process has PID {}'.format(get_pid()))
    Main process has PID 34866
    >>> pool = multiprocessing.dummy.Pool(None, initialize, work)
    Got module <module '__main__' (built-in)> in PID 34866
    Got module <module '__main__' (built-in)> in PID 34866
    Got module <module '__main__' (built-in)> in PID 34866
    Got module <module '__main__' (built-in)> in PID 34866
    Got module <module '__main__' (built-in)> in PID 34866
    Got module <module '__main__' (built-in)> in PID 34866
    Got module <module '__main__' (built-in)> in PID 34866
    Got module <module '__main__' (built-in)> in PID 34866
    >>> pool.map(worker, work)
    Got module <module '__main__' (built-in)> in PID 34866
    [None]
    

    酷。线程pool 工作......(因为它不需要腌制任何东西)。当我们使用序列化同时发送workerwork 时怎么样?

    >>> pool = multiprocessing.Pool(None, initialize, work)
    Got module <module '__main__' (built-in)> in PID 34875
    Got module <module '__main__' (built-in)> in PID 34876
    Got module <module '__main__' (built-in)> in PID 34877
    Got module <module '__main__' (built-in)> in PID 34878
    Got module <module '__main__' (built-in)> in PID 34879
    Got module <module '__main__' (built-in)> in PID 34880
    Got module <module '__main__' (built-in)> in PID 34881
    Got module <module '__main__' (built-in)> in PID 34882
    >>> pool.map(worker, work)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/Users/mmckerns/lib/python2.7/site-packages/multiprocess-0.70.4.dev0-py2.7-macosx-10.8-x86_64.egg/multiprocess/pool.py", line 251, in map
        return self.map_async(func, iterable, chunksize).get()
      File "/Users/mmckerns/lib/python2.7/site-packages/multiprocess-0.70.4.dev0-py2.7-macosx-10.8-x86_64.egg/multiprocess/pool.py", line 567, in get
        raise self._value
    NotImplementedError: pool objects cannot be passed between processes or pickled
    >>> 
    

    那么我们来看看酸洗work

    >>> import pickle
    >>> import sys            
    >>> pickle.dumps(sys.modules[__name__])
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1374, in dumps
        Pickler(file, protocol).dump(obj)
      File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 224, in dump
        self.save(obj)
      File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 306, in save
        rv = reduce(self.proto)
      File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/copy_reg.py", line 70, in _reduce_ex
        raise TypeError, "can't pickle %s objects" % base.__name__
    TypeError: can't pickle module objects
    >>> 
    

    所以,你不能腌制一个模块......好吧,我们可以使用dill做得更好吗?

    >>> import dill
    >>> dill.detect.trace(True)
    >>> dill.pickles(work)
    M1: <module '__main__' (built-in)>
    F2: <function _import_module at 0x10c017cf8>
    # F2
    D2: <dict object at 0x10d9a8168>
    M2: <module 'dill' from '/Users/mmckerns/lib/python2.7/site-packages/dill-0.2.5.dev0-py2.7.egg/dill/__init__.pyc'>
    # M2
    F1: <function worker at 0x10c07fed8>
    F2: <function _create_function at 0x10c017488>
    # F2
    Co: <code object worker at 0x10b053cb0, file "<stdin>", line 1>
    F2: <function _unmarshal at 0x10c017320>
    # F2
    # Co
    D1: <dict object at 0x10af68168>
    # D1
    D2: <dict object at 0x10c0e4a28>
    # D2
    # F1
    M2: <module 'sys' (built-in)>
    # M2
    F1: <function initialize at 0x10c07fe60>
    Co: <code object initialize at 0x10b241f30, file "<stdin>", line 1>
    # Co
    D1: <dict object at 0x10af68168>
    # D1
    D2: <dict object at 0x10c0ea398>
    # D2
    # F1
    M2: <module 'pathos' from '/Users/mmckerns/lib/python2.7/site-packages/pathos-0.2a1.dev0-py2.7.egg/pathos/__init__.pyc'>
    # M2
    C2: __future__._Feature
    # C2
    D2: <dict object at 0x10b05b7f8>
    # D2
    M2: <module 'multiprocess' from '/Users/mmckerns/lib/python2.7/site-packages/multiprocess-0.70.4.dev0-py2.7-macosx-10.8-x86_64.egg/multiprocess/__init__.pyc'>
    # M2
    T4: <class 'pathos.threading.ThreadPool'>
    # T4
    D2: <dict object at 0x10c0ea5c8>
    # D2
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/Users/mmckerns/lib/python2.7/site-packages/dill-0.2.5.dev0-py2.7.egg/dill/dill.py", line 1209, in pickles
        pik = copy(obj, **kwds)
      File "/Users/mmckerns/lib/python2.7/site-packages/dill-0.2.5.dev0-py2.7.egg/dill/dill.py", line 161, in copy
        return loads(dumps(obj, *args, **kwds))
      File "/Users/mmckerns/lib/python2.7/site-packages/dill-0.2.5.dev0-py2.7.egg/dill/dill.py", line 197, in dumps
        dump(obj, file, protocol, byref, fmode, recurse)#, strictio)
      File "/Users/mmckerns/lib/python2.7/site-packages/dill-0.2.5.dev0-py2.7.egg/dill/dill.py", line 190, in dump
        pik.dump(obj)
      File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 224, in dump
        self.save(obj)
      File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
        f(self, obj) # Call unbound method with explicit self
      File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 600, in save_list
        self._batch_appends(iter(obj))
      File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 636, in _batch_appends
        save(tmp[0])
      File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
        f(self, obj) # Call unbound method with explicit self
      File "/Users/mmckerns/lib/python2.7/site-packages/dill-0.2.5.dev0-py2.7.egg/dill/dill.py", line 1116, in save_module
        state=_main_dict)
      File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 419, in save_reduce
        save(state)
      File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
        f(self, obj) # Call unbound method with explicit self
      File "/Users/mmckerns/lib/python2.7/site-packages/dill-0.2.5.dev0-py2.7.egg/dill/dill.py", line 768, in save_module_dict
        StockPickler.save_dict(pickler, obj)
      File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 649, in save_dict
        self._batch_setitems(obj.iteritems())
      File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 681, in _batch_setitems
        save(v)
      File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 306, in save
        rv = reduce(self.proto)
      File "/Users/mmckerns/lib/python2.7/site-packages/multiprocess-0.70.4.dev0-py2.7-macosx-10.8-x86_64.egg/multiprocess/pool.py", line 452, in __reduce__
        'pool objects cannot be passed between processes or pickled'
    NotImplementedError: pool objects cannot be passed between processes or pickled
    >>> 
    

    答案是是的——模块开始腌制,但是,由于模块中的内容而失败......所以它看起来适用于__main__ 中的所有内容,除非有实例pool 中的 __main__ —— 那么它将失败。

    所以,如果您的最后两行代码被替换为这一行,它将起作用:

    >>> multiprocessing.Pool(None, initialize, work).map(worker, work)
    Got module <module '__main__' (built-in)> in PID 34922
    Got module <module '__main__' (built-in)> in PID 34923
    Got module <module '__main__' (built-in)> in PID 34924
    Got module <module '__main__' (built-in)> in PID 34925
    Got module <module '__main__' (built-in)> in PID 34926
    Got module <module '__main__' (built-in)> in PID 34927
    Got module <module '__main__' (built-in)> in PID 34928
    Got module <module '__main__' (built-in)> in PID 34929
    Got module <module '__main__' (built-in)> in PID 34922
    [None]
    >>> 
    

    这是使用multiprocess,因为它在封面下使用dillpickle 仍然无法在此处腌制,因为 pickle 无法序列化模块。需要序列化,因为必须将对象发送到另一个进程上的另一个 python 实例。

    【讨论】:

    • 感谢您的详细回复,以及向我介绍multiprocess 模块:) initializer 参数是否不需要腌制,因为它可以直接访问原始内存(因为所有进程将通过fork) 产生?
    • 刚刚通过将start method 设置为spawn 来确认这一点,这确实需要初始化函数的参数是可腌制的。再次感谢!
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2020-12-13
    • 2014-09-15
    • 2011-11-22
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多