【问题标题】:return counter object in multiprocessing / map function在多处理/映射函数中返回计数器对象
【发布时间】:2015-09-30 12:36:48
【问题描述】:

我有一个 python 脚本正在运行,它在多个线程中启动相同的函数。这些函数创建并处理 2 个计数器(c1 和 c2)。来自分叉进程的所有 c1 计数器的结果应合并在一起。与所有 c2 计数器的结果相同,由不同的分叉返回。

我的(伪)代码如下所示:

def countIt(cfg)
   c1 = Counter
   c2 = Counter
   #do some things and fill the counters by counting words in an text, like
   #c1= Counter({'apple': 3, 'banana': 0})
   #c2= Counter({'blue': 3, 'green': 0})    

   return c1, c2

if __name__ == '__main__':
        cP1 = Counter()
        cP2 = Counter()
        cfg = "myConfig"
        p = multiprocessing.Pool(4)  #creating 4 forks
        c1, c2 = p.map(countIt,cfg)[:2]
        # 1.) This will only work with [:2] which seams to be no good idea
        # 2.) at this point c1 and c2 are lists, not a counter anymore,
        # so the following will not work:
        cP1 + c1
        cP2 + c2

按照上面的示例,我需要如下结果: cP1 = Counter({'apple': 25, 'banana': 247, 'orange': 24}) cP2 = Counter({'red': 11, 'blue': 56, 'green': 3})

所以我的问题是:如何计算分叉进程中的事物,以便聚合父进程中的每个计数器(所有 c1 和所有 c2)?

【问题讨论】:

  • @mattm 这不起作用,因为sum() 不会返回计数器?!出现以下错误:TypeError: unsupported operand type(s) for +: 'int' and 'Counter'
  • 这一行至少是一个错误:c1, c2 = p.map(countIt,cfg)[:2]。您可以在 swenzel 的回答中看到如何处理结果。

标签: python fork counter python-multiprocessing python-collections


【解决方案1】:

您需要使用例如 for-each 循环来“解压缩”您的结果。您将收到一个元组列表,其中每个元组是一对计数器:(c1, c2)
使用您当前的解决方案,您实际上将它们混合在一起。您将[(c1a, c2a), (c1b, c2b)] 分配给c1, c2 意味着c1 包含(c1a, c2a) 并且c2 包含(c1b, c2b)

试试这个:

if __name__ == '__main__':
        from contextlib import closing

        cP1 = Counter()
        cP2 = Counter()

        # I hope you have an actual list of configs here, otherwise map will
        # will call `countIt` with the single characters of the string 'myConfig'
        cfg = "myConfig"

        # `contextlib.closing` makes sure the pool is closed after we're done.
        # In python3, Pool is itself a contextmanager and you don't need to
        # surround it with `closing` in order to be able to use it in the `with`
        # construct.
        # This approach, however, is compatible with both python2 and python3.
        with closing(multiprocessing.Pool(4)) as p:
            # Just counting, no need to order the results.
            # This might actually be a bit faster.
            for c1, c2 in p.imap_unordered(countIt, cfg):
                cP1 += c1
                cP2 += c2

【讨论】:

  • 不是 OP,但感谢您使用关闭上下文管理器改进代码。之前没见过,可能是因为我还没有在py3中使用过mp。
  • 谢谢,这行得通。就在我发现之前几分钟,python 构建了所有分支的所有结果的列表,例如[(counter(),counter()), (counter(),counter()),....]。所以你的答案完全符合这一点。谢谢你。使用closing 绝对是新的,但很有趣! :-)
猜你喜欢
  • 1970-01-01
  • 2023-01-30
  • 2021-02-17
  • 2019-07-06
  • 1970-01-01
  • 2019-06-16
  • 2017-10-22
  • 1970-01-01
  • 2012-08-06
相关资源
最近更新 更多