分组 Python 元组列表答案

【问题标题】：Grouping Python tuple list分组 Python 元组列表
【发布时间】：2011-01-15 23:34:51
【问题描述】：

我有一个这样的 (label, count) 元组列表：

[('grape', 100), ('grape', 3), ('apple', 15), ('apple', 10), ('apple', 4), ('banana', 3)]

由此我想对具有相同标签的所有值求和（相同的标签总是相邻）并以相同的标签顺序返回一个列表：

[('grape', 103), ('apple', 29), ('banana', 3)]

我知道我可以通过以下方式解决它：

def group(l):
    result = []
    if l:
        this_label = l[0][0]
        this_count = 0
        for label, count in l:
            if label != this_label:
                result.append((this_label, this_count))
                this_label = label
                this_count = 0
            this_count += count
        result.append((this_label, this_count))
    return result

但是有没有更 Pythonic/优雅/有效的方法来做到这一点？

【问题讨论】：

标签： grouping python

【解决方案1】：

使用 itertools 和列表推导

import itertools

[(key, sum(num for _, num in value))
    for key, value in itertools.groupby(l, lambda x: x[0])]

编辑：正如 gnibbler 指出的那样：如果 l 尚未排序，则将其替换为 sorted(l)。

【讨论】：

要使用 groupby，您必须首先确保序列已预先分组（所有“葡萄”相邻，等等）。一种方法是先对序列进行排序
@Thomas Wouters，是的，你是对的（“相同的标签总是相邻的”）

【解决方案2】：

itertools.groupby可以为所欲为：

import itertools
import operator

L = [('grape', 100), ('grape', 3), ('apple', 15), ('apple', 10),
     ('apple', 4), ('banana', 3)]

def accumulate(l):
    it = itertools.groupby(l, operator.itemgetter(0))
    for key, subiter in it:
       yield key, sum(item[1] for item in subiter) 

print(list(accumulate(L)))
# [('grape', 103), ('apple', 29), ('banana', 3)]

【讨论】：

我喜欢用operator.itemgetter 代替lambda。
这要求列表按第一个键排序。如果它还没有排序，那么 ghostdog74 的 defaultdict 方法是一个更好的解决方案。
为什么要使用operator 而不是lambda？

【解决方案3】：

import collections
d=collections.defaultdict(int)
a=[]
alist=[('grape', 100), ('banana', 3), ('apple', 10), ('apple', 4), ('grape', 3), ('apple', 15)]
for fruit,number in alist:
    if not fruit in a: a.append(fruit)
    d[fruit]+=number
for f in a:
    print (f,d[f])

输出

$ ./python.py
('grape', 103)
('banana', 3)
('apple', 29)

【讨论】：

这会在 alist 中搜索每个项目，这会使您的算法 O(n^2) 不是一件好事。

【解决方案4】：

>>> from itertools import groupby
>>> from operator import itemgetter
>>> L=[('grape', 100), ('grape', 3), ('apple', 15), ('apple', 10), ('apple', 4), ('banana', 3)]
>>> [(x,sum(map(itemgetter(1),y))) for x,y in groupby(L, itemgetter(0))]
[('grape', 103), ('apple', 29), ('banana', 3)]

【讨论】：

【解决方案5】：

或者更简单更易读的答案（不带 itertools）：

pairs = [('foo',1),('bar',2),('foo',2),('bar',3)]

def sum_pairs(pairs):
  sums = {}
  for pair in pairs:
    sums.setdefault(pair[0], 0)
    sums[pair[0]] += pair[1]
  return sums.items()

print sum_pairs(pairs)

【讨论】：

【解决方案6】：

我的版本没有 itertools
[(k, sum([y for (x,y) in l if x == k])) for k in dict(l).keys()]

【讨论】：

【解决方案7】：

方法

def group_by(my_list):
    result = {}
    for k, v in my_list:
        result[k] = v if k not in result else result[k] + v
    return result

用法

my_list = [
    ('grape', 100), ('grape', 3), ('apple', 15),
    ('apple', 10), ('apple', 4), ('banana', 3)
]

group_by(my_list) 

# Output: {'grape': 103, 'apple': 29, 'banana': 3}

您转换为像list(group_by(my_list).items()) 这样的元组列表。

【讨论】：