如何将 Counter 类与 pandas groupby 一起使用并应用答案

【问题标题】：How do I use the Counter class with pandas groupby and apply如何将 Counter 类与 pandas groupby 一起使用并应用
【发布时间】：2021-08-28 18:45:14
【问题描述】：

鉴于此 DataFrame：

df = pd.DataFrame([[1,1],[2,2],[2,3],[2,3],[2,4]], columns = ['A','B'])
df
    A   B
0   1   1
1   2   2
2   2   3
3   2   3
4   2   4

我想尝试在 B 中使用 A 列上的 groupby 和 B 列上的 apply 来尝试不同的聚合值集合

如果我将 B 收集为列表，这将按预期工作：

df.groupby('A')['B'].apply(list).reset_index(name='list')
    A   list
0   1   [1]
1   2   [2, 3, 3, 4]

如果我将 B 作为 set 收集，这将按预期工作：

df.groupby('A')['B'].apply(set).reset_index(name='set')
    A   set
0   1   {1}
1   2   {2, 3, 4}

我（天真地）本来希望 Counter 类以同样的方式工作：

from collections import Counter
Counter([2, 3, 3, 4])
Counter({2: 1, 3: 2, 4: 1})

但是当尝试使用 Counter 时，它的行为相当意外，就像我使用 set 或 list 一样：

df.groupby('A')['B'].apply(Counter).reset_index(name='counter')
A   level_1 counter
0   1   1   1.0
1   1   2   NaN
2   1   3   NaN
3   1   4   NaN
4   2   1   NaN
5   2   2   1.0
6   2   3   2.0
7   2   4   1.0

我希望：

    A   counter
0   1   Counter({1: 1})
1   2   Counter({2: 1, 3: 2, 4: 1})

一个有趣的线索是这样的：

df.groupby('A')['B'].apply(type).reset_index(name='type')
A   type
0   1   <class 'pandas.core.series.Series'>
1   2   <class 'pandas.core.series.Series'>

但这符合我的预期：

Counter(pd.core.series.Series([2, 3, 3, 4]))
Counter({2: 1, 3: 2, 4: 1})

这不起作用：

def mycounter(series):
    return Counter(list(series))
mycounter
df.groupby('A')['B'].apply(mycounter).reset_index(name='type')
A   level_1 type
0   1   1   1.0
1   1   2   NaN
2   1   3   NaN
3   1   4   NaN
4   2   1   NaN
5   2   2   1.0
6   2   3   2.0
7   2   4   1.0

我有点怀疑 Pandas 有错误？

（添加）：我刚试过这个，它有效。所以，我不知道为什么apply 没有，但agg 有：

df.groupby('A')['B'].agg([Counter]).reset_index()
A   Counter
0   1   {1: 1}
1   2   {2: 1, 3: 2, 4: 1}

【问题讨论】：

You want df.groupby('A')['B'].agg(Counter).reset_index(name='counter') apply 是一个有趣的函数，因为它可以产生聚合和非聚合结果。
基本上一个系列几乎是一个字典，而计数器是一个字典。当您调用 .agg() 时，它期望一个单一的值作为返回，因此它不会尝试重新扩展它。无论如何要执行与 Counter 相同的操作，在 pandas 中您可以使用 value_counts()
因为Counter对象是dicts

标签： python pandas pandas-groupby apply

【解决方案1】：

见groupby agg

df.groupby('A')['B'].agg(Counter).reset_index(name='counter')

   A             counter
0  1              {1: 1}
1  2  {2: 1, 3: 2, 4: 1}

apply 是一个有趣的函数，因为它既可以生成聚合结果，也可以生成非聚合结果。

运行：

df.groupby('A')['B'].apply(lambda x: {0: 1, 1: 2, 2: 3})

A   
1  0    1
   1    2
   2    3
2  0    1
   1    2
   2    3
Name: B, dtype: int64

当 dict 从 apply 返回时，它会将键解释为 DataFrame 的索引。而不是将其解释为聚合值（如agg）。

因此为什么计数器被解释为：

A   
1  1    1.0  # {1: 1} (index 1 value 1)
   2    NaN
   3    NaN
   4    NaN
2  1    NaN
   2    1.0  # {2: 1, 3: 2, 4: 1} (index 2 value 1)
   3    2.0  # (index 3 value 2)
   4    1.0  # (index 4 value 1)
Name: B, dtype: float64

但是，agg 会返回单个值，因此 dict 被解释为单个单元：

运行：

df.groupby('A')['B'].agg(lambda x: {0: 1, 1: 2, 2: 3})

A
1    {0: 1, 1: 2, 2: 3}
2    {0: 1, 1: 2, 2: 3}
Name: B, dtype: object

【讨论】：