计算熊猫数据框中一列值的频率，并用其频率出现数标记每一行答案

【问题标题】：Count freq of one column values in pandas dataframe and tag each row with its frequency occurence number计算熊猫数据框中一列值的频率，并用其频率出现数标记每一行
【发布时间】：2019-09-06 08:25:33
【问题描述】：

我想计算熊猫数据框特定列上每个元素的频率，然后用频率出现次数标记每一行。

大多数常见的解决方案是如何计算列中每个元素的频率，如下所示：count the frequency that a value occurs in a dataframe column

我有一个基本代码，例如：

df = pd.DataFrame({ 'A': ['foo', 'bar', 'g2g', 'g2g', 'g2g',  
                                'bar', 'bar', 'foo', 'bar'], 
                   'B': ['a', 'b', 'a', 'b', 'b', 'b', 'a', 'a', 'b'] }) 

print(df)

哪个输出：

     A  B
0  foo  a
1  bar  b
2  g2g  a
3  g2g  b
4  g2g  b
5  bar  b
6  bar  a
7  foo  a
8  bar  b

进一步：df['freq'] = df.groupby('B')['B'].transform('count') 输出：

    A  B  freq
0  foo  a     4
1  bar  b     5
2  g2g  a     4
3  g2g  b     5
4  g2g  b     5
5  bar  b     5
6  bar  a     4
7  foo  a     4
8  bar  b     5

虽然我在按“B”列分组后想要以下内容：

    A  B  freq_occurance
0  foo  a     1
1  bar  b     1
2  g2g  a     2
3  g2g  b     2
4  g2g  b     3
5  bar  b     4
6  bar  a     3
7  foo  a     4
8  bar  b     5

这意味着，如果'B'列中的值'a'的频率为4，那么出现'a'的第一行将被标记为1，出现'a'的第二行将被标记为2，依此类推.此逻辑适用于“B”列下的所有唯一值。

【问题讨论】：

试试df.groupby('B')['B'].cumcount().add(1)

标签： python-3.x pandas dataframe

【解决方案1】：

您可以使用转换并将索引（在reset_index之后）作为值然后加一（因为新索引从0开始）。

df['freq2'] = df.groupby('B')['B'].transform(lambda x: x.reset_index().index).add(1)

A   B   freq    freq2
0   foo a   4   1
1   bar b   5   1
2   g2g a   4   2
3   g2g b   5   2
4   g2g b   5   3
5   bar b   5   4
6   bar a   4   3
7   foo a   4   4
8   bar b   5   5

【讨论】：

【解决方案2】：

cumcount 是你需要的：

df['freq_occurance'] = df.groupby('B').cumcount() + 1

【讨论】：