使用 groupby 函数和同时使用 apply 函数计算值答案

【问题标题】：Count values using groupby function and using apply function at the same time使用 groupby 函数和同时使用 apply 函数计算值
【发布时间】：2021-11-20 23:39:14
【问题描述】：

我正在尝试使用数据帧上的 apply 和 grouby 函数计算分组值的出现次数并将值写入列中。我有以下数据框：

df = pd.DataFrame({'colA': ['name1', 'name2', 'name2', 'name4', 'name2', 'name5', 'name5'], 'colB': ['red', 'yellow', 'yellow', 'black', 'yellow', 'green', 'blue']})

我有两个语句可以给出正确的结果，但我需要将结果结合起来。一种是：

df_new = df.groupby("colA").count()

给了

colA
name1    1
name2    3
name4    1
name5    2

另一个是

df_new = df.groupby("colA")["colB"].apply(lambda lists: ','.join(color)).reset_index(name='Color')

并给予

    colA                Color
0  name1                   red
1  name2  yellow,yellow,yellow
2  name4                 black
3  name5            green,blue

我需要的是看起来像这样的组合

    colA                Color      Count grouped A
0  name1                   red     1
1  name2  yellow,yellow,yellow     3
2  name4                 black     1
3  name5            green,blue     2

尝试以多种方式进行组合，当然也进行了研究，但我做不到。

【问题讨论】：

标签： python dataframe pandas-groupby pandas-apply

【解决方案1】：

您可以将第一到第二列连接为新列，并使用colA 在正确的位置分配值。

df_new = df_2.join(df_1, on='colA')

还需要df_1.rename(columns={'colB': 'Count grouped A'})

import pandas as pd

df = pd.DataFrame({'colA': ['name1', 'name2', 'name2', 'name4', 'name2', 'name5', 'name5'], 'colB': ['red', 'yellow', 'yellow', 'black', 'yellow', 'green', 'blue']})

df_1 = df.groupby("colA").count().rename(columns={'colB': 'Count grouped A'})

df_2 = df.groupby("colA")["colB"].apply(lambda lists: ','.join(lists)).reset_index(name='Color')

df_new = df_2.join(df_1, on='colA')

print(df_new)

编辑：

小改动也一样

第一次groups = df.groupby("colA")，然后两次groups...
.apply(','.join) 而不是.apply(lambda lists: ','.join(lists))

import pandas as pd

df = pd.DataFrame({'colA': ['name1', 'name2', 'name2', 'name4', 'name2', 'name5', 'name5'], 'colB': ['red', 'yellow', 'yellow', 'black', 'yellow', 'green', 'blue']})

groups = df.groupby("colA")

df_1 = groups.count().rename(columns={'colB': 'Count grouped A'})
df_2 = groups["colB"].apply(','.join).reset_index(name='Color')

df_new = df_2.join(df_1, on='colA')

print(df_new)

编辑：

如果您将Color 保留为list，那么它可能会更简单。

您可以使用.str.len() 计算list 中的元素

.str 建议它具有字符串功能，但其中一些也适用于list（即.str[1:4]）甚至dictionary（即.str[key]）

import pandas as pd

df = pd.DataFrame({'colA': ['name1', 'name2', 'name2', 'name4', 'name2', 'name5', 'name5'], 'colB': ['red', 'yellow', 'yellow', 'black', 'yellow', 'green', 'blue']})

df_new = df.groupby("colA")["colB"].apply(list).reset_index(name='Color')
df_new['Count grouped A'] = df_new['Color'].str.len()

print(df_new)

【讨论】：