【问题标题】:Python Pandas calculate value_counts of two columns and use groupbyPython Pandas 计算两列的 value_counts 并使用 groupby
【发布时间】:2021-12-13 03:38:53
【问题描述】:

我有一个数据框:

data = {'label': ['cat','dog','dog','cat','cat'],
      'breeds': [ 'bengal','shar pei','pug','maine coon','maine coon'],
      'nicknames':[['Loki','Loki' ],['Max'],['Toby','Zeus ','Toby'],['Marty'],['Erin ','Erin']],
       'eye color':[['blue','green'],['green'],['brown','brown','brown'],['blue'],['green','brown']]
                   

输出:

label    breeds    nicknames            eye color
0   cat  bengal     [Loki,Loki]      [blue, green]
1   dog  shar pei   [Max]            [green]
2   dog  pug        [Toby,Zeus,Toby] [brown, brown, brown]
3   cat  maine coon [Marty]          [blue]
4   cat  maine coon [Erin,Erin]      [green, brown]

我想应用groupby :frame['label', 'breeds'],计算昵称和眼睛颜色的value_counts(唯一值),输出到不同的列: 'nickname_count','eye_count' 这段代码只输出一列,如何分别输出?

 frame2=frame.groupby(['label','breeds'])['nicknames','eye color'].apply(lambda x: x.astype('str').value_counts().to_dict())

【问题讨论】:

    标签: python pandas group-by count apply


    【解决方案1】:

    首先,我们在列表中使用groupbysum,因为sum 将列表连接在一起:

    >>> df_grouped = df.groupby(['label', 'breeds']).agg({'nicknames': sum, 'eye color': sum}).reset_index()
    >>> df_grouped
        label   breeds      nicknames               eye color
    0   cat     bengal      [Loki, Loki]            [blue, green]
    1   cat     maine coon  [Marty, Erin , Erin]    [blue, green, brown]
    2   dog     pug         [Toby, Zeus , Toby]     [brown, brown, brown]
    3   dog     shar pei    [Max]                   [green]
    

    然后,我们可以通过将列表转换为集合来计算列表中唯一值的数量,使用len 并将输出保存在两个新列中以获得预期结果:

    >>> df_grouped['nickname_count'] = df_grouped['nicknames'].apply(lambda x: list(set(x))).str.len()
    >>> df_grouped['eye_count'] = df_grouped['eye color'].apply(lambda x: list(set(x))).str.len()
    >>> df_grouped
        label   breeds      nicknames               eye color               nickname_count  eye_count
    0   cat     bengal      [Loki, Loki]            [blue, green]           1               2
    1   cat     maine coon  [Marty, Erin , Erin]    [blue, green, brown]    3               3
    2   dog     pug         [Toby, Zeus , Toby]     [brown, brown, brown]   2               1
    3   dog     shar pei    [Max]                   [green]                 1               1
    

    【讨论】:

    • 但我想按标签和品种分组,然后计数
    • 确实,我用 group by 更新了答案,然后计算列表中元素的数量。它回答了你的问题吗?
    • 感谢您的回答,但这并不是我想要的。我更喜欢 'word':count,所以我想使用 value_counts
    • 要统计唯一值吗?
    • 是的!比如 eye_count :('blue':2)....然后是 groupby
    猜你喜欢
    • 1970-01-01
    • 2017-08-06
    • 2013-07-14
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2020-06-22
    • 2021-01-06
    相关资源
    最近更新 更多