【问题标题】:Groupby aggregate and transpose in pandasGroupby 在 pandas 中聚合和转置
【发布时间】:2021-04-15 23:10:07
【问题描述】:

df=

Genre Song          Singer               Playlist           Album
Rock  Evil Walks      AC/DC                 Music            For Those About To Rock We Salute You
Rock  Snowballed      AC/DC                 Music            For Those About To Rock We Salute You
Rock  C.O.D           AC/DC                 Music            For Those About To Rock We Salute You         
Rock  Perfect         Alanis Morissette     Music            Jagged Little Pill
Rock  Forgiven        Alanis Morissette     Music            Jagged Little Pill
Metal Sad But True    Apocalyptica          Music            Plays Metallica By Four Cellos
Metal All For You     Black Label Society   Music            Alcohol Fueled Brewtality Live! [Disc 1]
Blues Layla           Eric Clapton          Music            The Cream Of Clapton
Blues Crossroads      Eric Clapton          Music            The Cream Of Clapton
.......
......
....
Latin Etnia           Chico Science         Music            Afrociberdelia

在流派领域的所有流派中,我只需要考虑“摇滚”、“拉丁”、“金属”、“布鲁斯”,并根据以下要求构建一个新的数据框

a.歌手有多少首属于该流派的歌曲(每种流派的数量必须在单独的列中)。

b.数据中歌手拥有多少张专辑的计数。

c. 歌手在数据中的曲目数。

d.包含歌手的任何歌曲的播放列表的计数。

期望的输出:

Singer       Rock  Latin  Metal  Blues   CountofAlbums   CountofSongs  Count of Playlists
AC/DC         5      7    8      2         4                22             2
Metallica     8      0    22     0         6                30             6       
Iron Maiden   21     0    27     13        10               61             12

我打算为 a 部分创建一个 df,为 b、c、d 部分创建一个 df 并将它们合并。

对于 b、c 和 d 部分。我想过循环歌手姓名并使用 nunique 来获得不同的计数,但没有意识到,循环也会每次都返回列标题。

mylist=list(set(df.Singer))
for i in mylist:
    temp=df[df['Singer']==i]
    df2=temp.nunique().to_frame().T
    

对于 A 部分,我打算按流派对歌曲进行分组查找计数并进行转置

mylist=list(set(df.Singer))
for i in mylist:
   group=df4.groupby('Genre_Name').agg(count=('Song','count'))
   newdf=group.T

任何帮助将不胜感激!

【问题讨论】:

    标签: python python-3.x pandas dataframe pandas-groupby


    【解决方案1】:

    可以一行完成,但是有点拗口……

    df = pd.DataFrame({
        'Genre':['Rock']*5+['Metal']*2+['Blues']*2+['Latin'],
        'Song':['Evil Walks','Snowballed','C.O.D','Perfect','Forgiven','Sad But True',
        'All For You','Layla','Crossroads','Etnia'],
        'Singer':['AC/DC']*3+['Alanis Morissette']*2+['Apocalyptica']+['Black Label Society']+['Eric Clapton']*2+['Chico Science'],
        'Playlist':['Music']*10,
        'Album':['For Those About To Rock We Salute You']*3+['Jagged Little Pill']*2+['Plays Metallica By Four Cellos']+['Alcohol Fueled Brewtality Live! [Disc 1]']+['The Cream Of Clapton']*2+['Afrociberdelia']
        })
    
    agg_df=df.groupby('Singer').agg({'Song':'count'})
    agg_df=agg_df.join(df[['Singer','Album']].drop_duplicates().groupby('Singer').count())
    agg_df=agg_df.join(df[['Singer','Playlist']].drop_duplicates().groupby('Singer').count())
    agg_df=agg_df.join(df.reset_index()[['Singer','Genre','index']].groupby(['Singer','Genre']).count().rename({'index':'count'},axis=1).unstack().fillna(0).astype(np.int16))
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2018-09-29
      • 2020-01-28
      • 2021-11-01
      • 2015-11-27
      • 2022-12-20
      • 1970-01-01
      • 2014-11-23
      • 2023-01-18
      相关资源
      最近更新 更多