【问题标题】:pandas: Grouping by two columns and then sorting it by the values of a third columnpandas:按两列分组,然后按第三列的值排序
【发布时间】:2019-01-13 01:50:56
【问题描述】:

我有以下行:

genre_df.groupby(['release_year', 'genres']).vote_average.mean()

这给了我以下信息:

release_year  genres         
1960          Action             6.950000
              Adventure          7.150000
              Comedy             7.900000
              Drama              7.600000
              Fantasy            7.300000
              History            6.900000
              Horror             8.000000
              Romance            7.600000
              Science Fiction    7.300000
              Thriller           7.650000
              Western            7.000000
1961          Action             7.000000
              Adventure          6.800000
              Animation          6.600000
              Comedy             7.000000
              Crime              6.600000
              Drama              7.000000
              Family             6.600000
              History            6.700000
              Music              6.600000
              Romance            7.400000
              War                7.000000
...

我希望看到的是按发行年份和类型分组的 df,但首先按最高平均票数排序。

又名:

 release_year  genres         
    1960          Horror             8.000000
                  Comedy             7.900000
                  Action             6.950000
                  Thriller           7.650000
                  Drama              7.600000
                  Romance            7.600000
                  Fantasy            7.300000
                  Science Fiction    7.300000
                  Adventure          7.150000
                  Western            7.000000
                  History            6.900000

如何做到这一点?

【问题讨论】:

    标签: python pandas sorting dataframe pandas-groupby


    【解决方案1】:

    0.23.0+ 的解决方案 - 首先由to_frame 创建一列DataFrame,然后sort_values

    df = df.to_frame().sort_values(['release_year','vote_average'], ascending=[True, False])
    print (df)
                                  vote_average
    release_year genres                       
    1960         Horror                   8.00
                 Comedy                   7.90
                 Thriller                 7.65
                 Drama                    7.60
                 Romance                  7.60
                 Fantasy                  7.30
                 Science Fiction          7.30
                 Adventure                7.15
                 Western                  7.00
                 Action                   6.95
                 History                  6.90
    1961         Romance                  7.40
                 Action                   7.00
                 Comedy                   7.00
                 Drama                    7.00
                 War                      7.00
                 Adventure                6.80
                 History                  6.70
                 Animation                6.60
                 Crime                    6.60
                 Family                   6.60
                 Music                    6.60
    

    对于旧版本的 pandas 是必需的 reset_indexset_index

    df = (df.reset_index()
           .sort_values(['release_year','vote_average'], ascending=[True, False])
           .set_index(['release_year','genres']))
    

    【讨论】:

      【解决方案2】:

      试试这个:

         genre_df = genre_df.reset_index()
         genre_df.sort_values(['vote_average'],ascending=False)
      

      【讨论】:

        猜你喜欢
        • 2019-10-17
        • 2018-11-29
        • 2022-11-12
        • 2020-11-11
        • 2020-12-14
        • 2021-08-26
        • 1970-01-01
        • 2016-11-07
        • 1970-01-01
        相关资源
        最近更新 更多