【问题标题】:Pivoting without numerical aggregation/ a numerical column没有数值聚合/数值列的旋转
【发布时间】:2023-04-08 18:57:01
【问题描述】:

我有一个看起来像这样的数据框

d = {'Name': ['Sally', 'Sally', 'Sally', 'James', 'James', 'James'], 'Sports': ['Tennis', 'Track & field', 'Dance', 'Dance', 'MMA', 'Crosscountry']}
df = pd.DataFrame(data=d)
Name Sports
Sally Tennis
Sally Track & field
Sally Dance
James Dance
James MMA
James Crosscountry

似乎熊猫的 pivot_table 只允许使用数值聚合进行整形,但我想将其整形为宽格式,以便字符串位于“值”中:

Name First_sport Second_sport Third_sport
Sally Tennis Track & field Dance
James Dance MMA Crosscountry

熊猫中有没有一种方法可以帮助我做到这一点?谢谢!

【问题讨论】:

    标签: python pandas pivot-table reshape melt


    【解决方案1】:

    您可以使用.pivot()(如果您的列/索引名称是唯一的)或使用.pivot_table() 通过提供也适用于字符串的聚合函数来做到这一点,例如'first'.

    >>> df['Sport_num'] = 'Sport ' + df.groupby('Name').cumcount().astype(str)
    >>> df
        Name         Sports Sport_num
    0  Sally         Tennis   Sport 0
    1  Sally  Track & field   Sport 1
    2  Sally          Dance   Sport 2
    3  James          Dance   Sport 0
    4  James            MMA   Sport 1
    5  James   Crosscountry   Sport 2
    >>> df.pivot(index='Name', values='Sports', columns='Sport_num')
    Sport_num Sport 0        Sport 1       Sport 2
    Name                                          
    James       Dance            MMA  Crosscountry
    Sally      Tennis  Track & field         Dance
    >>> df.pivot_table(index='Name', values='Sports', columns='Sport_num', aggfunc='first')
    Sport_num Sport 0        Sport 1       Sport 2
    Name                                          
    James       Dance            MMA  Crosscountry
    Sally      Tennis  Track & field         Dance
    

    【讨论】:

      【解决方案2】:

      另一种解决方案:

      print(
          df.groupby("Name")
          .agg(list)["Sports"]
          .apply(pd.Series)
          .rename(columns={0: "First", 1: "Second", 2: "Third"})
          .add_suffix("_sport")
          .reset_index()
      )
      

      打印:

          Name First_sport   Second_sport   Third_sport
      0  James       Dance            MMA  Crosscountry
      1  Sally      Tennis  Track & field         Dance
      

      【讨论】:

        【解决方案3】:

        我们还可以将groupby cumcountset_index + unstack 结合使用:

        new_df = df.set_index(['Name', df.groupby('Name').cumcount()]).unstack()
        

        new_df:

               Sports                             
                    0              1             2
        Name                                      
        James   Dance            MMA  Crosscountry
        Sally  Tennis  Track & field         Dance
        

        我们可以通过重命名和折叠 MultiIndex 来做一些额外的清理工作:

        new_df = (
            df.set_index(['Name', df.groupby('Name').cumcount()])
                .unstack()
                .rename(columns={0: "First", 1: "Second", 2: "Third",
                                 'Sports': 'Sport'})
        )
        new_df.columns = new_df.columns.swaplevel().map('_'.join)
        new_df = new_df.reset_index()
        

        new_df:

            Name First_Sport   Second_Sport   Third_Sport
        0  James       Dance            MMA  Crosscountry
        1  Sally      Tennis  Track & field         Dance
        

        如果想要从整数到序数词的程序化转换,我们可以使用类似inflect:

        import inflect
        
        new_df = df.set_index([
            'Name', df.groupby('Name').cumcount().add(1)
        ]).unstack()
        # Collapse MultiIndex
        p = inflect.engine()
        new_df.columns = new_df.columns.map(
            # Convert to Ordinal Word and Column to singular noun
            lambda c: f'{p.number_to_words(p.ordinal(c[1])).capitalize()}_'
                      f'{p.singular_noun(c[0])}'
        )
        new_df = new_df.reset_index()
        

        new_df:

            Name First_Sport   Second_Sport   Third_Sport
        0  James       Dance            MMA  Crosscountry
        1  Sally      Tennis  Track & field         Dance
        

        【讨论】:

          猜你喜欢
          • 2021-11-05
          • 2013-07-16
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2014-06-23
          • 1970-01-01
          相关资源
          最近更新 更多