【问题标题】:Make dataframe header as rows and row as header将数据框标题设为行,将行设为标题
【发布时间】:2021-09-28 15:44:22
【问题描述】:

我有一个数据框,我想将列标题转换为聚合行,并将列转换为标题:

Athlete  Variable    28.0   28.0  28.0  29.0   29.0   29.0

John     Minutes     23     32    43    21     43     35
John     Distance    23.2   32.7  54.2  24.2   47.8   32.1
John     Sprints     5      3     8     3      9      2
Bob      Minutes     25     23    43    31     38     23
Bob      Distance    32.4   35.6  43.7  21.8   26.8   12.5
Bob      Sprints     8      4     8     6      9      2

所需的数据框:

Athlete   week    Minutes   Distance   Sprints

John      28      98        110.1      16
John      29      99        104.1      14
Bob       28      91        111.7      20
Bob       29      92        61.1       17

【问题讨论】:

    标签: python pandas dataframe


    【解决方案1】:

    您正在寻找pivot_table,但您需要先将数据帧melt 转换为一维格式:

    >>> data = df.melt(id_vars=['Athlete', 'Variable'], var_name='week')
    >>> data
       Athlete  Variable  week  value
    0     John   Minutes  28.0   23.0
    1     John  Distance  28.0   23.2
    2     John   Sprints  28.0    5.0
    3      Bob   Minutes  28.0   25.0
    4      Bob  Distance  28.0   32.4
    5      Bob   Sprints  28.0    8.0
    6     John   Minutes  28.0   32.0
    7     John  Distance  28.0   32.7
    8     John   Sprints  28.0    3.0
    9      Bob   Minutes  28.0   23.0
    10     Bob  Distance  28.0   35.6
    11     Bob   Sprints  28.0    4.0
    12    John   Minutes  28.0   43.0
    13    John  Distance  28.0   54.2
    14    John   Sprints  28.0    8.0
    15     Bob   Minutes  28.0   43.0
    16     Bob  Distance  28.0   43.7
    17     Bob   Sprints  28.0    8.0
    18    John   Minutes  29.0   21.0
    19    John  Distance  29.0   24.2
    20    John   Sprints  29.0    3.0
    21     Bob   Minutes  29.0   31.0
    22     Bob  Distance  29.0   21.8
    23     Bob   Sprints  29.0    6.0
    24    John   Minutes  29.0   43.0
    25    John  Distance  29.0   47.8
    26    John   Sprints  29.0    9.0
    27     Bob   Minutes  29.0   38.0
    28     Bob  Distance  29.0   26.8
    29     Bob   Sprints  29.0    9.0
    30    John   Minutes  29.0   35.0
    31    John  Distance  29.0   32.1
    32    John   Sprints  29.0    2.0
    33     Bob   Minutes  29.0   23.0
    34     Bob  Distance  29.0   12.5
    35     Bob   Sprints  29.0    2.0
    

    所以使用pivot_table,你必须指定聚合函数,这里我们使用sum,默认为mean

    >>> data.pivot_table(index=['Athlete', 'week'], columns='Variable', values='value', aggfunc='sum')
    Variable      Distance  Minutes  Sprints
    Athlete week                            
    Bob     28.0     111.7     91.0     20.0
            29.0      61.1     92.0     17.0
    John    28.0     110.1     98.0     16.0
            29.0     104.1     99.0     14.0
    

    【讨论】:

      【解决方案2】:

      先尝试groupby,然后再使用pivotstack

      out = df.groupby(level=0,axis=1).sum().pivot('Athlete','Variable').\
                rename_axis(['week',None],axis=1).stack(level=0).reset_index()
      Out[466]: 
        Athlete  week  Distance  Minutes  Sprints
      0     Bob  28.0     111.7     91.0     20.0
      1     Bob  29.0      61.1     92.0     17.0
      2    John  28.0     110.1     98.0     16.0
      3    John  29.0     104.1     99.0     14.0
      

      【讨论】:

      • 似乎对我不起作用。 AthleteVariable 已经在索引中,然后 pivot 失败,或者它们不是,然后 groupby 返回一个空数据框。
      • @Cimbali 你可以做 df = df.reset_index(),然后应用上面的方法
      • .reset_index() 必须介于.sum().pivot() 之间,或者您可以使用df.groupby(level=0,axis=1).sum().stack().unstack('Variable')
      猜你喜欢
      • 2019-10-14
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2022-07-26
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2012-09-18
      相关资源
      最近更新 更多