【问题标题】:How to use Group by, Pivot_table, Stack & Unstack to reshape Pandas Dataframe如何使用 Group by、Pivo​​t_table、Stack 和 Unstack 重塑 Pandas Dataframe
【发布时间】:2018-04-24 13:20:05
【问题描述】:

我有一个如下所示的数据框:

我想将其更改为:

我知道使用 groupby &/或 pivot_table &/or stack 很容易做到这一点 - 我似乎无法摆脱基础。我的笔记并没有很好地告诉我如何做到这一点。我以为我与 pandas 文档中的 pivot_table 很接近——但连一级都做不到——更不用说 2 级了。因为我不想聚合任何东西。我的笔记都在做聚合...

欢迎采纳任何建议

创建第一个数据框的代码:

df2 = pd.DataFrame({'CPC_qtr_root': {13: 0.13493790567404607,
  14: 0.14353736611331172,
  15: 0.10359919568913414,
  16: 0.077153346715340618,
  17: 0.066759430932458397,
  39: 0.12067193385680651,
  40: 0.049033000970486448,
  41: 0.047640864406214359,
  42: 0.040086869604689483,
  43: 0.038795815932666726,
  100: 0.11017683494905577,
  101: 0.15510499735697988,
  102: 0.16478351543691827,
  103: 0.091894700285988867,
  104: 0.0359603120618152},
 'Country': {13: u'Afghanistan',
  14: u'Afghanistan',
  15: u'Afghanistan',
  16: u'Afghanistan',
  17: u'Afghanistan',
  39: u'Albania',
  40: u'Albania',
  41: u'Albania',
  42: u'Albania',
  43: u'Albania',
  100: u'Angola',
  101: u'Angola',
  102: u'Angola',
  103: u'Angola',
  104: u'Angola'},
 'IncomeLevel': {13: 'Lower Income',
  14: 'Lower Income',
  15: 'Lower Income',
  16: 'Lower Income',
  17: 'Lower Income',
  39: 'Upper Middle Income',
  40: 'Upper Middle Income',
  41: 'Upper Middle Income',
  42: 'Upper Middle Income',
  43: 'Upper Middle Income',
  100: 'Lower Middle Income',
  101: 'Lower Middle Income',
  102: 'Lower Middle Income',
  103: 'Lower Middle Income',
  104: 'Lower Middle Income'},
 'Rate': {13: 27.0,
  14: 37.0,
  15: 35.0,
  16: 39.0,
  17: 48.0,
  39: 95.0,
  40: 95.0,
  41: 96.0,
  42: 93.0,
  43: 96.0,
  100: 36.0,
  101: 65.0,
  102: 66.0,
  103: 52.0,
  104: 52.0},
 'Year': {13: 2000,
  14: 2001,
  15: 2002,
  16: 2003,
  17: 2004,
  39: 2000,
  40: 2001,
  41: 2002,
  42: 2003,
  43: 2004,
  100: 2000,
  101: 2001,
  102: 2002,
  103: 2003,
  104: 2004}})

【问题讨论】:

    标签: python pandas dataframe pivot-table reshape


    【解决方案1】:

    set_indexstackunstack 一起使用:

    df3 = df2.set_index(['Year','Country']).stack().unstack(1)
    print (df3)
    Country             Afghanistan              Albania               Angola
    Year                                                                     
    2000 CPC_qtr_root      0.134938             0.120672             0.110177
         IncomeLevel   Lower Income  Upper Middle Income  Lower Middle Income
         Rate                    27                   95                   36
    2001 CPC_qtr_root      0.143537             0.049033             0.155105
         IncomeLevel   Lower Income  Upper Middle Income  Lower Middle Income
         Rate                    37                   95                   65
    2002 CPC_qtr_root      0.103599            0.0476409             0.164784
         IncomeLevel   Lower Income  Upper Middle Income  Lower Middle Income
         Rate                    35                   96                   66
    2003 CPC_qtr_root     0.0771533            0.0400869            0.0918947
         IncomeLevel   Lower Income  Upper Middle Income  Lower Middle Income
         Rate                    39                   93                   52
    2004 CPC_qtr_root     0.0667594            0.0387958            0.0359603
         IncomeLevel   Lower Income  Upper Middle Income  Lower Middle Income
         Rate                    48                   96                   52
    

    获取混合类型:

    print (df3.head().applymap(type))
    Country                Afghanistan          Albania           Angola
    Year                                                                
    2000 CPC_qtr_root  <class 'float'>  <class 'float'>  <class 'float'>
         IncomeLevel     <class 'str'>    <class 'str'>    <class 'str'>
         Rate          <class 'float'>  <class 'float'>  <class 'float'>
    2001 CPC_qtr_root  <class 'float'>  <class 'float'>  <class 'float'>
         IncomeLevel     <class 'str'>    <class 'str'>    <class 'str'>
    

    【讨论】:

    • 有没有一种简单的方法可以将那些看起来像数字的字符串转换回数字?
    • 它似乎将值作为混合类型处理 - 但 Year 正在做一些奇怪的事情。我将它用作绘图中的 x 轴,而不是使用年份,它正在转换它们并在轴标签中添加某种力量。无论如何 - 你给了我一个很好的答案谢谢!
    【解决方案2】:

    您可以先将数据框从宽到长融合,使用 YearCountry 作为 ID,使用 IncomeLevelCPC_qtr_rootRate 作为值:

    df3 = pd.melt(df2, id_vars=['Year', 'Country'], value_vars=['IncomeLevel', 'CPC_qtr_root', 'Rate'])
    

    然后你可以旋转你的表:

    pd.pivot_table(df3, index = ['Year', 'variable'], 
                    columns = 'Country',
                    values = 'value',
                    aggfunc = np.sum,
                    fill_value = 0)
    

    这会返回:

    Country             Afghanistan              Albania               Angola
    Year variable                                                            
    2000 CPC_qtr_root      0.134938             0.120672             0.110177
         IncomeLevel   Lower Income  Upper Middle Income  Lower Middle Income
         Rate                    27                   95                   36
    2001 CPC_qtr_root      0.143537             0.049033             0.155105
         IncomeLevel   Lower Income  Upper Middle Income  Lower Middle Income
         Rate                    37                   95                   65
    2002 CPC_qtr_root      0.103599            0.0476409             0.164784
         IncomeLevel   Lower Income  Upper Middle Income  Lower Middle Income
         Rate                    35                   96                   66
    2003 CPC_qtr_root     0.0771533            0.0400869            0.0918947
         IncomeLevel   Lower Income  Upper Middle Income  Lower Middle Income
         Rate                    39                   93                   52
    2004 CPC_qtr_root     0.0667594            0.0387958            0.0359603
         IncomeLevel   Lower Income  Upper Middle Income  Lower Middle Income
         Rate                    48                   96                   52
    

    【讨论】:

      猜你喜欢
      • 2018-09-24
      • 1970-01-01
      • 2023-04-08
      • 2016-02-09
      • 1970-01-01
      • 1970-01-01
      • 2021-05-28
      • 2015-03-07
      • 2023-02-04
      相关资源
      最近更新 更多