【问题标题】:groupby with multi level column index python具有多级列索引python的groupby
【发布时间】:2022-01-18 18:19:11
【问题描述】:

我有一个带有多级列索引的数据框。我想获取按 group 分组的 g1, g2 列(每个级别 1 索引 (1,2))的交叉表列 (a, b)。我以为我可以只调用顶级列就可以逃脱,但我有点卡住了。我希望最终作为输出的数据帧在下面的 d2 中。欢迎所有的cmets,非常感谢

# the dataframe that I have 
d1 = pd.DataFrame((['i1', 'a', 'dog', 'mouse','cat','mouse'],['i2','a','cat','mouse','dog','dog'],['i3', 'a', 'dog', 'dog','cat','dog'],['i4','b','cat','dog','dog','cat']), columns = pd.MultiIndex.from_tuples(list(zip(*[['id','group','g1','g1','g2','g2'], ['-','-','1','2','1','2']]))))

# what I thought would work...
d1 = d1.set_index('id')
d1.groupby(['group'])['g1'].value_counts()


# the dataframe that I would like to have 
d2 = pd.DataFrame((['a', 'dog', 2,1,1,2],['a','mouse',0,2,0,1],['a','cat',1,0,2,0],['b','cat',1,0,1,1],['b','dog',0,1,1,1]), columns = pd.MultiIndex.from_tuples(list(zip(*[['group','category','g1','g1','g2','g2'], ['-','-','1','2','1','2']]))))

【问题讨论】:

    标签: python pandas dataframe group-by


    【解决方案1】:

    我建议先重组d1...

    d1 = d1.set_index([('id','-'),('group','-')]).stack([0,1]).reset_index()
    d1.columns = ['id','group','level_1','level_2','category']
    
        id group level_1 level_2 category
    0   i1     a      g1       1      dog
    1   i1     a      g1       2    mouse
    2   i1     a      g2       1      cat
    3   i1     a      g2       2    mouse
    4   i2     a      g1       1      cat
    5   i2     a      g1       2    mouse
    6   i2     a      g2       1      dog
    7   i2     a      g2       2      dog
    8   i3     a      g1       1      dog
    9   i3     a      g1       2      dog
    10  i3     a      g2       1      cat
    11  i3     a      g2       2      dog
    12  i4     b      g1       1      cat
    13  i4     b      g1       2      dog
    14  i4     b      g2       1      dog
    15  i4     b      g2       2      cat
    

    ...然后使用pivot_tablegroupby(结果相同)...

    # pivot_table
    d2 = pd.pivot_table(d1, index=['group', 'category'], columns=['level_1','level_2'], aggfunc='count', fill_value=0).droplevel(0, axis=1).rename_axis([None,None], axis=1)
    
    # groupby
    d2 = d1.groupby(['group','category','level_1','level_2'])['id'].count().unstack(['level_1','level_2'], fill_value=0).rename_axis([None,None], axis=1).sort_index(axis=1)
    
                   g1    g2   
                    1  2  1  2
    group category            
    a     cat       1  0  2  0
          dog       2  1  1  2
          mouse     0  2  0  1
    b     cat       1  0  0  1
          dog       0  1  1  0
    

    【讨论】:

      猜你喜欢
      • 2020-12-17
      • 2021-07-13
      • 2016-12-08
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2019-12-14
      • 1970-01-01
      相关资源
      最近更新 更多