多个groupby后如何将pandas数据从索引移动到列答案

【问题标题】：How to move pandas data from index to column after multiple groupby多个groupby后如何将pandas数据从索引移动到列
【发布时间】：2014-03-13 03:18:18
【问题描述】：

我有以下熊猫数据框：

token    year    uses  books
  386   xanthos  1830    3     3
  387   xanthos  1840    1     1
  388   xanthos  1840    2     2
  389   xanthos  1868    2     2
  390   xanthos  1875    1     1

我像这样聚合具有重复的 token 和 years 的行：

dfalph = dfalph[['token','year','uses','books']].groupby(['token', 'year']).agg([np.sum])
dfalph.columns = dfalph.columns.droplevel(1)

               uses  books
token    year       
xanthos  1830    3     3
         1840    3     3
         1867    2     2
         1868    2     2
         1875    1     1

我不想在索引中包含“token”和“year”字段，而是希望将它们返回到列并拥有一个整数索引。

【问题讨论】：

标签： python pandas pandas-groupby multi-index

【解决方案1】：

方法#1：reset_index()

>>> g
              uses  books
               sum    sum
token   year             
xanthos 1830     3      3
        1840     3      3
        1868     2      2
        1875     1      1

[4 rows x 2 columns]
>>> g = g.reset_index()
>>> g
     token  year  uses  books
                   sum    sum
0  xanthos  1830     3      3
1  xanthos  1840     3      3
2  xanthos  1868     2      2
3  xanthos  1875     1      1

[4 rows x 4 columns]

方法#2：一开始就不做索引，使用as_index=False

>>> g = dfalph[['token', 'year', 'uses', 'books']].groupby(['token', 'year'], as_index=False).sum()
>>> g
     token  year  uses  books
0  xanthos  1830     3      3
1  xanthos  1840     3      3
2  xanthos  1868     2      2
3  xanthos  1875     1      1

[4 rows x 4 columns]

【讨论】：

【解决方案2】：

我推迟接受的答案。虽然有两种方法可以做到这一点，但这些方法不一定会产生相同的输出。特别是当您在groupby 中使用Grouper 时

index=False
reset_index()

示例df

+---------+---------+-------------+------------+
| column1 | column2 | column_date | column_sum |
+---------+---------+-------------+------------+
| A       | M       | 26-10-2018  |          2 |
| B       | M       | 28-10-2018  |          3 |
| A       | M       | 30-10-2018  |          6 |
| B       | M       | 01-11-2018  |          3 |
| C       | N       | 03-11-2018  |          4 |
+---------+---------+-------------+------------+

它们的工作方式不同。

df = df.groupby(
    by=[
        'column1',
        'column2',
        pd.Grouper(key='column_date', freq='M')
    ],
    as_index=False
).sum()

以上将给出

+---------+---------+------------+
| column1 | column2 | column_sum |
+---------+---------+------------+
| A       | M       |          8 |
| B       | M       |          3 |
| B       | M       |          3 |
| C       | N       |          4 |
+---------+---------+------------+

虽然，

df = df.groupby(
    by=[
        'column1',
        'column2',
        pd.Grouper(key='column_date', freq='M')
    ]
).sum().reset_index()

会给

+---------+---------+-------------+------------+
| column1 | column2 | column_date | column_sum |
+---------+---------+-------------+------------+
| A       | M       | 31-10-2018  |          8 |
| B       | M       | 31-10-2018  |          3 |
| B       | M       | 30-11-2018  |          3 |
| C       | N       | 30-11-2018  |          4 |
+---------+---------+-------------+------------+

【讨论】：

【解决方案3】：

您需要添加drop=True:

df.reset_index(drop=True)

df = df.groupby(
    by=[
        'column1',
        'column2',
        pd.Grouper(key='column_date', freq='M')
    ]
).sum().reset_index(drop=True)

【讨论】：

【解决方案4】：

如果您有MultiIndex 并且只想重置特定的索引级别，您可以使用reset_index 中的参数level。例如：

index = pd.MultiIndex.from_tuples([('one', 'a'), ('one', 'b'), ('two', 'a'), ('two', 'b')], names=['A', 'B'])
s = pd.DataFrame(np.arange(1.0, 5.0), index=index, columns=['C'])

        C
A   B     
one a  1.0
    b  2.0
two a  3.0
    b  4.0

重置第一级：

df.reset_index(level=0)

输出：

     A    C
B          
a  one  1.0
b  one  2.0
a  two  3.0
b  two  4.0

重置第二级：

df.reset_index(level=1)

输出：

     B    C
A          
one  a  1.0
one  b  2.0
two  a  3.0
two  b  4.0

【讨论】：