如何在熊猫中使用多索引迭代系列答案

【问题标题】：how to iterate a Series with multiindex in pandas如何在熊猫中使用多索引迭代系列
【发布时间】：2016-03-08 10:34:20
【问题描述】：

我是熊猫的初学者。现在我想用熊猫实现决策树算法。首先，我将测试数据读入一个 padas.DataFrame，如下所示：

In [4]: df = pd.read_csv('test.txt', sep = '\t')

In [5]: df
Out[5]:
  Chocolate Vanilla Strawberry Peanut
0         Y       N          Y      Y
1         N       Y          Y      N
2         N       N          N      N
3         Y       Y          Y      Y
4         Y       Y          N      Y
5         N       N          N      N
6         Y       Y          Y      Y
7         N       Y          N      N
8         Y       N          Y      N
9         Y       N          Y      Y

然后我将“花生”和“巧克力”分组，得到的是：

In [15]: df2 = df.groupby(['Peanut', 'Chocolate'])

In [16]: serie1 = df2.size()

In [17]: serie1
Out[17]:
Peanut  Chocolate
N       N            4
        Y            1
Y       Y            5
dtype: int64

现在，serie1 的类型是 Series。我可以访问 serie1 的值，但无法获得“花生”和“巧克力”的值。如何同时获得 serie1 的数量和 'Peanut' 和 'Chocolate 的值？

【问题讨论】：

为什么不能只重置索引？ serie1.reset_index()?

标签： pandas series multi-index

【解决方案1】：

你可以使用index:

>>> serie1.index
MultiIndex(levels=[[u'N', u'Y'], [u'N', u'Y']],
           labels=[[0, 0, 1], [0, 1, 1]],
           names=[u'Peanut', u'Chocolate'])

您可以获取列名和级别的值。请注意，标签是指级别中同一行中的索引。例如，对于“花生”，第一个标签是levels[0][labels[0][0]]，即“N”。 “巧克力”的最后一个标签是 levels[1][labels[1][2]]，即“Y”。

我创建了一个小例子，它遍历索引并打印所有数据：

#loop the rows
for i in range(len(serie1)):
   print "Row",i,"Value",serie1.iloc[i],
   #loop the columns
   for j in range(len(serie1.index.names)):
      print "Column",serie1.index.names[j],"Value",serie1.index.levels[j][serie1.index.labels[j][i]],
   print

结果：

Row 0 Value 4 Column Peanut Value N Column Chocolate Value N
Row 1 Value 1 Column Peanut Value N Column Chocolate Value Y
Row 2 Value 5 Column Peanut Value Y Column Chocolate Value Y

【讨论】：

谢谢。所以我可以使用 serie1.index.levels 和 series.index.labels 来获取列的值。有点混乱，我想我需要几分钟才能完全理解这一点。再次感谢。
是的，您可以看到index.labels 的值实际上是index.levels 中数组的索引。我希望你能通过我举的例子理解它。