如何在 pandas.multiindex 级别应用条件？答案

【问题标题】：How to apply condition on level of pandas.multiindex?如何在 pandas.multiindex 级别应用条件？
【发布时间】：2012-10-18 15:25:34
【问题描述】：

我的数据如下所示（ch = 通道，det = 检测器）：

ch det time counts 
1   1    0    123
    2    0    121
    3    0    125 
2   1    0    212
    2    0    210
    3    0    210 
1   1    1    124
    2    1    125
    3    1    123 
2   1    1    210
    2    1    209
    3    1    213

请注意，实际上，时间列是 float，有 12 位左右的有效数字，对于 1 次测量的所有检测器仍然保持不变，但其值不可预测，也不按顺序排列。

我需要创建一个如下所示的数据框：

c  time  mean_counts_over_detectors
1   0       xxx
2   0       yyy
1   1       zzz
1   1       www

即，我想在每个时间分别对 1 个通道的检测器的所有计数应用 np.mean。我可以编写笨拙的循环，但我觉得 pandas 必须为此内置一些东西。我仍然是 pandas 的初学者，尤其是 MultiIndex 有很多概念，我不确定我应该在文档中寻找什么。

标题包含“条件”，因为我认为也许我想要一个通道的所有检测器的平均值对于时间相同的计数这一事实可以表示为切片条件。

【问题讨论】：

标签： python pandas dataframe multi-index

【解决方案1】：

与 @meteore 相同，但具有 MultiIndex。

In [55]: df
Out[55]:
             counts
ch det time
1  1   0        123
   2   0        121
   3   0        125
2  1   0        212
   2   0        210
   3   0        210
1  1   1        124
   2   1        125
   3   1        123
2  1   1        210
   2   1        209
   3   1        213

In [56]: df.index
Out[56]:
MultiIndex
[(1L, 1L, 0L) (1L, 2L, 0L) (1L, 3L, 0L) (2L, 1L, 0L) (2L, 2L, 0L)
 (2L, 3L, 0L) (1L, 1L, 1L) (1L, 2L, 1L) (1L, 3L, 1L) (2L, 1L, 1L)
 (2L, 2L, 1L) (2L, 3L, 1L)]

In [57]: df.index.names
Out[57]: ['ch', 'det', 'time']

In [58]: df.groupby(level=['ch', 'time']).mean()
Out[58]:
             counts
ch time
1  0     123.000000
   1     124.000000
2  0     210.666667
   1     210.666667

小心使用浮点数和分组（这与 MultiIndex 无关），由于与浮点数相关的数字表示/准确性限制，组可能会有所不同。

【讨论】：

为什么在使用 groupby 之前有一个多索引会有帮助？
使用了多索引，只是因为问题中的示例数据框使用了多索引

【解决方案2】：

不使用MultiIndexes（如果有的话，可以通过df.reset_index()去掉）：

chans = [1,1,1,2,2,2,1,1,1,2,2,2]
df = pd.DataFrame(dict(ch=chans, det=[1,2,3,1,2,3,1,2,3,1,2,3], time=6*[0]+6*[1], counts=np.random.randint(0,500,12)))

使用groupby 和mean 作为聚合函数：

>>> df.groupby(['time', 'ch'])['counts'].mean()
time  ch
0     1     315.000000
      2     296.666667
1     1     178.333333
      2     221.666667
Name: counts

其他聚合函数可以通过agg传递：

>>> df.groupby(['time', 'ch'])['counts'].agg(np.ptp)

【讨论】：

对不起，我问了 MultiIndex，这是一个艰难的选择，我把它交给 Wouter，好吗？当然给了你“UP”。