选择 Pandas 数据框分组的列答案

【问题标题】：Select columns that a Pandas dataframe was grouped by选择 Pandas 数据框分组的列
【发布时间】：2021-05-17 16:05:38
【问题描述】：

我有一个熊猫数据框flsa：

flsa[:10]

        auc  topics       ww  top-n  fold
0  0.668729      11  entropy     10     1
1  0.609736      11  entropy     10     2
2  0.654445      11  entropy     10     3
3  0.612886      11  entropy     10     4
4  0.596460      11  entropy     10     5
5  0.654208      11  entropy     15     1
6  0.620610      11  entropy     15     2
7  0.637275      11  entropy     15     3
8  0.603725      11  entropy     15     4
9  0.596100      11  entropy     15     5

现在，我将它们分组如下：

mean_flsa_auc = flsa.groupby(['topics','ww']).mean('auc').drop('fold', axis =  1).drop('top-n', axis=1)

导致：

mean_flsa_auc[:10]

                     auc
topics ww               
3      entropy  0.610580
       idf      0.593962
       normal   0.623830
       probidf  0.598362
5      entropy  0.623360
       idf      0.619105
       normal   0.644371
       probidf  0.617489
7      entropy  0.631131
       idf      0.624773

现在，我想制作以下折线图：x 轴：主题，y 轴：auc，4 行：熵、idf、正常、概率。

但是，每当我想选择所有“熵”值时：

mean_flsa_auc[mean_flsa_auc['ww'] == 'entropy']

我收到以下错误：

Traceback (most recent call last):

  File "C:\Users\20200016\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 2895, in get_loc
    return self._engine.get_loc(casted_key)

  File "pandas\_libs\index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc

  File "pandas\_libs\index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc

  File "pandas\_libs\hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item

  File "pandas\_libs\hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item

KeyError: 'ww'


    The above exception was the direct cause of the following exception:
    
    Traceback (most recent call last):
    
      File "<ipython-input-490-0dacb2bb9cf3>", line 1, in <module>
        mean_flsa_auc[mean_flsa_auc['ww'] == 'entropy']
    
      File "C:\Users\20200016\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2902, in __getitem__
        indexer = self.columns.get_loc(key)
    
      File "C:\Users\20200016\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 2897, in get_loc
        raise KeyError(key) from err
    
    KeyError: 'ww'

我怀疑我将mean_flsa_auc 视为dataframe 对象，而它现在是groupby 对象。但我不知道如何更改我的代码，以便获得 groupby 对象中所有 entropy 值的列表。

谁能帮我解决这个问题？

【问题讨论】：

标签： python python-3.x pandas dataframe pandas-groupby

【解决方案1】：

您可以在groupby() 语句中使用as_index=False 来保留groupby 字段的列，如下所示：

mean_flsa_auc = flsa.groupby(['topics','ww'], as_index=False).mean('auc').drop('fold', axis =  1).drop('top-n', axis=1)

默认情况下，groupby() 将分组字段设置为索引，因此您无法像以前那样访问这些字段，就像普通数据列一样。使用参数index=False，这些字段不会设置为索引，会保留在数据列中。

或者，您也可以在之后使用现有代码执行.reset_index() 将索引字段重新定位回数据列，如下所示：

mean_flsa_auc = mean_flsa_auc.reset_index()

然后，您可以访问ww 列。

【讨论】：