【问题标题】:Select columns that a Pandas dataframe was grouped by选择 Pandas 数据框分组的列
【发布时间】:2021-05-17 16:05:38
【问题描述】:

我有一个熊猫数据框flsa

flsa[:10]

        auc  topics       ww  top-n  fold
0  0.668729      11  entropy     10     1
1  0.609736      11  entropy     10     2
2  0.654445      11  entropy     10     3
3  0.612886      11  entropy     10     4
4  0.596460      11  entropy     10     5
5  0.654208      11  entropy     15     1
6  0.620610      11  entropy     15     2
7  0.637275      11  entropy     15     3
8  0.603725      11  entropy     15     4
9  0.596100      11  entropy     15     5

现在,我将它们分组如下:

mean_flsa_auc = flsa.groupby(['topics','ww']).mean('auc').drop('fold', axis =  1).drop('top-n', axis=1)

导致:

mean_flsa_auc[:10]

                     auc
topics ww               
3      entropy  0.610580
       idf      0.593962
       normal   0.623830
       probidf  0.598362
5      entropy  0.623360
       idf      0.619105
       normal   0.644371
       probidf  0.617489
7      entropy  0.631131
       idf      0.624773

现在,我想制作以下折线图:x 轴:主题,y 轴:auc,4 行:熵、idf、正常、概率。

但是,每当我想选择所有“熵”值时:

mean_flsa_auc[mean_flsa_auc['ww'] == 'entropy']

我收到以下错误:

Traceback (most recent call last):

  File "C:\Users\20200016\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 2895, in get_loc
    return self._engine.get_loc(casted_key)

  File "pandas\_libs\index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc

  File "pandas\_libs\index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc

  File "pandas\_libs\hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item

  File "pandas\_libs\hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item

KeyError: 'ww'


    The above exception was the direct cause of the following exception:
    
    Traceback (most recent call last):
    
      File "<ipython-input-490-0dacb2bb9cf3>", line 1, in <module>
        mean_flsa_auc[mean_flsa_auc['ww'] == 'entropy']
    
      File "C:\Users\20200016\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2902, in __getitem__
        indexer = self.columns.get_loc(key)
    
      File "C:\Users\20200016\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 2897, in get_loc
        raise KeyError(key) from err
    
    KeyError: 'ww'

我怀疑我将mean_flsa_auc 视为dataframe 对象,而它现在是groupby 对象。但我不知道如何更改我的代码,以便获得 groupby 对象中所有 entropy 值的列表。

谁能帮我解决这个问题?

【问题讨论】:

    标签: python python-3.x pandas dataframe pandas-groupby


    【解决方案1】:

    您可以在groupby() 语句中使用as_index=False 来保留groupby 字段的列,如下所示:

    mean_flsa_auc = flsa.groupby(['topics','ww'], as_index=False).mean('auc').drop('fold', axis =  1).drop('top-n', axis=1)
    

    默认情况下,groupby() 将分组字段设置为索引,因此您无法像以前那样访问这些字段,就像普通数据列一样。使用参数index=False,这些字段不会设置为索引,会保留在数据列中。

    或者,您也可以在之后使用现有代码执行.reset_index() 将索引字段重新定位回数据列,如下所示:

    mean_flsa_auc = mean_flsa_auc.reset_index()
    

    然后,您可以访问ww 列。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-02-23
      • 2012-07-02
      相关资源
      最近更新 更多