【问题标题】:"unfair" pandas categorical.from_codes“不公平”熊猫 categorical.from_codes
【发布时间】:2018-05-31 04:18:45
【问题描述】:

我必须为分类数据分配一个标签。让我们以 iris 为例:

import pandas as pd
import numpy as np
from sklearn.datasets import load_iris

iris = load_iris()

print "targets: ", np.unique(iris.target)
print "targets: ", iris.target.shape
print "target_names: ", np.unique(iris.target_names)
print "target_names: ", iris.target_names.shape

它将被打印出来:

目标:[0 1 2] 目标:(150L,)目标名称:['setosa' 'versicolor' 'virginica'] target_names: (3L,)

为了生成所需的标签,我使用 pandas.Categorical.from_codes:

print pd.Categorical.from_codes(iris.target, iris.target_names)

[setosa, setosa, setosa, setosa, setosa, ..., 维吉尼亚, 维吉尼亚, [virginica, virginica, virginica] 长度:150 类别(3,对象): [setosa, versicolor, 弗吉尼亚]

让我们在另一个例子上试一试:

# I define new targets
target = np.array([123,123,54,123,123,54,2,54,2])
target = np.array([1,1,3,1,1,3,2,3,2])
target_names = np.array(['paglia','gioele','papa'])
#---
print "targets: ", np.unique(target)
print "targets: ", target.shape
print "target_names: ", np.unique(target_names)
print "target_names: ", target_names.shape

如果我再次尝试转换标签中的分类值:

print pd.Categorical.from_codes(target, target_names) 

我收到错误消息:

C:\Users\ianni\Anaconda2\lib\site-packages\pandas\core\categorical.pyc 在 from_codes(cls, 代码, 类别, 有序) 459 460 如果 len(codes) 和 (codes.max() >= len(categories) 或 codes.min() 461 raise ValueError("代码必须介于 -1 和 " 第462章 第463章

ValueError: 代码需要介于 -1 和 len(categories)-1 之间

你知道为什么吗?

【问题讨论】:

    标签: python pandas categorical-data python-iris


    【解决方案1】:

    你知道为什么吗?

    如果您仔细查看错误回溯:

    In [128]: pd.Categorical.from_codes(target, target_names)
    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    <ipython-input-128-c2b4f6ac2369> in <module>()
    ----> 1 pd.Categorical.from_codes(target, target_names)
    
    ~\Anaconda3_5.0\envs\py36\lib\site-packages\pandas\core\categorical.py in from_codes(cls, codes, categories, ordered)
        619
        620         if len(codes) and (codes.max() >= len(categories) or codes.min() < -1):
    --> 621             raise ValueError("codes need to be between -1 and "
        622                              "len(categories)-1")
        623
    
    ValueError: codes need to be between -1 and len(categories)-1
    

    你会看到满足以下条件:

    codes.max() >= len(categories)
    

    在你的情况下:

    In [133]: target.max() >= len(target_names)
    Out[133]: True
    

    换句话说,pd.Categorical.from_codes() 期望 codes 作为从 0len(categories) - 1 的连续数字

    解决方法:

    In [173]: target
    Out[173]: array([123, 123,  54, 123, 123,  54,   2,  54,   2])
    

    辅助字典:

    In [174]: mapping = dict(zip(np.unique(target), np.arange(len(target_names))))
    
    In [175]: mapping
    Out[175]: {2: 0, 54: 1, 123: 2}
    
    In [176]: reverse_mapping = {v:k for k,v in mapping.items()}
    
    In [177]: reverse_mapping
    Out[177]: {0: 2, 1: 54, 2: 123}
    

    建筑分类系列:

    In [178]: ser = pd.Categorical.from_codes(pd.Series(target).map(mapping), target_names)
    
    In [179]: ser
    Out[179]:
    [papa, papa, gioele, papa, papa, gioele, paglia, gioele, paglia]
    Categories (3, object): [paglia, gioele, papa]
    

    反向映射:

    In [180]: pd.Series(ser.codes).map(reverse_mapping)
    Out[180]:
    0    123
    1    123
    2     54
    3    123
    4    123
    5     54
    6      2
    7     54
    8      2
    dtype: int64
    

    【讨论】:

      猜你喜欢
      • 2016-06-04
      • 2018-12-01
      • 2017-02-24
      • 2021-12-14
      • 2017-06-03
      • 2017-01-28
      • 2015-12-28
      • 2018-10-01
      相关资源
      最近更新 更多