【问题标题】:ValueError on inverse transform using OrdinalEncoder with dictionary使用带有字典的 OrdinalEncoder 进行逆变换时的 ValueError
【发布时间】:2022-01-02 20:16:09
【问题描述】:

我可以使用分类编码和序数编码将目标列转换为所需的有序数值。但我无法执行inverse_transform,因为错误显示如下所示。

import pandas as pd
import category_encoders as ce
from sklearn.preprocessing import OrdinalEncoder

lst = [ 'BRANCHING/ELONGATION', 'EARLY', 'EARLY', 'EARLY', 'EARLY', 'MID', 'MID',  'ADVANCED/TILLERING',
        'FLOWERING', 'FLOWERING', 'FLOWERING', 'SEEDLING/EMERGED']
  
filtered_df = pd.DataFrame(lst, columns =['growth_state'])

filtered_df['growth_state'].value_counts()

EARLY                   4
FLOWERING               3
MID                     2
ADVANCED/TILLERING      1
SEEDLING/EMERGED        1
BRANCHING/ELONGATION    1
Name: growth_state, dtype: int64

dictionary = [{'col': 'growth_state',
               'mapping':{'SEEDLING/EMERGED':0, 'EARLY':1, 'MID':2,
                          'ADVANCED/TILLERING':3, 'BRANCHING/ELONGATION':4, 'FLOWERING':5 }}]

# instiating encoder
encoder = ce.OrdinalEncoder(cols = 'growth_state', mapping= dictionary)

filtered_df['growth_state'] = encoder.fit_transform(filtered_df['growth_state'])
filtered_df

    growth_state
0   4
1   1
2   1
3   1
4   1
5   2
6   2
7   3
8   5
9   5
10  5
11  0

但是当我执行 inverse_transform 时:

newCol = encoder.inverse_transform(filtered_df['growth_state'])
AttributeError                            Traceback (most recent call last)
<ipython-input-26-b6505b4be1e1> in <module>
----> 1 newCol = encoder.inverse_transform(filtered_df['growth_state'])

d:\users\tiwariam\appdata\local\programs\python\python36\lib\site-packages\category_encoders\ordinal.py in inverse_transform(self, X_in)
    266         for switch in self.mapping:
    267             column_mapping = switch.get('mapping')
--> 268             inverse = pd.Series(data=column_mapping.index, index=column_mapping.values)
    269             X[switch.get('col')] = X[switch.get('col')].map(inverse).astype(switch.get('data_type'))
    270 

AttributeError: 'dict' object has no attribute 'index'

注意:以上列是目标列,我可以应用标签编码器,因为这是与分类相关的问题。但是我采用了上述分类编码和有序编码的组合,因为变量在本质上是有序的。

【问题讨论】:

    标签: python pandas machine-learning scikit-learn data-preprocessing


    【解决方案1】:

    错误来自inverse_transform source code中的这一行:

    inverse = pd.Series(data=column_mapping.index, index=column_mapping.values)
    

    似乎尽管category_encoders documentationmapping 应该作为字典提供,但他们的inverse_transform 代码实际上是在寻找pd.Series

    import pandas as pd
    from category_encoders import OrdinalEncoder
    
    df = pd.DataFrame({
        'growth_state': ['BRANCHING/ELONGATION', 'EARLY', 'EARLY', 'EARLY', 'EARLY', 'MID', 'MID', 'ADVANCED/TILLERING', 'FLOWERING', 'FLOWERING', 'FLOWERING', 'SEEDLING/EMERGED']
    })
    
    mapping = [{
        'col': 'growth_state',
        'mapping': pd.Series(data={'SEEDLING/EMERGED': 0, 'EARLY': 1, 'MID': 2, 'ADVANCED/TILLERING': 3, 'BRANCHING/ELONGATION': 4, 'FLOWERING': 5}),
        'data_type': object
    }]
    
    enc = OrdinalEncoder(cols=['growth_state'], mapping=mapping)
    
    df_transformed = enc.fit_transform(df)
    df_transformed.head()
    #    growth_state
    # 0             4
    # 1             1
    # 2             1
    # 3             1
    # 4             1
    
    df_inverse = enc.inverse_transform(df_transformed)
    df_inverse.head()
    #            growth_state
    # 0  BRANCHING/ELONGATION
    # 1                 EARLY
    # 2                 EARLY
    # 3                 EARLY
    # 4                 EARLY
    

    【讨论】:

    • 我们怎样才能对单个值(行)进行逆转换,而不是一次转换列的所有值?例如假设我只想将 2 反向转换为“早期”。有没有办法做到这一点?谢谢!
    • 试试enc.inverse_transform(df_transformed.iloc[i: i + 1, :]),其中i是您要逆变换的行的索引(例如i = 2)。
    • 这也有效 => enc.inverse_transform(df_transformed.iloc[i: i + 1])
    猜你喜欢
    • 2023-03-19
    • 2017-08-05
    • 2016-07-25
    • 1970-01-01
    • 2019-07-17
    • 1970-01-01
    • 1970-01-01
    • 2011-09-02
    相关资源
    最近更新 更多