【问题标题】:Trying to use LabelEncoder and OneHotEncoder into a Dataset with Multiple Columns尝试将 LabelEncoder 和 OneHotEncoder 用于具有多列的数据集
【发布时间】:2021-09-13 02:15:05
【问题描述】:

我正在尝试转换多个列,其中包含分类值中的一堆数据;但是当我使用 OneHotEncoder 时出现错误

My Dataframe

1) 分隔 X_census 和 Y_census 中的列(X_census 包含分类值):

X_census  = df[['workclass',
               'education',
               'marital-status',
               'occupation',
               'relationship',
               'race',
               'sex',
               'native-country']]

Y_census = df['income']

2) 使用 LabelEncoder 处理来自 X_census 的分类值

从 sklearn.preprocessing 导入标签编码器

le = LabelEncoder()
X_1 = X_census.apply(le.fit_transform)
X_2 = X_1.to_numpy()

3) 现在在我的 X_2 中使用 OneHotEncoder 将分类值转换为数值

from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer

oh = OneHotEncoder()
onehotencoder_census = ColumnTransformer(transformers=[('OneHot', oh, X_2[:])],remainder='passthrough')
X_census = onehotencoder_census.fit_transform(X_census) # Error appears here!

The Error

【问题讨论】:

  • 你能展示你的 X_2 吗?

标签: python machine-learning data-science one-hot-encoding


【解决方案1】:

你可以使用 pandas.get_dummies

df = pd.DataFrame({"marital_status":['S','M','D','S','M','D','S','M','D' ], "性别":["男","女","男","女","男","女","男","女","男"], "教育":['grad','post-grad','grad','post-grad','grad','post-grad','grad','post-grad','grad'], “收入”:[125,135,120,110,90,150,180,130,110]})

pd.get_dummies(df)

【讨论】:

    猜你喜欢
    • 2017-12-10
    • 2021-11-01
    • 2018-01-14
    • 2020-10-06
    • 2018-08-02
    • 1970-01-01
    • 2021-01-18
    • 2019-07-27
    • 1970-01-01
    相关资源
    最近更新 更多