【发布时间】:2022-01-10 02:12:12
【问题描述】:
我在我的数据库中进行预测变量和类之间的划分,所以我意识到我必须先进行 LabelEncoder 转换,然后是 OneHotEncoder,在第一个数据库中我是这样做的:
label_encoder_workclass = LabelEncoder()
label_encoder_education = LabelEncoder()
label_encoder_marital = LabelEncoder()
label_encoder_occupation = LabelEncoder()
label_encoder_relationship = LabelEncoder()
label_encoder_race = LabelEncoder()
label_encoder_sex = LabelEncoder()
label_encoder_country = LabelEncoder()
X_census[:,1] = label_encoder_workclass.fit_transform(X_census[:,1])
X_census[:,3] = label_encoder_education.fit_transform(X_census[:,3])
X_census[:,5] = label_encoder_marital.fit_transform(X_census[:,5])
X_census[:,6] = label_encoder_occupation.fit_transform(X_census[:,6])
X_census[:,7] = label_encoder_relationship.fit_transform(X_census[:,7])
X_census[:,8] = label_encoder_race.fit_transform(X_census[:,8])
X_census[:,9] = label_encoder_sex.fit_transform(X_census[:,9])
X_census[:,13] = label_encoder_country.fit_transform(X_census[:, 13])
onehotenconder_census = ColumnTransformer(transformers=[('OneHot', OneHotEncoder(), [1,3,5,6,7,8,9,13])], remainder='passthrough')
X_census = onehotenconder_census.fit_transform(X_census).toarray()
在第二个数据库中是这样的:
label_encoder_personHomeOwnership = LabelEncoder()
label_encoder_loanIntent = LabelEncoder()
label_encoder_loanGrade = LabelEncoder()
label_encoder_cbPersonDefaultOnFile = LabelEncoder()
X_credit[:,2] = label_encoder_personHomeOwnership.fit_transform(X_credit[:,2])
X_credit[:,4] = label_encoder_loanIntent.fit_transform(X_credit[:,4])
X_credit[:,5] = label_encoder_loanGrade.fit_transform(X_credit[:,5])
X_credit[:,9] = label_encoder_personHomeOwnership.fit_transform(X_credit[:,9])
oneHotEncoder_credit = ColumnTransformer(transformers=[('OneHot', OneHotEncoder(), [2,4,5,9])], remainder='passthrough')
X_credit = oneHotEncoder_credit.fit_transform(X_credit)
让我感兴趣的是为什么在第一个中我必须使用 toarray() 方法将其转换为 numpy.ndarray 类型的对象,而在第二个中我没有,它会自动转换。
请有人问我这个问题。我是不是做错了什么?
非常感谢您
【问题讨论】:
标签: python scikit-learn data-science