【发布时间】:2020-10-07 07:28:35
【问题描述】:
我正在尝试构建一个 one-hot 编码矩阵,该矩阵表示我的样本中未找到的其他类别。
如果使用以下代码:
s = np.array(['man', 'man', 'woman', 'woman', 'son', 'son', 'son', 'son', 'son'])
label_encoder = LabelEncoder()
integer_encoded = label_encoder.fit_transform(s)
onehot_encoder = OneHotEncoder(sparse=False)
integer_encoded = integer_encoded.reshape(len(integer_encoded), 1)
Y = onehot_encoder.fit_transform(integer_encoded)
print(Y)
结果是这样的:
[[1. 0. 0.]
[1. 0. 0.]
[0. 0. 1.]
[0. 0. 1.]
[0. 1. 0.]
[0. 1. 0.]
[0. 1. 0.]
[0. 1. 0.]
[0. 1. 0.]]
但实际上我有以下类别,其中一些在我的数据集中不存在,但我需要考虑它们:
categories = np.array(['man', 'woman', 'son', 'daughter', 'boy', 'girl', 'king', 'queen', 'baby', 'child'])
所以我需要的是这样的:
[[1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]]
因此我想弄清楚如何在这段代码中实现 OneHotEncoder(sparse=False, categories=categories):
categories = np.array(['man', 'woman', 'son', 'daughter', 'boy', 'girl', 'king', 'queen', 'baby', 'child'])
s = np.array(['man', 'man', 'woman', 'woman', 'son', 'son', 'son', 'son', 'son'])
label_encoder = LabelEncoder()
integer_encoded = label_encoder.fit_transform(s)
onehot_encoder = OneHotEncoder(sparse=False, categories=categories)
integer_encoded = integer_encoded.reshape(len(integer_encoded), 1)
Y = onehot_encoder.fit_transform(integer_encoded)
print(Y)
但它给出了以下错误:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
如果我改变:
integer_encoded = integer_encoded.reshape(len(integer_encoded), 1)
to
integer_encoded = integer_encoded.reshape(len(integer_encoded), 1).all()
我收到以下错误:
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
谁能帮我解决这个问题?
【问题讨论】:
标签: python scikit-learn dataset data-science one-hot-encoding