【问题标题】:Scikit-learn Expanded custom one-hot encoded matrix - not constructed from datasetScikit-learn 扩展的自定义 one-hot 编码矩阵 - 不是从数据集构建的
【发布时间】:2020-10-07 07:28:35
【问题描述】:

我正在尝试构建一个 one-hot 编码矩阵,该矩阵表示我的样本中未找到的其他类别。

如果使用以下代码:

s = np.array(['man', 'man', 'woman', 'woman', 'son', 'son', 'son', 'son', 'son'])
label_encoder = LabelEncoder()
integer_encoded = label_encoder.fit_transform(s)
onehot_encoder = OneHotEncoder(sparse=False)
integer_encoded = integer_encoded.reshape(len(integer_encoded), 1)
Y = onehot_encoder.fit_transform(integer_encoded)
print(Y)

结果是这样的:

[[1. 0. 0.]
 [1. 0. 0.]
 [0. 0. 1.]
 [0. 0. 1.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]]

但实际上我有以下类别,其中一些在我的数据集中不存在,但我需要考虑它们:

categories = np.array(['man', 'woman', 'son', 'daughter', 'boy', 'girl', 'king', 'queen', 'baby', 'child'])

所以我需要的是这样的:

[[1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]]

因此我想弄清楚如何在这段代码中实现 OneHotEncoder(sparse=False, categories=categories):

categories = np.array(['man', 'woman', 'son', 'daughter', 'boy', 'girl', 'king', 'queen', 'baby', 'child'])
s = np.array(['man', 'man', 'woman', 'woman', 'son', 'son', 'son', 'son', 'son'])
label_encoder = LabelEncoder()
integer_encoded = label_encoder.fit_transform(s)
onehot_encoder = OneHotEncoder(sparse=False, categories=categories)
integer_encoded = integer_encoded.reshape(len(integer_encoded), 1)
Y = onehot_encoder.fit_transform(integer_encoded)
print(Y)

但它给出了以下错误:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

如果我改变:

integer_encoded = integer_encoded.reshape(len(integer_encoded), 1)

to

integer_encoded = integer_encoded.reshape(len(integer_encoded), 1).all()

我收到以下错误:

Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

谁能帮我解决这个问题?

【问题讨论】:

    标签: python scikit-learn dataset data-science one-hot-encoding


    【解决方案1】:

    这里的问题是OneHotEncodercategories 参数
    您的categories 变量是ndarray,它正在提高ValueError
    尝试使用常规排序 list
    而且,您不需要在您的情况下使用LabelEncoder

    from sklearn.preprocessing import LabelEncoder, OneHotEncoder
    import numpy as np
    
    categories = [sorted(['man', 'woman', 'son',
                          'daughter', 'boy', 'girl',
                          'king', 'queen', 'baby', 'child'])]
    print(f'sorted categories: {categories}')
    
    s = np.array(['man', 'man', 'woman', 'woman',
                  'son', 'son', 'son', 'son', 'son']).reshape(-1, 1)
    onehot_encoder = OneHotEncoder(sparse=False, categories=categories)
    Y = onehot_encoder.fit_transform(s)
    print(Y)
    
    sorted categories: [['baby', 'boy', 'child', 'daughter', 'girl', 'king', 'man', 'queen', 'son', 'woman']]
    [[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
     [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
     [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
     [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
     [0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
     [0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
     [0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
     [0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
     [0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]]
    

    【讨论】:

      猜你喜欢
      • 2020-05-25
      • 2018-09-25
      • 2018-02-23
      • 1970-01-01
      • 2018-01-26
      • 2016-03-02
      • 2021-10-07
      • 2021-11-12
      • 2018-03-30
      相关资源
      最近更新 更多