【问题标题】:One Hot Encoding list of strings一个字符串的热编码列表
【发布时间】:2020-08-12 01:50:57
【问题描述】:

我有一个字符串列表,用作我的分类问题的标签(使用卷积神经网络进行图像识别)。这些标签由 5-8 个字符组成(数字从 0 到 9,字母从 A 到 Z)。为了训练我的神经网络,我想对标签进行一次热编码。我编写了一个代码来对一个标签进行编码,但在尝试将代码应用于列表时仍然遇到困难。

这是我的一个标签的代码,效果很好:

from numpy import argmax
# define input string
data = '7C24698'
print(data)
# define universe of possible input values
characters = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ '
# define a mapping of chars to integers
char_to_int = dict((c, i) for i, c in enumerate(characters))
int_to_char = dict((i, c) for i, c in enumerate(characters))
# integer encode input data
integer_encoded = [char_to_int[char] for char in data]
print(integer_encoded)
# one hot encode
onehot_encoded = list()
for value in integer_encoded:
    character = [0 for _ in range(len(characters))]
    character[value] = 1
    onehot_encoded.append(character)
print(onehot_encoded)
# invert encoding
inverted = int_to_char[argmax(onehot_encoded[0])]
print(inverted)

我现在想为标签列表获取相同的输出并将输出存储在一个新列表中:

list_of_labels = ['7C24698', 'NDK745']
encoded_labels = []

我该怎么做?

【问题讨论】:

    标签: python list conv-neural-network one-hot-encoding


    【解决方案1】:

    你可以使用LabelBinarizer from scikit-learn:

    from sklearn.preprocessing import LabelBinarizer
    
    >>> labels = ["first", "second", "third"]
    >>> lb = LabelBinarizer()
    >>> lb.fit(labels)
    >>> lb.transform(labels)
    array([[1, 0, 0],
           [0, 1, 0],
           [0, 0, 1]])
    
    

    并将 one-hot 编码标签转换回string 值:

    >>> encoded_labels = [
      [1, 0, 0],
      [0, 1, 0],
      [0, 0, 1]
    ]
    >>> lb.inverse_transform(encoded_labels)
    array(['first', 'second', 'third'])
    

    【讨论】:

      【解决方案2】:

      您可以使用您的工作代码创建一个函数,然后使用内置函数 map 从您的 lists_of_labels 您的 one-hot 编码函数中申请每个元素:

      from numpy import argmax
      # define input string
      
      def my_onehot_encoded(data):
          # define universe of possible input values
          characters = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ '
          # define a mapping of chars to integers
          char_to_int = dict((c, i) for i, c in enumerate(characters))
          int_to_char = dict((i, c) for i, c in enumerate(characters))
          # integer encode input data
          integer_encoded = [char_to_int[char] for char in data]
          # one hot encode
          onehot_encoded = list()
          for value in integer_encoded:
              character = [0 for _ in range(len(characters))]
              character[value] = 1
              onehot_encoded.append(character)
      
          return onehot_encoded
      
      
      list_of_labels = ['7C24698', 'NDK745']
      encoded_labels = list(map(my_onehot_encoded, list_of_labels))
      

      【讨论】:

        猜你喜欢
        • 2018-12-11
        • 2020-03-14
        • 1970-01-01
        • 2020-12-27
        • 2016-05-08
        • 2019-10-07
        • 2023-03-18
        • 1970-01-01
        相关资源
        最近更新 更多