【问题标题】:Encode all categorical elements in array to binary ones将数组中的所有分类元素编码为二进制元素
【发布时间】:2018-05-28 09:18:13
【问题描述】:

我正在学习机器学习课程,我的任务是将分类值更改为二进制值,以便它们可以与我之前编写的算法兼容。

这是我获取数据的地方:

https://archive.ics.uci.edu/ml/datasets/Mushroom

from numpy import genfromtxt
mushroom = genfromtxt('dane/mushroom.csv', delimiter=',' ,dtype = str)

features = mushroom[:,range(1,23)]
classes = mushroom[:,0]


#7. Attribute Information: (classes: edible=e, poisonous=p)
#     1. cap-shape:                bell=b,conical=c,convex=x,flat=f,
#                                  knobbed=k,sunken=s
#     2. cap-surface:              fibrous=f,grooves=g,scaly=y,smooth=s
#     3. cap-color:                brown=n,buff=b,cinnamon=c,gray=g,green=r,
#                                  pink=p,purple=u,red=e,white=w,yellow=y
#     4. bruises?:                 bruises=t,no=f
#     5. odor:                     almond=a,anise=l,creosote=c,fishy=y,foul=f,
#                                  musty=m,none=n,pungent=p,spicy=s
#     6. gill-attachment:          attached=a,descending=d,free=f,notched=n
#     7. gill-spacing:             close=c,crowded=w,distant=d
#     8. gill-size:                broad=b,narrow=n
#     9. gill-color:               black=k,brown=n,buff=b,chocolate=h,gray=g,
#                                  green=r,orange=o,pink=p,purple=u,red=e,
#                                  white=w,yellow=y
#    10. stalk-shape:              enlarging=e,tapering=t
#    11. stalk-root:               bulbous=b,club=c,cup=u,equal=e,
#                                  rhizomorphs=z,rooted=r,missing=?
#    12. stalk-surface-above-ring: fibrous=f,scaly=y,silky=k,smooth=s
#    13. stalk-surface-below-ring: fibrous=f,scaly=y,silky=k,smooth=s
#    14. stalk-color-above-ring:   brown=n,buff=b,cinnamon=c,gray=g,orange=o,
#                                  pink=p,red=e,white=w,yellow=y
#    15. stalk-color-below-ring:   brown=n,buff=b,cinnamon=c,gray=g,orange=o,
#                                  pink=p,red=e,white=w,yellow=y
#    16. veil-type:                partial=p,universal=u
#    17. veil-color:               brown=n,orange=o,white=w,yellow=y
#    18. ring-number:              none=n,one=o,two=t
#    19. ring-type:                cobwebby=c,evanescent=e,flaring=f,large=l,
#                                  none=n,pendant=p,sheathing=s,zone=z
#    20. spore-print-color:        black=k,brown=n,buff=b,chocolate=h,green=r,
#                                  orange=o,purple=u,white=w,yellow=y
#    21. population:               abundant=a,clustered=c,numerous=n,
#                                  scattered=s,several=v,solitary=y
#    22. habitat:                  grasses=g,leaves=l,meadows=m,paths=p,
#                                  urban=u,waste=w,woods=d

我有一个这样的数组:

x   s   n 
x   s   y

我想改变这样的功能:

x

0, 0, 1

s

0, 1, 0

n

0、1、1

是的

1, 0, 0

结果:

 0  1  0,   0, 1, 0,   0, 1, 1
 0  1  0,   0, 1, 0,   1, 0, 0

对于课程来说,这相当容易,所以我不需要帮助。

提前谢谢你。

【问题讨论】:

  • 我真的无法理解你的问题...你想将数值转换为二进制?

标签: python arrays python-3.x machine-learning


【解决方案1】:
from functools import reduce
import numpy as np
mushroom = genfromtxt('dane/mushroom.csv', delimiter=',' ,dtype = str)

features = mushroom[:,range(1,23)]
classes = mushroom[:,0]

def toBinaryFeatures(features):
    COLUMNS = features.shape[1]
    v = [x + str(i % COLUMNS) for i, x in enumerate(features.flatten())]
    l = features.tolist() 
    uv = list(set(v)) # unique values of all features

    mv = {} # mapping to unique powers of 2
    for i,x in enumerate(uv):
        mv[x] = 2**i

    as_numbers = [reduce((lambda x, y: x | y), [mv[x + str(i)] for i, x in enumerate(row)]) for row in l]
    TO_BIN = "{0:0" + str(len(mv)) +"b}"
    flattened_features = [[int(char) for char in TO_BIN.format(number)] for number in as_numbers]
    return np.array(flattened_features)

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2023-01-12
    • 2017-01-26
    • 1970-01-01
    • 1970-01-01
    • 2022-01-25
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多