【发布时间】:2017-12-13 06:54:07
【问题描述】:
我有一个颜色列表:
initialData = [u'black' u'black' u'white' u'powderblue'
u'whitesmoke' u'black' u'cornflowerblue' u'powderblue' u'powderblue'
u'goldenrod']
每种颜色代表选择该颜色的用户。我有每种颜色的标签,这意味着一种性别:
labels_train = [0 0 0 0 0 1 1 1 1 1]
0 表示颜色由女性选择,1 表示男性。我需要使用这些知识来预测性别,我是这样做的:
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit(initialData)
features_train = le.transform(initialData)
features_train = features_train.reshape(-1, 1)
from sklearn.naive_bayes import GaussianNB
clf = GaussianNB()
clf.fit(features_train, labels_train)
但现在我想向我的initialData 添加更多数据。如果我不仅要使用颜色预测性别,还要使用生物来预测性别呢?对于这种情况,我应该如何更改/规范化我的 features_train?例如,如果我有另一个这样的列表:
initialData2 = [u'Hello, my name is Bob and I love to cook' u'happy mother' ...]
还有每个元素的标签:
labels_train2 = [1 0]
UPD
我的数据示例。这是一个 CSV,我使用列 link_color 和 description:
_unit_id,_golden,_unit_state,_trusted_judgments,_last_judgment_at,gender,gender:confidence,profile_yn,profile_yn:confidence,created,description,fav_number,gender_gold,link_color,name,profile_yn_gold,profileimage,retweet_count,sidebar_color,text,tweet_coord,tweet_count,tweet_created,tweet_id,tweet_location,user_timezone
815719226,FALSE,finalized,3,10/26/15 23:24,male,1,yes,1,12/5/13 1:48,i sing my own rhythm.,0,,08C2C2,sheezy0,,pbs.twimg.com/profile_images/414342229096808449/fYvzqXN7_normal.png,0,FFFFFF,Robbie E Responds To Critics After Win Against Eddie Edwards In The #WorldTitleSeries t.co/NSybBmVjKZ,,110964,10/26/15 12:40,6.5873E+17,main; @Kan1shk3,Chennai
815719227,FALSE,finalized,3,10/26/15 23:30,male,1,yes,1,10/1/12 13:51,I'm the author of novels filled with family drama and romance.,68,,0084B4,DavdBurnett,,pbs.twimg.com/profile_images/539604221532700673/WW16tBbU_normal.jpeg,0,C0DEED,���It felt like they were my friends and I was living the story with them� t.co/arngE0YHNO #retired #IAN1 t.co/CIzCANPQFz,,7471,10/26/15 12:40,6.5873E+17,,Eastern Time (US & Canada)
815719228,FALSE,finalized,3,10/26/15 23:33,male,0.6625,yes,1,11/28/14 11:30,louis whining and squealing and all,7696,,ABB8C2,lwtprettylaugh,,pbs.twimg.com/profile_images/657330418249658368/SBLCXdF7_normal.png,1,C0DEED,i absolutely adore when louis starts the songs it hits me hard but it feels good,,5617,10/26/15 12:40,6.5873E+17,clcncl,Belgrade
815719229,FALSE,finalized,3,10/26/15 23:10,male,1,yes,1,6/11/09 22:39,"Mobile guy. 49ers, Shazam, Google, Kleiner Perkins, Yahoo!, Sprint PCS, AirTouch, Air Force. Stanford GSB, UVa. Dad, Husband, Brother. Golfer.",202,,0084B4,douggarland,,pbs.twimg.com/profile_images/259703936/IMG_8444_normal.JPG,0,C0DEED,Hi @JordanSpieth - Looking at the url - do you use @IFTTT?! Don't typically see an advanced user on the @PGATOUR! t.co/H68ou5PE9L,,1693,10/26/15 12:40,6.5873E+17,"Palo Alto, CA",Pacific Time (US & Canada)
815719230,FALSE,finalized,3,10/27/15 1:15,female,1,yes,1,4/16/14 13:23,Ricky Wilson The Best FRONTMAN/Kaiser Chiefs The Best BAND Xxxx Thank you Kaiser Chiefs for an incredible year of gigs and memories to cherish always :) Xxxxxxx,37318,,3B94D9,WilfordGemma,,pbs.twimg.com/profile_images/564094871032446976/AOfpk-mr_normal.jpeg,0,0,Watching Neighbours on Sky+ catching up with the Neighbs!! Xxx _���_���_���_���_ُ�_�ԍ_ُ� Xxx,,31462,10/26/15 12:40,6.5873E+17,,
【问题讨论】:
标签: python-2.7 machine-learning scikit-learn