【发布时间】:2016-07-03 02:42:52
【问题描述】:
我有一个包含职位的数据集,我想对它们进行聚类。
职位包括:
Automotive Service Worker
Community Police Services Aide
DEPUTY SHERIFF
COUNSELOR, JUVENILE HALL
Swimming Instructor
FIREFIGHTER
Porter
Account Clerk
Deputy Sheriff
Assistant Retirement Analyst
POLICE OFFICER III
Patient Care Assistant
Public Service Trainee
PUBLIC RELATIONS OFFICER
SPECIAL NURSE
我将清理标题(删除不需要的字符,将所有标题大写等),以使操作更容易一些。一旦我对语料库进行矢量化,维度将非常非常大。对于这样的问题,您会推荐哪些聚类算法? KMeans 对高维问题表现良好吗?
【问题讨论】:
标签: machine-learning nlp scikit-learn