【发布时间】:2015-08-22 15:12:43
【问题描述】:
我的数据集看起来像这样
['', 'ABCDH', '', '', 'H', 'HHIH', '', '', '', '', '', '', '', '', ' ','','','FECABDAI','','','','','','','','','','','','','' ,'','','','','FABHJJFFFFEEFGEE','FFFF','','','','','','','','','','FF ','F','FF','F','F','FFFFFFIFF','','FFFFFFF','F','','','F','','',' ','','','','','F','','','ABB','','','','','','','',' ', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '' ,'','FF','','','','','','','','','','','','','','', '','F','FFEIE','FF','ABABCDIIJCCFG','','FABACFFF','FEGGIHJCABAGGFEFGGFEECA','','FF','FFGEFGGFFG','F','FFF',' ', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '一世' , '', '', 'ABIIII', '', '', '', '', 'I', '', '', '', '', '', '', '', '' , '', '', '', '', '', '', '', '', '', '', 'AAAAA', 'AFGFE', 'FGFEEFGFEFGFEFGJJGFEACHJ', '', '', ' ', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', ' ','JFEFFFFFFF','','AAIIJFFGEFGCABAGG','','','','','','','','','','F','JFJFJFJ','' ,'','','','','','','','','','','','','','','','',' ','','','F','','','','','','','','','F','','','',' ', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '' ,'','','','','','','','','','','','','','','','',' ', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '' ,'','','','','','','','','F','FGFEFGFE','','','','','','' ,'','','','','','','','','','','','','','','','',' ', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '' ,'','','','','','','','','','','','','','','','',' ', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
这只是一个示例,但我会有更多的字符串。我如何对它们进行聚类,使每个聚类都有一些模式
【问题讨论】:
-
顺便说一句:存储/传递所有这些空值真的是强制性的吗?
-
不,现在可以忽略空字符串。
-
生物信息学像数据吗?
-
字符有限制吗?会只选择 7-8 个字符还是可以选择 26 个英文字符或更大的词汇量?
-
暂时只有A-J,以后可能会增加一些
标签: machine-learning cluster-analysis k-means hierarchical-clustering