【发布时间】:2018-03-04 10:06:22
【问题描述】:
我正在尝试执行文档聚类。输入格式是一个 JSON 字符串,具有字符串和数字类型的各种键和值。根据存在的键的类型和值,我应该能够使用自己的类似类型对文档进行聚类。
例如:JSOn 文档:
{"title":0, "brand":"Levis", "length":"28,30,32,34,36", "type":"Jeans"},
{"title":0, "brand":"Levis", "length":"28,30,32,34,36", "type":"Shirt"},
{"title":0, "brand":"Levis", "length":"28,30,32,34,36", "type":"Jeans"},
{"title":0, "brand":"Levis", "length":"28,30,32,34,36", "type":"Jeans"},
{"title":0, "brand":"Levis", "length":"28,30,32,34,36", "type":"Top"},
{"title":0, "Bname":"Brand1", "weight":"100", "type":"Top"},
{"title":0, "Bname":"Lee", "height":"2864", "type":"refrigerator"},
{"title":0, "brand":"Levis", "length":"28,30,32,34,36", "type":"Top"},
{"title":0, "Time":"Casio", "Price":"2000", "type":"watch"},
{"title":0, "brand":"Levis", "length":"28,30,32,34,36", "type":"Top"},
{"title":0, "brand":"Levis", "length":"28,30,32,34,36", "type":"Shirt"}
基于匹配参数,我想对文档进行聚类。
我想知道执行此操作的方法和可能的 java 机器学习库。
到目前为止,我已经了解 Kmeans,DBSCAN 在聚类中,但我不确定如何将 JSON 字符串简化为向量以及如何对此结果执行聚类。
【问题讨论】:
标签: java json machine-learning cluster-analysis data-science