【发布时间】:2012-02-19 07:43:24
【问题描述】:
例如,我有数十亿个短语,我想将它们相似的聚类。
> strings.to.cluster <- c("Best Toyota dealer in bay area. Drive out with a new car today",
"Largest Selection of Furniture. Stock updated everyday" ,
" Unique selection of Handcrafted Jewelry",
"Free Shipping for orders above $60. Offer Expires soon",
"XXXX is where smart men buy anniversary gifts",
"2012 Camrys on Sale. 0% APR for select customers",
"Closing Sale on office desks. All Items must go"
)
假设这个向量有数十万行。 R中是否有一个包可以按含义对这些短语进行聚类? 或者有人可以建议一种按给定短语的含义对“相似”短语进行排名的方法。
【问题讨论】:
-
你建议如何定义“意义”?您的哪些示例短语应该聚集在一起?
标签: r statistics nlp