根据相似性将文本分类为不同的类别答案

【问题标题】：Classifying text into different classes depending on similarity根据相似性将文本分类为不同的类别
【发布时间】：2015-11-23 08:09:28
【问题描述】：

我正在处理非常大的文档 {NEWS + Articles}，使用将自然句子建模为类，请看以下示例：

1- The System enables a user to shut down the server remotely ==> class 1

2- The Application allows a customer to to close the machine online ==> (must be also) class 1 , why ?

因为这两个句子有很多相似的同义词 {System ~Application,enables ~allows ,user ~ customer ,shut down ~ close,server ~ machine,remotely~online} 所以我正在根据单词的相似性规则或同义词+词干+可能是（词形还原）我们可以获得的最多结果的最多规则数对一些数据进行分类器训练。

所以问题是配置/调整分类器以适应该想法的最佳策略是什么？提前谢谢你

【问题讨论】：

查看：radimrehurek.com/gensim/models/doc2vec.html

标签： nlp nltk computer-science weka svm

【解决方案1】：

你看过这个吗？？

Is there an algorithm that tells the semantic similarity of two phrases

最重要的是确定相似度均值。如果你这样做了，选择分类器是任务的简单部分（ID3、C4.5、词袋、朴素贝叶斯等）。

【讨论】：