NLP：分类给出错误的结果。如何发现 NLP 分类的结果是错误的？答案

【问题标题】：NLP: Classification giving wrong result. How to find out that the result from NLP Classification is wrong?NLP：分类给出错误的结果。如何发现 NLP 分类的结果是错误的？
【发布时间】：2017-05-12 09:57:02
【问题描述】：

我已经开始学习自然语言处理并且已经开始磕磕绊绊了。

我正在使用NodeJs 在NaturalNode library 的帮助下创建我的应用程序 Natural Node GitHub project

问题

我正在用几个场景训练我的文档，如下所示

/// importing package
var natural = require('natural');
var classifier = new natural.BayesClassifier();



/// traning document
classifier.addDocument("h", "greetings");
classifier.addDocument("hi", "greetings");
classifier.addDocument("hello", "greetings");
classifier.addDocument("data not working", "internet_problem");
classifier.addDocument("browser not working", "internet_problem");
classifier.addDocument("google not working", "internet_problem");
classifier.addDocument("facebook not working", "internet_problem");
classifier.addDocument("internet not working", "internet_problem");
classifier.addDocument("websites not opening", "internet_problem");
classifier.addDocument("apps not working", "internet_problem");
classifier.addDocument("call drops", "voice_problem");
classifier.addDocument("voice not clear", "voice_problem");
classifier.addDocument("call not connecting", "voice_problem");
classifier.addDocument("calls not going through", "voice_problem");
classifier.addDocument("disturbance", "voice_problem");
classifier.addDocument("bye", "close");
classifier.addDocument("thank you", "feedback_positive");
classifier.addDocument("thanks", "voice_problem");
classifier.addDocument("shit", "feedback_negeive");
classifier.addDocument("shit", "feedback_negeive");
classifier.addDocument("useless", "feedback_negetive");
classifier.addDocument("siebel testing", "siebel_testing")


classifier.train();


/// running classification
console.log('result for hi');
console.log(classifier.classify('hi'));
console.log('result for hii');
console.log(classifier.classify('hii'));
console.log('result for h');
console.log(classifier.classify('h'));

输出

result for hi:
greetings


result for hii:
internet_problem

result for h:
internet_problem

正如您在 hi 的关键工作的结果中看到的那样，该值是正确的，但如果我将 hi 拼错为 hii 或 ih，那么它会给出错误的结果。我无法理解分类是如何工作的，我应该如何训练分类器，或者有没有办法找出分类结果是错误的，以便我可以要求用户再次输入。

任何帮助或解释或任何东西都非常感谢。非常感谢。

请把我当成菜鸟，如有错误请见谅。

【问题讨论】：

标签： node.js nlp classification

【解决方案1】：

hii 和 ih 之前您的分类器从未见过，因此除非natural.BayesClassifier 对输入进行一些预处理，否则它不知道该怎么做使用它们，因此使用从各个类标签的频率派生的prior probability 对它们进行分类：internet_problem 是 22 个训练示例中最常见的标签。

编辑 29/12/2016： 如 cmets 中所述，可以通过提示用户重新输入分类置信度测量值低于给定的最小阈值：

const MIN_CONFIDENCE = 0.2; // Tune this

var classLabel = null;
do {
    var userInput = getUserInput(); // Get user input somehow
    var classifications = classifier.getClassifications(userInput);
    var bestClassification = classifications[0];
    if (bestClassification["value"] < MIN_CONFIDENCE) {
        // Re-prompt user in the next iteration
    } else {
        classLabel = bestClassification["label"];
    }   
} while (classLabel == null);
// Do something with the label

【讨论】：

有什么方法可以查出分类是否给出了错误的结果，以便我可以要求用户重新输入语句。非常感谢您的洞察力
根据自然节点的文档，您可以使用console.log(classifier.getClassifications('i am long copper')); 访问分类器置信度。如果您的预测仅依赖于您班级的先验概率，那么它的置信水平应该相对较低。