斯坦福情绪分析偏于负面？答案

【问题标题】：Stanford Sentiment Analysis is biased towards negative?斯坦福情绪分析偏于负面？
【发布时间】：2017-07-21 03:01:49
【问题描述】：

我正在对现有的情绪分析器应用程序进行一些研究。我目前正在查看 Stanford CoreNlp/Sentiment Analysis 3.8.0，我在测试数据中注意到的预测似乎偏向于负面。以下是一些返回 Negative 的示例：

纽约是我最终希望度过我的教学生涯的地方，这个机会太好了，无法拒绝。 - 否定
我明白成为一名有效和有影响力的老师是一项责任，但我渴望在课前、课中和课后时间安排好时间，以确保我是我学生的可用资源。 - 否定
从我个人的经验来看，我在课堂上学到了许多必要的生活技能，我最有影响力的老师是我的动力和支持者。 - 否定

我检查过，只有一个可能的模型可供使用（所以我认为没有任何杠杆可以推动那里 - 我不想训练模型）。我可以使用不同/更好（也许？）的 POS，这可能会给我一个不同的预测，但我有点迷惑，因为我读到的关于斯坦福图书馆的所有博客/cmets 都是积极的，而且我的结果很糟糕。我错过了什么吗？

代码：

    Properties props = new Properties();
    props.setProperty("annotators", "tokenize, ssplit, parse, sentiment");
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    Annotation document = pipeline.process(text);
    pipeline.annotate(document);

    List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);
    int mainSentiment=0; int longest = 0;
    SimpleMatrix matrix = null;
    for (CoreMap sentence : sentences) {
        String s_sentiment = sentence.get(SentimentCoreAnnotations.SentimentClass.class);

        Tree tree = sentence
                .get(SentimentCoreAnnotations.SentimentAnnotatedTree.class);
        int sentiment = RNNCoreAnnotations.getPredictedClass(tree);
        matrix = RNNCoreAnnotations.getPredictions(tree);

        System.out.println(sentence);
        System.out.println(sentiment + "-" +s_sentiment + "\t" + matrix.elementMaxAbs());
    }

分数的可能值： 0 非常负面 1 负 2 中性 3 正面 4 非常积极

如果您在生产应用程序中使用此库，您是否发现结果可靠以推动操作？

【问题讨论】：

标签： java nlp stanford-nlp sentiment-analysis

【解决方案1】：

首先，从3.3.1 版本开始，不仅有一个模型可以作为参数传递给the option sentiment.model，而是有两个模型（遗憾的是，网站上似乎没有提到这一点）：

四类模型（Verynegative、Negative、Neutral、Positive、Very积极) edu/stanford/nlp/models/sentiment/sentiment.ser.gz
二分类模型（Negative、Neutral、Positive）edu/stanford/nlp/models/sentiment/sentiment.binary.ser.gz

这不是标准模型集的一部分，而是the additional models-english model；为了使用它，您需要获取它，这可以更好地记录。适当的 Maven 工件依赖项是

<dependency>
        <groupId>edu.stanford.nlp</groupId>
        <artifactId>stanford-corenlp</artifactId>
        <version>${stanford-corenlp.version}</version>
        <classifier>models-english</classifier>
        <scope>runtime</scope>
</dependency>

如their 2013 paper 中所述，他们使用电影评论语料库来创建他们的模型，并且这些数据很可能不是分析您所使用的语言类型的最佳选择：例如，@987654325 @ 尽管它是一个相对常见的术语。

我自己也尝试使用他们预先训练的模型来分析会话语言，结果还不错，但也不令人惊讶：仅创建正面和负面模式列表并在其中查找它们的准确性我的文本与使用情绪分析器的文本没有显着差异。

【讨论】：

感谢您的回复！我使用的是 3.8 版，它无法识别第二个模型的路径。我还克隆了他们的存储库，我再也看不到第二个模型了。也许他们决定摆脱它。
那是因为它不是标准模型集的一部分；请查看我的编辑。
@Poc 这个答案对你有帮助吗？如果是这样，我会非常感谢你accepting and/or upvoting it。