错误：注释器“sentiment”需要注释器“binarized_trees”答案

【问题标题】：Error: Annotator "sentiment" requires annotator "binarized_trees"错误：注释器“sentiment”需要注释器“binarized_trees”
【发布时间】：2015-08-08 16:27:39
【问题描述】：

当这个错误发生时，任何人都可以帮助我。任何想法都非常感谢。我需要添加任何东西吗，任何注释器。这是我传递的数据或模型与默认模型分开的问题吗？

我正在使用 Standford NLP 3.4.1 对社交媒体数据进行情绪计算。当我通过 spark/scala 作业运行它时，某些数据会出现以下错误。

java.lang.IllegalArgumentException: annotator "sentiment" requires annotator "binarized_trees"
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:300)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:129)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:125)
    at com.pipeline.sentiment.NonTwitterSentimentAndThemeProcessorAction$.create(NonTwitterTextEnrichmentComponent.scala:142)
    at com.pipeline.sentiment.NonTwitterTextEnrichmentInitialized.action$lzycompute(NonTwitterTextEnrichmentComponent.scala:52)
    at com.pipeline.sentiment.NonTwitterTextEnrichmentInitialized.action(NonTwitterTextEnrichmentComponent.scala:50)
    at com.pipeline.sentiment.NonTwitterTextEnrichmentInitialized.action(NonTwitterTextEnrichmentComponent.scala:49)

这是我在 scala 中的代码

 def create(features: Seq[String] = Seq("tokenize", "ssplit", "pos","parse","sentiment")): TwitterSentimentAndThemeAction = {
      println("comes inside the TwitterSentimentAndThemeProcessorAction create method")
      val props = new Properties()
      props.put("annotators", features.mkString(", "))
      props.put(""pos.model", "tagger/gate-EN-twitter.model");
      props.put("parse.model", "tagger/englishSR.ser.gz");
      val pipeline = new StanfordCoreNLP(props)

非常感谢任何帮助。感谢您的帮助

【问题讨论】：

您是否在 1 台机器和 1 个线程上运行此代码？
不，我在 hadoop/spark 上运行它，有 200 个分区
呵呵；我认为情绪只需要解析注释器。如果显式添加 BinarizerAnnotator 会发生什么？即，将binarizer 添加到注释器中，并将以下内容添加到属性中：props.setProperty("customAnnotatorClass.binarizer", "edu.stanford.nlp.pipeline.BinarizerAnnotator")
感谢 gabor。我已经添加了这样的 props.put("pos.model", "tagger/gate-EN-twitter.model") props.put("parse.model", "tagger /englishSR.ser.gz"); props.setProperty("customAnnotatorClass.binarizer", "edu.stanford.nlp.pipeline.BinarizerAnnotator") 但没有运气，它给出了同样的错误。我使用默认的 PCFG 解析器，然后切换到减少解析器与此问题 stackoverflow.com/questions/30413885/…。感谢您的帮助

标签： nlp stanford-nlp sentiment-analysis pos-tagger

【解决方案1】：

...你确定这是你得到的错误吗？使用您的代码，我得到一个错误

Loading parser from serialized file tagger/englishSR.ser.gz ...edu.stanford.nlp.io.RuntimeIOException: java.io.IOException: Unable to resolve "tagger/englishSR.ser.gz" as either class path, filename or URL

这更有意义。 shift reduce 解析器模型位于edu/stanford/nlp/models/srparser/englishSR.ser.gz。如果我不使用 shift reduce 模型，那么编写的代码对我来说很好；同样，如果我在上面包含模型路径，它可以正常工作。

我尝试的确切代码是：

#!/bin/bash
exec scala -J-mx4g "$0" "$@"
!#

import scala.collection.JavaConversions._
import edu.stanford.nlp.pipeline._
import java.util._

val props = new Properties()
props.put("annotators", Seq("tokenize", "ssplit", "pos","parse","sentiment").mkString(", "))
props.put("parse.model", "edu/stanford/nlp/models/srparser/englishSR.ser.gz");
val pipeline = new StanfordCoreNLP(props)

【讨论】：

Gabor，感谢您查看它。我已经下载了兼容的 3.4.1 模型englishSR.ser.gz 并将其放在标记器目录中。对于正常情况我没有看到错误。当我使用超过 200 个分区的 spark/hadoop 运行时出现此错误
我怀疑这是您的 Spark 集群的配置错误。我没有资格在那里提供帮助......