执行和测试 stanford core nlp 示例答案

【问题标题】：Executing and testing stanford core nlp example执行和测试 stanford core nlp 示例
【发布时间】：2013-12-03 18:55:43
【问题描述】：

我下载了 stanford core nlp 包并尝试在我的机器上进行测试。

使用命令：java -cp "*" -mx1g edu.stanford.nlp.sentiment.SentimentPipeline -file input.txt

我收到了positive 或negative 形式的情绪结果。 input.txt 包含要测试的句子。

关于更多命令：java -cp stanford-corenlp-3.3.0.jar;stanford-corenlp-3.3.0-models.jar;xom.jar;joda-time.jar -Xmx600m edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,parse -file input.txt 执行时给出以下行：

H:\Drive E\Stanford\stanfor-corenlp-full-2013~>java -cp stanford-corenlp-3.3.0.j
ar;stanford-corenlp-3.3.0-models.jar;xom.jar;joda-time.jar -Xmx600m edu.stanford
.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,parse -file
input.txt
Adding annotator tokenize
Adding annotator ssplit
Adding annotator pos
Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/english-left3wo
rds/english-left3words-distsim.tagger ... done [36.6 sec].
Adding annotator lemma
Adding annotator parse
Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCF
G.ser.gz ... done [13.7 sec].

Ready to process: 1 files, skipped 0, total 1
Processing file H:\Drive E\Stanford\stanfor-corenlp-full-2013~\input.txt ... wri
ting to H:\Drive E\Stanford\stanfor-corenlp-full-2013~\input.txt.xml {
  Annotating file H:\Drive E\Stanford\stanfor-corenlp-full-2013~\input.txt [13.6
81 seconds]
} [20.280 seconds]
Processed 1 documents
Skipped 0 documents, error annotating 0 documents
Annotation pipeline timing information:
PTBTokenizerAnnotator: 0.4 sec.
WordsToSentencesAnnotator: 0.0 sec.
POSTaggerAnnotator: 1.8 sec.
MorphaAnnotator: 2.2 sec.
ParserAnnotator: 9.1 sec.
TOTAL: 13.6 sec. for 10 tokens at 0.7 tokens/sec.
Pipeline setup: 58.2 sec.
Total time for StanfordCoreNLP pipeline: 79.6 sec.

H:\Drive E\Stanford\stanfor-corenlp-full-2013~>

可以理解。没有有用的结果。

我有一个例子：stanford core nlp java output

import java.io.*;
import java.util.*;

import edu.stanford.nlp.io.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.util.*;

public class StanfordCoreNlpDemo {

  public static void main(String[] args) throws IOException {
    PrintWriter out;
    if (args.length > 1) {
      out = new PrintWriter(args[1]);
    } else {
      out = new PrintWriter(System.out);
    }
    PrintWriter xmlOut = null;
    if (args.length > 2) {
      xmlOut = new PrintWriter(args[2]);
    }

    StanfordCoreNLP pipeline = new StanfordCoreNLP();
    Annotation annotation;
    if (args.length > 0) {
      annotation = new Annotation(IOUtils.slurpFileNoExceptions(args[0]));
    } else {
      annotation = new Annotation("Kosgi Santosh sent an email to Stanford University. He didn't get a reply.");
    }

    pipeline.annotate(annotation);
    pipeline.prettyPrint(annotation, out);
    if (xmlOut != null) {
      pipeline.xmlPrint(annotation, xmlOut);
    }
    // An Annotation is a Map and you can get and use the various analyses individually.
    // For instance, this gets the parse tree of the first sentence in the text.
    List<CoreMap> sentences = annotation.get(CoreAnnotations.SentencesAnnotation.class);
    if (sentences != null && sentences.size() > 0) {
      CoreMap sentence = sentences.get(0);
      Tree tree = sentence.get(TreeCoreAnnotations.TreeAnnotation.class);
      out.println();
      out.println("The first sentence parsed is:");
      tree.pennPrint(out);
    }
  }

}

尝试在包含必要库的 netbeans 中执行它。但它总是卡在两者之间或给出异常Exception in thread “main” java.lang.OutOfMemoryError: Java heap space

你我把要分配的内存设置在property/run/VM box

知道如何使用命令行在 java 示例之上运行吗？

我想获取示例的情绪分数

更新

输出：java -cp "*" -mx1g edu.stanford.nlp.sentiment.SentimentPipeline -file input.txt

输出：java -cp stanford-corenlp-3.3.0.j ar;stanford-corenlp-3.3.0-models.jar;xom.jar;joda-time.jar -Xmx600m edu.stanford .nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,parse -file input.txt

【问题讨论】：

在你的一个例子中产生了什么效果？（即“H:\Drive E\Stanford\stanfor-corenlp-full-2013~\input.txt.xml”）
@home: OutOfMemoryError 我已经在网上搜索并研究了解决方案。仍然存在相同的错误
@ElliottFrisch：请看我更新了问题
如果内存不足，请尝试在示例命令中使用-mx?g 配置。 ? 表示为运行代码分配的 GB RAM。增加，直到成功为止。

标签： java nlp stanford-nlp

【解决方案1】：

您可以在代码中执行以下操作：

String text = "I am feeling very sad and frustrated.";
Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, lemma, parse, sentiment");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
<...>
Annotation annotation = pipeline.process(text);
List<CoreMap> sentences = annotation.get(CoreAnnotations.SentencesAnnotation.class);
for (CoreMap sentence : sentences) {
  String sentiment = sentence.get(SentimentCoreAnnotations.SentimentClass.class);
  System.out.println(sentiment + "\t" + sentence);
}

它将打印句子的情绪和句子本身，例如“我感到非常悲伤和沮丧。”：

Negative    I am feeling very sad and frustrated.

【讨论】：

你在程序中把输入语句传到哪里了？
添加到示例 Annotation annotation = pipeline.process(text);

【解决方案2】：

您需要将“情感”注释器添加到注释器列表中：

-annotators tokenize,ssplit,pos,lemma,parse,sentiment

这将为您的 XML 中的每个句子节点添加一个“情感”属性。

【讨论】：

我找到了这个例子，不是所有的 6 个注释器都用于执行情感分析，而只有 4 个。不包括 POS 和引理，它如何影响结果？示例：blog.openshift.com/…

【解决方案3】：

根据示例here，您需要运行情绪分析。

java -cp "*" -mx5g edu.stanford.nlp.sentiment.SentimentPipeline -file input.txt

显然，这是一个消耗大量内存的操作，仅 1 GB 可能无法完成。然后就可以使用“评估工具”了

java -cp "*" edu.stanford.nlp.sentiment.Evaluate edu/stanford/nlp/models/sentiment/sentiment.ser.gz input.txt

【讨论】：

Elliott：你说得对，但我根据我的系统配置选择了-mx1g，并且在命令行中它也可以工作
你有测试斯坦福 coreNLP 情感部分的经验吗？
不，我有使用商业情感引擎的经验，并且您在该图像中获得了情感分数。
我很抱歉！我是 NLP 的新手。我没有在图像中找到情绪分数，可能是我缺少约定。如果您能启发它，我将不胜感激
是的。这似乎与我在“网络”上找到的随机代码所期望的一样有效。

【解决方案4】：

这对我来说很好 -

Maven 依赖：

        <dependency>
            <groupId>edu.stanford.nlp</groupId>
            <artifactId>stanford-corenlp</artifactId>
            <version>3.5.2</version>
            <classifier>models</classifier>
        </dependency>
        <dependency>
            <groupId>edu.stanford.nlp</groupId>
            <artifactId>stanford-corenlp</artifactId>
            <version>3.5.2</version>
        </dependency>
        <dependency>
            <groupId>edu.stanford.nlp</groupId>
            <artifactId>stanford-parser</artifactId>
            <version>3.5.2</version>
        </dependency>

Java 代码：

public static void main(String[] args) throws IOException {
        String text = "This World is an amazing place";
        Properties props = new Properties();
        props.setProperty("annotators", "tokenize, ssplit, pos, lemma, parse, sentiment");
        StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

        Annotation annotation = pipeline.process(text);
        List<CoreMap> sentences = annotation.get(CoreAnnotations.SentencesAnnotation.class);
        for (CoreMap sentence : sentences) {
            String sentiment = sentence.get(SentimentCoreAnnotations.SentimentClass.class);
            System.out.println(sentiment + "\t" + sentence);
        }
    }

结果：

非常积极这个世界是一个了不起的地方

【讨论】：