【发布时间】:2015-02-23 16:46:18
【问题描述】:
在this 问题之后,我正在尝试使用 stanford corenlp 进行词形还原。我的环境是:-
- Java 1.7
- Eclipse 3.4.0
- StandfordCoreNLP 版本 3.4.1 (downloaded from here)。
我的代码 sn-p 是:-
//...........lemmatization starts........................
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props, false);
String text = "painting";
Annotation document = pipeline.process(text);
List<edu.stanford.nlp.util.CoreMap> sentences = document.get(SentencesAnnotation.class);
for(edu.stanford.nlp.util.CoreMap sentence: sentences)
{
for(CoreLabel token: sentence.get(TokensAnnotation.class))
{
String word = token.get(TextAnnotation.class);
String lemma = token.get(LemmaAnnotation.class);
System.out.println("lemmatized version :" + lemma);
}
}
//...........lemmatization ends.........................
我得到的输出是:-
lemmatized version :painting
我期待的地方
lemmatized version :paint
请赐教。
【问题讨论】:
标签: java-7 stanford-nlp eclipse-3.4 lemmatization