斯坦福的数名实体识别答案

【问题标题】：Number name entity recognition in Stanford斯坦福的数名实体识别
【发布时间】：2017-05-06 15:03:24
【问题描述】：

我有一个问题，我试图使用 Stanford 从文本中识别数字名称实体，以防我有例如 2000 万它正在检索这样 "Number":["20-5","million -6"]，我怎样才能优化答案，使 2000 万聚集在一起？以及如何忽略上面示例中的 (5,6) 之类的索引号？我正在使用java语言。

    public void extractNumbers(String text) throws  IOException {
    number = new HashMap<String, ArrayList<String>>();
    n= new ArrayList<String>();
    edu.stanford.nlp.pipeline.Annotation document = new edu.stanford.nlp.pipeline.Annotation(text);
    pipeline.annotate(document);
    List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);
    for (CoreMap sentence : sentences) {
        for (CoreLabel token : sentence.get(CoreAnnotations.TokensAnnotation.class)) {

            if (!token.get(CoreAnnotations.NamedEntityTagAnnotation.class).equals("O")) {

                if (token.get(CoreAnnotations.NamedEntityTagAnnotation.class).equals("NUMBER")) {
                  n.add(token.toString());
        number.put("Number",n);
                }
            }

        }

    }

【问题讨论】：

您可能想要扩展一点。你用的是哪个ner模型？您使用什么语言？还有一个代码 sn-p 来告诉我们你做了什么也会有所帮助。
@entrophy 我编辑了问题:)
这里哪个类的对象是pipeline。就像您使用的是哪个斯坦福管道一样。
@entrophy ' StanfordCoreNLP 管道；注解注解；属性 props = new Properties(); props.setProperty("注释器", "tokenize, ssplit, pos, lemma, ner");管道 = 新的 StanfordCoreNLP(props); '

标签： nlp stanford-nlp opennlp

【解决方案1】：

要从CoreLabel 类的任何对象中获取确切的文本，只需使用token.originalText() 而不是token.toString()

如果您需要这些令牌中的任何其他内容，请查看CoreLabel 的javadoc。

【讨论】：

这对我的第二个问题很有效，非常感谢