【问题标题】:Simple CoreNLP - how to get all the nouns to an array?简单的 CoreNLP - 如何将所有名词放入一个数组?
【发布时间】:2017-06-10 05:14:29
【问题描述】:

我正在使用斯坦福简单 NLP。我需要将所有名词值获取到 nounPhrases 数组。 me() 方法给我的输出如下:

The parse of the sentence 'I like java and python' is (ROOT (S (NP (PRP I)) (VP (VBP like) (NP (NN java) (CC and) (NN python)))))

这是我的方法

public String s = "I like java and python";

public static Set<String> nounPhrases = new HashSet<>();

public void me() {

    Document doc = new Document(" " + s);
    for (Sentence sent : doc.sentences()) {

        System.out.println("The parse of the sentence '" + sent + "' is " + sent.parse());

        if (sent.parse().equals("NN") || sent.parse().equals("NNS") || sent.parse().equals("NNP")
                || sent.parse().equals("NNPS")) {

            // I need to assign all nouns to the array nounPhrases

        }

    }
}

我不确定我的 if 条件是对还是错,因为我是斯坦福 NLP 的新手。请帮我把我的名词放到这个数组中。

我得到了 URL 下面的示例代码表单,并对其进行了一些自定义。

Simple CoreNLP

【问题讨论】:

    标签: java arrays parsing stanford-nlp


    【解决方案1】:

    如果有人需要此解决方案的完整和最新版本,这里是:

    import java.util.HashSet;
    import java.util.Properties;
    import java.util.Set;
    
    import edu.stanford.nlp.pipeline.CoreDocument;
    import edu.stanford.nlp.pipeline.CoreSentence;
    import edu.stanford.nlp.pipeline.StanfordCoreNLP;
    
    
    public class BasicPipelineExample4 {
    
      public static String text = "Joe Smith was born in California. "+
      "Study studying studied. " +
      "In 2017, he went to Paris, France in the summer. " +
      "His flight left at 3:00pm on July 10th, 2017. " +
      "After eating some escargot for the first time, Joe said, \"That was delicious!\" " +
      "He sent a postcard to his sister Jane Smith. " +
      "He is ok. " +
      "Simple, right? Remove removed removing was were is are element at given gave give index, insert it at desired index. Let's see if it works for the second test case."+
      "He is ok to go now. " +
      "After hearing about Joe's trip, Jane decided she might go to France one day.";
    
    public static void main(String[] args) {
        Properties props = new Properties();
        // set the list of annotators to run
        props.setProperty("annotators", "tokenize,ssplit,pos,parse");
        // set a property for an annotator, in this case the coref annotator is being
        // set to use the neural algorithm
        props.setProperty("coref.algorithm", "neural");
        // build pipeline
        StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
        // create a document object
        CoreDocument doc = new CoreDocument(text);
        // annnotate the document
        pipeline.annotate(doc);
    
        Set<String> nounPhrases = new HashSet<>();
    
        for (CoreSentence sent : doc.sentences()) {
    
            System.out.println("The parse of the sentence '" + sent + "' is " + sent.constituencyParse());
            // Iterate over every word in the sentence
            for (int i = 0; i < sent.tokens().size(); i++) {
                // Condition: if the word is a noun (posTag starts with "NN")
                if (sent.posTags() != null && sent.posTags().get(i) != null && sent.posTags().get(i).contains("NN")) {
                    // Put the word into the Set
                    nounPhrases.add(sent.tokens().get(i).originalText());
                }
            }
        }
    
        System.out.println("Nouns: " + nounPhrases);
    
    }
    
    }
    

    【讨论】:

      【解决方案2】:

      您的情况几乎是正确的。您想要每个具有包含“NN”的 POS 标签的单词,即每个名词。要检查每个单词的 POS 标签,您必须遍历句子中的每个单词。根据您的代码,它可能如下所示:

      public String s = "I like java and python";
      
      public static Set<String> nounPhrases = new HashSet<>();
      
      public void me() {
      
          Document doc = new Document(" " + s);
          for (Sentence sent : doc.sentences()) {
      
              System.out.println("The parse of the sentence '" + sent + "' is " + sent.parse());
              //Iterate over every word in the sentence
              for(int i = 0; i < sent.words().size(); i++) {
                  //Condition: if the word is a noun (posTag starts with "NN")
                  if (sent.posTag(i) != null && sent.posTag(i).contains("NN")) {
                      //Put the word into the Set
                      nounPhrases.add(sent.word(i));
                  }
              }
          }
      }
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2022-08-18
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2022-01-26
        • 2016-08-11
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多