【问题标题】:How to parse taggedword using stanford NLP如何使用 stanford NLP 解析标记词
【发布时间】:2013-11-09 04:07:40
【问题描述】:

我有一个以下列格式存储在 txt 文件中的标记句子列表:

We_PRP 've_VBP just_RB wrapped_VBN up_RP with_IN the_DT boys_NNS of_IN Block_NNP B_NNP

现在我要解析句子,我找到了以下代码:

String filename = "tt.txt";
    // This option shows loading and sentence-segmenting and tokenizing
    // a file using DocumentPreprocessor.
    TreebankLanguagePack tlp = new PennTreebankLanguagePack();
    GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
    // You could also create a tokenizer here (as below) and pass it
    // to DocumentPreprocessor
    for (List<HasWord> sentence : new DocumentPreprocessor(filename)) {
        Tree parse = lp.apply(sentence);
        parse.pennPrint();
        System.out.println();

        GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
        Collection tdl = gs.typedDependenciesCCprocessed();
        System.out.println(tdl);
        System.out.println();
    }

解析结果很长,我想知道问题出在这一行 new DocumentPreprocessor(filename) 它实际上重新标记了我的句子,有什么方法可以跳过标记步骤?

【问题讨论】:

    标签: nlp stanford-nlp


    【解决方案1】:

    你可以在Parser FAQ找到答案,我试过了,对我有用

    // set up grammar and options as appropriate
    LexicalizedParser lp = LexicalizedParser.loadModel(grammar, options);
    String[] sent3 = { "It", "can", "can", "it", "." };
    // Parser gets tag of second "can" wrong without help                    
    String[] tag3 = { "PRP", "MD", "VB", "PRP", "." };                             
    List sentence3 = new ArrayList();
    for (int i = 0; i < sent3.length; i++) {
      sentence3.add(new TaggedWord(sent3[i], tag3[i]));
    }
    Tree parse = lp.parse(sentence3);
    parse.pennPrint();
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2018-02-28
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多