【问题标题】:Stanford POSTagger with UIMA斯坦福 POSTagger 与 UIMA
【发布时间】:2015-07-27 06:51:02
【问题描述】:

我正在尝试在 UIMA 管道中制作 POSTagger(词性)。我已经下载了 stanford POSTagger jar 并将其附加到项目中并复制了英文模型,但它引发了一些异常。

我的代码:

package com.gauge.ie.uimaproject;

import java.io.IOException;

import org.apache.uima.UIMAException;
import org.apache.uima.analysis_component.JCasAnnotator_ImplBase;
import org.apache.uima.analysis_engine.AnalysisEngine;
import org.apache.uima.analysis_engine.AnalysisEngineProcessException;
import org.apache.uima.cas.CASException;
import org.apache.uima.fit.descriptor.ConfigurationParameter;
import org.apache.uima.fit.factory.AnalysisEngineFactory;
import org.apache.uima.fit.factory.JCasFactory;
import org.apache.uima.jcas.JCas;

import edu.stanford.nlp.tagger.maxent.MaxentTagger;

public class POSTagger extends JCasAnnotator_ImplBase
{
    public static String SOFA_NAME="";
    MaxentTagger tagger = new MaxentTagger("tagger/bidirectional-distsim-wsj-0-18.tagger");

    @Override
    public void process(JCas jcas)throws AnalysisEngineProcessException
    {

    try
    {
            String text="";
            JCas newJCas=jcas.createView(SOFA_NAME);

            System.out.println("getting doc text.......");

            String docText = jcas.getDocumentText();
            String tagged=tagger.tagString(docText);
            System.out.println(tagged);
            newJCas.setDocumentText(tagged);
    }
        catch(CASException cae)
        {
            System.out.println(cae);
        }
    }
}

例外:

Reading POS tagger model from tagger/bidirectional-distsim-wsj-0-18.tagger ... org.apache.uima.resource.ResourceInitializationException: Could not instantiate Annotator class "com.gauge.ie.uimaproject.POSTagger". Check that your annotator class is not abstract and has a zero-argument constructor.  (Descriptor: <unknown>)
    at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:250)
    at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initialize(PrimitiveAnalysisEngine_impl.java:170)
    at org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
    at org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
    at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:279)
    at org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:407)
    at org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.java:256)
    at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initASB(AggregateAnalysisEngine_impl.java:430)
    at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:374)
    at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initialize(AggregateAnalysisEngine_impl.java:187)
    at org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
    at org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
    at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:279)
    at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:331)
    at org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:448)
    at org.apache.uima.fit.pipeline.SimplePipeline.runPipeline(SimplePipeline.java:140)
    at com.gauge.ie.uimaproject.pipeline.main(pipeline.java:27)
Caused by: edu.stanford.nlp.io.RuntimeIOException: Unrecoverable error while loading a tagger model
    at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:869)
    at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:767)
    at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:298)
    at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:263)
    at com.gauge.ie.uimaproject.POSTagger.<init>(POSTagger.java:20)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
    at java.lang.Class.newInstance(Class.java:442)
    at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:240)
    ... 16 more
Caused by: java.io.InvalidClassException: edu.stanford.nlp.tagger.maxent.ExtractorDistsim; local class incompatible: stream classdesc serialVersionUID = 1, local class serialVersionUID = 2
    at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:621)
    at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623)
    at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
    at java.util.HashMap.readObject(HashMap.java:1396)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1896)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
    at edu.stanford.nlp.tagger.maxent.MaxentTagger.readExtractors(MaxentTagger.java:595)
    at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:820)
    ... 26 more
org.apache.uima.resource.ResourceInitializationException: Could not instantiate Annotator class "com.gauge.ie.uimaproject.POSTagger". Check that your annotator class is not abstract and has a zero-argument constructor.  (Descriptor: <unknown>)

【问题讨论】:

    标签: stanford-nlp uima part-of-speech


    【解决方案1】:

    在编写您自己的集成代码之前,我建议您先了解一下 DKPro 及其对斯坦福 PoS 标记器的集成。这可能会为您节省一些时间:

    https://code.google.com/p/dkpro-core-asl/wiki/ComponentList_1_6_2#POS_Tagging

    http://dkpro-core-gpl.googlecode.com/svn/de.tudarmstadt.ukp.dkpro.core-gpl/tags/de.tudarmstadt.ukp.dkpro.core-gpl-1.6.2/apidocs/index.html?de/tudarmstadt/ukp/dkpro/core/stanfordnlp/StanfordPosTagger.html

    如果你真的想自己写代码,你可以看看他们的源码:

    https://github.com/dkpro/dkpro-core/blob/master/de.tudarmstadt.ukp.dkpro.core.stanfordnlp-gpl/src/main/java/de/tudarmstadt/ukp/dkpro/core/stanfordnlp/StanfordPosTagger.java

    据我所知,他们使用不同的构造函数实例化标记器:

    String modelFile = aUrl.toString();
                MaxentTagger tagger = new MaxentTagger(modelFile,
                        StringUtils.argsToProperties(new String[] { "-model", modelFile }), false);
    

    【讨论】:

    • 我的项目暂时不能使用DKPro。但该链接看起来很有用。无论如何,谢谢。
    【解决方案2】:

    您正在尝试加载与您正在使用的词性标注器版本不兼容的模型词性标注器模型

    Caused by: java.io.InvalidClassException: edu.stanford.nlp.tagger.maxent.ExtractorDistsim; 
      local class incompatible: 
        stream classdesc serialVersionUID = 1, 
        local class serialVersionUID = 2
    

    因此,POS 标记器无法反序列化模型。确保您使用的是兼容型号。

    【讨论】:

    • 但是,当我在不使用 UIMA 管道的情况下使用它时,这个模型就可以工作。
    • 再次检查您的类路径。很可能您在类路径中有多个具有相同类的 JAR,并且偶然在 UIMA 下运行时,兼容的一次优先于不兼容的 JAR。无论如何,我强烈建议使用 CoreNLP 而不是独立的 POSTagger。 CoreNLP 包括大多数斯坦福工具,包括 POSTagger。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2019-03-17
    • 2016-10-20
    • 2014-04-26
    • 1970-01-01
    • 1970-01-01
    • 2020-04-28
    相关资源
    最近更新 更多