【发布时间】:2015-12-02 07:28:35
【问题描述】:
NLTK 文档在此集成方面相当差。我followed的步骤是:
下载http://nlp.stanford.edu/software/stanford-postagger-full-2015-04-20.zip到
/home/me/stanford下载http://nlp.stanford.edu/software/stanford-spanish-corenlp-2015-01-08-models.jar到
/home/me/stanford
然后在ipython 控制台中:
在 [11] 中:导入 nltk
In [12]: nltk.__version__
Out[12]: '3.1'
In [13]: from nltk.tag import StanfordNERTagger
然后
st = StanfordNERTagger('/home/me/stanford/stanford-postagger-full-2015-04-20.zip', '/home/me/stanford/stanford-spanish-corenlp-2015-01-08-models.jar')
但是当我尝试运行它时:
st.tag('Adolfo se la pasa corriendo'.split())
Error: no se ha encontrado o cargado la clase principal edu.stanford.nlp.ie.crf.CRFClassifier
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-14-0c1a96b480a6> in <module>()
----> 1 st.tag('Adolfo se la pasa corriendo'.split())
/home/nanounanue/.pyenv/versions/3.4.3/lib/python3.4/site-packages/nltk/tag/stanford.py in tag(self, tokens)
64 def tag(self, tokens):
65 # This function should return list of tuple rather than list of list
---> 66 return sum(self.tag_sents([tokens]), [])
67
68 def tag_sents(self, sentences):
/home/nanounanue/.pyenv/versions/3.4.3/lib/python3.4/site-packages/nltk/tag/stanford.py in tag_sents(self, sentences)
87 # Run the tagger and get the output
88 stanpos_output, _stderr = java(cmd, classpath=self._stanford_jar,
---> 89 stdout=PIPE, stderr=PIPE)
90 stanpos_output = stanpos_output.decode(encoding)
91
/home/nanounanue/.pyenv/versions/3.4.3/lib/python3.4/site-packages/nltk/__init__.py in java(cmd, classpath, stdin, stdout, stderr, blocking)
132 if p.returncode != 0:
133 print(_decode_stdoutdata(stderr))
--> 134 raise OSError('Java command failed : ' + str(cmd))
135
136 return (stdout, stderr)
OSError: Java command failed : ['/usr/bin/java', '-mx1000m', '-cp', '/home/nanounanue/Descargas/stanford-spanish-corenlp-2015-01-08-models.jar', 'edu.stanford.nlp.ie.crf.CRFClassifier', '-loadClassifier', '/home/nanounanue/Descargas/stanford-postagger-full-2015-04-20.zip', '-textFile', '/tmp/tmp6y169div', '-outputFormat', 'slashTags', '-tokenizerFactory', 'edu.stanford.nlp.process.WhitespaceTokenizer', '-tokenizerOptions', '"tokenizeNLs=false"', '-encoding', 'utf8']
StandfordPOSTagger 也是如此
注意:我需要这将是西班牙语版本。
注意:我在 python 3.4.3
【问题讨论】:
-
应该是“Download & extract the stanford NER package”,你忘记了“extract”部分;P
-
简短的问题,您在哪里找到您发布的说明的文档?是github.com/nltk/nltk/wiki/…吗?如果没有,您介意发布链接和/或在 NLTK 上创建问题,以便开发人员做出适当的更改吗?
-
我会根据您对问题提出的建议进行更改,谢谢
标签: python python-3.x nlp nltk stanford-nlp