【发布时间】:2017-09-13 16:00:05
【问题描述】:
我一直试图让斯坦福 POS Tagger 工作一段时间。从old SO post 我发现了以下(稍作修改)代码:
stanford_dir = 'C:/Users/.../stanford-postagger-2017-06-09/'
from nltk.tag import StanfordPOSTagger
#from nltk.tag.stanford import StanfordPOSTagger # I tried it both ways
from nltk import word_tokenize
# Add the jar and model via their path (instead of setting environment variables):
jar = stanford_dir + 'stanford-postagger.jar'
model = stanford_dir + 'models/english-left3words-distsim.tagger'
pos_tagger = StanfordPOSTagger(model, jar, encoding='utf8')
text = pos_tagger.tag(word_tokenize("What's the airspeed of an unladen swallow ?"))
print(text)
但是,我收到以下错误:
LookupError:
===========================================================================
NLTK was unable to find the java file!
Use software specific configuration paramaters or set the JAVAHOME environment variable.
===========================================================================
我不知道它在说什么 java 文件。我确定它找到了正确的文件,因为如果我将路径更改为不正确的内容,我会收到不同的错误:
LookupError: Could not find stanford-postagger.jar jar file at C:/Users/.../stanford-postagger-2017-06-09/sstanford-postagger.jar
缺少什么 java 文件?如何让斯坦福 POS 标记器工作?
编辑:
我去了这个link for Stanford NLP on Windows 并尝试了:
(第二次编辑 - 添加安装程序):
import urllib.request
import zipfile
urllib.request.urlretrieve(r'http://nlp.stanford.edu/software/stanford-postagger-full-2015-04-20.zip', r'C:/Users/HMISYS/Downloads/stanford-postagger-full-2015-04-20.zip')
zfile = zipfile.ZipFile(r'C:/Users/HMISYS/Downloads/stanford-postagger-full-2015-04-20.zip')
zfile.extractall(r'C:/Users/HMISYS/Downloads/')
# End second edit
from nltk.tag.stanford import StanfordPOSTagger
# Trying on an older version
_model_filename = r'C:/Users/HMISYS/Downloads/stanford-postagger-full-2015-04-20/models/english-bidirectional-distsim.tagger'
_path_to_jar = r'C:/Users/HMISYS/Downloads/stanford-postagger-full-2015-04-20/stanford-postagger.jar'
st = StanfordPOSTagger(model_filename=_model_filename, path_to_jar=_path_to_jar)
text = st.tag(nltk.word_tokenize("What's the airspeed of an unladen swallow ?"))
print(text)
但我遇到了同样的错误。基于this post,我将路径变量设置为:
set STANFORDTOOLSDIR=$HOME
set CLASSPATH=$STANFORDTOOLSDIR/stanford-postagger-full-2015-04-20/stanford-postagger.jar
set export STANFORD_MODELS=$STANFORDTOOLSDIR/stanford-postagger-full-2015-04-20/models
但我收到此错误:
NLTK was unable to find stanford-postagger.jar! Set the CLASSPATH environment variable.
【问题讨论】:
-
您是否按照 github gist 上的说明进行安装?
-
是的,我编辑了我的问题以包含这些程序。
-
我不相信 Windows 可以识别
$HOME形式的变量。再努力一点,并检查CLASSPATH是否完全 具有预期的内容。 仔细查看ECHO %CLASSPATH%的结果。
标签: python nlp nltk stanford-nlp