NP-chunker 值错误（Python nltk）答案

【问题标题】：NP-chunker value error (Python nltk)NP-chunker 值错误（Python nltk）
【发布时间】：2017-07-31 22:00:53
【问题描述】：

我正在基于 Python NLTK 书（第 7 章）构建 NLP 管道。第一段代码正确地预处理了数据，但我无法通过我的 NP-chunker 运行它的输出：

import nltk, re, pprint

#Import Data

data = 'This is a test sentence to check if preprocessing works' 

#Preprocessing

def preprocess(document):
    sentences = nltk.sent_tokenize(document)
    sentences = [nltk.word_tokenize(sent) for sent in sentences] 
    sentences = [nltk.pos_tag(sent) for sent in sentences]
    return(sentences)

tagged = preprocess(data)
print(tagged)

#regular expression-based NP chunker

grammar = "NP: {<DT>?<JJ>*<NN>}"
cp = nltk.RegexpParser(grammar) #chunk parser
chunked = []
for s in tagged:
    chunked.append(cp.parse(tagged))
print(chunked)

这是我得到的回溯：

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\u0084411\AppData\Local\Continuum\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 866, in runfile
    execfile(filename, namespace)
  File "C:\Users\u0084411\AppData\Local\Continuum\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)
  File "C:/Users/u0084411/Box Sync/Procesmanager DH/Text Mining/Tools/NLP_pipeline.py", line 24, in <module>
    chunked.append(cp.parse(tagged))
  File "C:\Users\u0084411\AppData\Local\Continuum\Anaconda3\lib\site-packages\nltk\chunk\regexp.py", line 1202, in parse
    chunk_struct = parser.parse(chunk_struct, trace=trace)
  File "C:\Users\u0084411\AppData\Local\Continuum\Anaconda3\lib\site-packages\nltk\chunk\regexp.py", line 1017, in parse
    chunkstr = ChunkString(chunk_struct)
  File "C:\Users\u0084411\AppData\Local\Continuum\Anaconda3\lib\site-packages\nltk\chunk\regexp.py", line 95, in __init__
    tags = [self._tag(tok) for tok in self._pieces]
  File "C:\Users\u0084411\AppData\Local\Continuum\Anaconda3\lib\site-packages\nltk\chunk\regexp.py", line 95, in <listcomp>
    tags = [self._tag(tok) for tok in self._pieces]
  File "C:\Users\u0084411\AppData\Local\Continuum\Anaconda3\lib\site-packages\nltk\chunk\regexp.py", line 105, in _tag
    raise ValueError('chunk structures must contain tagged '
ValueError: chunk structures must contain tagged tokens or trees
>>>

我的错误是什么？ 'Tagged' 已标记化，那么为什么程序无法识别呢？

非常感谢！汤姆

【问题讨论】：

见Why am I getting error? ValueError: chunk structures must contain tagged tokens or trees。
我已经实现了这个，但是我得到了相同的回溯
您的tags 必须是元组或树。见nltk.org/_modules/nltk/chunk/regexp.html。
抱歉这个基本问题，但有没有办法将标记转换为元组或树？
当我将标记转换为元组时（使用 (tuple(tagged)) 问题并没有消失

标签： python-3.x nlp nltk chunking

【解决方案1】：

看到这个你会拍自己的额头。而不是这个

for s in tagged:
    chunked.append(cp.parse(tagged))

应该是这样的：

for s in tagged:
    chunked.append(cp.parse(s))

您收到错误是因为您没有传递 cp.parse() 标记的句子，而是它们的列表。

【讨论】：

好的，非常感谢，现在我得到了一些结果，但我不确定如何解释它：[Tree('S', [('THis', 'NNP'), Tree('NP', [('sentence', 'NN')]), ('contains', 'VBZ'), ('one', 'CD'), Tree('NP', [('名词' , 'NN')]), Tree('NP', [('phrase', 'NN')])])]
我无法调用 chunked.draw 来获得视觉表示；回溯给我： AttributeError: 'list' object has no attribute 'draw'
您正在输入chunked.draw()？它是一个列表（you 定义的列表），树在其中。试试chunked[0].draw()。
@Tom 如果此答案解决了您的问题，请点击左侧的大复选标记“接受”它。