【发布时间】:2018-02-13 21:54:11
【问题描述】:
我从 NLTK 开始,并按照 NLTK 书籍的说明进行操作。在第 5 章(N-Gram 标记)中可以找到以下代码:
>>> from nltk.corpus import brown
>>> brown_tagged_sents = brown.tagged_sents(categories='news')
>>> brown_sents = brown.sents(categories='news')
>>> unigram_tagger = nltk.UnigramTagger(brown_tagged_sents)
>>> unigram_tagger.tag(brown_sents[2007])
[('Various', 'JJ'), ('of', 'IN'), ('the', 'AT'), ('apartments', 'NNS'),
('are', 'BER'), ('of', 'IN'), ('the', 'AT'), ('terrace', 'NN'), ('type', 'NN'),
(',', ','), ('being', 'BEG'), ('on', 'IN'), ('the', 'AT'), ('ground', 'NN'),
('floor', 'NN'), ('so', 'QL'), ('that', 'CS'), ('entrance', 'NN'), ('is', 'BEZ'),
('direct', 'JJ'), ('.', '.')]
>>> unigram_tagger.evaluate(brown_tagged_sents)
0.9349006503968017
我正在尝试做同样的事情,但我想利用整个布朗语料库来训练一元标注器。为此,我正在尝试:
brown_tagged_sents = brown.tagged_sents()
brown_sents = brown.sents()
unigram_tagger = nltk.UnigramTagger(brown_tagged_sents)
unigram_tagger.tag(brown_sents)
unigram_tagger.evaluate(brown_tagged_sents)
但由于某种原因,我收到了错误:
Traceback (most recent call last):
File "/Users/missogra/PycharmProjects/try/POS-Tagger-nltk.py", line 9, in <module>
unigram_tagger.tag(brown_sents)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/tag/sequential.py", line 63, in tag
tags.append(self.tag_one(tokens, i, tags))
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/tag/sequential.py", line 83, in tag_one
tag = tagger.choose_tag(tokens, index, history)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/tag/sequential.py", line 142, in choose_tag
return self._context_to_tag.get(context)
TypeError: unhashable type: 'list'
Process finished with exit code 1
我非常感谢任何关于为什么会发生这种情况的提示。
pyhon 3.5 版
提前谢谢你。
【问题讨论】:
-
您正在提供需要非可变数据的可变列表数据。元组是不可变的,列表是可变的。如果你仔细看你会发现
unigram_tagger.tag(brown_sents[2007])是一个元组列表,你的数据可能是一个列表列表。 Havent 使用了 NLTK,无法帮助编写代码。调试您的数据,查看列表的位置并在处理它们之前转换为元组:data = [tuple(x) for x in ListOfLists] -
非常感谢您的解释!我设法使它工作:-)
标签: python-3.x nltk