【发布时间】:2023-03-19 11:11:01
【问题描述】:
我有使用 NLTK 的平均感知器标记器进行 POS 标记的代码:
from nltk.corpus import wordnet
from nltk.stem import WordNetLemmatizer
from nltk import pos_tag
from nltk.tokenize import word_tokenize
string = 'dogs runs fast'
tokens = word_tokenize(string)
tokensPOS = pos_tag(tokens)
print(tokensPOS)
结果:
[('dogs', 'NNS'), ('runs', 'VBZ'), ('fast', 'RB')]
我尝试了循环遍历每个标记的标记并使用 WordNet lemmatizer 对其进行词形还原的代码:
lemmatizedWords = []
for w in tokensPOS:
lemmatizedWords.append(WordNetLemmatizer().lemmatize(w))
print(lemmatizedWords)
产生的错误:
Traceback (most recent call last):
File "<ipython-input-30-462d7c3bdbb7>", line 15, in <module>
lemmatizedWords = WordNetLemmatizer().lemmatize(w)
File "C:\Users\taca\AppData\Local\Continuum\Anaconda3\lib\site-packages\nltk\stem\wordnet.py", line 40, in lemmatize
lemmas = wordnet._morphy(word, pos)
File "C:\Users\taca\AppData\Local\Continuum\Anaconda3\lib\site-packages\nltk\corpus\reader\wordnet.py", line 1712, in _morphy
forms = apply_rules([form])
File "C:\Users\taca\AppData\Local\Continuum\Anaconda3\lib\site-packages\nltk\corpus\reader\wordnet.py", line 1692, in apply_rules
for form in forms
File "C:\Users\taca\AppData\Local\Continuum\Anaconda3\lib\site-packages\nltk\corpus\reader\wordnet.py", line 1694, in <listcomp>
if form.endswith(old)]
AttributeError: 'tuple' object has no attribute 'endswith'
我觉得这里有两个问题:
- POS 标签未转换为 WordNet 可以理解的标签(我尝试实现类似于此答案 wordnet lemmatization and pos tagging in python 的内容,但没有成功)
- 数据结构的格式不正确,无法循环遍历每个元组(除了
os相关代码之外,我找不到更多关于此错误的信息)
如何通过词形还原来跟进 POS 标记以避免这些错误?
【问题讨论】:
标签: python python-3.x nlp nltk pos-tagger