【发布时间】:2013-11-14 10:31:07
【问题描述】:
我一直在尝试找出给定句子中名词的频率分布。如果我这样做:
text = "This ball is blue, small and extraordinary. Like no other ball."
text=text.lower()
token_text= nltk.word_tokenize(text)
tagged_sent = nltk.pos_tag(token_text)
nouns= []
for word,pos in tagged_sent:
if pos in ['NN',"NNP","NNS"]:
nouns.append(word)
freq_nouns=nltk.FreqDist(nouns)
print freq_nouns
它考虑“球”和“球”。作为单独的词。所以我继续tokenized the sentence before tokenizing the words:
text = "This ball is blue, small and extraordinary. Like no other ball."
text=text.lower()
sentences = nltk.sent_tokenize(text)
words = [nltk.word_tokenize(sent)for sent in sentences]
tagged_sent = [nltk.pos_tag(sent)for sent in words]
nouns= []
for word,pos in tagged_sent:
if pos in ['NN',"NNP","NNS"]:
nouns.append(word)
freq_nouns=nltk.FreqDist(nouns)
print freq_nouns
它给出了以下错误:
Traceback (most recent call last):
File "C:\beautifulsoup4-4.3.2\Trial.py", line 19, in <module>
for word,pos in tagged_sent:
ValueError: too many values to unpack
我做错了什么?请帮忙。
【问题讨论】:
标签: python-2.7 nltk frequency-distribution