【发布时间】:2017-11-27 12:06:49
【问题描述】:
我正在尝试解决一个难题,但我迷路了。
这是我应该做的:
INPUT: file
OUTPUT: dictionary
Return a dictionary whose keys are all the words in the file (broken by
whitespace). The value for each word is a dictionary containing each word
that can follow the key and a count for the number of times it follows it.
You should lowercase everything.
Use strip and string.punctuation to strip the punctuation from the words.
Example:
>>> #example.txt is a file containing: "The cat chased the dog."
>>> with open('../data/example.txt') as f:
... word_counts(f)
{'the': {'dog': 1, 'cat': 1}, 'chased': {'the': 1}, 'cat': {'chased': 1}}
这就是我到目前为止所拥有的一切,至少试图找出正确的单词:
def word_counts(f):
i = 0
orgwordlist = f.split()
for word in orgwordlist:
if i<len(orgwordlist)-1:
print orgwordlist[i]
print orgwordlist[i+1]
with open('../data/example.txt') as f:
word_counts(f)
我想我需要以某种方式使用 .count 方法并最终将一些字典压缩在一起,但我不确定如何计算每个第一个单词的第二个单词。
我知道我离解决问题还差得很远,但我试着一步一步来。任何帮助表示赞赏,即使只是指向正确方向的提示。
【问题讨论】:
-
f.split()。f是文件处理程序,而不是字符串。
标签: python dictionary nltk counter n-gram