【发布时间】:2018-11-07 20:25:01
【问题描述】:
如何比较python中两个文本文件的词频?例如,如果一个单词在 file1 和 file2 中都包含,那么它应该只写一次,但在比较时不要添加它们的频率,它应该是 {'The': 3,5}。这里 3 是 file1 中的频率,5 是 file2 中的频率。如果某些单词只存在于一个文件中,而不存在于两个文件中,则该文件应该为 0。请帮助 这是我到目前为止所做的:
import operator
f1=open('file1.txt','r') #file 1
f2=open('file2.txt','r') #file 2
wordlist=[]
wordlist2=[]
for line in f1:
for word in line.split():
wordlist.append(word)
for line in f2:
for word in line.split():
wordlist2.append(word)
worddictionary = {}
for word in wordlist:
if word in worddictionary:
worddictionary[word] += 1
else:
worddictionary[word] = 1
worddictionary2 = {}
for word in wordlist2:
if word in worddictionary2:
worddictionary2[word] += 1
else:
worddictionary2[word] = 1
print(worddictionary)
print(worddictionary2)
【问题讨论】:
-
你有什么问题?
-
2个以上文件如何泛化?
标签: python python-3.x dictionary frequency word-frequency