【发布时间】:2016-02-22 02:02:11
【问题描述】:
我目前正在用 Python 编写一个程序来计算德语文本中的英语。我想知道全文中出现了多少次英语。为此,我列出了所有德语中的英国语,如下所示:
abchecken
abchillen
abdancen
abdimmen
abfall-container
abflug-terminal
名单还在继续……
然后我检查了这个列表和要分析的文本之间的交集,但这只会给我一个列表,其中列出了两个文本中出现的所有单词,例如:Anglicisms : 4:{'abdancen', 'abchecken', 'terminal'}
我真的希望 porgram 输出这些单词出现的次数(最好按频率排序),例如:
Anglicisms: abdancen(5), abchecken(2), terminal(1)
这是我目前的代码:
#counters to zero
lines, blanklines, sentences, words = 0, 0, 0, 0
print ('-' * 50)
while True:
try:
#def text file
filename = input("Please enter filename: ")
textf = open(filename, 'r')
break
except IOError:
print( 'Cannot open file "%s" ' % filename )
#reads one line at a time
for line in textf:
print( line, ) # test
lines += 1
if line.startswith('\n'):
blanklines += 1
else:
#sentence ends with . or ! or ?
#count these characters
sentences += line.count('.') + line.count('!') + line.count('?')
#create a list of words
#use None to split at any whitespace regardless of length
tempwords = line.split(None)
print(tempwords)
#total words
words += len(tempwords)
#anglicisms
words1 = set(open(filename).read().split())
words2 = set(open("anglicisms.txt").read().split())
duplicates = words1.intersection(words2)
textf.close()
print( '-' * 50)
print( "Lines : ", lines)
print( "Blank lines : ", blanklines)
print( "Sentences : ", sentences)
print( "Words : ", words)
print( "Anglicisms : %d:%s"%(len(duplicates),duplicates))
我遇到的第二个问题是,它没有计算那些英国主义,换句话说。例如,如果“big”出现在英国语列表中,而“bigfoot”出现在文本中,则此事件将被忽略。我该如何解决?
来自瑞士的亲切问候!
【问题讨论】:
-
您是否正在寻找类似:sorted([{w:text.count(w)} for w in words]) 的内容?
标签: python python-3.x text text-files intersection