【问题标题】:Counting the occurrences of all letters in a txtfile [duplicate]计算文本文件中所有字母的出现次数[重复]
【发布时间】:2017-01-08 04:20:37
【问题描述】:
我正在尝试打开一个文件并计算字母的出现次数。
到目前为止,这是我所在的位置:
def frequencies(filename):
infile=open(filename, 'r')
wordcount={}
content = infile.read()
infile.close()
counter = {}
invalid = "‘'`,.?!:;-_\n—' '"
for word in content:
word = content.lower()
for letter in word:
if letter not in invalid:
if letter not in counter:
counter[letter] = content.count(letter)
print('{:8} appears {} times.'.format(letter, counter[letter]))
任何帮助将不胜感激。
【问题讨论】:
标签:
python-3.x
dictionary
text-files
counting
【解决方案1】:
最好的方法是使用 numpy 包,例子是这样的
import numpy
text = "xvasdavawdazczxfawaczxcaweac"
text = list(text)
a,b = numpy.unique(text, return_counts=True)
x = sorted(zip(b,a), reverse=True)
print(x)
在您的情况下,您可以将所有单词组合成单个字符串,然后将字符串转换为字符列表
如果要删除除字符之外的所有内容,可以使用正则表达式来清理它
#clean all except character
content = re.sub(r'[^a-zA-Z]', r'', content)
#convert to list of char
content = list(content)
a,b = numpy.unique(content, return_counts=True)
x = sorted(zip(b,a), reverse=True)
print(x)
【解决方案2】:
如果您正在寻找不使用numpy 的解决方案:
invalid = set([ch for ch in "‘'`,.?!:;-_\n—' '"])
def frequencies(filename):
counter = {}
with open(filename, 'r') as f:
for ch in (char.lower() for char in f.read()):
if ch not in invalid:
if ch not in counter:
counter[ch] = 0
counter[ch] += 1
results = [(counter[ch], ch) for ch in counter]
return sorted(results)
for result in reversed(frequencies(filename)):
print result
【解决方案3】:
我建议改用collections.Counter。
紧凑型解决方案
from collections import Counter
from string import ascii_lowercase # a-z string
VALID = set(ascii_lowercase)
with open('in.txt', 'r') as fin:
counter = Counter(char.lower() for line in fin for char in line if char.lower() in VALID)
print(counter.most_common()) # print values in order of most common to least.
更具可读性的解决方案。
from collections import Counter
from string import ascii_lowercase # a-z string
VALID = set(ascii_lowercase)
with open('in.txt', 'r') as fin:
counter = Counter()
for char in (char.lower() for line in fin for char in line):
if char in VALID:
counter[char] += 1
print(counter)
如果您不想使用Counter,那么您可以使用dict。
from string import ascii_lowercase # a-z string
VALID = set(ascii_lowercase)
with open('test.txt', 'r') as fin:
counter = {}
for char in (char.lower() for line in fin for char in line):
if char in VALID:
# add the letter to dict
# dict.get used to either get the current count value
# or default to 0. Saves checking if it is in the dict already
counter[char] = counter.get(char, 0) + 1
# sort the values by occurrence in descending order
data = sorted(counter.items(), key = lambda t: t[1], reverse = True)
print(data)