【问题标题】:Counting the occurrences of all letters in a txtfile [duplicate]计算文本文件中所有字母的出现次数[重复]
【发布时间】:2017-01-08 04:20:37
【问题描述】:

我正在尝试打开一个文件并计算字母的出现次数。

到目前为止,这是我所在的位置:

def frequencies(filename):
    infile=open(filename, 'r')
    wordcount={}
    content = infile.read()
    infile.close()
    counter = {}
    invalid = "‘'`,.?!:;-_\n—' '"

    for word in content:
        word = content.lower()
        for letter in word:
            if letter not in invalid:
                if letter not in counter:
                    counter[letter] = content.count(letter)
                    print('{:8} appears {} times.'.format(letter, counter[letter]))

任何帮助将不胜感激。

【问题讨论】:

    标签: python-3.x dictionary text-files counting


    【解决方案1】:

    最好的方法是使用 numpy 包,例子是这样的

    import numpy
    text = "xvasdavawdazczxfawaczxcaweac"
    text = list(text)
    a,b = numpy.unique(text, return_counts=True)
    x = sorted(zip(b,a), reverse=True)
    print(x)
    

    在您的情况下,您可以将所有单词组合成单个字符串,然后将字符串转换为字符列表 如果要删除除字符之外的所有内容,可以使用正则表达式来清理它

    #clean all except character
    content = re.sub(r'[^a-zA-Z]', r'', content)
    #convert to list of char
    content = list(content)
    a,b = numpy.unique(content, return_counts=True)
    x = sorted(zip(b,a), reverse=True)
    print(x)
    

    【讨论】:

      【解决方案2】:

      如果您正在寻找不使用numpy 的解决方案:

      invalid = set([ch for ch in  "‘'`,.?!:;-_\n—' '"])
      
      def frequencies(filename):
          counter = {}
          with open(filename, 'r') as f:
              for ch in (char.lower() for char in f.read()):
                  if ch not in invalid:
                      if ch not in counter:
                          counter[ch] = 0
                      counter[ch] += 1
      
              results = [(counter[ch], ch) for ch in counter]
              return sorted(results)
      
      for result in reversed(frequencies(filename)):
          print result
      

      【讨论】:

        【解决方案3】:

        我建议改用collections.Counter

        紧凑型解决方案

        from collections import Counter
        from string import ascii_lowercase # a-z string
        
        VALID = set(ascii_lowercase)
        
        with open('in.txt', 'r') as fin:
            counter = Counter(char.lower() for line in fin for char in line if char.lower() in VALID)
            print(counter.most_common()) # print values in order of most common to least.
        

        更具可读性的解决方案。

        from collections import Counter
        from string import ascii_lowercase # a-z string
        
        VALID = set(ascii_lowercase)
        
        with open('in.txt', 'r') as fin:
            counter = Counter()
            for char in (char.lower() for line in fin for char in line):
                if char in VALID:
                    counter[char] += 1
            print(counter)
        

        如果您不想使用Counter,那么您可以使用dict

        from string import ascii_lowercase # a-z string
        
        VALID = set(ascii_lowercase)
        
        with open('test.txt', 'r') as fin:
            counter = {}
            for char in (char.lower() for line in fin for char in line):
                if char in VALID:
                    # add the letter to dict
                    # dict.get used to either get the current count value
                    # or default to 0. Saves checking if it is in the dict already
                    counter[char] = counter.get(char, 0) + 1
            # sort the values by occurrence in descending order
            data = sorted(counter.items(), key = lambda t: t[1], reverse = True)
            print(data)
        

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 1970-01-01
          • 2021-08-04
          • 1970-01-01
          • 1970-01-01
          • 2013-11-04
          • 2014-04-26
          • 2016-05-22
          • 2015-12-01
          相关资源
          最近更新 更多