【问题标题】:Finding the most frequent character in a string查找字符串中出现频率最高的字符
【发布时间】:2011-05-07 01:52:15
【问题描述】:

我在查看 SO 上的职位发布时发现了这个编程问题。我认为这很有趣,作为一名 Python 初学者,我试图解决它。但是我觉得我的解决方案相当......混乱......任何人都可以提出任何建议来优化它或让它更干净吗?我知道这很琐碎,但我写得很开心。注意:Python 2.6

问题:

为一个接收字符串并返回该字符串中出现次数最多的字母的函数编写伪代码(或实际代码)。

我的尝试:

import string

def find_max_letter_count(word):

    alphabet = string.ascii_lowercase
    dictionary = {}

    for letters in alphabet:
        dictionary[letters] = 0

    for letters in word:
        dictionary[letters] += 1

    dictionary = sorted(dictionary.items(), 
                        reverse=True, 
                        key=lambda x: x[1])

    for position in range(0, 26):
        print dictionary[position]
        if position != len(dictionary) - 1:
            if dictionary[position + 1][1] < dictionary[position][1]:
                break

find_max_letter_count("helloworld")

输出:

>>> 
('l', 3)

更新示例:

find_max_letter_count("balloon") 
>>>
('l', 2)
('o', 2)

【问题讨论】:

标签: python algorithm optimization time-complexity


【解决方案1】:

有很多方法可以缩短此时间。例如,您可以使用Counter 类(在 Python 2.7 或更高版本中):

import collections
s = "helloworld"
print(collections.Counter(s).most_common(1)[0])

如果没有,可以手动进行计数(2.5 或更高版本有defaultdict):

d = collections.defaultdict(int)
for c in s:
    d[c] += 1
print(sorted(d.items(), key=lambda x: x[1], reverse=True)[0])

话虽如此,您的实施并没有什么大错。

【讨论】:

  • 谢谢你的回答(你也是 Chris Morgan),但我想我忘了提到如果多个字符是最常见的,它们都应该被输出。 (例如,'abcdefg' 输出 a = 1、b = 1 等)我认为这是最棘手的部分,因此最后一团糟。我已经编辑了问题。
【解决方案2】:

如果您使用的是 Python 2.7,则可以使用集合模块快速完成此操作。 collections 是一个高性能的数据结构模块。阅读更多 http://docs.python.org/library/collections.html#counter-objects

>>> from collections import Counter
>>> x = Counter("balloon")
>>> x
Counter({'o': 2, 'a': 1, 'b': 1, 'l': 2, 'n': 1})
>>> x['o']
2

【讨论】:

    【解决方案3】:

    这是使用字典查找最常见字符的方法

    message = "hello world"
    d = {}
    letters = set(message)
    for l in letters:
        d[message.count(l)] = l
    
    print d[d.keys()[-1]], d.keys()[-1]
    

    【讨论】:

      【解决方案4】:

      我这样做的方式不使用 Python 本身的内置函数,只使用 for 循环和 if 语句。

      def most_common_letter():
          string = str(input())
          letters = set(string)
          if " " in letters:         # If you want to count spaces too, ignore this if-statement
              letters.remove(" ")
          max_count = 0
          freq_letter = []
          for letter in letters:
              count = 0
              for char in string:
                  if char == letter:
                      count += 1
              if count == max_count:
                  max_count = count
                  freq_letter.append(letter)
              if count > max_count:
                  max_count = count
                  freq_letter.clear()
                  freq_letter.append(letter)
          return freq_letter, max_count
      

      这可确保您获得最常用的每个字母/字符,而不仅仅是一个。它还返回它发生的频率。希望这会有所帮助:)

      【讨论】:

        【解决方案5】:

        如果您想让 所有 个字符具有最大计数,那么您可以对目前提出的两个想法之一进行变体:

        import heapq  # Helps finding the n largest counts
        import collections
        
        def find_max_counts(sequence):
            """
            Returns an iterator that produces the (element, count)s with the
            highest number of occurrences in the given sequence.
        
            In addition, the elements are sorted.
            """
        
            if len(sequence) == 0:
                raise StopIteration
        
            counter = collections.defaultdict(int)
            for elmt in sequence:
                counter[elmt] += 1
        
            counts_heap = [
                (-count, elmt)  # The largest elmt counts are the smallest elmts
                for (elmt, count) in counter.iteritems()]
        
            heapq.heapify(counts_heap)
        
            highest_count = counts_heap[0][0]
        
            while True:
        
                try:
                    (opp_count, elmt) = heapq.heappop(counts_heap)
                except IndexError:
                    raise StopIteration
        
                if opp_count != highest_count:
                    raise StopIteration
        
                yield (elmt, -opp_count)
        
        for (letter, count) in find_max_counts('balloon'):
            print (letter, count)
        
        for (word, count) in find_max_counts(['he', 'lkj', 'he', 'll', 'll']):
            print (word, count)
        

        这会产生,例如:

        lebigot@weinberg /tmp % python count.py
        ('l', 2)
        ('o', 2)
        ('he', 2)
        ('ll', 2)
        

        这适用于任何序列:单词,但也适用于 ['hello', 'hello', 'bonjour'],例如。

        heapq 结构可以非常有效地找到序列中的最小元素,而无需对其进行完全排序。另一方面,由于字母表中没有那么多字母,您可能还可以遍历排序的计数列表,直到不再找到最大计数,而不会导致任何严重的速度损失。

        【讨论】:

          【解决方案6】:
          def most_frequent(text):
              frequencies = [(c, text.count(c)) for c in set(text)]
              return max(frequencies, key=lambda x: x[1])[0]
          
          s = 'ABBCCCDDDD'
          print(most_frequent(s))
          

          frequencies 是一个元组列表,将字符计数为(character, count)。我们使用count's 将 max 应用于元组并返回该元组的character。如果出现平局,此解决方案将只选择一个。

          【讨论】:

            【解决方案7】:

            问题: 字符串中出现频率最高的字符 输入字符串中出现的最大字符数

            方法一:

            a = "GiniGinaProtijayi"
            
            d ={}
            chh = ''
            max = 0 
            for ch in a : d[ch] = d.get(ch,0) +1 
            for val in sorted(d.items(),reverse=True , key = lambda ch : ch[1]):
                chh = ch
                max  = d.get(ch)
            
            
            print(chh)  
            print(max)  
            

            方法二:

            a = "GiniGinaProtijayi"
            
            max = 0 
            chh = ''
            count = [0] * 256 
            for ch in a : count[ord(ch)] += 1
            for ch in a :
                if(count[ord(ch)] > max):
                    max = count[ord(ch)] 
                    chh = ch
            
            print(chh)        
            

            方法三:

            import collections
            
            a = "GiniGinaProtijayi"
            
            aa = collections.Counter(a).most_common(1)[0]
            print(aa)
            

            【讨论】:

              【解决方案8】:

              我注意到,即使最常用的字符数量相同,大多数答案也只会返回一项。例如“iii 444 yyy 999”。有相同数量的空格,i's、4's、y's 和 9's。解决方案应该包含所有内容,而不仅仅是字母 i:

              sentence = "iii 444 yyy 999"
              
              # Returns the first items value in the list of tuples (i.e) the largest number
              # from Counter().most_common()
              largest_count: int = Counter(sentence).most_common()[0][1]
              
              # If the tuples value is equal to the largest value, append it to the list
              most_common_list: list = [(x, y)
                                       for x, y in Counter(sentence).items() if y == largest_count]
              
              print(most_common_count)
              
              # RETURNS
              [('i', 3), (' ', 3), ('4', 3), ('y', 3), ('9', 3)]
              

              【讨论】:

                【解决方案9】:

                这是一种使用 FOR LOOP AND COUNT() 的方法

                w = input()
                r = 1
                for i in w:
                    p = w.count(i)
                    if p > r:
                        r = p
                        s = i
                print(s)
                

                【讨论】:

                  【解决方案10】:

                  以下是我要做的几件事:

                  • 使用collections.defaultdict 而不是您手动初始化的dict
                  • 使用像max 这样的内置排序和最大功能,而不是自己解决 - 更容易。

                  这是我的最终结果:

                  from collections import defaultdict
                  
                  def find_max_letter_count(word):
                      matches = defaultdict(int)  # makes the default value 0
                  
                      for char in word:
                          matches[char] += 1
                  
                      return max(matches.iteritems(), key=lambda x: x[1])
                  
                  find_max_letter_count('helloworld') == ('l', 3)
                  

                  【讨论】:

                  • 吹毛求疵:lettersletter 一样更正确,因为它是一个只包含一个字母的变量。
                  • @EOL:真;我没有从他的变量中重命名那个变量——我想我自己把它命名为char,因为它不仅仅是一个字母......
                  【解决方案11】:

                  如果您因任何原因无法使用集合,我建议您采用以下实现方式:

                  s = input()
                  d = {}
                  
                  # We iterate through a string and if we find the element, that
                  # is already in the dict, than we are just incrementing its counter.
                  for ch in s:
                      if ch in d:
                          d[ch] += 1
                      else:
                          d[ch] = 1
                  
                  # If there is a case, that we are given empty string, then we just
                  # print a message, which says about it.
                  print(max(d, key=d.get, default='Empty string was given.'))
                  

                  【讨论】:

                    【解决方案12】:
                    sentence = "This is a great question made me wanna watch matrix again!"
                    
                    char_frequency = {}
                    
                    for char in sentence:
                        if char == " ": #to skip spaces
                            continue
                        elif char in char_frequency:
                            char_frequency[char] += 1 
                        else:
                            char_frequency[char] = 1
                    
                    
                    char_frequency_sorted = sorted(
                        char_frequency.items(), key=lambda ky: ky[1], reverse=True
                    )
                    print(char_frequency_sorted[0]) #output -->('a', 9)
                    

                    【讨论】:

                      【解决方案13】:
                      #file:filename
                      #quant:no of frequent words you want
                      
                      def frequent_letters(file,quant):
                          file = open(file)
                          file = file.read()
                          cnt = Counter
                          op = cnt(file).most_common(quant)
                          return op   
                      

                      【讨论】:

                      • 感谢您提供此代码 sn-p,它可能会提供一些有限的即时帮助。一个正确的解释would greatly improve 其长期价值通过展示为什么这是解决问题的好方法,并将使其对未来有其他类似问题的读者更有用。请edit您的回答添加一些解释,包括您所做的假设。具体来说,Counter 是从哪里来的?
                      • 计数器必须通过使用命令'from collections import Counter'导入
                      • edit您的答案以显示附加信息,而不是将其写为评论。评论可能会消失得无影无踪,因此它确实需要成为您答案的一部分。谢谢。
                      【解决方案14】:
                      # This code is to print all characters in a string which have highest frequency
                       
                      def find(str):
                            
                          y = sorted([[a.count(i),i] for i in set(str)])
                        # here,the count of unique character and the character are taken as a list  
                        # inside y(which is a list). And they are sorted according to the 
                        # count of each character in the list y. (ascending)
                        # Eg : for "pradeep", y = [[1,'r'],[1,'a'],[1,'d'],[2,'p'],[2,'e']]
                      
                          most_freq= y[len(y)-1][0]   
                        # the count of the most freq character is assigned to the variable 'r'
                        # ie, most_freq= 2
                      
                          x= []
                      
                          for j in range(len(y)):
                             
                              if y[j][0] == most_freq:
                                  x.append(y[j])
                            # if the 1st element in the list of list == most frequent 
                            # character's count, then all the characters which have the 
                            # highest frequency will be appended to list x.
                            # eg :"pradeep"
                            # x = [['p',2],['e',2]]   O/P  as expected
                          return x
                      
                      find("pradeep")
                      

                      【讨论】:

                      • 您能否对此代码进行一些解释,并解释它比其他解决方案更好/更差的地方?
                      猜你喜欢
                      • 1970-01-01
                      • 2023-04-03
                      • 1970-01-01
                      • 1970-01-01
                      • 2013-12-20
                      • 1970-01-01
                      • 1970-01-01
                      • 2021-10-10
                      相关资源
                      最近更新 更多