如何编写一个在字符串中查找最常见字母的方法？答案

【问题标题】：How to write a method that finds the most common letter in a string?如何编写一个在字符串中查找最常见字母的方法？
【发布时间】：2015-10-13 03:26:32
【问题描述】：

这是问题提示：

编写一个接收string 的方法。您的方法应该返回数组中最常见的字母，以及它出现的次数。

到目前为止，我不完全确定该去哪里。

def most_common_letter(string)
  arr1 = string.chars
  arr2 = arr1.max_by(&:count)
end

【问题讨论】：

标签： ruby string methods

【解决方案1】：

一种简单的方法，无需担心检查空字母：

letter, count = ('a'..'z')
                .map {|letter| [letter, string.count(letter)] }
                .max_by(&:last)

【讨论】：

它可以，但是我怎样才能得到这封信出现的次数？
哦，这速度很快！
@Kirill，不，它很慢。当只需要一个时，它会通过整个字符串 26 次（肯定比您的解决方案慢得多）。
是的，这是一个更慢但直接的解决方案。
这很奇怪。我did simple benchmark，你的解决方案比我的快。

【解决方案2】：

def most_common_letter(string)
  string.downcase.split('').group_by(&:itself).map { |k, v| [k, v.size] }.max_by(&:last)
end

编辑：
使用哈希：

def most_common_letter(string)
  chars             = {}
  most_common       = nil
  most_common_count = 0
  string.downcase.gsub(/[^a-z]/, '').each_char do |c|
    count = (chars[c] = (chars[c] || 0) + 1)
    if count > most_common_count
      most_common       = c
      most_common_count = count
    end
  end
  [most_common, most_common_count]
end

【讨论】：

很好的答案，一旦你解决了一个小问题：most_common_letter(" a ") #=> [" ", 21]（我的大部分空间都被压缩了。如你所见，有 21 个。）当然不仅仅是空间：most_common_letter("She--Ann--is very cool") #=> ["-", 4] 。
@CarySwoveland，这很奇怪。我收到most_common_letter(" a ") #=> [" ", 2]。 most_common_letter("She--Ann--is very cool") #=> ["-", 4] 怎么了？
另外，我怎样才能摆脱res变量？
字符" " 和"-" 不是字母。字母是"a"-"z" 和"A"-"Z"。请参阅我的答案以获取想法。你可以用str.split('').group_by(&:itself).map { |k,v| [k,v.size] }.max_by(&:last) 消除res。
您收到 most_common_letter(" a ") #=> [" ", 2] 是因为您正在复制我在评论中的内容。我的字符串有 10 个空格，后跟一个"a"，然后是 11 个空格。然而，在 cmets 中格式化文本时，SO 会删除多余的空格，所以看起来我的字符串只有两个空格。

【解决方案3】：

我建议你使用计数哈希：

str = "The quick brown dog jumped over the lazy fox."

str.downcase.gsub(/[^a-z]/,'').
             each_char.
             with_object(Hash.new(0)) { |c,h| h[c] += 1 }.
             max_by(&:last)
   #=> ["e",4]

Hash::new 带零参数会创建一个默认值为 0 的空散列。

步骤：

s = str.downcase.gsub(/[^a-z]/,'')
  #=> "thequickbrowndogjumpedoverthelazyfox"
enum0 = s.each_char
  #=> #<Enumerator: "thequickbrowndogjumpedoverthelazyfox":each_char>  
enum1 = enum0.with_object(Hash.new(0))
  #=> #<Enumerator: #<Enumerator:
  #    "thequickbrowndogjumpedoverthelazyfox":each_char>:with_object({})>

您可以将enum1 视为“复合”枚举器。（研究上面的返回值。）

我们来看看enum1的元素：

enum1.to_a
  #=> [["t", {}], ["h", {}], ["e", {}], ["q", {}],..., ["x", {}]]

enum1 (["t", {}]) 的第一个元素由String#each_char 传递给块并分配给块变量：

c,h = enum1.next
  #=> ["t", {}] 
c #=> "t" 
h #=> {}

然后进行块计算：

h[c] += 1
  #=> h[c] = h[c] + 1
  #=> h["t"] = h["t"] + 1
  #=> h["t"] = 0 + 1 #=> 1
h #=> {"t"=>1}

Ruby 将h[c] += 1 扩展为h[c] = h[c] + 1，即h["t"] = h["t"] + 1，因为h #=> {}，h 没有键"t"，所以等号右边的h["t"] 被hash 代替默认值，0。下次c #=> "t"，h["t"] = h["t"] + 1 将减少到h["t"] = 1 + 1 #=> 2（即不会使用默认值，因为h 现在有一个键"t"）。

然后将enum1的下一个值传入block，进行block计算：

c,h = enum1.next
  #=> ["h", {"t"=>1}] 
h[c] += 1
  #=> 1 
h #=> {"t"=>1, "h"=>1}

enum1 的其余元素类似处理。

【讨论】：

非常详细的答案，+1 :-)

【解决方案4】：

这是做你想做的另一种方式：

str = 'aaaabbbbcd'
h = str.each_char.with_object(Hash.new(0)) { |c,h| h[c] += 1 }
max = h.values.max
output_hash = Hash[h.select { |k, v| v == max}]
puts "most_frequent_value: #{max}"
puts "most frequent character(s): #{output_hash.keys}"

【讨论】：

排序是查找集合中定义的函数的最大值或最小值的一种极其低效的方法。
@CarySwoveland 我已经更新了我的答案。你现在怎么想？请告诉我！
好多了。此外，它是唯一在出现平局时返回具有最大计数的所有值的解决方案（尽管没有要求这样做）。你可以做一些事情来收紧它，让它更像 Ruby。例如，您可以删除第二行并写入str.each_char.with_object(Hash.new(0)) { |c,h| h[c] += 1 }（返回h）。请注意，我使用each_char（返回一个枚举器）而不是chars 来避免构建临时数组。
感谢您的建议。再次更新了我的答案。

【解决方案5】：

char, count = string.split('').                  
                     group_by(&:downcase).
                     map { |k, v| [k, v.size] }.
                     max_by { |_, v| v }

【讨论】：

【解决方案6】：

我想提一个Enumerable#tally 的解决方案，由Ruby 2.7.0 引入：

str =<<-END
Tallies the collection, i.e., counts the occurrences of each element. Returns a hash with the elements of the collection as keys and the corresponding counts as values.
END

str.scan(/[a-z]/).tally.max_by(&:last)
#=> ["e", 22]

在哪里：

str.scan(/[a-z]/).tally
#=> {"a"=>8, "l"=>9, "i"=>6, "e"=>22, "s"=>12, "t"=>13, "h"=>9, "c"=>11, "o"=>11, "n"=>11, "u"=>5, "r"=>5, "f"=>2, "m"=>2, "w"=>1, "k"=>1, "y"=>1, "d"=>2, "p"=>1, "g"=>1, "v"=>1}

【讨论】：