【问题标题】:Ruby - sort array of hashes values (string) based on array orderRuby - 根据数组顺序对哈希值(字符串)数组进行排序
【发布时间】:2018-10-07 01:11:15
【问题描述】:

我有一个如下所示格式的哈希数组,我正在尝试根据一个单独的数组对哈希的:book 键进行排序。顺序不是字母顺序,对于我的用例,它不能是字母顺序。

我需要根据以下数组进行排序:

array = ['Matthew', 'Mark', 'Acts', '1John']

请注意,我见过几种利用Array#index(例如Sorting an Array of hashes based on an Array of sorted values)执行自定义排序的解决方案,但不适用于字符串。

我尝试过使用Array#sortArray#sort_by 进行各种排序组合,但它们似乎不接受自定义订单。我错过了什么?提前感谢您的帮助!

哈希数组

[{:book=>"Matthew",
  :chapter=>"4",
  :section=>"new_testament"},
 {:book=>"Matthew",
  :chapter=>"22",
  :section=>"new_testament"},
 {:book=>"Mark",
  :chapter=>"6",
  :section=>"new_testament"},
 {:book=>"1John",
  :chapter=>"1",
  :section=>"new_testament"},
 {:book=>"1John",
  :chapter=>"1",
  :section=>"new_testament"},
 {:book=>"Acts",
  :chapter=>"9",
  :section=>"new_testament"},
 {:book=>"Acts",
  :chapter=>"17",
  :section=>"new_testament"}]

【问题讨论】:

    标签: arrays ruby


    【解决方案1】:

    这是一个例子

    arr = [{a: 1}, {a: 3}, {a: 2}] 
    
    order = [2,1,3]  
    
    arr.sort { |a,b| order.index(a[:a]) <=> order.index(b[:a]) }                                           
    # => [{:a=>2}, {:a=>1}, {:a=>3}]  
    

    在你的情况下是

    order = ['Matthew', 'Mark', 'Acts', '1John']
    result = list_of_hashes.sort do |a,b|
      order.index(a[:name]) <=> order.index(b[:name])
    end
    

    这里有两个重要的概念:

    1. 使用Array#index 查找在数组中找到元素的位置
    2. “宇宙飞船操作员”&lt;=&gt; 这就是 Array#sort 的工作原理 - 请参阅 What is the Ruby <=> (spaceship) operator?

    您可以通过索引要排序的元素列表来稍微加快速度:

    order_with_index = order.each.with_object.with_index({}) do |(elem, memo), idx|
      memo[elem] = idx
    end
    

    然后用order_with_index[&lt;name&gt;]代替order.index(&lt;name&gt;)

    【讨论】:

    • 啊,这个比我的解决方案好,因为sort_by 很贵
    • Max,非常感谢您的快速回复!我会试试这个。我不认为索引可以用来比较字符串顺序。这非常优雅——非常感谢您的时间。
    • @ChaitanyaKale, sort_by 便宜,不贵。这是因为将元素映射到排序标准的哈希构造只在排序操作之前完成一次。相比之下,&lt;=&gt; 每次成对比较都必须计算两次索引,这比执行两次哈希查找要慢得多。
    • 啊谢谢@CarySwoveland!我读到ruby-doc.org/core-2.2.0/Enumerable.html#method-i-sort_by 提到它很昂贵“sort_by 的当前实现生成一个包含原始集合元素和映射值的元组数组。当键集很简单时,这使得 sort_by 相当昂贵。”
    • @ChaitanyaKale,我很惊讶sort_by 使用了一个二元素数组。我只是假设这将是快速查找的哈希。即便如此,sort_by 在很多情况下都比sort 快得多。
    【解决方案2】:

    documentation 可以看出,Array#index 确实适用于字符串(甚至是提供的示例),所以这会起作用:

    books.sort_by { |b| array.index(b[:book]) }
    

    但是不用重复搜索array,你可以只确定一次顺序然后查找:

    order = array.each.with_index.to_h
    #=> { "Matthew" => 0, "Mark" => 1, "Acts" => 2, "1John" => 3 }
    books.sort_by { |b| order[b[:book]] }
    

    【讨论】:

    • 我之前不知何故错过了你的答案。我预计哈希会大大加快 sort_by 的速度,但我的基准测试表明它不会。
    • 该数组可能太小而无法产生明显的差异。但是对于更大的数组,我当然会使用这种方法。
    • 啊,我错过了上面所有 BM 结果的更新。似乎即使对于大数组,我的解决方案和您的解决方案之间的差异也可以忽略不计。也许sort_by 与两个元素数组的实现抵消了使用哈希生成外部订单的好处,所以我想说应该使用语义上最有意义的任何代码。感谢基准测试!
    • 抱歉迈克尔,我现在才看到这个。非常感谢您抽出时间来整理这些。这绝对是一个更具可读性的解决方案,我对此表示赞同。很高兴了解后端排序的工作原理,但我将来肯定会走这条路。
    • @KurtW 别担心,您似乎从这个答案中学到了很多东西,这就是 SO 的意义所在。
    【解决方案3】:

    由于您知道所需的顺序,因此无需对数组进行排序。这是您可以做到这一点的一种方法。 (我已将您的哈希数组称为 bible。)

    bible.group_by { |h| h[:book] }.values_at(*array).flatten
      #=> [{:book=>"Matthew", :chapter=>"4", :section=>"new_testament"},
      #    {:book=>"Matthew", :chapter=>"22", :section=>"new_testament"},
      #    {:book=>"Mark", :chapter=>"6", :section=>"new_testament"},
      #    {:book=>"Acts", :chapter=>"9", :section=>"new_testament"},
      #    {:book=>"Acts", :chapter=>"17", :section=>"new_testament"},
      #    {:book=>"1John", :chapter=>"1", :section=>"new_testament"},
      #    {:book=>"1John", :chapter=>"1", :section=>"new_testament"}] 
    

    由于Enumerable#group_byHash#values_atArray#flatten 都只需要遍历数组bible,这可能比bible 很大时的排序要快。

    这里是步骤。

    h = bible.group_by { |h| h[:book] }
      #=> {"Matthew"=>[{:book=>"Matthew", :chapter=>"4", :section=>"new_testament"},
      #                {:book=>"Matthew", :chapter=>"22", :section=>"new_testament"}],
      #    "Mark"   =>[{:book=>"Mark", :chapter=>"6", :section=>"new_testament"}],
      #    "1John"  =>[{:book=>"1John", :chapter=>"1", :section=>"new_testament"},
      #                {:book=>"1John", :chapter=>"1", :section=>"new_testament"}],
      #    "Acts"   =>[{:book=>"Acts", :chapter=>"9", :section=>"new_testament"}, 
      #                {:book=>"Acts", :chapter=>"17", :section=>"new_testament"}]
      #   } 
    
    a = h.values_at(*array)
      #=> h.values_at('Matthew', 'Mark', 'Acts', '1John')
      #=> [[{:book=>"Matthew", :chapter=>"4", :section=>"new_testament"},
      #     {:book=>"Matthew", :chapter=>"22", :section=>"new_testament"}],
      #    [{:book=>"Mark", :chapter=>"6", :section=>"new_testament"}],
      #    [{:book=>"Acts", :chapter=>"9", :section=>"new_testament"},
      #     {:book=>"Acts", :chapter=>"17", :section=>"new_testament"}],
      #    [{:book=>"1John", :chapter=>"1", :section=>"new_testament"},
      #     {:book=>"1John", :chapter=>"1", :section=>"new_testament"}]] 
    

    最后,a.flatten 返回前面显示的数组。

    让我们做一个基准测试。

    require 'fruity'
    
    @bible = [
      {:book=>"Matthew",
       :chapter=>"4",
       :section=>"new_testament"},
      {:book=>"Matthew",
       :chapter=>"22",
       :section=>"new_testament"},
      {:book=>"Mark",
       :chapter=>"6",
       :section=>"new_testament"},
      {:book=>"1John",
       :chapter=>"1",
       :section=>"new_testament"},
      {:book=>"1John",
       :chapter=>"1",
       :section=>"new_testament"},
      {:book=>"Acts",
       :chapter=>"9",
       :section=>"new_testament"},
      {:book=>"Acts",
       :chapter=>"17",
       :section=>"new_testament"}]
    
    @order = ['Matthew', 'Mark', 'Acts', '1John']
    

    def bench_em(n)
      arr = (@bible*((n/@bible.size.to_f).ceil))[0,n].shuffle
      puts "arr contains #{n} elements"
      compare do 
        _sort       { arr.sort { |h1,h2| @order.index(h1[:book]) <=>
                      @order.index(h2[:book]) }.size }
        _sort_by    { arr.sort_by { |h| @order.find_index(h[:book]) }.size }
        _sort_by_with_hash {ord=@order.each.with_index.to_h;
                            arr.sort_by {|b| ord[b[:book]]}.size}    
        _values_at  { arr.group_by { |h| h[:book] }.values_at(*@order).flatten.size }
      end
    end
    

    @maxpleaner、@ChaitanyaKale 和@Michael Kohl 分别贡献了_sort_sort_bysort_by_with_hash

    bench_em    100
    arr contains 100 elements
    Running each test 128 times. Test will take about 1 second.
    _sort_by is similar to _sort_by_with_hash
    _sort_by_with_hash is similar to _values_at
    _values_at is faster than _sort by 2x ± 1.0
    
    bench_em  1_000
    arr contains 1000 elements
    Running each test 16 times. Test will take about 1 second.
    _sort_by_with_hash is similar to _values_at
    _values_at is similar to _sort_by
    _sort_by is faster than _sort by 2x ± 0.1
    
    bench_em 10_000
    arr contains 10000 elements
    Running each test once. Test will take about 1 second.
    _values_at is faster than _sort_by_with_hash by 10.000000000000009% ± 10.0%
    _sort_by_with_hash is faster than _sort_by by 10.000000000000009% ± 10.0%
    _sort_by is faster than _sort by 2x ± 0.1
    
    bench_em 100_000
    arr contains 100000 elements
    Running each test once. Test will take about 3 seconds.
    _values_at is similar to _sort_by_with_hash
    _sort_by_with_hash is similar to _sort_by
    _sort_by is faster than _sort by 2x ± 0.1
    

    这是第二次运行。

    bench_em    100
    arr contains 100 elements
    Running each test 128 times. Test will take about 1 second.
    _sort_by_with_hash is similar to _values_at
    _values_at is similar to _sort_by
    _sort_by is faster than _sort by 2x ± 0.1
    
    bench_em  1_000
    arr contains 1000 elements
    Running each test 8 times. Test will take about 1 second.
    _values_at is faster than _sort_by_with_hash by 10.000000000000009% ± 10.0%
    _sort_by_with_hash is similar to _sort_by
    _sort_by is faster than _sort by 2.2x ± 0.1
    
    bench_em 10_000
    arr contains 10000 elements
    Running each test once. Test will take about 1 second.
    _values_at is similar to _sort_by_with_hash
    _sort_by_with_hash is similar to _sort_by
    _sort_by is faster than _sort by 2x ± 1.0
    
    bench_em 100_000
    arr contains 100000 elements
    Running each test once. Test will take about 3 seconds.
    _sort_by_with_hash is similar to _values_at
    _values_at is similar to _sort_by
    _sort_by is faster than _sort by 2x ± 0.1
    

    【讨论】:

    • 卡里,很抱歉我昨天没听清楚。我很高兴有一个我没有跟进的解决方案。这是非常好的分析,我的@bible 数组(或任何将被称为的)非常大。如果我把这件事搞定,我很可能会比较性能并在我的最终代码中使用你的解决方案。谢谢!!
    【解决方案4】:

    正如Array#sort_by 的描述接受一个块。该块应返回 -1、0 或 +1,具体取决于 a 和 b 之间的比较。您可以在array 上使用find_index 进行此类比较。

    array_of_hashes.sort_by {|a| array.find_index(a[:book]) } 应该可以解决问题。

    【讨论】:

    • 感谢您抽空柴坦亚!
    【解决方案5】:

    您的错误是认为您正在排序。但是,事实上,你不是,你已经有了订单,你只需要放置元素。我不是在提出一个紧凑或最优的解决方案,而是一个简单的解决方案。首先将您的大数组转换为由:book 键索引的哈希(这应该是您的第一个数据结构),然后只需使用map

    array = ['Matthew', 'Mark', 'Acts', '1John']
    elements = [{:book=>"Matthew",
      :chapter=>"4",
      :section=>"new_testament"},
     {:book=>"Matthew",
      :chapter=>"22",
      :section=>"new_testament"},
     {:book=>"Mark",
      :chapter=>"6",
      :section=>"new_testament"},
     {:book=>"1John",
      :chapter=>"1",
      :section=>"new_testament"},
     {:book=>"1John",
      :chapter=>"1",
      :section=>"new_testament"},
     {:book=>"Acts",
      :chapter=>"9",
      :section=>"new_testament"},
     {:book=>"Acts",
      :chapter=>"17",
      :section=>"new_testament"}]
    by_name = {}
    for e in elements
      by_name[e[:book]] = e
    end
    final = array.map { |x| by_name[x] }
    

    【讨论】:

    • 嗯。我没看到同名的条目不止一个,忘了这个吧。
    猜你喜欢
    • 2015-04-08
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2014-03-05
    • 1970-01-01
    • 1970-01-01
    • 2014-09-21
    相关资源
    最近更新 更多