【问题标题】:ruby convert array to hash preserve duplicate keyruby 将数组转换为哈希保留重复键
【发布时间】:2014-07-02 19:42:18
【问题描述】:

我需要将 git ls-remote 的结果下拉到一个数组中,然后将该数组转换为这样的哈希:{commit_hash => reference}。有时,两个提交哈希是相同的(但可能有不同的引用)。所以我得到了这种东西:

["19d97e408ee3f993745b053e281ac9dc69519e06","refs/heads/auto",
 "8f6f47c6e8023540b022586e368c68e1e814ce6d","refs/heads/callout_hooks",  
 "3cbdb4b2fcb85bc7f0ed08b62e2bf2445a7659e8","refs/heads/elab",
 "d38a9a26ef887c08b306bdab210b39882f58e587","refs/heads/elab_6.1",
 "19d97e408ee3f993745b053e281ac9dc69519e06","refs/heads/master",
 "906dfe6eebff832baf0f92683d751432fcc98ab7","refs/heads/regression"]

我想转换成:

{"19d97e408ee3f993745b053e281ac9dc69519e06" => "refs/heads/auto"...}

但是 master 和 auto 具有相同的哈希值,因此其中一个会在转换中被丢弃。

如何 1.) 获取转换中删除的值的列表,或 2.) 通过向键添加特殊字符(如 *)使键唯一?

【问题讨论】:

  • 您的数组元素在 Ruby 中无效...它们是字符串数组吗?

标签: ruby arrays hash type-conversion


【解决方案1】:

你为你想要做的事情提供了两个选项:

  • 获取转换中删除的值的列表
  • 通过向键添加特殊字符使键唯一

我认为第二种方法是一个坏主意,原因有两个:a)您必须有一种修改密钥的方法,以允许它们有多个重复的可能性; b) 在原件和复制件之间建立联系会很尴尬。此外,它会很丑陋。

我看到其他人提出了第三种可能性:更改结果哈希的形式,以便为字符串数组赋值。这可能对你有好处,但这不是你所要求的,所以我选择建立一个被删除的值的列表;即,除了第一个。

代码

def create_hash_and_save_extras(arr)
  arr.each_slice(2).with_object([{},[]]) { |(k,v),(h,ex)|
    h.update({k=>v}) { |k, ov, nv| ex << {k=>nv}; ov } }
end

示例

create_hash_and_save_extras(arr)
  #=> [{"19d97e408ee3f993745b053e281ac9dc69519e06"=>"refs/heads/auto",
  #     "8f6f47c6e8023540b022586e368c68e1e814ce6d"=>"refs/heads/callout_hooks",
  #     "3cbdb4b2fcb85bc7f0ed08b62e2bf2445a7659e8"=>"refs/heads/elab",
  #     "d38a9a26ef887c08b306bdab210b39882f58e587"=>"refs/heads/elab_6.1",
  #     "906dfe6eebff832baf0f92683d751432fcc98ab7"=>"refs/heads/regression"},
  #   [{"19d97e408ee3f993745b053e281ac9dc69519e06"=>"refs/heads/master"}]]

说明

Enumerable#each_slice 发送到arr 返回一个枚举器:

enum1 = arr.each_slice(2)
  #=> #<Enumerator: [
  #      "19d97e408ee3f993745b053e281ac9dc69519e06", "refs/heads/auto",
  #      "8f6f47c6e8023540b022586e368c68e1e814ce6d", "refs/heads/callout_hooks",
  #      ...
  #      "906dfe6eebff832baf0f92683d751432fcc98ab7", "refs/heads/regression"
  #   ]:each_slice(2)>

Enumerator#with_object 创建一个由初始空散列(由块变量 h 表示)和一个由块变量 ex 表示的初始空数组(用于“附加”)组成的数组,其中然后发送到enum1 以创建另一个枚举器(您可以将其视为“复合枚举器”——注意下面对each_slice(2)&gt;:with_object({}) 的引用)。

enum2 = enum1.with_object([{},[]])
  #=> #<Enumerator: #<Enumerator: [
  #      "19d97e408ee3f993745b053e281ac9dc69519e06", "refs/heads/auto",
  #      "8f6f47c6e8023540b022586e368c68e1e814ce6d", "refs/heads/callout_hooks",
  #      ...
  #      "906dfe6eebff832baf0f92683d751432fcc98ab7", "refs/heads/regression"
  #   ]:each_slice(2)>:with_object([{},[])>

我们可以将enum2 转换为一个数组,看看它将传递给它的块:

enum2.to_a
#=> [[["19d97e408ee3f993745b053e281ac9dc69519e06", "refs/heads/auto"],
#       [{}, []]],
#    [["8f6f47c6e8023540b022586e368c68e1e814ce6d", "refs/heads/callout_hooks"],
#       [{}, []]],
#    [["3cbdb4b2fcb85bc7f0ed08b62e2bf2445a7659e8", "refs/heads/elab"],
#       [{}, []]],
#    [["d38a9a26ef887c08b306bdab210b39882f58e587", "refs/heads/elab_6.1"],
#       [{}, []]],
#    [["19d97e408ee3f993745b053e281ac9dc69519e06", "refs/heads/master"],
#       [{}, []]],
#    [["906dfe6eebff832baf0f92683d751432fcc98ab7", "refs/heads/regression"],
#       [{}, []]],

enum2 传入其块的第一个元素是

[["19d97e408ee3f993745b053e281ac9dc69519e06", "refs/heads/auto"], [{}, []]]]]

因此,块变量分配如下:

k => "19d97e408ee3f993745b053e281ac9dc69519e06"
v => "refs/heads/auto"
h => {}
ex = []

我们现在使用Hash#update(又名Hash#merge!)将{k,v}合并到hh最初为空。)因此

h.update({k=>v}) { |k, ov, nv| extras << {k=>nv}; ov }

变成

h.update({"19d97e408ee3f993745b053e281ac9dc69519e06"=>"refs/heads/auto"})

紧随其后

{ |k, ov, nv| ex << {k=>nv}; ov }

但该块仅适用于哈希合并哈希 (h) 和正在合并的哈希 (update 的参数) 共享相同的密钥 k,在这种情况下 ovnv 是分别与 h 的这些键关联的值和要合并的哈希值。键 k 的合并值将是块返回的值。是的,这将在我们遇到重复项时适用。

那么现在

h #=> {"19d97e408ee3f993745b053e281ac9dc69519e06"=>"refs/heads/auto"}

我们继续以这种方式处理enum2 的其他每个元素。当我们遇到

k = "19d97e408ee3f993745b053e281ac9dc69519e06"
v = "refs/heads/master"
h = {"19d97e408ee3f993745b053e281ac9dc69519e06"=>"refs/heads/auto",
      "8f6f47c6e8023540b022586e368c68e1e814ce6d"=>"refs/heads/callout_hooks",
      "3cbdb4b2fcb85bc7f0ed08b62e2bf2445a7659e8"=>"refs/heads/elab",
      "d38a9a26ef887c08b306bdab210b39882f58e587"=>"refs/heads/elab_6.1"}

我们发现k已经在合并哈希h中,因此对该块进行评估以确定合并哈希hk的值。我们希望保留当前值h[k],即ov,这就是块返回的值。然而,首先,我们在(仍然为空的)数组ex 中附加重复值,以哈希表示。

ex << {"19d97e408ee3f993745b053e281ac9dc69519e06" => "refs/heads/master"}

【讨论】:

    【解决方案2】:

    我希望你会喜欢这个:

    ary = [
           "19d97e408ee3f993745b053e281ac9dc69519e06","refs/heads/auto",
           "8f6f47c6e8023540b022586e368c68e1e814ce6d","refs/heads/callout_hooks",  
           "3cbdb4b2fcb85bc7f0ed08b62e2bf2445a7659e8","refs/heads/elab",
           "d38a9a26ef887c08b306bdab210b39882f58e587","refs/heads/elab_6.1",
           "19d97e408ee3f993745b053e281ac9dc69519e06","refs/heads/master",
           "906dfe6eebff832baf0f92683d751432fcc98ab7","refs/heads/regression"
         ]
    
    array_hash = ary.each_slice(2).with_object(Hash.new { |h,k| h[k] = []}) do |(k,v),hash|
      hash[k] << v 
    end
    
    # the main advantage is here you wouldn't loose any data, all are with you. You can
    # use it as per your need. I think it is a better approach to deal with your situation.
    array_hash
    # => {"19d97e408ee3f993745b053e281ac9dc69519e06"=>
    #      ["refs/heads/auto", "refs/heads/master"],
    #     "8f6f47c6e8023540b022586e368c68e1e814ce6d"=>["refs/heads/callout_hooks"],
    #     "3cbdb4b2fcb85bc7f0ed08b62e2bf2445a7659e8"=>["refs/heads/elab"],
    #     "d38a9a26ef887c08b306bdab210b39882f58e587"=>["refs/heads/elab_6.1"],
    #     "906dfe6eebff832baf0f92683d751432fcc98ab7"=>["refs/heads/regression"]}
    

    【讨论】:

    • Arup,我相信with_object 就够了。
    • @CarySwoveland 是的.. 你是对的.. 因为#each_slice 给出了Enumerator,所以#with_object 就足够了。感谢您的评论..
    【解决方案3】:

    如果你对 hash_value => refs 数组进行散列,你将保留所有内容:

    array = ["19d97e408ee3f993745b053e281ac9dc69519e06","refs/heads/auto",
     "8f6f47c6e8023540b022586e368c68e1e814ce6d","refs/heads/callout_hooks",  
     "3cbdb4b2fcb85bc7f0ed08b62e2bf2445a7659e8","refs/heads/elab",
     "d38a9a26ef887c08b306bdab210b39882f58e587","refs/heads/elab_6.1",
     "19d97e408ee3f993745b053e281ac9dc69519e06","refs/heads/master",
     "906dfe6eebff832baf0f92683d751432fcc98ab7","refs/heads/regression"
    ]
    
    array.each_slice(2).reduce({}) { |h, (k, v)| (h[k] ||= []) << v; h }
    

    看起来奥雅纳和我的想法是一样的......

    【讨论】:

    • 尼克,也许是each_with_object,而不是reduce,以摆脱那个难看的; h。 (召回块变量的顺序不同。)很好的解决方案。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2021-01-17
    • 1970-01-01
    • 1970-01-01
    • 2010-12-11
    • 2021-10-16
    • 1970-01-01
    • 2016-02-05
    相关资源
    最近更新 更多