删除数组中的重复元素，该元素是哈希值及其对应的 id答案

【问题标题】：Delete duplicated elements in an array that's a value in a hash and its corresponding ids删除数组中的重复元素，该元素是哈希值及其对应的 id
【发布时间】：2019-09-20 08:56:21
【问题描述】：

我有一个哈希值，它是一个数组。如何以最高效的方式删除数组中的重复元素和相应的 id？

这是我的哈希示例

hash = { 
  "id" => "sjfdkjfd",
  "name" => "Field Name",
  "type" => "field",
  "options" => ["Language", "Question", "Question", "Answer", "Answer"],
  "option_ids" => ["12345", "23456", "34567", "45678", "56789"]
}

我的想法是这样的

hash["options"].each_with_index { |value, index |
  h = {}
  if h.key?(value)
    delete(value)
    delete hash["option_ids"].delete_at(index)
  else 
    h[value] = index
  end
}

结果应该是

hash = { 
  "id" => "sjfdkjfd",
  "name" => "Field Name",
  "type" => "field",
  "options" => ["Language", "Question", "Answer"],
  "option_ids" => ["12345", "23456", "45678"]
}

我知道我必须考虑到，当我删除 options 和 option_ids 的值时，这些值的索引会发生变化。但不知道该怎么做

【问题讨论】：

{"Language" => "12345", "Question" => "23456", "Answer" => "45678"} 是否有理由不受欢迎？
是的，这会更有意义，但这是给我的问题。
“重复元素”是什么意思？ 2（以及1）是[1,2,2,3,1]中的重复元素吗？
C.，从技术上讲，在回答 @engineersmnky 的问题时，我认为您实际上的意思是“不”（没有理由）。 :-) 你说，“结果应该是...hash = {...”。这有点令人困惑。如果你写了hash #=> {...，那意味着你想修改现有的哈希hash。如果您只写了{...，则意味着（除非您另有说明）您希望创建一个新的散列并保持现有散列不变。提问时，一般规则是不修改输入对象（也称为 mutated），除非提问者明确声明要修改它们。
@CarySwoveland 是的，duplicated 会是更好的措辞 :) 感谢您的 cmets 和帮助！

标签： ruby algorithm hash

【解决方案1】：

我的第一个想法是压缩值并调用 uniq，然后想办法返回到初始形式：

h['options'].zip(h['option_ids']).uniq(&:first).transpose
#=> [["Language", "Question", "Answer"], ["12345", "23456", "45678"]]

然后，通过并行分配：

h['options'], h['option_ids'] = h['options'].zip(h['option_ids']).uniq(&:first).transpose

h #=> {"id"=>"sjfdkjfd", "name"=>"Field Name", "type"=>"field", "options"=>["Language", "Question", "Answer"], "option_ids"=>["12345", "23456", "45678"]}

这些是步骤：

h['options'].zip(h['option_ids'])
#=> [["Language", "12345"], ["Question", "23456"], ["Question", "34567"], ["Answer", "45678"], ["Answer", "56789"]]

h['options'].zip(h['option_ids']).uniq(&:first)
#=> [["Language", "12345"], ["Question", "23456"], ["Answer", "45678"]]

【讨论】：

哇，从现在开始我真的很喜欢 zip :) 另外，转置然后真的很好！很好的解释。
这太棒了。谢谢！
@CarySwoveland，谢谢！我采纳了明智的建议。

【解决方案2】：

hash = { 
  "id" => "sjfdkjfd",
  "name" => "Field Name",
  "type" => "field",
  "options" => ["L", "Q", "Q", "Q", "A", "A", "Q"],
  "option_ids" => ["12345", "23456", "34567", "dog", "45678", "56789", "cat"]
}

我假设“重复元素”指的是连续的相等元素（2 仅在 [1,2,2,1] 中）而不是“重复元素”（1 和 2 在前面的示例中）。如果第二种解释适用，我确实展示了如何更改代码（实际上是简化）。

idx = hash["options"].
  each_with_index.
  chunk_while { |(a,_),(b,_)| a==b }.
  map { |(_,i),*| i }
  #=> [0, 1, 4, 6]

hash.merge(
  ["options", "option_ids"].each_with_object({}) { |k,h| h[k] = hash[k].values_at(*idx) }
)
  #=> {"id"=>"sjfdkjfd",
  #    "name"=>"Field Name",
  #    "type"=>"field",
  #    "options"=>["L", "Q", "A", "Q"],
  #    "option_ids"=>["12345", "23456", "45678", "cat"]}

如果“重复元素”被解释为意味着"options" 和"option_ids" 的值只有上面显示的前三个元素，则计算idx 如下：

idx = hash["options"].
  each_with_index.
  uniq { |s,_| s }.
  map(&:last)
    #=> [0, 1, 4]

请参阅Enumerable#chunk_while（可以使用Enumerable#slice_when）和Array#values_at。步骤如下。

a = hash["options"]
  #=> ["L", "Q", "Q", "Q", "A", "A", "Q"] 
e0 = a.each_with_index
  #=> #<Enumerator: ["L", "Q", "Q", "Q", "A", "A", "Q"]:each_with_index> 
e1 = e0.chunk_while { |(a,_),(b,_)| a==b }
  #=> #<Enumerator: #<Enumerator::Generator:0x000055e4bcf17740>:each>

我们可以看到枚举器e1 将生成的值并通过将其转换为数组传递给map：

e1.to_a
  #=> [[["L", 0]],
  #    [["Q", 1], ["Q", 2], ["Q", 3]],
  #    [["A", 4], ["A", 5]], [["Q", 6]]]

继续，

idx = e1.map { |(_,i),*| i }
  #=> [0, 1, 4, 6] 

c = ["options", "option_ids"].
      each_with_object({}) { |k,h| h[k] = hash[k].values_at(*idx) } 
  #=> {"options"=>["L", "Q", "A", "Q"],
  #    "option_ids"=>["12345", "23456", "45678", "cat"]} 
hash.merge(c)
  #=> {"id"=>"sjfdkjfd",
  #    "name"=>"Field Name",
  #    "type"=>"field",
  #    "options"=>["L", "Q", "A", "Q"],
  #    "option_ids"=>["12345", "23456", "45678", "cat"]}

【讨论】：

【解决方案3】：

使用Array#transpose

hash = {
  "options" => ["Language", "Question", "Question", "Answer", "Answer"],
  "option_ids" => ["12345", "23456", "34567", "45678", "56789"]
}

hash.values.transpose.uniq(&:first).transpose.map.with_index {|v,i| [hash.keys[i], v]}.to_h
#=> {"options"=>["Language", "Question", "Answer"], "option_ids"=>["12345", "23456", "45678"]}

OP 编辑后：

hash = {
  "id" => "sjfdkjfd",
  "name" => "Field Name",
  "type" => "field",
  "options" => ["Language", "Question", "Question", "Answer", "Answer"],
  "option_ids" => ["12345", "23456", "34567", "45678", "56789"]
}

hash_array = hash.to_a.select {|v| v.last.is_a?(Array)}.transpose
hash.merge([hash_array.first].push(hash_array.last.transpose.uniq(&:first).transpose).transpose.to_h)
#=> {"id"=>"sjfdkjfd", "name"=>"Field Name", "type"=>"field", "options"=>["Language", "Question", "Answer"], "option_ids"=>["12345", "23456", "45678"]}

【讨论】：