如何从 n 个元素的数组中获得“公平组合”？答案

【问题标题】：How to get 'fair combination' from an array of n elements?如何从 n 个元素的数组中获得“公平组合”？
【发布时间】：2016-06-21 10:58:37
【问题描述】：

在 Ruby 上使用 combination 方法，

[1, 2, 3, 4, 5, 6].combination(2).to_a
#=> [[1, 2], [1, 3], [1, 4], [1, 5], [1, 6], [2, 3],
#    [2, 4], [2, 5], [2, 6], [3, 4], [3, 5], [3, 6],
#    [4, 5], [4, 6], [5, 6]]

我们可以得到一个包含 15 (6C2) 个元素的二维数组。

我想创建一个fair_combination 方法，它返回如下数组：

arr = [[1, 2], [3, 5], [4, 6],
       [3, 4], [5, 1], [6, 2],
       [5, 6], [1, 3], [2, 4],
       [2, 3], [4, 5], [6, 1],
       [1, 4], [2, 5], [3, 6]]

这样每三个子数组（6 个的一半）包含所有给定的元素：

arr.each_slice(3).map { |a| a.flatten.sort }
#=> [[1, 2, 3, 4, 5, 6],
#    [1, 2, 3, 4, 5, 6],
#    [1, 2, 3, 4, 5, 6],
#    [1, 2, 3, 4, 5, 6],
#    [1, 2, 3, 4, 5, 6]]

这使得它有点“公平”，随着数组的继续使用尽可能不同的元素。

为了更通用，它需要满足如下：

(1) 当你从头开始跟踪数组并计算每个数字出现的次数时，在任何时候它都应该尽可能平坦；

(1..7).to_a.fair_combination(3)
#=> [[1, 2, 3], [4, 5, 6], [7, 1, 4], [2, 5, 3], [6, 7, 2], ...]

前 7 个数字组成 [1,2,...,7]，接下来的 7 个数字也是如此。

(2) 一旦数字 A 与 B 在同一个数组中，如果可能，A 不希望与 B 在同一个数组中。

(1..10).to_a.fair_combination(4)
#=> [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 1, 5], [2, 6, 9, 3], [4, 7, 10, 8], ...]

有没有什么好的算法可以创建这样的“公平组合”？

【问题讨论】：

我讨厌成为那种人，但你有什么尝试过？
@NickZuber 感谢您的评论。设法创建了一种等效的 fair_combination(2)； gist.github.com/honake/b685811d7644c563cd26a620274a75e6 但效果不太好，无法使其更通用。

标签： ruby algorithm math combinations

【解决方案1】：

不能保证提供最佳解决方案，但它提供了足够好的解决方案。

在每一步，它都会选择一个最小子池，它是一组具有最小高度的项目，还有一个组合可供选择（高度是项目之前使用过的次数）。

例如，设枚举数为

my_enum = FairPermuter.new('abcdef'.chars, 4).each

第一次迭代可能会返回

my_enum.next  # => ['a', 'b', 'c', 'd']

此时这些字母的高度为 1，但高度为 0 的字母不足以进行组合，因此将它们全部用于下一个：

my_enum.next  # => ['a', 'b', 'c', 'e'] for instance

现在，a、b 和 c 的高度为 2，1 为 d 和 e，0 为 f，仍然是最佳池完整的初始集。

所以这并没有真正针对大尺寸组合进行优化。另一方面，如果组合的大小最多是初始集合大小的一半，那么该算法是相当不错的。

class FairPermuter
  def initialize(pool, size)
    @pool = pool
    @size = size
    @all = Array(pool).combination(size)
    @used = []
    @counts = Hash.new(0)
    @max_count = 0
  end

  def find_valid_combination
    [*0..@max_count].each do |height|
      candidates = @pool.select { |item| @counts[item] <= height }
      next if candidates.size < @size
      cand_comb = [*candidates.combination(@size)] - @used
      comb = cand_comb.sample
      return comb if comb
    end
    nil
  end

  def each
    return enum_for(:each) unless block_given?
    while combination = find_valid_combination
      @used << combination
      combination.each { |k| @counts[k] += 1 }
      @max_count = @counts.values.max
      yield combination
      return if @used.size >= [*1..@pool.size].inject(1, :*)
    end
  end
end

4 比 6 的公平组合结果

[[1, 2, 4, 6], [3, 4, 5, 6], [1, 2, 3, 5],
 [2, 4, 5, 6], [2, 3, 5, 6], [1, 3, 5, 6],
 [1, 2, 3, 4], [1, 3, 4, 6], [1, 2, 4, 5],
 [1, 2, 3, 6], [2, 3, 4, 6], [1, 2, 5, 6],
 [1, 3, 4, 5], [1, 4, 5, 6], [2, 3, 4, 5]]

2 比 6 的公平组合结果

[[4, 6], [1, 3], [2, 5],
 [3, 5], [1, 4], [2, 6],
 [4, 5], [3, 6], [1, 2],
 [2, 3], [5, 6], [1, 6],
 [3, 4], [1, 5], [2, 4]]

2 比 5 的公平组合结果

[[4, 5], [2, 3], [3, 5],
 [1, 2], [1, 4], [1, 5],
 [2, 4], [3, 4], [1, 3],
 [2, 5]]

是时候得到 5 比 12 的组合了：

        1.19 real         1.15 user         0.03 sys

【讨论】：

感谢您的代码。这绝对是一个很好的算法。我想我会修改它以处理数组大小不够大时的问题。
顺便说一句，我很确定没有简单的方法可以使用任何算法，即使在精确拆分组合（如 2 比 6 的组合）的受限情况下，当拆分不是整个数组的一半，不使用完全回溯。

【解决方案2】：

幼稚的实现将是：

class Integer
  # naïve factorial implementation; no checks
  def !
    (1..self).inject(:*)
  end
end

class Range
  # constant Proc instance for tests; not needed
  C_N_R = -> (n, r) { n.! / ( r.! * (n - r).! ) }

  def fair_combination(n)
    to_a.permutation
        .map { |a| a.each_slice(n).to_a }
        .each_with_object([]) do |e, memo|
          e.map!(&:sort)
          memo << e if memo.all? { |me| (me & e).empty? }
        end
  end
end

▶ (1..6).fair_combination(2)
#⇒ [
#    [[1, 2], [3, 4], [5, 6]],
#    [[1, 3], [2, 5], [4, 6]],
#    [[1, 4], [2, 6], [3, 5]],
#    [[1, 5], [2, 4], [3, 6]],
#    [[1, 6], [2, 3], [4, 5]]]
▶ (1..6).fair_combination(3)
#⇒ [
#    [[1, 2, 3], [4, 5, 6]],
#    [[1, 2, 4], [3, 5, 6]],
#    [[1, 2, 5], [3, 4, 6]],
#    [[1, 2, 6], [3, 4, 5]],
#    [[1, 3, 4], [2, 5, 6]],
#    [[1, 3, 5], [2, 4, 6]],
#    [[1, 3, 6], [2, 4, 5]],
#    [[1, 4, 5], [2, 3, 6]],
#    [[1, 4, 6], [2, 3, 5]],
#    [[1, 5, 6], [2, 3, 4]]]
▶ Range::C_N_R[6, 3]
#⇒ 20

坦率地说，我不明白这个函数对于 10 和 4 的行为应该如何，但无论如何，这个实现太消耗内存而无法在大范围内正常工作（在我的机器上，它卡在大小 > 8 的范围内.)

要将此调整为更强大的解决方案，需要摆脱 permutation 以支持“智能连接置换数组”。

希望这对初学者有好处。

【讨论】：

感谢您的评论。非常抱歉，但我应该指定条件。只是具体描述了该方法应满足的要求。但似乎只要输入数组的大小（例如 12）可以除以参数（例如 2、3、4、6），您的算法就可以很好地工作。我想我可以从你的代码开始重新思考。干杯。