从 n 个元素生成所有 4 元组对答案

【问题标题】：Generate all 4-tuple pairs from n elements从 n 个元素生成所有 4 元组对
【发布时间】：2017-09-06 11:31:25
【问题描述】：

在给定大小为n 的数组的情况下，我想生成所有可能的4 元组对 的列表。 n 至少有 8 个，所以总能找到至少 1 对。

作为一个有助于理解问题的示例，我使用了一个较小版本的问题，2 元组对给出了一个大小为 5 的数组。 2 元组对的预期结果将产生 15 个项目（元组是有序的，没有重复）：

[(1,2), (3,4)], [(1,2), (3,5)], [(1,2), (4,5)], [(1,3), (2,4)], [(1,3), (2,5)], [(1,3), (4,5)], [(1,4), (2,3)], [(1,4), (2,5)], [(1,4), (3,5)], [(1,5), (2,3)], [(1,5), (2,4)], [(1,5), (3,4)], [(2,3), (4,5)], [(2,4), (3,5)], [(2,5), (3,4)]

我目前这样做的方法是使用 python 中的 itertools 并遍历 itertools.combinations 返回的所有元素，执行 2 个循环并找到 2 个不共享单个元素的对，然后使用该元素。

为了用python代码表达这一点，我准备了一个小sn-p：

arr = list(range(30)) # example list 
comb = list(itertools.combinations(range(0, len(arr)), 4))

for c1 in comb:
    for c2 in comb:  # go through all possible pairs
        if len([val for val in c1 if val in c2]) == 0:  # intersection of both sets results in 0, so they don't share an element
            ... # do something and check for duplicates

此方法正在发挥作用，但由于 2 个循环而效率低下，并且仅适用于给定时间范围内的小 n。这样做可以更有效率吗？

更新：经过一些回答后，我评估了这些建议。对于我的具体情况，最好的是由 MSeifert 的（现已删除的）答案提供的（扩展的）算法，它执行得最快：

def generate_four_pairs(n):
    valids = range(0, n)
    for x00, x01, x02, x03, x10, x11, x12, x13 in itertools.combinations(valids, 8):
      yield [x00, x01, x02, x03], [x10, x11, x12, x13]
      yield [x00, x01, x02, x10], [x03, x11, x12, x13]
      yield [x00, x01, x02, x11], [x03, x10, x12, x13]
      yield [x00, x01, x02, x12], [x03, x10, x11, x13]
      yield [x00, x01, x02, x13], [x03, x10, x11, x12]
      yield [x00, x01, x03, x10], [x02, x11, x12, x13]
      yield [x00, x01, x03, x11], [x02, x10, x12, x13]
      yield [x00, x01, x03, x12], [x02, x10, x11, x13]
      yield [x00, x01, x03, x13], [x02, x10, x11, x12]
      yield [x00, x01, x10, x11], [x02, x03, x12, x13]
      yield [x00, x01, x10, x12], [x02, x03, x11, x13]
      yield [x00, x01, x10, x13], [x02, x03, x11, x12]
      yield [x00, x01, x11, x12], [x02, x03, x10, x13]
      yield [x00, x01, x11, x13], [x02, x03, x10, x12]
      yield [x00, x01, x12, x13], [x02, x03, x10, x11]
      yield [x00, x02, x03, x10], [x01, x11, x12, x13]
      yield [x00, x02, x03, x11], [x01, x10, x12, x13]
      yield [x00, x02, x03, x12], [x01, x10, x11, x13]
      yield [x00, x02, x03, x13], [x01, x10, x11, x12]
      yield [x00, x02, x10, x11], [x01, x03, x12, x13]
      yield [x00, x02, x10, x12], [x01, x03, x11, x13]
      yield [x00, x02, x10, x13], [x01, x03, x11, x12]
      yield [x00, x02, x11, x12], [x01, x03, x10, x13]
      yield [x00, x02, x11, x13], [x01, x03, x10, x12]
      yield [x00, x02, x12, x13], [x01, x03, x10, x11]
      yield [x00, x03, x10, x11], [x01, x02, x12, x13]
      yield [x00, x03, x10, x12], [x01, x02, x11, x13]
      yield [x00, x03, x10, x13], [x01, x02, x11, x12]
      yield [x00, x03, x11, x12], [x01, x02, x10, x13]
      yield [x00, x03, x11, x13], [x01, x02, x10, x12]
      yield [x00, x03, x12, x13], [x01, x02, x10, x11]
      yield [x00, x10, x11, x12], [x01, x02, x03, x13]
      yield [x00, x10, x11, x13], [x01, x02, x03, x12]
      yield [x00, x10, x12, x13], [x01, x02, x03, x11]
      yield [x00, x11, x12, x13], [x01, x02, x03, x10]
      yield [x01, x02, x03, x00], [x10, x11, x12, x13]
      yield [x01, x02, x03, x10], [x00, x11, x12, x13]
      yield [x01, x02, x03, x11], [x00, x10, x12, x13]
      yield [x01, x02, x03, x12], [x00, x10, x11, x13]
      yield [x01, x02, x03, x13], [x00, x10, x11, x12]
      yield [x01, x02, x10, x00], [x03, x11, x12, x13]
      yield [x01, x02, x10, x11], [x00, x03, x12, x13]
      yield [x01, x02, x10, x12], [x00, x03, x11, x13]
      yield [x01, x02, x10, x13], [x00, x03, x11, x12]
      yield [x01, x02, x11, x00], [x03, x10, x12, x13]
      yield [x01, x02, x11, x12], [x00, x03, x10, x13]
      yield [x01, x02, x11, x13], [x00, x03, x10, x12]
      yield [x01, x02, x12, x00], [x03, x10, x11, x13]
      yield [x01, x02, x12, x13], [x00, x03, x10, x11]
      yield [x01, x02, x13, x00], [x03, x10, x11, x12]
      yield [x01, x03, x10, x00], [x02, x11, x12, x13]
      yield [x01, x03, x10, x11], [x00, x02, x12, x13]
      yield [x01, x03, x10, x12], [x00, x02, x11, x13]
      yield [x01, x03, x10, x13], [x00, x02, x11, x12]
      yield [x01, x03, x11, x00], [x02, x10, x12, x13]
      yield [x01, x03, x11, x12], [x00, x02, x10, x13]
      yield [x01, x03, x11, x13], [x00, x02, x10, x12]
      yield [x01, x03, x12, x00], [x02, x10, x11, x13]
      yield [x01, x03, x12, x13], [x00, x02, x10, x11]
      yield [x01, x03, x13, x00], [x02, x10, x11, x12]
      yield [x01, x10, x11, x00], [x02, x03, x12, x13]
      yield [x01, x10, x11, x12], [x00, x02, x03, x13]
      yield [x01, x10, x11, x13], [x00, x02, x03, x12]
      yield [x01, x10, x12, x00], [x02, x03, x11, x13]
      yield [x01, x10, x12, x13], [x00, x02, x03, x11]
      yield [x01, x10, x13, x00], [x02, x03, x11, x12]
      yield [x01, x11, x12, x00], [x02, x03, x10, x13]
      yield [x01, x11, x12, x13], [x00, x02, x03, x10]
      yield [x01, x11, x13, x00], [x02, x03, x10, x12]
      yield [x01, x12, x13, x00], [x02, x03, x10, x11]

对于一般方法，我建议使用 NPE 提供的答案，因为这是针对此问题的最短且最易读的答案。

【问题讨论】：

什么是arr，您确定itertools 解决方案真的有效吗？还有为什么是 15 个元素而不是 30 个？例如[(4, 5) (2, 3)] 有什么问题？
@MSeifert，它相当于[(2, 3), (4, 5)]，它已经在列表中。 “元组是有序的，没有重复”。
但如果itertools.combinations(range(0, len(arr)), 4) 被itertools.combinations(range(1, 6), 2) 替换，所提出的itertools 解决方案实际上确实给出了[(4, 5), (2, 3)]，这就是为什么我想知道该解决方案是否有效或正确。如果假设len(arr) 是5，那么目前肯定不是。
是的，上面的解决方案也有重复。它们在最后被过滤，因为我不知道比最后过滤它们更简单的解决方案。对这个误会深表歉意
仍然存在未定义 arr 和 combinations 中的 4 的问题，因为您想要 2 个 2 元组。但是你的方法会给出 2 个 4 元组。

标签： python algorithm combinatorics

【解决方案1】：

通过生成所有组合对然后丢弃几乎所有组合，因为它们包含共同的元素，您正在做很多不必要的工作。

以下方法首先获取四个数字的所有子集（在您的 2 元组示例中），然后将每个子集分成所有可能的对：

import itertools

def gen_pairs(n, m):
  for both_halves in itertools.combinations(xrange(1, n + 1), 2 * m):
    for first_half in itertools.combinations(both_halves, m):
      second_half = tuple(sorted(set(both_halves) - set(first_half)))
      yield [first_half, second_half]

print sorted(gen_pairs(5, 2))

请注意，这并不能消除重复项（例如，[(4, 5) (2, 3)] 与 [(2, 3), (4, 5)]），因此会产生两倍于您预期的元素。

但是，删除重复项很简单。这留给读者作为练习。

【讨论】：

不错的解决方案！给读者的提示：仅当first_half 的第一个元素小于second_half 的第一个元素时才调用yield

【解决方案2】：

我愿意：

from itertools import combinations

sample = range(1,6)
x1 = [subset for subset in combinations(sample,2)] #getting the set of tuples
x2 = [list(subset) for subset in combinations(x1,2)] #getting the pair of tuples
x3 = [x for x in x2 if (set(x[0]) & set(x[1]) == set())] #finally filtering the tuples with no intersection

输出：

[[(1, 2), (3, 4)],
 [(1, 2), (3, 5)],
 [(1, 2), (4, 5)],
 [(1, 3), (2, 4)],
 [(1, 3), (2, 5)],
 [(1, 3), (4, 5)],
 [(1, 4), (2, 3)],
 [(1, 4), (2, 5)],
 [(1, 4), (3, 5)],
 [(1, 5), (2, 3)],
 [(1, 5), (2, 4)],
 [(1, 5), (3, 4)],
 [(2, 3), (4, 5)],
 [(2, 4), (3, 5)],
 [(2, 5), (3, 4)]]

【讨论】：

【解决方案3】：

您可以使用可能更快的排列和拆分：

array = ...
size = 4
c = itertools.permutations(array)
for t in c:
    a = []
    for i in range(0, len(t), size):
        if i + size <= len(t):
            a.append(t[i:i+size])
    yield a

注意：如果数组的长度不是大小的倍数，则此解决方案有效，但会产生重复。

【讨论】：

【解决方案4】：

这是生成 MSeifert 的 yield 语句的代码 :)（它只产生 35 个，这意味着没有重复 :)

def g(L, n, k, A, B):
    if len(A) == k:
        return [[tuple(A), tuple([L[i] for i in xrange(0, n + 1)] + B)]]

    elif len(B) == k:
        return [[tuple([L[i] for i in xrange(0, n + 1)] + A), tuple(B)]]

    return g(L, n - 1, k, A, [L[n]] + B[0:]) + g(L, n - 1, k, [L[n]] + A[0:], B)

def f(L):
    assert(len(L) > 3 and len(L) % 2 == 0)
    return g(L, len(L) - 2, len(L) / 2, [], [L[-1]])

for i in f(['x00','x01','x02','x03','x10','x11','x12','x13']):
    print(i)

【讨论】：