如何有效地分析python中2个大列表的所有可能组合？答案

【问题标题】：How to efficiently analyze all possible combinations of 2 large lists in python?如何有效地分析python中2个大列表的所有可能组合？
【发布时间】：2020-07-13 23:05:14
【问题描述】：

假设我们有 2 个列表，每个列表有大约 18 个完全唯一的 1,000 以下的数字。

现在，我们要计算每个列表中数字的所有可能的组合（r 从 1 到 18）。

然后，我们要从列表中计算这些组合的所有可能对（对由列表 A 中的一个组合和列表 B 中的另一个组合组成）。

最后，假设我们要计算这些对之间的差异，方法是将这对每一边内的所有数字相加，然后将该对的第一部分除以第二部分。最后，我们查看每个减法的结果并选择产生最大数字的对。

我尝试将所有对加载到一个大列表中，然后执行for pair in list:，但是可能的对太多，无法将它们全部加载到一个列表中。因此，我们必须强化和分块分析对。但是，我不确定什么是最节省时间和资源的方法。

这是我尝试使用的代码示例：

from itertools import combinations, product
import random

list_A = random.sample(range(100, 250), 18)
list_B = random.sample(range(300, 450), 18)

# All possible combinations of items in list A
i = 1
all_list_A_combos = []
while i <= 18:
    all_list_A_combos_temp = combinations(list_A, i)
    all_list_A_combos.extend(all_list_A_combos_temp)
    i += 1

# All possible combinations of items in list B
i = 1
all_list_B_combos = []
while i <= 18:
    all_list_B_combos_temp = combinations(list_B, i)
    all_list_B_combos.extend(all_list_B_combos_temp)
    i += 1

# Executing this line crashes the program due to too many possible pairs
all_possible_pairs = list(product(all_list_A_combos, all_list_B_combos))

# Calculating products of division for each pair
list_of_all_differences = []
for pair in all_possible_pairs:

    side_A_sum = 0
    for number in pair[0]:
        side_A_sum += number
    side_B_sum = 0
    for number in pair[1]:
        side_B_sum += number

    difference = side_A_sum / side_B_sum
    list_of_all_differences.append(difference)

# Finding best pair
best_pair = all_possible_pairs[list_of_all_differences.index(max(list_of_all_differences))]
print(best_pair)

我知道您可以通过知道列表 A 中所有项目的总和除以列表 B 中的最小数字是正确答案来“作弊”，但我将除法的乘积作为任务的示例。在我的真实案例中，分析有点复杂，您需要扫描每个可能的配对才能确定。

【问题讨论】：

堆栈溢出不会让你做一些不可行的事情。有 100 个元素列表的 2^100 组合（子集），以及此类子集的 4^100 对。以每秒 10 亿个的速度处理这些对将花费比宇宙年龄更长的时间（很多很多数量级）。除非我误解了您要做什么（不太清楚），否则这是 combinatorial explosion 的经典案例。
好吧，如果配对的数量超过一个列表可以容纳的数量，但低于不合理的时间（大约 100 亿）怎么办？我可以修改问题以适应这个。
你问的是不可行的，如果你想问其他可行的问题，你可以问另一个问题，或者至少编辑这个问题。
@JohnColeman 那里，每个列表 18 个项目应该提供大约 680 亿个组合，这更加现实。
@JohnColeman 现在应该有意义了

标签： python optimization combinations combinatorics large-data

【解决方案1】：

itertools 是基于生成器的。您很少需要将结果收集到列表中。只需制作自己的生成器：

import itertools

def subset_pairs(list_a,list_b):
    """iterator over all pairs of subsets, (s,t), with s drawn from list_a and t drawn from list_b"""
    for i in range(1+len(list_a)):
        for j in range(1+len(list_b)):
            for s in itertools.combinations(list_a,i):
                for t in itertools.combinations(list_b,j):
                    yield s,t

这是一个简单的测试（以print 作为您处理的替身）：

for s,t in subset_pairs(['a','b','c'],[1,2]):
    print(s,"and",t)

输出：

() and ()
() and (1,)
() and (2,)
() and (1, 2)
('a',) and ()
('b',) and ()
('c',) and ()
('a',) and (1,)
('a',) and (2,)
('b',) and (1,)
('b',) and (2,)
('c',) and (1,)
('c',) and (2,)
('a',) and (1, 2)
('b',) and (1, 2)
('c',) and (1, 2)
('a', 'b') and ()
('a', 'c') and ()
('b', 'c') and ()
('a', 'b') and (1,)
('a', 'b') and (2,)
('a', 'c') and (1,)
('a', 'c') and (2,)
('b', 'c') and (1,)
('b', 'c') and (2,)
('a', 'b') and (1, 2)
('a', 'c') and (1, 2)
('b', 'c') and (1, 2)
('a', 'b', 'c') and ()
('a', 'b', 'c') and (1,)
('a', 'b', 'c') and (2,)
('a', 'b', 'c') and (1, 2)

【讨论】：

刚刚测试过，效果很好。非常感谢您的帮助！