根据第一个列表中可变数量的项目从两个列表创建组合答案

【问题标题】：Create combinations from two lists based on variable number of items from first list根据第一个列表中可变数量的项目从两个列表创建组合
【发布时间】：2022-08-16 09:02:42
【问题描述】：

我一直在努力解决这个问题，所以我想我会伸出援手！

所以我有两个索引位置列表，我需要从中生成组合。（最初我有一个列表，并尝试使用 itertools.product 和 itertools.combinations，但实际数据会因大小而产生内存错误。）

所以最初：（想想x，y坐标）

coords = [[0, 0], [0, 1], [1, 0], [1, 1], [1, 2], [1, 3], [1, 4], [1, 5], [1, 6], [1, 7], [1, 8], [1, 9], [2, 0], [2, 1], [3, 0], [3, 1], [3, 2], [4, 0], [4, 1], [4, 2], [4, 3], [4, 4], [4, 5], [5, 0], [5, 1], [5, 2], [5, 3], [5, 4], [5, 5], [5, 6], [5, 7], [6, 0], [6, 1], [6, 2], [6, 3], [6, 4], [6, 5], [6, 6], [6, 7], [6, 8], [6, 9], [6, 10], [6, 11], [6, 12], [6, 13], [6, 14], [6, 15], [6, 16], [6, 17], [6, 18], [6, 19], [6, 20], [6, 21], [6, 22], [6, 23], [6, 24], [6, 25], [6, 26], [6,
27], [6, 28], [6, 29], [7, 0], [7, 1], [7, 2], [7, 3]]

#the coords get transformed into this:
#each \"x\" element contains the \"y\" sub elements

coord_list = [[0, 1], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 1], [0, 1, 2], [0, 1, 2, 3, 4, 5], [0, 1, 2, 3, 4, 5, 6, 7], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29], [0, 1, 2, 3]]

output = list(itertools.product(*coord))

这一直有效，直到我的索引超过 20 个级别（我在示例中只显示了 7 个级别的索引）

所以我认为我可以通过将列表拆分为我感兴趣的重要特征并限制一次使用的数量来限制生成的组合数量。

我有一个变量（截止），它定义了从第一个列表（neg_list）中提取多少项目。需要用 neg_list 中的那些项目填充新列表，然后用另一个列表 (pos_list) 中的元素填充。

问题是您只能使用每个索引级别中的一个项目，并且只有在绝对必要时，我才需要结果列表来重用第一个列表中的项目。（也许通过向元素添加计数器？） - 目标是至少使用每个元素一次，但分配时间元素在特定索引级别被尽可能多地重用。 ....也许 itertools.takewhile() 会很方便？

cutoff = 2
depth = 7  #The number of unique items in the first index position

pos_list = [[0, 1], [1, 1], [1, 7], [1, 8], [2, 0], [3, 1], [4, 1], [5, 1], [6, 1], [6, 2], [7, 1]]
neg_list = [[0, 0], [1, 0], [1, 2], [1, 3], [1, 4], [1, 5], [1, 6], [1, 9], [2, 1], [3, 0], [3, 2], [4, 0], [4, 2], [4, 3], [4, 4], [4, 5], [5, 0], [5, 2], [5, 3], [5, 4], [5, 5], [5, 6], [5, 7], [6, 0], [6, 3], [6, 4], [6, 5], [6, 6], [6, 7], [6, 8], [6, 9], [6, 10], [6, 11], [6, 12], [6, 13], [6, 14], [6, 15], [6, 16], [6, 17], [6, 18], [6, 19], [6, 20], [6, 21], [6, 22], [6, 23], [6, 24], [6, 25], [6, 26], [6, 27], [6, 28], [6, 29], [7, 0], [7, 2], [7, 3]]

pseudo code:
add use_count to each element of neg_list and pos_list
get cutoff number of elements randomly from neg_list with unique first index number by choosing lowest use_count until all items have a use_count > 0
populate remaining elements up to depth number with elements from pos_list with unique first index number and lowest use_count
increment use_count on used elements in neg_list and pos_list

pseudo output:
an array or list of lists with all the combinations generated
cutoff 2 partial example: (the ^^^ indicate where the neg_list \"seeds\" are)

[[0, 0], [1, 1], [2, 0], [3, 2], [4, 1], [5, 1], [6, 1], [7, 1]]
  ^^^^                    ^^^^
[[0, 1], [1, 2], [2, 0], [3, 1], [4, 1], [5, 1], [6, 18], [7, 1]]
          ^^^^                                    ^^^^^


pos_list would then maybe look like:
[[[0, 1],1], [[1, 1],1], [1, 7], [1, 8], [[2, 0],2], [[3, 1],1], [[4, 1],2] [[5, 1],2], [[6, 1],1], [[6, 2],0], [[7, 1],2]]

neg list would look similar, with counts next to the elements that have been used

截止是唯一可以改变的变量。所以一个截止值 1，我认为会产生 54 个集合。截断两个会产生一堆组合，同时最大化所用元素的可变性。

想法？我不知道该去哪里。

你从哪里获取use_count？
元组可能比这里的列表更好，因为它们使用更少的内存
你说的是“从neg_list中随机获取截断数量的元素”，但在你上面说的是“我有一个变量，它定义了从第一的list\"。那么它是哪一个？请添加更多详细信息。您的问题很难理解。您是否尝试过编写python代码？请提供您尝试过的内容
我对 use_count 的想法是，这将是一种我可以跟踪每个元素被使用了多少次的方式（以避免在可能的情况下一遍又一遍地使用相同的元素）
要么我不明白，要么您示例中的第一个子列表是错误的，因为它显示了来自同一级别的两个元素 pos_list（[6,1] 和 [6,2]）

标签： python combinations itertools

【解决方案1】：

参数：

good_coords = [(0,0), (1,0)] 
bad_coords = [(0,1), (1,1), (1,2)] 
cutoff = 2

我假设 x 出现的顺序是 SORTED

from collections import defaultdict
from itertools import combinations, cycle

# 'xs' stands for 'ex-es', plural of 'x'
xs = sorted(list(set(pair[0] for pair in good_coords)
                 .union(set(pair[0] for pair in bad_coords))))


pairs_good, pairs_bad = defaultdict(list), defaultdict(list)
for x, y in good_coords:
    pairs_good[x].append(y)
for x, y in bad_coords:
    pairs_bad[x].append(y)


sequences_for_each_n_bad = {}  # contains one list for each `n_bad`, and
                               # they contain other lists - sequences of ys.
for n_bads in range(1, cutoff+1):
    sequences = []
    for chosen_x_bads in combinations(xs, n_bads):
        chosen_y_bads = [pairs_bad[x] for x in chosen_x_bads]
        maxlen = max(len(bad_ys) for bad_ys in chosen_y_bads)

        chosen_ys = [pairs_bad[x] if x in chosen_x_bads 
                        else pairs_good[x] 
                        for x in xs]
        # iterate over all elements of all rows in parallel,
        # until the last element of the longest bad row is met
        for sequence in zip(*[cycle(ys) for ys in chosen_ys]):
            if maxlen <= 0: break
            sequences.append(sequence)
            maxlen -= 1
        
    sequences_for_each_n_bad[n_bads] = sequences

结果

sequences_for_each_n_bad

{1: [(1, 0), (0, 1), (0, 2)], 2: [(1, 1), (1, 2)]}

请注意，x 在相同位置始终具有相同的值——所以我只是将这些唯一值保存到xs。

这里字典的键是n_bads（每个序列中坏元素的数量）。

如果你想以你的格式接收输出，你可以使用这个： ``蟒蛇 aslist = [[list(zip(xs, sequence)) 用于序列中的序列] 对于 n_bad，序列在sequence_for_each_n_bad.items()] [aslist 中 n_bad_list 的子列表对于 n_bad_list 中的子列表]

[[(0, 1), (1, 0)],
 [(0, 0), (1, 1)],
 [(0, 0), (1, 2)],
 [(0, 1), (1, 1)],
 [(0, 1), (1, 2)]]

其他测试示例：

good_coords = [(0,0), (1,0), (2,0)] 
bad_coords = [(0,1), (1,1), (2,1), (0,2), (1,2), (2,2)] 
cutoff = 2

{1: [(1, 0, 0), (2, 0, 0), (0, 1, 0), (0, 2, 0), (0, 0, 1), (0, 0, 2)],
 2: [(1, 1, 0), (2, 2, 0), (1, 0, 1), (2, 0, 2), (0, 1, 1), (0, 2, 2)]}

good_coords = [(0,0), (1,0), (2,0)] 
bad_coords = [(0,1), (1,1), (2,1), (0,2), (1,2), (2,2)] 
cutoff = 2

{1: [(1, 0, 0), (2, 0, 0), (0, 1, 0), (0, 2, 0), (0, 0, 1), (0, 0, 2)],
 2: [(1, 1, 0), (2, 2, 0), (1, 0, 1), (2, 0, 2), (0, 1, 1), (0, 2, 2)]}

【讨论】：