代表某个特征的两个列表中的所有值组合答案

【问题标题】：All combinations of values from two lists representing a certain feature代表某个特征的两个列表中的所有值组合
【发布时间】：2018-01-05 20:57:16
【问题描述】：

我有三个列表：

a = [0,1,2]
b = [3,4,5]
c = [aab, abb, aaa]

如何创建所有三元素组合？列表中的序列c 告诉您哪个列表可用于为给定输出序列中的给定位置选择数字

例如（伪代码）：

for i=0 in range(len(c)):
    print: [0,1,3]
           [0,1,4]
             ...
           [0,2,5]
             ...
           [1,2,4]
           [1,2,5]

其余的i 索引也是如此。单个子列表中的值不能重复。我将非常感谢任何提示。

【问题讨论】：

什么是""如何创建所有三元素组合？"" 你能说明更多吗？
这看起来不像 Python。更重要的是，你自己尝试过什么吗？
我故意用伪语言来说明问题。
我用它作为标准it.combinations，以及后来的筛选条件。但我想找到一种不会查找每个索引的所有组合的方法。
"""各个子列表中的值不能重复的地方。""" 对此行的解释？

标签： python python-3.x list combinations

【解决方案1】：

此生成器函数将处理带有 a 和 b 的任意顺序的 'ab' 模板字符串，如果 a 和 b 列表不相交，则输出列表将不包含重复项。我们使用itertools.combinations 生成所需订单的组合，并使用itertools.product 组合a 和b 组合。我们通过将每个 a 和 b 组合转换为迭代器并通过字典从正确的迭代器中选择来以正确的顺序获取它们。

from itertools import combinations, product

def groups(a, b, c):
    for pat in c:
        acombo = combinations(a, pat.count('a'))
        bcombo = combinations(b, pat.count('b'))
        for ta, tb in product(acombo, bcombo):
            d = {'a': iter(ta), 'b': iter(tb)}
            yield [next(d[k]) for k in pat]

# tests

a = [0,1,2]
b = [3,4,5]

templates = ['aab', 'abb', 'aaa'], ['aba'], ['bab']

for c in templates:
    print('c', c)
    for i, t in enumerate(groups(a, b, c), 1):
        print(i, t)
    print()

输出

c ['aab', 'abb', 'aaa']
1 [0, 1, 3]
2 [0, 1, 4]
3 [0, 1, 5]
4 [0, 2, 3]
5 [0, 2, 4]
6 [0, 2, 5]
7 [1, 2, 3]
8 [1, 2, 4]
9 [1, 2, 5]
10 [0, 3, 4]
11 [0, 3, 5]
12 [0, 4, 5]
13 [1, 3, 4]
14 [1, 3, 5]
15 [1, 4, 5]
16 [2, 3, 4]
17 [2, 3, 5]
18 [2, 4, 5]
19 [0, 1, 2]

c ['aba']
1 [0, 3, 1]
2 [0, 4, 1]
3 [0, 5, 1]
4 [0, 3, 2]
5 [0, 4, 2]
6 [0, 5, 2]
7 [1, 3, 2]
8 [1, 4, 2]
9 [1, 5, 2]

c ['bab']
1 [3, 0, 4]
2 [3, 0, 5]
3 [4, 0, 5]
4 [3, 1, 4]
5 [3, 1, 5]
6 [4, 1, 5]
7 [3, 2, 4]
8 [3, 2, 5]
9 [4, 2, 5]

我应该提到，即使combinations 返回迭代器，并且product 愉快地将迭代器作为参数，它也必须从迭代器中创建列表，因为它必须多次运行迭代器内容。因此，如果组合的数量很大，这可能会消耗相当多的 RAM。

如果您想要排列而不是组合，这很容易。我们只调用itertools.permutations 而不是itertools.combinations。

from itertools import permutations, product

def groups(a, b, c):
    for pat in c:
        acombo = permutations(a, pat.count('a'))
        bcombo = permutations(b, pat.count('b'))
        for ta, tb in product(acombo, bcombo):
            d = {'a': iter(ta), 'b': iter(tb)}
            yield [next(d[k]) for k in pat]

# tests

a = [0,1,2]
b = [3,4,5]

templates = ['aaa'], ['abb'] 

for c in templates:
    print('c', c)
    for i, t in enumerate(groups(a, b, c), 1):
        print(i, t)
    print()

输出

 c ['aaa']
1 [0, 1, 2]
2 [0, 2, 1]
3 [1, 0, 2]
4 [1, 2, 0]
5 [2, 0, 1]
6 [2, 1, 0]

c ['abb']
1 [0, 3, 4]
2 [0, 3, 5]
3 [0, 4, 3]
4 [0, 4, 5]
5 [0, 5, 3]
6 [0, 5, 4]
7 [1, 3, 4]
8 [1, 3, 5]
9 [1, 4, 3]
10 [1, 4, 5]
11 [1, 5, 3]
12 [1, 5, 4]
13 [2, 3, 4]
14 [2, 3, 5]
15 [2, 4, 3]
16 [2, 4, 5]
17 [2, 5, 3]
18 [2, 5, 4]

最后，这是一个可以处理任意数量的列表和任意长度的模板字符串的版本。每次调用它只接受一个模板字符串，但这不应该是一个问题。您还可以通过可选的关键字 arg 选择是否要生成排列或组合。

from itertools import permutations, combinations, product

def groups(sources, template, mode='P'):
    func = permutations if mode == 'P' else combinations
    keys = sources.keys()
    combos = [func(sources[k], template.count(k)) for k in keys]
    for t in product(*combos):
        d = {k: iter(v) for k, v in zip(keys, t)}
        yield [next(d[k]) for k in template]

# tests

sources = {
    'a': [0, 1, 2],
    'b': [3, 4, 5],
    'c': [6, 7, 8],
}

templates = 'aa', 'abc', 'abba', 'cab'

for template in templates:
    print('\ntemplate', template)
    for i, t in enumerate(groups(sources, template, mode='C'), 1):
        print(i, t)

输出

template aa
1 [0, 1]
2 [0, 2]
3 [1, 2]

template abc
1 [0, 3, 6]
2 [0, 3, 7]
3 [0, 3, 8]
4 [0, 4, 6]
5 [0, 4, 7]
6 [0, 4, 8]
7 [0, 5, 6]
8 [0, 5, 7]
9 [0, 5, 8]
10 [1, 3, 6]
11 [1, 3, 7]
12 [1, 3, 8]
13 [1, 4, 6]
14 [1, 4, 7]
15 [1, 4, 8]
16 [1, 5, 6]
17 [1, 5, 7]
18 [1, 5, 8]
19 [2, 3, 6]
20 [2, 3, 7]
21 [2, 3, 8]
22 [2, 4, 6]
23 [2, 4, 7]
24 [2, 4, 8]
25 [2, 5, 6]
26 [2, 5, 7]
27 [2, 5, 8]

template abba
1 [0, 3, 4, 1]
2 [0, 3, 5, 1]
3 [0, 4, 5, 1]
4 [0, 3, 4, 2]
5 [0, 3, 5, 2]
6 [0, 4, 5, 2]
7 [1, 3, 4, 2]
8 [1, 3, 5, 2]
9 [1, 4, 5, 2]

template cab
1 [6, 0, 3]
2 [7, 0, 3]
3 [8, 0, 3]
4 [6, 0, 4]
5 [7, 0, 4]
6 [8, 0, 4]
7 [6, 0, 5]
8 [7, 0, 5]
9 [8, 0, 5]
10 [6, 1, 3]
11 [7, 1, 3]
12 [8, 1, 3]
13 [6, 1, 4]
14 [7, 1, 4]
15 [8, 1, 4]
16 [6, 1, 5]
17 [7, 1, 5]
18 [8, 1, 5]
19 [6, 2, 3]
20 [7, 2, 3]
21 [8, 2, 3]
22 [6, 2, 4]
23 [7, 2, 4]
24 [8, 2, 4]
25 [6, 2, 5]
26 [7, 2, 5]
27 [8, 2, 5]

【讨论】：

好吧，这个愚蠢的内存；）这就是为什么我一直在尝试写一些东西而不把它存储在我的内存中。但很难:) 感谢您的解决方案。现在，我正在测试同事@wwii 的解决方案。但奇怪的是，在计算时间时，它比我的旧解决方案更糟糕......
为什么在您的解决方案中，如果我只想创建一个“模板”值，那么我必须用逗号输入它？例如，templates = ['bba'],。因为否则它会错误地显示结果。
@TomaszPrzemski for c in templates: 循环遍历一组模板。 templates = ['bba'], 行创建了一个包含 1 项列表 ['bba'] 的 1 项元组。（括号不会创建元组，逗号会）。你可以只做 a = [0,1,2]; b = [3,4,5]; c = ['bba'] ` for i, t in enumerate(groups(a, b, c), 1): print(i, t)`
@TomaszPrzemski 正如您所注意到的，二战的解决方案使得包含重复元素的组合比例很高。然后你必须将它们过滤掉，所以它肯定比像我这样的解决方案要慢，它只是避免制作包含重复元素的组合。
@TomaszPrzemski 认真的吗？下一次，你应该更好地展示/描述预期的输出！那么为什么 'aaa' 只给出这 3 个排列而不是 [1, 2, 0], [2, 0, 1], [2, 1, 0] 呢？

【解决方案2】：

from itertools import product, chain

setups = ['aab', 'abb', 'aaa']
sources = {
    'a': [0,1,2],
    'b': [3,4,5]
}

combinations = (product(*map(sources.get, setup)) for setup in setups)

combinations 是一个嵌套的惰性迭代器（即，没有任何内容存储在内存中并进行计算）。如果你想得到一个列表的迭代器

combinations = map(list, (product(*map(sources.get, setup)) for setup in setups))

或者您可能想要展平结果：

combinations = chain.from_iterable(product(*map(sources.get, setup)) for setup in setups)

【讨论】：

此代码生成大量包含重复项的列表，这是 OP 不想要的。您可以将它们过滤掉，但如果源列表和设置字符串很大，则效率很低。
@PM2Ring 那么我猜，我误解了任务。
但它很接近 :) 因为我对最经济的方式感兴趣，因为我处理大型列表。感谢您的宝贵时间。

【解决方案3】：

如果我理解正确，您可以通过字典记录 "a" 等字符与变量名称 a 的对应关系来实现目标。

from collections import defaultdict

a = [0,1,2]
b = [3,4,5]
c = ["aab", "abb", "aaa"]
d = {"a": a, "b": b}
d2 = defaultdict(list)
for seq in c:
    l = []
    for idx, v in enumerate(seq):
        l.append(d[v][idx]) 
    print(l)
    d2[seq].append(l)
# Out:
#[0, 1, 5]
#[0, 4, 5]
#[0, 1, 2]
print(d2)
# defaultdict(<class 'list'>, {'aab': [[0, 1, 5]], 'abb': [[0, 4, 5]], 'aaa': [[0, 1, 2]]})

【讨论】：

附近。以及如何获得具有给定特征的所有组合？
什么是给定的功能？ @TomaszPrzemski
所有特征为'aab'，所有特征为'abb'，所有特征为'aaa'
@TomaszPrzemski 已更新。你是这个意思吗？保留另一个字典，用于存储每次形成序列时的值。
这还不是原来的样子，但我已经有了解决方案。虽然不完美 :) 感谢您的宝贵时间！

【解决方案4】：

将列表放入字典中，以便您可以使用字符串访问它们。
使用每个序列中的字符来确定要使用的列表。
使用 itertools.product 获取组合。

import itertools, collections
from pprint import pprint
d = {'a':[0,1,2], 'b':[3,4,5]}
c = ['aab', 'abb', 'aaa']

def f(t):
    t = collections.Counter(t)
    return max(t.values()) < 2

for seq in c:
    data = (d[char] for char in seq)
    print(f'sequence: {seq}')
    pprint(list(filter(f, itertools.product(*data))))
    print('***************************')

序列'abb'的结果：

sequence: abb
[(0, 3, 4),
 (0, 3, 5),
 (0, 4, 3),
 (0, 4, 5),
 (0, 5, 3),
 (0, 5, 4),
 (1, 3, 4),
 (1, 3, 5),
 (1, 4, 3),
 (1, 4, 5),
 (1, 5, 3),
 (1, 5, 4),
 (2, 3, 4),
 (2, 3, 5),
 (2, 4, 3),
 (2, 4, 5),
 (2, 5, 3),
 (2, 5, 4)]

编辑以过滤掉具有重复项

的元组

我喜欢可与 map 一起使用的可调用字典的想法。可以在这里使用。

class CallDict(dict):
    def __call__(self, key):
        return self[key]    #self.get(key)

e = CallDict([('a',[0,1,2]), ('b',[3,4,5])])

for seq in c:
    data = map(e, seq)
    print(f'sequence: {seq}')
    for thing in filter(f, itertools.product(*data)):
        print(thing)
    print('***************************')

我忍不住，这是@PM2Ring 的solution/answer 的通用版本。它不会过滤掉不需要的项目，而是从一开始就不会产生它们。

d = {'a':[0,1,2], 'b':[3,4,5]}
c = ['aab', 'abb', 'aaa', 'aba']
def g(d, c):
    for seq in c:
        print(f'sequence: {seq}')
        counts = collections.Counter(seq)
##        data = (itertools.combinations(d[key],r) for key, r in counts.items())
        data = (itertools.permutations(d[key],r) for key, r in counts.items())
        for thing in itertools.product(*data):
            q = {key:iter(other) for key, other in zip(counts, thing)}
            yield [next(q[k]) for k in seq]

for t in g(d, c):
    print(t)

【讨论】：

看起来不错，只是设置过滤器，使给定元组中没有重复值。
@TomaszPrzemski - 具有重复值的元组是否丢弃？
是的，我说得有点糟糕 :) 我现在使用了你的解决方案，但我有点惊讶，因为结果证明时间更长。老实说，看代码本身，我希望能缩短时间。叹息……

【解决方案5】：

您似乎正在寻找某种方式以编程方式调用itertools.product

from itertools import product

d = {'a': [0,1,2],
     'b': [3,4,5]}
c = ['aab', 'abb', 'aaa']

for s in c:
    print(list(product(*[d[x] for x in s])))

【讨论】：