python itertools 具有绑定值的排列答案

【问题标题】：python itertools permutations with tied valuespython itertools 具有绑定值的排列
【发布时间】：2016-01-11 14:29:39
【问题描述】：

我想有效地 找到具有绑定值的向量的排列。

例如，如果perm_vector = [0,0,1,2] 我想获得[0,0,1,2], [0,0,2,1], [0,1,2,0] 等的所有组合作为输出，但我不想获得[0,0,1,2] 两次，这是标准itertools.permutations(perm_vector) 会给出的。

我尝试了以下方法，但是当 perm_vector grows 在 len 中时，它的工作速度真的很慢：

vectors_list = []
for it in itertools.permutations(perm_vector):
    vectors_list.append(list(it))
df_vectors_list  = pd.DataFrame( vectors_list)
df_gb = df_vectors_list.groupby(list(df_vectors_list.columns)) 
vectors_list = pd.DataFrame(df_gb.groups.keys()).T

实际上，这个问题具有更一般的“加速”性质。主要时间花在创建长向量的排列上——即使没有重复性，创建 12 个唯一值的向量的排列也需要“无穷大”。是否有可能在不访问整个排列数据但处理一堆数据的情况下迭代地调用 itertools？

【问题讨论】：

Why does Python's itertools.permutations contain duplicates? (When the original list has duplicates)的可能重复
这是来自上述评论引用的线程中的评论的外部link，可能会有所帮助。
在 itertools 模块中有一个配方，查看 unique_everseen 配方：docs.python.org/3/library/itertools.html#itertools-recipes
基于 C++ 的std::next_permutation 思想的东西可能是合适的； std::next_permutation 以您想要的方式处理重复项。我建议至少自己实施一次作为学习经验，但也有existing implementations available。

标签： python performance pandas itertools

【解决方案1】：

如果 perm_vector 很小，试试这个：

import itertools as iter
{x for x in iter.permutations(perm_vector)}

这应该给你唯一的值，因为现在它变成了一个集合，默认情况下删除重复。

如果 perm_vector 很大，您可能想尝试回溯：

def permu(L, left, right, cache):
    for i in range(left, right):
        L[left], L[i] = L[i], L[left]
        L_tuple = tuple(L)
        if L_tuple not in cache:                
            permu(L, left + 1, right, cache)
            L[left], L[i] = L[i], L[left]
            cache[L_tuple] = 0
cache = {}
permu(perm_vector, 0, len(perm_vector), cache)
cache.keys()

【讨论】：

虽然这在技术上是可行的，但它仍然会在过滤之前生成所有重复的排列，因此当有很多重复时它的效率非常低。
@user2357112 是的。如果列表很大，可能需要使用回溯和记忆来提高效率。我在上面发布了我的代码（如果有办法在“for循环”中避免“if”，那就太好了）..

【解决方案2】：

这个怎么样：

from collections import Counter

def starter(l):
    cnt = Counter(l)
    res = [None] * len(l)
    return worker(cnt, res, len(l) - 1)

def worker(cnt, res, n):
    if n < 0:
        yield tuple(res)
    else:
        for k in cnt.keys():
            if cnt[k] != 0:
                cnt[k] = cnt[k] - 1
                res[n] = k
                for r in worker(cnt, res, n - 1):
                    yield r
                cnt[k] = cnt[k] + 1

【讨论】：