找到一组 0 和 1 的排列，给定索引 O(N)答案

【问题标题】：Finding permutation of a set of 0 and 1, given index with O(N)找到一组 0 和 1 的排列，给定索引 O(N)
【发布时间】：2014-09-05 06:48:09
【问题描述】：

我正在尝试找到在给定索引的一组“0”和“1”上查找排列的最有效方法。

例如：给定 l = [0, 0, 1, 1]。所有升序排列为 {0011, 0101, 0110, 1001, 1010, 1100}。这些元素的索引范围为 0 -> 5。给定索引 = 2，结果为 0110。

我找到了算法here，它输入一个整数多重集（例如l = [1, 2, 2]）。他的算法是有效的 (O(N^2))。但是，我的多重集仅包含“0”和“1”，并且需要 O(N) 或更少。 N是列表的长度

请您帮帮我。请注意，我的实际测试很大（len(l) 为 1024），因此 intertool 库不适合。我正在尝试尽可能加快速度（例如，使用 gmpy2...）

基于1，以下是我的尝试，但它是 O(N^2)

from collections import Counter
from math import factorial
import gmpy2   

def permutation(l, index):
    if not index:
        return l

    counter = Counter(l)
    total_count = gmpy2.comb(len(l), counter['1'])
    acc = 0
    for i, v in enumerate(l):
        if i > 0 and v == l[i-1]:
            continue
        count = total_count * counter[v] / len(l)

        if acc + count > index:
            return [l[i]] + permutation(l[:i] + l[i + 1:], index - acc)
        acc += count

    raise ValueError("Not enough permutations")

l = ['0', '0', '1', '1']
index = 2
print (l, index)
   --> result = [0, 1, 1, 0]

提前致谢。

【问题讨论】：

所以基本上，您正在寻找具有一定数量 1 位的第 N 个二进制数？
@AaronDigulla：我的意思是在一组所有可能的排列中找到第 index 个排列。每个排列的长度为 n 并由给定数量的位“1”组成
@santa 但是顺序很重要吗？我的意思是你想要有序的可能排列集中的第 index 个排列？还是您只是想要一种一致且独特的方式来索引排列？
@freakish：是的，所有可能的排列都是按升序排列的。我在上面描述了我的示例：例如：给定 l = [0, 0, 1, 1]。所有升序排列为 {0011, 0101, 0110, 1001, 1010, 1100}。这些元素的索引范围为 0 -> 5。给定索引 = 2，结果为 0110。
@santa：和我说的一样。将输入集转换为一个大的二进制数，然后使用一组有限的移位操作将在 O(index) 时间内为您提供您想要的（即仅取决于您想要的索引）但我知道这对您没有帮助很多。不过，看看二进制模式。也许你可以使用预先计算的结果来加快这个过程。

标签： python algorithm permutation

【解决方案1】：

让我们想想：

For n bits with k ones there are n choose k anagrams.

For each position, p, that the i`th left-most set-bit can occupy there are 
p choose (k-i) anagrams, for example:

n = 4, k = 2, i = 1 (left-most set-bit), position 1 => 001x => 1 choose 1 = 1
n = 4, k = 2, i = 1 (left-most set-bit), position 2 => 01xx => 2 choose 1 = 2

Given index 3 (non zero-based), we calculate the position of the 
left-most set-bit:

position 1, 1 choose (2-1) = 1 anagram, index 1
position 2, 2 choose (2-1) = 2 anagrams, index 2-3

We now know the left-most set-bit must be on position 2 and we know there 
are 2 anagrams possible. 

We look at the next set-bit (i = 2):
position 0, 0 choose (2-2) = 1 anagram, index 2
position 1, 1 choose (2-2) = 1 anagram, index 3

Therefore the second set-bit is in position 1 => 0110

I think this might be O(n*k) - I hope someone can understand/explain the
complexity better and perhaps improve/optimize this algorithm idea.

【讨论】：

【解决方案2】：

给定N个0和M个1的排列，我们需要找到索引为K的排列

我们知道从0开始的排列数等于N-1个0和M个1的排列数，我们称它为K0。

if K > K0 =>  The permutation starts with 1, K remains the same
if k <= K0 => The permutation starts with 0, remove K0 from K

固定第一个位，重新开始 K = K - K0 和正确的 0 和 1 的数量。

这个算法在 O(n) 中运行，其中 n 是位数（而不是列表的长度）。

为了简化计算，我们假设一个基于 1 的索引（从 1 开始）

例子：

n = xxxx
l = [0, 0, 1, 1]
K = 2 => 3
Number of permutations starting with 0: K0 = 3! / (2! * 1!) = 3
K <= K0 => first bit is a 0

n = 0xxx
l = [0, 1, 1]
K = K = 3
Number of permutations starting with 0: K0 = 2! / (2! * 0!) = 1
K > K0 => first bit is a 1

n = 01xx
l = [0, 1]
K = K - K0 = 2
Number of permutations starting with 0: K0 = 1! / (1! * 0!) = 1
K > K0 => first bit is a 1

n = 011x
l = [0]
K = K - K0 = 1
Number of permutations starting with 0: K0 = 1! / (0! * 0!) = 1
K <= K0 => first bit is a 0

n = 0110 Which is verified in your example.

实现此算法可能很棘手，请确保正确处理整个列表仅由 0 或 1 组成的情况。计算阶乘也可能需要一些时间（并且在其他语言中会导致溢出），但可以预先计算它们。

【讨论】：

谢谢，刚刚注意到我们有一个非常相似的方法 :)
是的，但是您帮助我找到了一种更简洁有效的思考方式。
@SamyArous：谢谢。在第一个循环中：为什么 'K = 2 => 3'？
@SamyArous：此外，在第二个循环中：K = 3 > K0 = 1，根据你的算法，K 应该是 3，但是为什么在第三个循环中，K = K-K0 = 2？
@santa，正如我所说，为了简化计算，我们使用基数为 1 的索引数组，而您使用基数为 0 的索引数组。因此，您示例中的索引 2 成为算法中的索引 3

【解决方案3】：

一些想法，你可以如何尝试解决这个问题。

这是一个打印所有排列的简单程序：

import sys

oneBits = int(sys.argv[1])
totalLen = int(sys.argv[2])

low = 2**oneBits-1
end = 2**totalLen

print 'oneBits:',oneBits
print 'totalLen:',totalLen
print 'Range:',low,'-',end
print
format = '{0:0%db}' % totalLen
index = 0
print 'Index Pattern Value'
for i in range(low,end):
    val = format.format(i)
    if val.count('1') == oneBits:
        print '%5d %s %5d' % (index,val,i)
        index += 1

如您所见，它完全适用于位操作（好吧，我在计算1 位时有点作弊:-)

当您使用各种输入运行它时，您会看到输入具有模式：

oneBits: 2
totalLen: 5
Range: 3 - 32

Index Pattern Value
    0 00011     3
    1 00101     5
    2 00110     6  <-- pure shift
    3 01001     9
    4 01010    10
    5 01100    12  <-- pure shift
    6 10001    17
    7 10010    18
    8 10100    20
    9 11000    24  <-- pure shift

所以我的第一个方法是找出发生这些纯变化的索引。距离仅取决于 0 位和 1 位的数量。由于总和始终为 1024，这意味着您应该能够预先计算这些点并将结果存储在包含 1024 个条目的表中。这将使您更接近您想去的地方。

【讨论】：

谢谢你，亚伦。我还没有理解你的方法。你的意思是我需要先使用代码列出所有排列吗？如果是这样，当 totalLen = 1024，oneBits = 100 时，我不能使用 'for'，因为 'OverflowError: range() result has too many items'；此外，列出所有模式可能会花费时间？请您向我解释一下。谢谢
该程序应该帮助您识别排列中的模式。这应该给你一个捷径的想法。一种这样的快捷方式是识别开始模式刚刚移动 X 的最近索引（即，一对0 从开头删除并附加在末尾）。然后，这些可以作为找到您正在寻找的真正价值的起点。
要查看这些模式，您可能不应该尝试使用 totalLen > 10 和 oneBits > 4 的任何操作，因为您获得的值太多。

【解决方案4】：

基于 Samy Arous 的想法，我稍微改变了他的算法：

if K >= K0 => The permutation starts with 1, K = K - K0
if K < K0  => The permutation starts with 0, K remains the same

以下是我的代码：

import gmpy2

def find_permutation (lst, K, numberbit1, numberbit0):
    l = lst
    N = numberbit0
    M = numberbit1

    if N == len(l):
        return '1' * N
    if M == len(l):
        return '1' * M

    result = ''    
    for i in range (0, len(lst)-1):
        K0 = gmpy2.comb(len(l)-1, M)
        if (K < K0):
            result += '0'
            l.remove ('0')
        else:
            result += '1'
            l.remove ('1')
            M -=1
            K = K - K0
    result += l[0]
    return result

lst = ['0','1','1', '1']
K = 1
numberbit1 = 3
numberbit0 = 1
print find_permutation (lst, K, numberbit1, numberbit0)
        --> result = '1011'

谢谢。虽然是 O(n) x（gmpy2.comb 的复杂度），但比我的问题中的 alg 要好。

【讨论】：