在一组数字中找到唯一的位答案

【问题标题】：Find a unique bit in a collection of numbers在一组数字中找到唯一的位
【发布时间】：2011-11-27 13:40:06
【问题描述】：

最好的解释方式是演示。

有一组数字。它们可能会重复，所以：

1110, 0100, 0100, 0010, 0110 ...

我正在寻找的数字是设置了位的数字，没有出现在其他任何数字中。结果是数字（在本例中为 1 - 第一个数字）和位位置（或掩码很好）所以 1000（第 4 位）。可能有不止一种解决方案，但为此它可能是贪婪的。

我可以通过迭代来做到...对于每个数字N，它是：

N & ~(其他数字 OR'd together)

但比特的本质是，如果您跳出框框思考，总会有更好的方法。例如，多次出现的数字永远不会有唯一的位，并且对 ORing 没有影响。

【问题讨论】：

有问题吗？如果您要问是否可以不进行迭代，则不能；必须进行某种迭代，因为您必须查看所有数字才能得出答案。
好吧，你可以将其设为 O(nwordsize) 而不是 O(n^2);计算每个位为 1 的次数，然后找到第一个为 1 的计数（这将是您的位索引），然后找到设置了位的第一个数字。 * 看到这个：stackoverflow.com/questions/7793997
@Artefacto，但我上面的方法涉及两次迭代。有什么方法可以让它变得更好？
@Thomas，是的，9 位。不是一个好数字，但很好。
好，那么 O(n*wordsize) 是 O(n) 一个合理的常数。

标签： c bit-manipulation

【解决方案1】：

您只需要记录每个位是否被看过一次或多次，以及是否被看过两次或多次。独特的位是那些已经看到一次或多次而不是两次或更多的位。这可以使用按位运算有效地完成。

count1 = 0
count2 = 0

for n in numbers:
    count2 |= count1 & n
    count1 |= n

for n in numbers:
    if n & count1 & ~count2:
        return n

如果您不想对数字进行两次迭代，您可以跟踪您看到的包含每个位的某个数字。如果数字存储在磁盘上并且流式传输它们需要磁盘访问，这可能是一个很好的优化，但当然它会使代码更加复杂。

examples = [-1] * wordsize
count1 = 0
count2 = 0

for n in numbers:
    if n & ~count1:
        for i in xrange(wordsize):
            if n & (1 << i):
                examples[i] = n
    count2 |= count1 & n
    count1 |= n

for i in xrange(wordsize):
    if (count1 & ~count2) & (1 << i):
        return examples[i]

您可能会在设置示例的循环中使用技巧来更有效地提取位索引，但由于此代码最多执行 'wordsize' 次，因此可能不值得。

这段代码很容易翻译成 C...为了清楚起见，我只是用 Python 编写的。

【讨论】：

如果我可以将两个回复标记为答案，我肯定会标记这个。然而，Harold 最先进入了那里，这是对他的回答的一个很好的改进。无论如何+1，所以谢谢:)

【解决方案2】：

（我在评论中写的长版）

通过计算索引 k 处的位为每 k 一个的次数（有一个技巧可以比简单地更快地完成此操作，但它仍然是 O(n)），您会得到一个 bitlength 计数器列表其中计数为 1 意味着该位只有一次。该计数器的索引（在 O(1) 中找到，因为您有固定数量的位数）因此是您想要的位位置。要找到设置了该位的数字，只需再次迭代所有数字并检查它是否设置了该位（再次O（n）），如果是，那就是您想要的数字。

总共：O(n) 与 O(n²) 对比每个数字与所有其他数字。

【讨论】：

【解决方案3】：

此方法使用少于 2 次传递（但会更改输入数组）

    #include <stdio.h>

    unsigned array[] = { 0,1,2,3,4,5,6,7,8,16,17 };
    #define COUNTOF(a) (sizeof(a)/sizeof(a)[0])
    void swap(unsigned *a, unsigned *b)
    {
        unsigned tmp;
        tmp = *a;
        *a = *b;
        *b = tmp;
    }

    int main(void)
    {
    unsigned idx,bot,totmask,dupmask;

    /* First pass: shift all elements that introduce new bits into the found[] array.
    ** totmask is a mask of bits that occur once or more
    ** dupmask is a mask of bits that occur twice or more
    */
    totmask=dupmask=0;
     for (idx=bot=0; idx < COUNTOF(array); idx++) {
         dupmask |= array[idx] & totmask;
         if (array[idx] & ~totmask) goto add;
         continue;

    add:
        totmask |= array[idx];
        if (bot != idx) swap(array+bot,array+idx);
        bot++;
        }
    fprintf(stderr, "Bot=%u, totmask=%u, dupmask=%u\n", bot, totmask, dupmask );

    /* Second pass: reduce list of candidates by checking if
    ** they consist of *only* duplicate bits */
    for (idx=bot; idx-- > 0 ; ) {
        if ((array[idx] & dupmask) == array[idx]) goto del;
        continue;
    del:
        if (--bot != idx) swap(array+bot,array+idx);

    }

    fprintf(stdout, "Results[%u]:\n", bot );
    for (idx=0; idx < bot; idx++) {
        fprintf(stdout, "[%u]: %x\n" ,idx, array[idx] );
        }
    return 0;
    }

更新 2011-11-28 另一个版本，不会改变原始数组。（临时）结果保存在单独的数组中。

#include <stdio.h>
#include <limits.h>
#include <assert.h>

unsigned array[] = { 0,1,2,3,4,5,6,7,8,16,17,32,33,64,96,128,130 };
#define COUNTOF(a) (sizeof(a)/sizeof(a)[0])
void swap(unsigned *a, unsigned *b)
{
    unsigned tmp;
    tmp = *a, *a = *b, *b = tmp;
}


int main(void)
{
unsigned idx,nfound,totmask,dupmask;
unsigned found[sizeof array[0] *CHAR_BIT ];

/* First pass: save all elements that introduce new bits to the left
** totmask is a mask of bits that occur once or more
** dupmask is a mask of bits that occur twice or more
*/
totmask=dupmask=0;
 for (idx=nfound=0; idx < COUNTOF(array); idx++) {
     dupmask |= array[idx] & totmask;
     if (array[idx] & ~totmask) goto add;
     continue;

add:
    totmask |= array[idx];
    found[nfound++] = array[idx];
    assert(nfound <= COUNTOF(found) );
    }
fprintf(stderr, "Bot=%u, totmask=%u, dupmask=%u\n", nfound, totmask, dupmask );

/* Second pass: reduce list of candidates by checking if
** they consist of *only* duplicate bits */
for (idx=nfound; idx-- > 0 ; ) {
    if ((found[idx] & dupmask) == found[idx]) goto del;
    continue;
del:
    if (--nfound != idx) swap(found+nfound,found+idx);

}

fprintf(stdout, "Results[%u]:\n", nfound );
for (idx=0; idx < nfound; idx++) {
    fprintf(stdout, "[%u]: %x\n" ,idx, found[idx] );
    }
return 0;
}

【讨论】：

在发现新位时存储元素的想法很聪明。这会使我的解决方案更短。
谢谢。这一切背后的驱动力是我想避免第二遍。看来我做不到；但至少我设法将它减少到 32 步。顺便说一句，goto 只是为了激怒白痴;-)

【解决方案4】：

正如指出的那样，这是行不通的：

您可以将XOR 放在一起，结果将为您提供mask。然后你必须找到N & mask 表达式中第一个不为 0 的数字。

【讨论】：

但是多次为 1 但奇数次的位会（错误地）在掩码中显示为 1，对吗？
@harold 很遗憾你是对的，抱歉我忽略了这一点。