提取设置为1的位索引的最有效方法答案

【问题标题】：Most efficient way of extracting the index of bits set to 1提取设置为1的位索引的最有效方法
【发布时间】：2014-06-30 21:28:43
【问题描述】：

我正在编写一个国际象棋程序，我使用 64 位位掩码来表示棋盘的每个方格上是否有棋子。每当我需要遍历板并为所有部分做某事时，我会查看位掩码，找到设置为 1 的位的“索引”（移位量），然后查看板以查看它是哪一块。

这个过程可能是最好的，也可能不是最好的，但我发现这个提取位的函数（on_bits）占用了程序运行时间的 5%！即使考虑到它被调用的次数，它仍然很慢。所以我正在寻找一个好的解决方案。我正在发布我的两个尝试。

原文：

int on_bits(u64 x, u8 *arr) {
    int ret = 0;
    int i = 0;

    while (x) {
        while (!(x&0xffffffff)) {
            x >>= 32;
            i += 32;
        }

        while (!(x&0xff)) {
            x >>= 8;
            i += 8;
        }

        while (!(x&1)) {
            x >>= 1;
            i++;
        }

        arr[ret++] = i;
        x >>= 1;
        i++;
    }

    return ret;
}

新版本，通过编译器优化和展开运行更快。比前一个快大约 2 倍。

#define B(n)    (((u64)0xff)<<((8*n)))
#define b(n)    (((u64)1<<(n)))

int on_bits(u64 x, u8 *arr) {
    int ret = 0;

    if (x & (B(0) | B(1) | B(2) | B(3))) {
        if (x & B(0)) {
            if (x & b(0)) arr[ret++] = 0;
            if (x & b(1)) arr[ret++] = 1;
            if (x & b(2)) arr[ret++] = 2;
            if (x & b(3)) arr[ret++] = 3;
            if (x & b(4)) arr[ret++] = 4;
            if (x & b(5)) arr[ret++] = 5;
            if (x & b(6)) arr[ret++] = 6;
            if (x & b(7)) arr[ret++] = 7;
        }
        if (x & B(1)) {
            if (x & b(8)) arr[ret++] = 8;
            if (x & b(9)) arr[ret++] = 9;
            if (x & b(10)) arr[ret++] = 10;
            if (x & b(11)) arr[ret++] = 11;
            if (x & b(12)) arr[ret++] = 12;
            if (x & b(13)) arr[ret++] = 13;
            if (x & b(14)) arr[ret++] = 14;
            if (x & b(15)) arr[ret++] = 15;
        }
        if (x & B(2)) {
            if (x & b(16)) arr[ret++] = 16;
            if (x & b(17)) arr[ret++] = 17;
            if (x & b(18)) arr[ret++] = 18;
            if (x & b(19)) arr[ret++] = 19;
            if (x & b(20)) arr[ret++] = 20;
            if (x & b(21)) arr[ret++] = 21;
            if (x & b(22)) arr[ret++] = 22;
            if (x & b(23)) arr[ret++] = 23;
        }
        if (x & B(3)) {
            if (x & b(24)) arr[ret++] = 24;
            if (x & b(25)) arr[ret++] = 25;
            if (x & b(26)) arr[ret++] = 26;
            if (x & b(27)) arr[ret++] = 27;
            if (x & b(28)) arr[ret++] = 28;
            if (x & b(29)) arr[ret++] = 29;
            if (x & b(30)) arr[ret++] = 30;
            if (x & b(31)) arr[ret++] = 31;
        }
    }
    if (x & (B(4) | B(5) | B(6) | B(7))) {
        if (x & B(4)) {
            if (x & b(32)) arr[ret++] = 32;
            if (x & b(33)) arr[ret++] = 33;
            if (x & b(34)) arr[ret++] = 34;
            if (x & b(35)) arr[ret++] = 35;
            if (x & b(36)) arr[ret++] = 36;
            if (x & b(37)) arr[ret++] = 37;
            if (x & b(38)) arr[ret++] = 38;
            if (x & b(39)) arr[ret++] = 39;
        }
        if (x & B(5)) {
            if (x & b(40)) arr[ret++] = 40;
            if (x & b(41)) arr[ret++] = 41;
            if (x & b(42)) arr[ret++] = 42;
            if (x & b(43)) arr[ret++] = 43;
            if (x & b(44)) arr[ret++] = 44;
            if (x & b(45)) arr[ret++] = 45;
            if (x & b(46)) arr[ret++] = 46;
            if (x & b(47)) arr[ret++] = 47;
        }
        if (x & B(6)) {
            if (x & b(48)) arr[ret++] = 48;
            if (x & b(49)) arr[ret++] = 49;
            if (x & b(50)) arr[ret++] = 50;
            if (x & b(51)) arr[ret++] = 51;
            if (x & b(52)) arr[ret++] = 52;
            if (x & b(53)) arr[ret++] = 53;
            if (x & b(54)) arr[ret++] = 54;
            if (x & b(55)) arr[ret++] = 55;
        }
        if (x & B(7)) {
            if (x & b(56)) arr[ret++] = 56;
            if (x & b(57)) arr[ret++] = 57;
            if (x & b(58)) arr[ret++] = 58;
            if (x & b(59)) arr[ret++] = 59;
            if (x & b(60)) arr[ret++] = 60;
            if (x & b(61)) arr[ret++] = 61;
            if (x & b(62)) arr[ret++] = 62;
            if (x & b(63)) arr[ret++] = 63;
        }
    }

    return ret;
}

（毫无疑问，哪个更简单：））

那么，有什么改进的方法吗？或者这是一个死胡同？作为参考，该函数在非常短的基准测试中被调用了 3000 万次。

谢谢

编辑：不需要对输出数组进行排序。此外，一个超快的“这是第一位设置”功能可以，但我的尝试与此相比非常慢（我使用了 Linux 内核中的 fls 功能）

【问题讨论】：

参见：chessprogramming.wikispaces.com/BitScan 有硬件支持。
What is the fastest way to return the positions of all set bits in a 64-bit integer?的可能重复
您应该将宏替换为内联函数以提高代码质量，而不会造成其他不利影响。
@usr 当然，这实际上是一个快速模型。谢谢

标签： c performance memory bits chess

【解决方案1】：

如果你使用 gcc，它有有用的内置函数来做你想做的事

— Built-in Function: int __builtin_ffs (int x)
    Returns one plus the index of the least significant 1-bit of x, or if x is zero, returns zero.
— Built-in Function: int __builtin_ffsl (long)
    Similar to __builtin_ffs, except the argument type is long.
— Built-in Function: int __builtin_ffsll (long long)
    Similar to __builtin_ffs, except the argument type is long long.

【讨论】：

是的，但是如果我使用这些函数进行迭代，实际上一切都会变慢。我刚刚按照 chessprogramming.com 中的建议使用这些进行了测试。我认为最好一次获得所有索引。我现在正在进行一些更好的测试，我会回来找你的

【解决方案2】：

为了回答你的问题，这里有合理的答案...

但是……

5% 是什么都没有。如果你把它减半，你得到了什么？花生。您还可以采取其他措施（我敢打赌）来节省更多时间，然后再采取其他措施。（那里有malloc 和free 吗？）剪掉足够多的那些，你会加快整个过程，直到on-bits 中的5% 将增长到值得担心，因为你已经切掉了其他脂肪。

你没有说你使用什么分析方法来获得这 5% 的数字，但大多数分析器（尤其是gprof）很高兴地没有告诉你最大的加速机会是什么，导致你认为你的代码如此紧凑以至于需要 5% 的东西值得专注。 This explains it in more detail.

【讨论】：

是的，完全正确。另见阿姆达尔定律。即使你让它花费 0 时间，你说的是 5% 的加速。没什么大不了的。
@Patrick：对。那么加速比将是 100/95。如果最初需要一分 40 秒，它会缩小到一分 35 秒。 BFD。

【解决方案3】：

如果您考虑扭转问题，而不是构造一个包含哪些位的数组，而是简单地询问位 X 是否已设置，您也许可以减少时间。如果您需要创建数组，则使用以下内容循环所有位可能会更快：

/* (bit == 1) ? return 1 : 0, on error return -1 */
inline int bit_isset (unsigned long bf, int n)
{
    if ((unsigned long) n > sizeof (unsigned long) * CHAR_BIT -1)
        return -1;

    return ((bf >> n) & 0x1) ? 1 : 0;
}

在这里您可以简单地将x 传递给bit_isset，然后将您感兴趣的位传递给bit_isset，例如测试第 49 位是否设置为bit_isset(x, 49)。您可以尝试使用此函数构造 ret 数组，并通过在 for 循环或类似循环中迭代 0<n<63 来测试时间比较。

【讨论】：

嗨，这与第一个选项类似。所以它比我的第二种方法慢 2 倍。
如何放弃边界检查并仅分配 ret[n]=((x >> n) & 0x1) ? 1 : 0; 比较？（更改为满足您的变量）
我确实放弃了边界检查，因为我的代码保证一切正常。