查找 64 位整数中最高和最低有效位集的快速方法答案

【问题标题】：Fast way of finding most and least significant bit set in a 64-bit integer查找 64 位整数中最高和最低有效位集的快速方法
【发布时间】：2015-07-13 02:24:16
【问题描述】：

StackOverflow 上有很多关于此的问题。很多。但是我找不到答案：

在 C# 中工作
适用于 64 位整数（相对于 32 位）

快于：

private static int Obvious(ulong v)
{
    int r = 0;
    while ((v >>= 1) != 0) 
    {
        r++;
    }
    return r;
}

甚至

int r = (int)(Math.Log(v,2));

我假设这里是 64 位 Intel CPU。

一个有用的参考是Bit Hacks page，另一个是fxtbook.pdf 然而，虽然这些给出了解决问题的有用方向，但它们并没有给出现成的答案。

我正在寻找一个可重复使用的函数，它可以执行类似于 _BitScanForward64 和 _BitScanReverse64 的操作，仅适用于 C#。

【问题讨论】：

这不是和stackoverflow.com/questions/10439242/…本质上一样吗？显然，您必须将其调整为 64 位，它为您提供与您正在寻找的数字相反的数字，但它传达的信息相同。
@Taekahn 调整到 64 位并非易事。尝试一下。正如我在问题中所承认的那样，SO 上确实存在 32 位答案。

标签： c# x86 bit-manipulation leading-zero

【解决方案1】：

.NET Core 3.0 添加了BitOperations.LeadingZeroCount 和BitOperations.TrailingZeroCount，因此您可以直接使用它们。它们将被映射到 x86 的 LZCNT/BSR 和 TZCNT/BSF 指令，因此非常高效

int mostSignificantPosition = 63 - BitOperations.LeadingZeroCount(0x1234L);
int leastSignificantPosition = BitOperations.TrailingZeroCount(0x1234L);

或者，最高有效位的位置可以这样计算

int mostSignificantPosition = BitOperations.Log2(x - 1) + 1

【讨论】：

不错！感谢您分享这个！
添加了指向您的答案的链接。 +1

【解决方案2】：

问题中链接的 Bit Hacks 页面上描述的其中一种方法是利用 De Bruijn sequence。不幸的是，这个页面没有给出所述序列的 64 位版本。 This useful page 解释了如何构造 De Bruijn 序列，this one 给出了一个用 C++ 编写的序列生成器的示例。如果我们修改给定的代码，我们可以生成多个序列，其中之一在下面的 C# 代码中给出：

public static class BitScanner
{
    private const ulong Magic = 0x37E84A99DAE458F;

    private static readonly int[] MagicTable =
    {
        0, 1, 17, 2, 18, 50, 3, 57,
        47, 19, 22, 51, 29, 4, 33, 58,
        15, 48, 20, 27, 25, 23, 52, 41,
        54, 30, 38, 5, 43, 34, 59, 8,
        63, 16, 49, 56, 46, 21, 28, 32,
        14, 26, 24, 40, 53, 37, 42, 7,
        62, 55, 45, 31, 13, 39, 36, 6,
        61, 44, 12, 35, 60, 11, 10, 9,
    };

    public static int BitScanForward(ulong b)
    {
        return MagicTable[((ulong) ((long) b & -(long) b)*Magic) >> 58];
    }

    public static int BitScanReverse(ulong b)
    {
        b |= b >> 1;
        b |= b >> 2;
        b |= b >> 4;
        b |= b >> 8;
        b |= b >> 16;
        b |= b >> 32;
        b = b & ~(b >> 1);
        return MagicTable[b*Magic >> 58];
    }
}

我还将序列生成器的 C# 端口发布到 github

问题中未提及的另一篇相关文章，对 De Bruijn 序列进行了不错的覆盖，可以找到here。

【讨论】：

【解决方案3】：

根据我的评论，这是 C# 中的一个函数，用于计算修改为 64 位整数的前导零位。

public static UInt64 CountLeadingZeros(UInt64 input)
{
    if (input == 0) return 64;

    UInt64 n = 1;

    if ((input >> 32) == 0) { n = n + 32; input = input << 32; }
    if ((input >> 48) == 0) { n = n + 16; input = input << 16; }
    if ((input >> 56) == 0) { n = n + 8; input = input << 8; }
    if ((input >> 60) == 0) { n = n + 4; input = input << 4; }
    if ((input >> 62) == 0) { n = n + 2; input = input << 2; }
    n = n - (input >> 63);

    return n;
}

更新：
如果您使用的是较新版本的 C#，请根据以下答案检查这是否是内置的。 https://stackoverflow.com/a/61141435/1587755

【讨论】：

根据我的性能测试，这比我的输入要快。做得好，谢谢！
如果可以的话，我很好奇你会用这个做什么？尽我所能，我想不出任何实际应用。
我正在运行某些数学建模模拟。在每批处理数十亿个样本时，在这里和那里缩短几毫秒，使他们能够更快地完成。
目前整个事情的运行速度比我开始时快了大约 6 倍（每次模拟需要 40 分钟，而我开始时每次模拟需要 4 小时），并且分析器中的热点当前是 @987654323 @ 和 Dictionary.TryGetValue。这向我表明，我唯一可以进一步优化的事情可能就是提出更好的数据修剪以使样本更小。
有趣。你的工作听起来比我的有趣......但我离题了。感谢分享:)

【解决方案4】：

在 IL 代码中获取最高有效位的最快方法应该是 float 转换并访问指数位。

保存代码：

int myint = 7;
int msb = (BitConverter.SingleToInt32Bits(myint) >> 23) - 0x7f;

更快的方法是msb 和lsb cpu 指令。正如 phuclv 所提到的，它在 .Net Core 3.0 中可用，所以我添加了一个测试，不幸的是它并没有快多少。

这里要求的是 uint 和 ulong 的 10000 次隐蔽的 BenchmarkDotNet 结果。加速是 2 倍，因此 BitScanner 解决方案速度很快，但无法击败原生浮点转换。

           Method |     Mean |    Error |   StdDev | Ratio
BitScannerForward | 34.37 us | 0.420 us | 0.372 us |  1.00
BitConverterULong | 18.59 us | 0.238 us | 0.223 us |  0.54
 BitConverterUInt | 18.58 us | 0.129 us | 0.121 us |  0.54
     NtdllMsbCall | 31.34 us | 0.204 us | 0.170 us |  0.91       
 LeadingZeroCount | 15.85 us | 0.169 us | 0.150 us |  0.48

【讨论】：

因为有一个接受的答案与你的答案不同，你应该做一些速度测试，并在你的答案中发布结果，以表明它更快。 @我如果你这样做，我会投票赞成你的答案。 ...您将需要考虑指定它也处理 64 位整数并特别排除 32 位整数的问题。 ...因此，您可能只需要删除此答案。一般来说，虽然速度问答总是发布速度测试和结果，理想情况下是数据集。
我怀疑您的结果是否正确。 BitOperations.LeadingZeroCount 应该比转换为 float 然后进行一些操作要快得多
我为 BitOperations.LeadingZeroCount 添加了一个测试，它更快但速度慢得令人失望。因此，如果您有 .Net Core 3.0 兼容目标平台的运气，您应该使用它，如果不是，浮点转换是最快的方法。

【解决方案5】：

@Taekahn 给出了很好的答案。我会稍微改进一下：

[System.Runtime.CompilerServices.MethodImpl(MethodImplOptions.AggressiveInlining)]
public static int CountLeadingZeros(this ulong input)
{
    const int bits = 64;
    // if (input == 0L) return bits; // Not needed. Use only if 0 is very common.
    int n = 1;
    if ((input >> (bits - 32)) == 0) { n += 32; input <<= 32; }
    if ((input >> (bits - 16)) == 0) { n += 16; input <<= 16; }
    if ((input >> (bits - 8)) == 0) { n += 8; input <<= 8; }
    if ((input >> (bits - 4)) == 0) { n += 4; input <<= 4; }
    if ((input >> (bits - 2)) == 0) { n += 2; input <<= 2; }
    return n - (int)(input >> (bits - 1));
}

避免使用有点魔幻的数字，而是使用 (bits - x) 使它们的意图更加明显。
现在对不同字长的适应应该是显而易见且微不足道的。
不需要将 (input == 0) 视为特殊的，删除它会加快所有其他输入的速度。
使用 int 作为计数器比使用 UInt64 更合理。（甚至可以将其设为字节，但 int 是默认的整数类型，据说对于每个平台来说都是最快的。）
为积极内联添加了属性，以确保最佳性能。

在运行时不需要计算任何“(bits - x)”，所以编译器应该预先计算它们。因此，提高可读性是免费的。

编辑：正如@Peter Cordes 所指出的，您可能应该只使用 System.Numerics.BitOperations.LeadingZeroCount 如果您有可用的 BitOperations 类。一方面，我经常不这样做。

【讨论】：

2020 年这有什么意义吗？ BitOperations.LeadingZeroCount 如果 JIT 有效，应该会更快。如果为没有硬件位扫描的目标体系结构编译，则等于。如果 C# 不通过 bitops 版本进行常量传播，我可以想象这对于编译时常量输入会更快，但希望它可以。
@Peter Cordes：.NET 平台有很多变体，并不是所有的平台都可以访问 BitOperations 类。在我们公司，我们仍在为某些产品使用遗留的“便携式”项目，而 System.Numerics.BitOperations 根本不存在。
你的答案没有给出正确的结果为零。你最终会得到 1+32+16+8+4+2-0=63 而不是 64。

【解决方案6】：

由于我们在这里讨论的是 .NET，因此通常最好不要求助于外部本地调用。但是，如果您可以容忍每个操作的托管/非托管往返开销，那么以下两个调用提供了对本机 CPU 指令的非常直接和纯粹的访问。

还显示了来自ntdll.dll 的各个完整函数的（简约）反汇编。该库将出现在任何 Windows 计算机上，并且始终可以找到，如果如图所示引用。

最低有效位 (LSB)：

[DllImport("ntdll"), SuppressUnmanagedCodeSecurity]
public static extern int RtlFindLeastSignificantBit(ulong ul);

// X64:
//      bsf rdx, rcx
//      mov eax, 0FFFFFFFFh
//      movzx ecx, dl
//      cmovne eax,ecx
//      ret

最高有效位 (MSB)：

[DllImport("ntdll"), SuppressUnmanagedCodeSecurity]
public static extern int RtlFindMostSignificantBit(ulong ul);

// X64:
//      bsr rdx, rcx
//      mov eax, 0FFFFFFFFh
//      movzx ecx, dl
//      cmovne eax,ecx
//      ret

用法：
这是一个要求上述声明可访问的用法示例。再简单不过了。

int ix;

ix = RtlFindLeastSignificantBit(0x00103F0A042C1D80UL);  // ix --> 7

ix = RtlFindMostSignificantBit(0x00103F0A042C1D80UL);   // ix --> 52

【讨论】：