在 C 浮点数的 base-2 表示中获取“个”数字的位置答案

【问题标题】：Get the positions of the 'ones' digits in a base-2 representation of a C float在 C 浮点数的 base-2 表示中获取“个”数字的位置
【发布时间】：2012-08-03 21:38:08
【问题描述】：

假设我有一个浮点数。我想提取数字的基数为 2 表示中所有个位的位置。

例如，10.25 = 2^-2 + 2^1 + 2^3，所以它的base-2位置是{-2, 1, 3}。

一旦我得到了一个数字n 的以 2 为底的幂的列表，以下应该总是返回 true（在伪代码中）。

sum = 0
for power in powers:
    sum += 2.0 ** power
return n == sum

但是，在 C 和 C++ 中对浮点数执行位逻辑有点困难，更难移植。

如何在任何一种语言中使用少量 CPU 指令来实现这一点？

【问题讨论】：

这有点不可能移植，因为标准不保证 IEEE 浮点。另外，如果“ones”数字超出范围怎么办？
我实际上并不介意不可移植性，只要它可以在 Linux x86_64 上使用 gcc 并保证 IEEE 浮点数。任何其他架构都可以使用经过调整的代码或缓慢的幼稚方法。
避免低级别的按位操作有什么意义？
唯一合乎逻辑的解决方案是使用联合将浮点数转换为整数。然后提取指数并应用偏移量。如果为负数或 > 23/52，则超出范围。
不，你可以使用frexp，然后它就变得完全便携了。

标签： c++ c floating-point bit-manipulation

【解决方案1】：

放弃可移植性，假设 IEEE float 和 32 位 int。

// Doesn't check for NaN or denormalized.
// Left as an exercise for the reader.
void pbits(float x)
{
    union {
        float f;
        unsigned i;
    } u;
    int sign, mantissa, exponent, i;
    u.f = x;
    sign = u.i >> 31;
    exponent = ((u.i >> 23) & 255) - 127;
    mantissa = (u.i & ((1 << 23) - 1)) | (1 << 23);
    for (i = 0; i < 24; ++i) {
        if (mantissa & (1 << (23 - i)))
            printf("2^%d\n", exponent - i);
    }
}

这将打印出与给定浮点数相加的 2 的幂。例如，

$ ./a.out 156 2^7 2^4 2^3 2^2 $ ./a.out 0.3333333333333333333333333 2^-2 2^-4 2^-6 2^-8 2^-10 2^-12 2^-14 2^-16 2^-18 2^-20 2^-22 2^-24 2^-25

您可以看到 1/3 是如何四舍五入的，这并不直观，因为无论我们使用多少个小数位，我们总是将其四舍五入。

脚注：不要做以下事情：

float x = ...;
unsigned i = *(unsigned *) &x; // no

union 的技巧不太可能产生警告或混淆编译器。

【讨论】：

在代码的顶部，你会看到一条注释// Left as an exercise for the reader
这很好用。它也可以用一堆定义和几个 typedef 来概括。谢谢！

【解决方案2】：

无需处理浮点数的编码。 C 提供了以可移植方式处理浮点值的例程。以下工作。

#include <math.h>
#include <stdio.h>
#include <stdlib.h>


int main(int argc, char *argv[])
{
    /*  This should be replaced with proper allocation for the floating-point
        type.
    */
    int powers[53];
    double x = atof(argv[1]);

    if (x <= 0)
    {
        fprintf(stderr, "Error, input must be positive.\n");
        return 1;
    }

    // Find value of highest bit.
    int e;
    double f = frexp(x, &e) - .5;
    powers[0] = --e;
    int p = 1;

    // Find remaining bits.
    for (; 0 != f; --e)
    {
        printf("e = %d, f = %g.\n", e, f);
        if (.5 <= f)
        {
            powers[p++] = e;
            f -= .5;
        }
        f *= 2;
    }

    // Display.
    printf("%.19g =", x);
    for (int i = 0; i < p; ++i)
        printf(" + 2**%d", powers[i]);
    printf(".\n");

    // Test.
    double y = 0;
    for (int i = 0; i < p; ++i)
        y += ldexp(1, powers[i]);

    if (x == y)
        printf("Reconstructed number equals original.\n");
    else
        printf("Reconstructed number is %.19g, but original is %.19g.\n", y, x);

    return 0;
}

【讨论】：