生成具有特定最大值的均匀随机整数答案

【问题标题】：Generating uniform random integers with a certain maximum生成具有特定最大值的均匀随机整数
【发布时间】：2012-03-18 22:27:50
【问题描述】：

我想生成满足0 <= result <= maxValue 的统一整数。

我已经有一个生成器，它返回内置无符号整数类型的全部范围内的统一值。让我们调用此byte Byte()、ushort UInt16()、uint UInt32() 和ulong UInt64() 的方法。假设这些方法的结果是完美一致的。

我想要的方法的签名是uint UniformUInt(uint maxValue)和ulong UniformUInt(ulong maxValue)。

我在寻找什么：

正确性
我希望返回值分布在给定的时间间隔内。
但是如果能显着提高性能，非常小偏差是可以接受的。我的意思是，在给定 2^64 值的情况下，允许区分概率为 2/3 的顺序偏差。
它必须适用于任何maxValue。
性能
该方法应该很快。
效率
该方法确实消耗很少的原始随机性，因为根据底层生成器，生成原始字节可能很昂贵。浪费几位是可以的，但消耗 128 位来生成一个数字可能是多余的。

还可以在某些成员变量中缓存上一次调用的剩余随机性。

注意 int 溢出和包装行为。

我已经有了一个解决方案（我会将它作为答案发布），但它对我的口味来说有点难看。所以我想获得更好的解决方案的想法。

关于如何使用大 maxValues 进行单元测试的建议也很好，因为我无法生成包含 2^64 个桶和 2^74 个随机值的直方图。另一个复杂的问题是，对于某些错误，只有一些 maxValue 分布有很大的偏差，而其他的只有很小的偏差。

【问题讨论】：

不使用floor(maxValue/maxfordatatype*existingrandom())的理由是什么？
@EugenRieck 它违反了正确性属性，除非maxfordatatype 明显大于maxValue，在这种情况下它会有效率问题。
我从您的 OQ 了解到，小偏差是可以接受的，所以我应该理解，1/maxValue 偏差是不能接受的小？
@EugenRieck 因为每个返回值只有1/maxValue 的概率，相同幅度的偏差意味着可能根本无法到达某些字段，而其他字段的频率是应有的两倍。 System.Random 在Next(maxValue) 中使用了类似的算法，我只需使用几个样本就可以将其与真正的随机数区分开来。
请注意，我量化了小的偏差，因此即使是一个完美的区分器也需要 2^64 个样本，在 2/3 的情况下，当被问到：这个样本集是真正随机的，还是生成的由具有已知偏差的生成器。所以我的要求是非常严格的，我宁愿完全没有偏见。

标签： c# algorithm random uniform

【解决方案1】：

这样的通用解决方案怎么样？该算法基于Java's nextInt method 使用的算法，拒绝任何会导致非均匀分布的值。只要您的 UInt32 方法的输出完全一致，那么这也应该是。

uint UniformUInt(uint inclusiveMaxValue)
{
    unchecked
    {
        uint exclusiveMaxValue = inclusiveMaxValue + 1;

        // if exclusiveMaxValue is a power of two then we can just use a mask
        // also handles the edge case where inclusiveMaxValue is uint.MaxValue
        if ((exclusiveMaxValue & (~exclusiveMaxValue + 1)) == exclusiveMaxValue)
            return UInt32() & inclusiveMaxValue;

        uint bits, val;
        do
        {
            bits = UInt32();
            val = bits % exclusiveMaxValue;

            // if (bits - val + inclusiveMaxValue) overflows then val has been
            // taken from an incomplete chunk at the end of the range of bits
            // in that case we reject it and loop again
        } while (bits - val + inclusiveMaxValue < inclusiveMaxValue);

        return val;
    }
}

理论上，拒绝过程可以永远循环下去；在实践中，性能应该相当不错。如果不了解 (a) 预期的使用模式和 (b) 底层 RNG 的性能特征，很难提出任何普遍适用的优化建议。

例如，如果大多数调用者将指定最大值确实获得了特定信息，那么您就可以继续优化和测试，直到您的结果足够好为止。）

【讨论】：

（我还应该指出，我还没有测试过这段特定代码的正确性或偏差。我只是从我敲过的一些旧代码中改编它。其他代码已经过测试一次一次，所以这也应该没问题。）
我的解决方案使用了类似的基于拒绝的算法，但我明确计算了拒绝界限。所以你的拒绝条件可能会稍微快一些。
我可能会在我自己的代码中添加相同的案例，以减少熵消耗。我的分析表明特殊的大小写是值得的，尤其是在使用较慢的随机提供者时。

【解决方案2】：

我不确定，他的答案是否定的。它肯定比评论需要更多的空间，所以我必须在这里写，但如果其他人认为这是愚蠢的，我愿意删除。

根据我得到的 OQ，

熵位非常昂贵
其他一切都应该被认为是昂贵的，但不如熵。

我的想法是使用二进制数字减半，四分之一... maxValue 空间，直到它减少为一个数字。有点像

我以 maxValue=333（十进制）为例，假设函数 getBit() 随机返回 0 或 1

offset:=0
space:=maxValue

while (space>0)

  //Right-shift the value, keeping the rightmost bit this should be 
  //efficient on x86 and x64, if coded in real code, not pseudocode
  remains:=space & 1
  part:=floor(space/2)
  space:=part

  //In the 333 example, part is now 166, but 2*166=332 If we were to simply chose one
  //half of the space, we would be heavily biased towards the upper half, so in case
  //we have a remains, we consume a bit of entropy to decide which half is bigger

  if (remains)
    if(getBit())
      part++;

  //Now we decide which half to chose, consuming a bit of entropy
  if (getBit())
    offset+=part;

  //Exit condition: The remeinind number space=0 is guaranteed to be met
  //In the 333 example, offset will be 0, 166 or 167, remaining space will be 166
}

randomResult:=offset

getBit() 可以来自您的熵源，如果它是基于位的，或者在第一次调用时一次消耗 n 位熵（显然 n 是您的熵源的最佳值），并将其转移到空。

【讨论】：

非常昂贵有点言过其实。每个字节最多 40 个时钟，具体取决于所选的提供商。但与其他供应商相比，它的速度要快得多。
如果速度快得多，您可以使用简单的除法方法，分别从 32 位开始进行 8 位和 16 位。 64 位源，但对于像 333333 这样的例子，我坚持我的概念。
当然是一个有趣的实现。不知道它有多快。我需要进行基准测试。
正如我在前言中所说，我认为避免使用过多的熵更重要，但我怀疑进位的转变，2-3 if，0-2 加法每个 @987654325 @ 相当快。很大程度上取决于getBit() 的实施

【解决方案3】：

我目前的解决方案。对我的口味来说有点难看。每个生成的数字也有两个部门，这可能会对性能产生负面影响（我还没有分析这部分）。

uint UniformUInt(uint maxResult)
{
    uint rand;
    uint count = maxResult + 1;

    if (maxResult < 0x100)
    {
        uint usefulCount = (0x100 / count) * count;
        do
        {
            rand = Byte();
        } while (rand >= usefulCount);
        return rand % count;
    }
    else if (maxResult < 0x10000)
    {
        uint usefulCount = (0x10000 / count) * count;
        do
        {
            rand = UInt16();
        } while (rand >= usefulCount);
        return rand % count;
    }
    else if (maxResult != uint.MaxValue)
    {
        uint usefulCount = (uint.MaxValue / count) * count;//reduces upper bound by 1, to avoid long division
        do
        {
            rand = UInt32();
        } while (rand >= usefulCount);
        return rand % count;
    }
    else
    {
        return UInt32();
    }
}

ulong UniformUInt(ulong maxResult)
{
    if (maxResult < 0x100000000)
        return InternalUniformUInt((uint)maxResult);
    else if (maxResult < ulong.MaxValue)
    {
        ulong rand;
        ulong count = maxResult + 1;
        ulong usefulCount = (ulong.MaxValue / count) * count;//reduces upper bound by 1, since ulong can't represent any more
        do
        {
            rand = UInt64();
        } while (rand >= usefulCount);
        return rand % count;
    }
    else
        return UInt64();
}

【讨论】：