从低均值的泊松分布中提取数字的性能答案

【问题标题】：Performance for drawing numbers from Poisson distribution with low mean从低均值的泊松分布中提取数字的性能
【发布时间】：2020-08-20 04:35:44
【问题描述】：

为了在 C++ 中从泊松分布中抽取随机数，一般建议使用

RNG_type rng;
std::poisson_distribution<size_t> d(1e-6);
auto r = d(rng);

在每次调用 std::poisson_distribution 对象时，都会消耗整个随机位序列（例如，std::mt19937 为 32 位，std::mt19937_64 为 64 位）。令我震惊的是，在如此低的平均值 (mean = 1e-6) 下，绝大多数情况下，只有少数位足以确定要返回的值为 0。然后可以缓存其他位以供以后使用。

假设设置为真的位序列与泊松分布的高返回值相关联，当使用1e-6 的平均值时，任何不以 19 个真开始的序列都必然返回零！确实，

1 - 1/2^19 < P(0, 1e-6) < 1 - 1/2^20

，其中P(n, r) 表示从均值为r 的泊松分布中抽取n 的概率。不浪费比特的算法会使用一半的时间，四分之一的时间使用两个比特，八分之一的时间使用三个比特，......

有没有一种算法可以通过在绘制泊松数时消耗尽可能少的位来提高性能？当我们考虑低均值时，与std::poisson_distribution 相比，还有其他方法可以提高性能吗？

回应@Jarod42 的评论是谁说的

想知道如果使用更少的位不会破坏等概率...

我认为它不会破坏等概率性。在一次模糊的测试中，我用一个简单的伯努利分布考虑了同样的问题。我以1/2^4 的概率采样真，以1 - 1/2^4 的概率采样假。函数 drawWithoutWastingBits 在缓存中看到 true 后立即停止，函数 drawWastingBits 消耗 4 位，无论这些位是什么。

#include <iostream>
#include <vector>
#include <string>
#include <algorithm>
#include <random>

bool drawWithoutWastingBits(std::vector<bool>& cache, size_t& cache_index)
{
    /* 
        Get a true with probability 1/2^4 (=1/16=0.0625) and a false otherwise
    */

    size_t nbTrues = 0;
    while (cache[cache_index])
    {
        ++nbTrues;
        ++cache_index;
        if (nbTrues == 4)
        {
            return true;
        }
    }
    ++cache_index;
    return false;
}


bool drawWastingBits(std::vector<bool>& cache, size_t& cache_index)
{
    /* 
        Get a true with probability 1/2^4 (=1/16=0.0625) and a false otherwise
    */

    bool isAnyTrue = false;
    for (size_t i = 0 ; i < 4; ++i)
    {
        if (cache[cache_index])
        {
            isAnyTrue = true;
        }
        ++cache_index;
    }
    return !isAnyTrue;
}

int main()
{
    /*
        Just cache a lot of bits in advance in `cache`. The same sequence of bits will be used by both function.
        I am just caching way enough bits to make sure they don't run out of bits below
        I made sure to have the same number of zeros and ones so that any deviation is caused by the methodology and not by the RNG
    */

    // Produce cache
    std::vector<bool> cache;
    size_t nbBitsToCache = 1e7;
    cache.reserve(nbBitsToCache);
    for (size_t i = 0 ; i < nbBitsToCache/2 ; ++i)
    {
        cache.push_back(false);
        cache.push_back(true);
    }
    // Shuffle cache
    {
        std::mt19937 mt(std::random_device{}());
        std::shuffle(cache.begin(), cache.end(), mt);
    }


    // Draw without wasting bits
    {
        size_t nbDraws = 1e6;
        size_t cache_index = 0;
        std::pair<size_t, size_t> outcomes = {0,0};
        for (size_t r = 0 ; r < nbDraws ; ++r)
        {
            drawWithoutWastingBits(cache, cache_index) ? ++outcomes.first : ++outcomes.second;
            assert(cache_index <= cache.size());
        }   

        assert(outcomes.first + outcomes.second == nbDraws);
        std::cout << "Draw Without Wasting Bits: prob true = " << (double)outcomes.first / nbDraws << "\n";
    }


    // Draw wasting bits
    {
        size_t nbDraws = 1e6;
        size_t cache_index = 0;
        std::pair<size_t, size_t> outcomes = {0,0};
        for (size_t r = 0 ; r < nbDraws ; ++r)
        {
            drawWastingBits(cache, cache_index) ? ++outcomes.first : ++outcomes.second;
            assert(cache_index <= cache.size());
        }   

        assert(outcomes.first + outcomes.second == nbDraws);
        std::cout << "Draw Wit Wasting Bits: prob true = " << (double)outcomes.first / nbDraws << "\n";
    }
}

可能的输出

Draw Without Wasting Bits: prob true = 0.062832
Draw Wit Wasting Bits: prob true = 0.062363

【问题讨论】：

想知道如果使用更少的位不会破坏等概率......
@rustyx 这篇文章表明 Mersenne Twister 的表现优于 LCG（Talk wikipedia page 也是如此）。我正在使用std::mt19937_64（我已经在缓存用于采样等概率布尔值的位）并且还没有真正尝试过任何 LCG 或 xorshift 或任何其他可能更快的 RNG。在所有情况下，虽然随机数的产生很慢，但std::poisson_disribution 本身也很慢。我希望一旦它知道平均值非常低，这也可以改善。
平均值在应用程序中是固定值吗？
在一般意义上谈论std::poisson_disribution 毫无意义，因为它是由实现定义的。据我们所知，可以有一个按照您建议的方式执行的实现。我会检查它是如何在不同的工具链中实现的（boost 也有）。
@PeterO。是的，平均值是一个固定值。

标签： c++ performance random probability poisson

【解决方案1】：

Devroye 的 Non-Uniform Random Variate Generation，第 505 和 86 页提到了通过顺序搜索算法进行的反转。

基于该算法，如果您知道mean 远小于1，那么如果您在[0, 1] 中生成均匀随机变量u，则泊松变量将为0，如果u <= exp(-mean)，否则大于 0。

如果均值较低并且您可以容忍近似分布，则可以使用以下方法（参见“The Discrete Gaussian for Differential Privacy”的附录 A）：

以有理数的形式表示mean，形式为numer/denom。例如，如果mean 是一个固定值，则可以相应地预先计算numer 和denom，例如在编译时。
随机生成一个伯努利(numer / denom) 数（以numer / denom 的概率生成 1，否则生成 0）。如果以这种方式生成 1，则对 Bernoulli(numer / (denom * 2))、Bernoulli(numer / (denom * 3)) 等重复此步骤，直到以这种方式生成 0。使用最小化位浪费的算法生成这些数字，例如 Lumbroso 的 Fast Dice Roller 论文 (2013) 的附录 B 中提到的算法，或者从那里修改并在我的 Boolean conditions 部分中给出的“ZeroToOne”方法。另见this question。
如果第 2 步产生偶数个 1，则泊松变量正好为 0。
如果第 2 步产生了奇数个 1，则泊松变量大于 0，并且需要一个“较慢”的算法来仅对大于 0 的泊松变量进行采样。

例如，假设平均值为 1e-6 (1/1000000)，生成一个伯努利 (1/1000000) 数，然后是伯努利 (1/2000000) 等，直到您以这种方式生成 0。如果生成了偶数个 1，则 Poisson 变量正好为 0。否则，Poisson 变量为 1 或更大，需要“较慢”的算法。

一个示例是下面的算法，它基于第 505 页和第 86 页中的算法，但仅对泊松变量 1 或更大的样本进行采样：

METHOD Poisson1OrGreater(mean)
 sum=Math.exp(-mean)
 prod=sum
 u=RNDRANGE(sum, 1)
 i=0
 while i==0 or u>sum
   prod*=mean/(i+1)
   sum+=prod
   i=i+1
 end
 return i
END METHOD

不过，这种方法不是很健壮，尤其是因为它使用接近 1 的数字（浮点空间更稀疏）而不是接近 0 的数字。

请注意，n 独立 Poisson(mean) 随机变量的总和是 Poisson(mean*n) 分布的（第 501 页）。因此，此答案中的上述讨论适用于n泊松随机变量的总和，只要n乘以它们的平均值仍然很小。例如，要生成 1000 个均值为 1e-6 的泊松随机变量的总和，只需生成一个均值为 0.001 的泊松随机变量。这将大大节省对伪随机数生成器的调用。

还有另一种方法可以生成均值较低（1 或更少）的泊松变量。 Duchon 和 Duvignau 在“在不断增长的均匀排列中保留长度为 k 的循环数”，Electronic Journal of Combinatorics 23(4)，2016 中对此进行了描述。

首先，使用下面给出的算法生成 Poisson(1) 随机变量 x = Poisson1()，该算法仅使用整数算术（其中 RNDINT(a) 在 [0, a] 中生成统一随机整数）：

METHOD Poisson1()
  ret=1; a=1; b=0
  while true // until this method returns
    j=RNDINT(a)
    if j<a and j<b: return ret
    if j==a: ret=ret+1
    else
      ret=ret-1; b=a+1
    end
    a=a+1
  end
END METHOD

现在让mean 成为所需的均值。掷硬币x 次，硬币正面朝上的概率等于mean。（换句话说，生成一个二项式（x,mean）随机变量。）正面的数量是一个泊松（mean）随机变量。

【讨论】：

【解决方案2】：

您可以使用等式(-ln U) / λ 生成到下一个事件的时间，其中0 < U ≤ 1 是一个统一的随机数，λ 是事件发生率（又名1e-6） .

https://preshing.com/20111007/how-to-generate-random-timings-for-a-poisson-process/

【讨论】：

这是一个指数分布，而不是泊松分布。前者是连续分布，后者是计数分布。它们相关但不同——泊松分布描述了在固定长度间隔内发生了多少泊松过程事件。
@pjs 指数分布是相同底层过程的不同问题。指数分布仅测量1 - Pr(X = 0)，因为λ^k e^-λ / k! = e^-λ 时k = 0。
@pjs 因为在这种情况下，我们直接在连续分布上生成事件，所以您只需要计算每个间隔中有多少下降。所以如果(-ln U) / λ 返回 1248.6，然后是 1.6，你知道接下来的 1248 个泊松样本为零，然后是 1，然后是另一个 0，然后是 1+。这比其他方法效率高得多。
是的，所有这些都包含在 Peter O 的回答中。而且他从 Devroye 发布的代码效率更高，因为它预先计算了双方的 exp() 以将日志总和转换为制服的乘积，从而避免了每次迭代的先验评估。无论如何，据我所知，您的回答并没有真正回答问题。
@pjs 不，Peter O 的回答需要一个新样本每次迭代。我的答案需要一个新样本每个事件，它的频率要低一百万倍。