使用 C++11 <random> 高效生成随机数答案

【问题标题】：Efficient random number generation with C++11 <random>使用 C++11 <random> 高效生成随机数
【发布时间】：2016-06-25 05:03:30
【问题描述】：

我试图了解如何使用 C++11 随机数生成功能。我关心的是性能。

假设我们需要在0..k之间生成一系列随机整数，但k每一步都在变化。最好的方法是什么？

例子：

for (int i=0; i < n; ++i) {
    int k = i; // of course this is more complicated in practice
    std::uniform_int_distribution<> dist(0, k);
    int random_number = dist(engine);
    // do something with random number
}

<random> 标头提供的分布非常方便。但它们对用户来说是不透明的，所以我无法轻易预测它们的性能。例如，上面的dist 的构造会导致多少（如果有）运行时开销并不清楚。

相反，我可以使用类似的东西

std::uniform_real_distribution<> dist(0.0, 1.0);
for (int i=0; i < n; ++i) {
    int k = i; // of course this is more complicated in practice
    int random_number = std::floor( (k+1)*dist(engine) );
    // do something with random number
}

避免在每次迭代中构造一个新对象。

随机数通常用于性能很重要的数值模拟中。在这些情况下使用<random> 的最佳方式是什么？

请不要回答“profile it”。分析是有效优化的一部分，但对如何使用库以及该库的性能特征有很好的理解也是如此。如果答案是它取决于标准库的实现，或者知道的唯一方法是分析它，那么我宁愿根本不使用来自<random> 的分布。相反，我可以使用我自己的实现，这对我来说是透明的，并且在必要时更容易优化。

【问题讨论】：

另一个考虑因素：像std::mt19937 这样的生成器的优点之一是它们是可移植的，并且实现是标准强制。使用带有给定种子的生成器必须在任何符合要求的实现上产生相同的随机序列uint32_t。但是，分发适配器std::uniform_int_distribution 没有此保证，因此如果您使用它们，如果您更改编译器或其他东西，您可能会从同一个种子中获得不同的整数序列。这可能是数值模拟的考虑因素。
@ChrisBeck 我不知道，谢谢指出！

标签： c++ performance c++11 random

【解决方案1】：

您可以做的一件事是拥有一个永久分发对象，这样您每次只创建param_type对象，如下所示：

template<typename Integral>
Integral randint(Integral min, Integral max)
{
    using param_type =
        typename std::uniform_int_distribution<Integral>::param_type;

    // only create these once (per thread)
    thread_local static std::mt19937 eng {std::random_device{}()};
    thread_local static std::uniform_int_distribution<Integral> dist;

    // presumably a param_type is cheaper than a uniform_int_distribution
    return dist(eng, param_type{min, max});
}

【讨论】：

我看不到它在哪里声明构造是编译时复杂性：D::param_type 不是构造param_type，它产生类型。

【解决方案2】：

为了最大化性能，首先考虑不同的PRNG，比如xorshift128+。据报道，对于 64 位随机数，它的速度是 mt19937 的两倍以上；见http://xorshift.di.unimi.it/。而且几行代码就可以实现。

此外，如果您不需要 “完美平衡” 均匀分布并且您的 k 远小于 2^64（很可能是这样），我建议您简单地写成：

uint64_t temp = engine_64(); // generates 0 <= temp < 2^64
int random_number = temp % (k + 1); // crop temp to 0,...,k

但是请注意，整数除法/模运算并不便宜。例如，在 Intel Haswell 处理器上，它们需要 39-103 个处理器周期来处理 64 位数字，这可能比调用 MT19937 或 xorshift+ 引擎要长得多。

【讨论】：

It has been reported being more than twice as fast as mt19937 for 64-bit random numbers。好吧，如果您不需要 MT 期间，您可以使用 xorshift128。如果你不需要 xorshift 质量和周期，你甚至可以选择 LCG，它会是最快的。有一些东西叫做权衡，你知道......