有缺陷的随机数生成器？答案

【问题标题】：Flawed random number generator?有缺陷的随机数生成器？
【发布时间】：2012-02-26 16:21:47
【问题描述】：

我使用了this加权随机数生成器。

import random

def weighted_choice(weights):
    totals = []
    running_total = 0

    for w in weights:
        running_total += w
        totals.append(running_total)

    rnd = random.random() * running_total
    for i, total in enumerate(totals):
        if rnd < total:
            return i

如下：

# The meaning of this dict is a little confusing, so here's the explanation:
# The keys are numbers and values are weights of its occurence and values - 1
# are weights of its disoccurence. You can imagine it like biased coins
# (except for 2 which is fair coin).
probabilities = { 0 : 1.0, 1 : 1.0, 2 : 0.5, 3 : 0.45, 4 : 0.4, 5 : 0.35,
                    6 : 0.3, 7 : 0.25, 8 : 0.2, 9 : 0.15, 10 : 0.1
                  }
  numberOfDeactivations = []
  for number in probabilities.keys():
    x = weighted_choice([probabilities[number], 1 - probabilities[number]])
    if x == 0:
      numberOfDeactivations.append(number)
  print "chance for ", repr(numberOfDeactivations)

我经常在结果中看到7、8、9、10。

是否有一些证据或保证这对概率论是正确的？

【问题讨论】：

什么是“经常”？你有直方图可以给我们看吗？
必填：xkcd.com/221
20 次迭代什么都没有。增加数字（数百万......）以开始查看具有统计意义的数据。对于更严肃的意图，您应该使用均匀性拟合测试:)。
@xralf：只有 20 次迭代，一点都不奇怪......
@xralf：就像我建议的那样，尝试运行一百万次，然后将结果反馈给我们。

标签： python random probability proof correctness

【解决方案1】：

编辑：附带说明：我认为您的代码相当于

import random
probabilities = { 0 : 1.0, 1 : 1.0, 2 : 0.5, 3 : 0.45, 4 : 0.4, 5 : 0.35,
                    6 : 0.3, 7 : 0.25, 8 : 0.2, 9 : 0.15, 10 : 0.1}
numberOfDeactivations=filter(
         lambda kv:random.random()<=probabilities[kv] , probabilities)

原答案：

方法是正确的。下面是一个完整的示例，创建频率表并将其与请求的权重进行比较。

100000 次迭代没有任何迹象表明您没有得到您所要求的。 “预期”是您要求的概率，“得到”是您实际获得该值的比例。比率应该接近 1 并且是：

  0, expected: 0.2128 got: 0.2107 ratio: 1.0100
  1, expected: 0.2128 got: 0.2145 ratio: 0.9921
  2, expected: 0.1064 got: 0.1083 ratio: 0.9825
  3, expected: 0.0957 got: 0.0949 ratio: 1.0091
  4, expected: 0.0851 got: 0.0860 ratio: 0.9900
  5, expected: 0.0745 got: 0.0753 ratio: 0.9884
  6, expected: 0.0638 got: 0.0635 ratio: 1.0050
  7, expected: 0.0532 got: 0.0518 ratio: 1.0262
  8, expected: 0.0426 got: 0.0418 ratio: 1.0179
  9, expected: 0.0319 got: 0.0323 ratio: 0.9881
 10, expected: 0.0213 got: 0.0209 ratio: 1.0162

 A total of 469633 numbers where generated for this table.

代码如下：

import random

def weighted_choice(weights):
    totals = []
    running_total = 0
    for w in weights:
        running_total += w
        totals.append(running_total)
    rnd = random.random() * running_total
    for i, total in enumerate(totals):
        if rnd < total:
            return i


counts={ k:0 for k in range(11)}
probabilities = { 0 : 1.0, 1 : 1.0, 2 : 0.5, 3 : 0.45, 4 : 0.4, 5 : 0.35,
                    6 : 0.3, 7 : 0.25, 8 : 0.2, 9 : 0.15, 10 : 0.1
                  }

for x in range(100000):
  numberOfDeactivations = []
  for number in probabilities.keys():
    x = weighted_choice([probabilities[number], 1 - probabilities[number]])
    if x == 0:
      numberOfDeactivations.append(number)
  for k in numberOfDeactivations:
    counts[k]+=1.0

sums=sum(counts.values())
counts2=[x*1.0/sums for x in counts.values()]

print "ratio expected frequency to requested:":

# make the probabilities real probabilities instead of weights:
psum=sum(probabilities.values())
for k in probabilities:
    probabilities[k]=probabilities[k]/psum

for k in probabilities:
    print "%3d, expected: %6.4f got: %6.4f ratio: %6.4f" %(k,probabilities[k],counts2[k], probabilities[k]/counts2[k])

【讨论】：

我写了一个有问题的评论来描述字典。概率或字典中的值。因此，0 的概率为 1，1 的概率为 1，2 的概率为 0.5（公平硬币）等。字典项目是独立的。我只想说明更广泛的背景，尽管只写字典中的一项就足够了。

【解决方案2】：

这在数学上是正确的。它是inverse transform sampling 的一个应用程序（尽管它在这种情况下工作的原因应该是相对直观的）。

我不懂 Python，所以我不能说是否有任何微妙之处使这个特定的实现无效。

【讨论】：

你怎么知道Python中的random使用了这个？
@xralf：用什么？ Python random 是一个统一的 RNG。上面的代码是逆变换采样。
Python 将如何管理这个uniformity？使用制服不容易识别出存在缺陷，但是当您使用权重时，很容易看到“轻”数字在这里表现得像“重”（至少比我想象的重）。它是否取决于运行此应用程序的频率？有什么东西会破坏随机性吗？或者这 Inverse transform sampling 会损坏 Pythons `uniform RNG？
@xralf：Python 通过使用生成均匀分布的算法来管理均匀性。我不知道您所说的“损坏的 Python 的统一 RNG”是什么意思；您只需有一些代码对random 的输出进行操作。
@xralf：random 使用的算法示例是Mersenne twiseter（但我不确定）。但是为什么不直接绘制直方图呢？