生成总和为预定义值的随机数答案

【问题标题】：Generate random numbers summing to a predefined value生成总和为预定义值的随机数
【发布时间】：2011-04-05 02:09:59
【问题描述】：

所以这就是交易：我想（例如）生成 4 个伪随机数，当它们相加时等于 40。这怎么可能是 python 中的圆顶？我可以生成一个随机数 1-40，然后生成另一个介于 1 和余数之间的数字，等等，但是第一个数字将有更大的机会“抓住”更多。

【问题讨论】：

标签： python random

【解决方案1】：

生成 4 个随机数，计算它们的总和，将每个数除以总和并乘以 40。

如果你想要整数，那么这将需要一点非随机性。

【讨论】：

这将创建一个非均匀分布stackoverflow.com/a/8068956/2075003

【解决方案2】：

b = random.randint(2, 38)
a = random.randint(1, b - 1)
c = random.randint(b + 1, 39)
return [a, b - a, c - b, 40 - c]

（我假设你想要整数，因为你说“1-40”，但这可以很容易地推广到浮点数。）

它是这样工作的：

将总范围随机分成两部分，即 b。奇数范围是因为将在中点以下至少 2 个，在中点以上至少 2 个。（这来自每个值的 1 最小值）。
将这些范围中的每一个随机分成两部分。同样，界限要考虑 1 个最小值。
返回每个切片的大小。它们加起来是 40 个。

【讨论】：

我认为您需要a = random.randint(1, b-1) 和c = random.randint(b+1, 39) 以确保您不会在输出列表中得到零。此外，这有一个稍微特殊的分布：[1, 1, x, 38-x] 形式的结果比均匀分布更有可能发生。
@Mark：我相信你是对的。我在那里犯了几个错误。

【解决方案3】：

在 [1,37] 范围内只有 37^4 = 1,874,161 个四个整数的排列（允许重复）。枚举它们，保存并计算加起来为 40 的排列。（这将是一个小得多的数字，N）。

在区间 [0, N-1] 中绘制均匀分布的随机整数 K，并返回第 K 个排列。可以很容易地看出，这保证了可能结果空间上的均匀分布，每个序列位置的分布相同。（我看到的许多答案都会使最终选择的偏差低于前三个！）

【讨论】：

这个答案不能很好概括。

【解决方案4】：

这是标准解决方案。它类似于 Laurence Gonsalves 的答案，但比该答案有两个优点。

它是统一的：4 个正整数加起来等于 40 的每个组合都同样可能提出这个方案。

和

很容易适应其他总数（7 个数字加起来等于 100，等等）

import random

def constrained_sum_sample_pos(n, total):
    """Return a randomly chosen list of n positive integers summing to total.
    Each such list is equally likely to occur."""

    dividers = sorted(random.sample(range(1, total), n - 1))
    return [a - b for a, b in zip(dividers + [total], [0] + dividers)]

示例输出：

>>> constrained_sum_sample_pos(4, 40)
[4, 4, 25, 7]
>>> constrained_sum_sample_pos(4, 40)
[9, 6, 5, 20]
>>> constrained_sum_sample_pos(4, 40)
[11, 2, 15, 12]
>>> constrained_sum_sample_pos(4, 40)
[24, 8, 3, 5]

解释：(1) 4 元组(a, b, c, d) 的正整数如a + b + c + d == 40 和(2) 三元组(e, f, g) 与0 < e < f < g < 40 之间存在一一对应关系，这很容易使用random.sample 生成后者。对应由(e, f, g) = (a, a + b, a + b + c)在一个方向给出，(a, b, c, d) = (e, f - e, g - f, 40 - g)在相反方向给出。

如果您想要非负整数（即允许0）而不是正整数，那么有一个简单的转换：如果(a, b, c, d) 是非负整数和40，那么(a+1, b+1, c+1, d+1)是与44 相加的正整数，反之亦然。使用这个想法，我们有：

def constrained_sum_sample_nonneg(n, total):
    """Return a randomly chosen list of n nonnegative integers summing to total.
    Each such list is equally likely to occur."""

    return [x - 1 for x in constrained_sum_sample_pos(n, total + n)]

constrained_sum_sample_pos(4, 10) 的图解，感谢@FM。（稍作修改。）

0 1 2 3 4 5 6 7 8 9 10  # The universe.
|                    |  # Place fixed dividers at 0, 10.
|   |     |       |  |  # Add 4 - 1 randomly chosen dividers in [1, 9]
  a    b      c    d    # Compute the 4 differences: 2 3 4 1

【讨论】：

+1 这提供了丰富的信息——谢谢。我编辑了您的答案，添加了帮助我弄清楚算法的图形插图。通常，我不愿意这样做，但我认为其他人可能会发现它有用。随意更改或撤消我的编辑。
队长，我在这个领域发现了大量的胜利！ +1
@FM：谢谢；不错的补充。我确实稍微编辑了它以适应我从 0 开始的宇宙观；我希望这不会影响清晰度。
如果您需要约束生成的整数高于某个值low，这可以通过将a - b 替换为a - b + (low-1) 来完成，并补偿n*(low-1) 的增加通过用total - (min-1)*n 替换total 的两个实例来获得新的总和。我还没有找到添加high 阈值的方法。
在high 阈值上运气好吗？

【解决方案5】：

使用multinomial 分发

from numpy.random import multinomial
multinomial(40, [1/4.] * 4)

在本例中，每个变量将分布为均值 n * p 等于 40 * 1/4 = 10 的二项分布。

【讨论】：

显然是最干净和最坚固的解决方案，但答案中可能有更多解释会帮助 OP 理解为什么这是最好的答案
这似乎产生了接近相等的值，而不是所需范围内的任意值：multinomial(2**16, [1/3] * 3)/2**16 -> array([0.33073425, 0.33273315, 0.33653259])（多次运行给出相似的结果）。对我来说看起来不统一
这不是我的抱怨。总和确实是正确的。问题在于样品的均匀性。它们将紧密地悬停在均匀分割的间隔上，而不是有时给出一些更大或更小的间隔。票数最高的答案确实做到了。
@kram1032，它们不是统一的，它们是二项式的，平均 n * p 在你的情况下是 1/3 * 2**16 ~ 21k。 OP 没有要求统一。
啊，很公平，确实，没有要求统一。

【解决方案6】：

以@markdickonson 为基础，提供对除数之间分布的一些控制。我引入了方差/抖动作为每个之间均匀距离的百分比。

 def constrained_sum_sample(n, total, variance=50):
    """Return a random-ish list of n positive integers summing to total.

    variance: int; percentage of the gap between the uniform spacing to vary the result.
    """
    divisor = total/n
    jiggle = divisor * variance / 100 / 2
    dividers = [int((x+1)*divisor + random.random()*jiggle) for x in range(n-1)]
    result = [a - b for a, b in zip(dividers + [total], [0] + dividers)]
    return result

样本输出：

[12, 8, 10, 10]
[10, 11, 10, 9]
[11, 9, 11, 9]
[11, 9, 12, 8]

这个想法仍然是平均划分人口，然后在给定范围内随机向左或向右移动。由于每个值仍然绑定到统一点，我们不必担心它会漂移。

对于我的目的来说已经足够了，但并不完美。例如：第一个数字总是更高，最后一个数字总是更低。

【讨论】：

【解决方案7】：

如果您想要真正的随机性，请使用：

import numpy as np
def randofsum_unbalanced(s, n):
    # Where s = sum (e.g. 40 in your case) and n is the output array length (e.g. 4 in your case)
    r = np.random.rand(n)
    a = np.array(np.round((r/np.sum(r))*s,0),dtype=int)
    while np.sum(a) > s:
        a[np.random.choice(n)] -= 1
    while np.sum(a) < s:
        a[np.random.choice(n)] += 1
    return a

如果您想要更高水平的均匀性，请利用多项分布：

def randofsum_balanced(s, n):
    return np.random.multinomial(s,np.ones(n)/n,size=1)[0]

【讨论】：