根据权重分配的算法，项目总数未知，保证良好的分配？答案

【问题标题】：Algorithm to distribute according to weights, with unknown total of items, guaranteeing good distribution?根据权重分配的算法，项目总数未知，保证良好的分配？
【发布时间】：2012-08-06 16:24:15
【问题描述】：

我想将代币分配到 3 个插槽中。

每个插槽都有一定的权重：可能 50% 的代币应该进入第一个插槽，30% 应该进入第二个插槽，20% 应该进入第三个插槽。

我不知道代币的总数——它们不断涌现。我可能会在中午获得 1000 个代币来分发，我在下午 1 点再获得 300 个。等等，不可预测。在任何时候，我到目前为止收到的代币都应该根据权重尽可能好地分配。

一种解决方案是按概率分布。我为每个令牌掷一个 100 面骰子。如果结果是 1-50，则令牌进入插槽 1。51-80 的结果平均插槽 2，81-100 的结果平均插槽 3。

但这意味着，例如，每个令牌最终都出现在插槽 3 中并非不可能（只是不太可能）。

我想保证，当我总共收到 100 个令牌时，其中的 50 个将在插槽 1 中。当我收到 1000 个令牌时，恰好 500 应该在插槽 1 中。

对此有什么好的算法？

【问题讨论】：

标签： algorithm distribution probability

【解决方案1】：

根据理想分布计算每个槽的误差。始终将令牌插入错误最多的插槽中。如果两个或多个插槽并列，则随机插入其中一个。

误差是预期的tokens数量（tokens added * ratio）与实际tokens的数量之差。

这样你总能最大限度地减少错误，如果令牌能够准确分配，就不会出现错误。

演示代码（如果错误数量相等，则插入第一个槽，而不是随机分布）：

import random

tokens_in_slots = [0, 0, 0]
slot_distributions = [0.5, 0.3, 0.2]

def add_token():
    num_tokens = sum(tokens_in_slots)
    if not num_tokens:
        #first token can go anywhere
        tokens_in_slots[random.randint(0,2)] += 1
        return
    expected_tokens = [num_tokens*distr for distr in slot_distributions]
    errors = [expected - actual
              for expected, actual in zip(expected_tokens, tokens_in_slots)]
    most_error = max(enumerate(errors), key=lambda (i,e): e)
    tokens_in_slots[most_error[0]] += 1

def add_and_print(n):
    for i in xrange(n):
        add_token()
        print sum(tokens_in_slots), tokens_in_slots

结果：

>>> add_and_print(100)
1 [0, 0, 1]
2 [1, 0, 1]
3 [1, 1, 1]
4 [2, 1, 1]
5 [2, 2, 1]
6 [3, 2, 1]
7 [3, 2, 2]
8 [4, 2, 2]
9 [4, 3, 2]
10 [5, 3, 2]
11 [6, 3, 2]
12 [6, 4, 2]
13 [6, 4, 3]
14 [7, 4, 3]
15 [7, 5, 3]
16 [8, 5, 3]
17 [8, 5, 4]
18 [9, 5, 4]
19 [9, 6, 4]
20 [10, 6, 4]
21 [11, 6, 4]
22 [11, 7, 4]
23 [11, 7, 5]
24 [12, 7, 5]
25 [12, 8, 5]
26 [13, 8, 5]
27 [13, 8, 6]
28 [14, 8, 6]
29 [14, 9, 6]
30 [15, 9, 6]
31 [16, 9, 6]
32 [16, 10, 6]
33 [16, 10, 7]
34 [17, 10, 7]
35 [17, 11, 7]
36 [18, 11, 7]
37 [18, 11, 8]
38 [19, 11, 8]
39 [19, 12, 8]
40 [20, 12, 8]
41 [21, 12, 8]
42 [21, 13, 8]
43 [21, 13, 9]
44 [22, 13, 9]
45 [22, 14, 9]
46 [23, 14, 9]
47 [23, 14, 10]
48 [24, 14, 10]
49 [24, 15, 10]
50 [25, 15, 10]
51 [26, 15, 10]
52 [26, 16, 10]
53 [26, 16, 11]
54 [27, 16, 11]
55 [27, 17, 11]
56 [28, 17, 11]
57 [28, 17, 12]
58 [29, 17, 12]
59 [29, 18, 12]
60 [30, 18, 12]
61 [31, 18, 12]
62 [31, 19, 12]
63 [31, 19, 13]
64 [32, 19, 13]
65 [32, 20, 13]
66 [33, 20, 13]
67 [33, 20, 14]
68 [34, 20, 14]
69 [34, 21, 14]
70 [35, 21, 14]
71 [36, 21, 14]
72 [36, 22, 14]
73 [36, 22, 15]
74 [37, 22, 15]
75 [37, 23, 15]
76 [38, 23, 15]
77 [38, 23, 16]
78 [39, 23, 16]
79 [39, 24, 16]
80 [40, 24, 16]
81 [41, 24, 16]
82 [41, 25, 16]
83 [41, 25, 17]
84 [42, 25, 17]
85 [42, 26, 17]
86 [43, 26, 17]
87 [43, 26, 18]
88 [44, 26, 18]
89 [44, 27, 18]
90 [45, 27, 18]
91 [46, 27, 18]
92 [46, 28, 18]
93 [46, 28, 19]
94 [47, 28, 19]
95 [47, 29, 19]
96 [48, 29, 19]
97 [48, 29, 20]
98 [49, 29, 20]
99 [49, 30, 20]
100 [50, 30, 20]

结果

tokens_in_slots = [0, 0, 0, 0]
slot_distributions = [0.8, 0.1, 0.05, 0.05]

：

>>> add_and_print(100)
1 [0, 0, 1, 0]
2 [1, 0, 1, 0]
3 [2, 0, 1, 0]
4 [3, 0, 1, 0]
5 [3, 1, 1, 0]
6 [4, 1, 1, 0]
7 [5, 1, 1, 0]
8 [6, 1, 1, 0]
9 [7, 1, 1, 0]
10 [7, 1, 1, 1]
11 [8, 1, 1, 1]
12 [9, 1, 1, 1]
13 [10, 1, 1, 1]
14 [11, 1, 1, 1]
15 [11, 2, 1, 1]
16 [12, 2, 1, 1]
17 [13, 2, 1, 1]
18 [14, 2, 1, 1]
19 [15, 2, 1, 1]
20 [16, 2, 1, 1]
21 [17, 2, 1, 1]
22 [17, 3, 1, 1]
23 [18, 3, 1, 1]
24 [19, 3, 1, 1]
25 [20, 3, 1, 1]
26 [20, 3, 2, 1]
27 [21, 3, 2, 1]
28 [22, 3, 2, 1]
29 [23, 3, 2, 1]
30 [23, 3, 2, 2]
31 [24, 3, 2, 2]
32 [25, 3, 2, 2]
33 [26, 3, 2, 2]
34 [27, 3, 2, 2]
35 [27, 4, 2, 2]
36 [28, 4, 2, 2]
37 [29, 4, 2, 2]
38 [30, 4, 2, 2]
39 [31, 4, 2, 2]
40 [32, 4, 2, 2]
41 [33, 4, 2, 2]
42 [33, 5, 2, 2]
43 [34, 5, 2, 2]
44 [35, 5, 2, 2]
45 [36, 5, 2, 2]
46 [36, 5, 3, 2]
47 [37, 5, 3, 2]
48 [38, 5, 3, 2]
49 [39, 5, 3, 2]
50 [39, 5, 3, 3]
51 [40, 5, 3, 3]
52 [41, 5, 3, 3]
53 [42, 5, 3, 3]
54 [43, 5, 3, 3]
55 [43, 6, 3, 3]
56 [44, 6, 3, 3]
57 [45, 6, 3, 3]
58 [46, 6, 3, 3]
59 [47, 6, 3, 3]
60 [48, 6, 3, 3]
61 [49, 6, 3, 3]
62 [49, 7, 3, 3]
63 [50, 7, 3, 3]
64 [51, 7, 3, 3]
65 [52, 7, 3, 3]
66 [52, 7, 4, 3]
67 [53, 7, 4, 3]
68 [54, 7, 4, 3]
69 [55, 7, 4, 3]
70 [55, 7, 4, 4]
71 [56, 7, 4, 4]
72 [57, 7, 4, 4]
73 [58, 7, 4, 4]
74 [59, 7, 4, 4]
75 [59, 8, 4, 4]
76 [60, 8, 4, 4]
77 [61, 8, 4, 4]
78 [62, 8, 4, 4]
79 [63, 8, 4, 4]
80 [64, 8, 4, 4]
81 [65, 8, 4, 4]
82 [65, 9, 4, 4]
83 [66, 9, 4, 4]
84 [67, 9, 4, 4]
85 [68, 9, 4, 4]
86 [68, 9, 5, 4]
87 [69, 9, 5, 4]
88 [70, 9, 5, 4]
89 [71, 9, 5, 4]
90 [71, 9, 5, 5]
91 [72, 9, 5, 5]
92 [73, 9, 5, 5]
93 [74, 9, 5, 5]
94 [75, 9, 5, 5]
95 [75, 10, 5, 5]
96 [76, 10, 5, 5]
97 [77, 10, 5, 5]
98 [78, 10, 5, 5]
99 [79, 10, 5, 5]
100 [80, 10, 5, 5]

【讨论】：

这是描述问题的好方法。谢谢！我想我会尝试这些方面的东西。
在 Ruby 中实现了它，所以我可以玩弄它：gist.github.com/3276573#file_claudiu.rb

【解决方案2】：

我想到的解决方案：

给每个插槽一个计算分数。将令牌放入得分最高的插槽中。如果不止一个共享该分数，我不在乎我们是选择第一个还是随机的。

计算出的分数将类似于以下 Ruby/伪代码：

# Example values
# Floats to avoid integer division
slot_1_weight = 50.0
total_weight  = 100.0
slot_1_tokens = 2.0
total_tokens  = 3.0

if total_tokens == 0 || total_weight == 0 || slot_1_tokens
  # Avoid division by zero.
  slot_1_score = slot_1_weight
else
  expected_distribution = slot_1_weight/total_weight
  actual_distribution = slot_1_tokens/total_tokens
  slot_1_score  = slot_1_weight * (expected_distribution/actual_distribution)
end

所以当预期和实际匹配时，分数就是原始权重。如果期望值太高，则按比例缩小权重。如果期望值太低，则按比例增加权重。

【讨论】：