【问题标题】:Algorithm to distribute according to weights, with unknown total of items, guaranteeing good distribution?根据权重分配的算法,项目总数未知,保证良好的分配?
【发布时间】:2012-08-06 16:24:15
【问题描述】:

我想将代币分配到 3 个插槽中。

每个插槽都有一定的权重:可能 50% 的代币应该进入第一个插槽,30% 应该进入第二个插槽,20% 应该进入第三个插槽。

我不知道代币的总数——它们不断涌现。我可能会在中午获得 1000 个代币来分发,我在下午 1 点再获得 300 个。等等,不可预测。在任何时候,我到目前为止收到的代币都应该根据权重尽可能好地分配。

一种解决方案是按概率分布。我为每个令牌掷一个 100 面骰子。如果结果是 1-50,则令牌进入插槽 1。51-80 的结果平均插槽 2,81-100 的结果平均插槽 3。

但这意味着,例如,每个令牌最终都出现在插槽 3 中并非不可能(只是不太可能)。

我想保证,当我总共收到 100 个令牌时,其中的 50 个将在插槽 1 中。当我收到 1000 个令牌时,恰好 500 应该在插槽 1 中。

对此有什么好的算法?

【问题讨论】:

    标签: algorithm distribution probability


    【解决方案1】:

    根据理想分布计算每个槽的误差。始终将令牌插入错误最多的插槽中。如果两个或多个插槽并列,则随机插入其中一个。

    误差是预期的tokens数量(tokens added * ratio)与实际tokens的数量之差。

    这样你总能最大限度地减少错误,如果令牌能够准确分配,就不会出现错误。

    演示代码(如果错误数量相等,则插入第一个槽,而不是随机分布):

    import random
    
    tokens_in_slots = [0, 0, 0]
    slot_distributions = [0.5, 0.3, 0.2]
    
    def add_token():
        num_tokens = sum(tokens_in_slots)
        if not num_tokens:
            #first token can go anywhere
            tokens_in_slots[random.randint(0,2)] += 1
            return
        expected_tokens = [num_tokens*distr for distr in slot_distributions]
        errors = [expected - actual
                  for expected, actual in zip(expected_tokens, tokens_in_slots)]
        most_error = max(enumerate(errors), key=lambda (i,e): e)
        tokens_in_slots[most_error[0]] += 1
    
    def add_and_print(n):
        for i in xrange(n):
            add_token()
            print sum(tokens_in_slots), tokens_in_slots
    

    结果:

    >>> add_and_print(100)
    1 [0, 0, 1]
    2 [1, 0, 1]
    3 [1, 1, 1]
    4 [2, 1, 1]
    5 [2, 2, 1]
    6 [3, 2, 1]
    7 [3, 2, 2]
    8 [4, 2, 2]
    9 [4, 3, 2]
    10 [5, 3, 2]
    11 [6, 3, 2]
    12 [6, 4, 2]
    13 [6, 4, 3]
    14 [7, 4, 3]
    15 [7, 5, 3]
    16 [8, 5, 3]
    17 [8, 5, 4]
    18 [9, 5, 4]
    19 [9, 6, 4]
    20 [10, 6, 4]
    21 [11, 6, 4]
    22 [11, 7, 4]
    23 [11, 7, 5]
    24 [12, 7, 5]
    25 [12, 8, 5]
    26 [13, 8, 5]
    27 [13, 8, 6]
    28 [14, 8, 6]
    29 [14, 9, 6]
    30 [15, 9, 6]
    31 [16, 9, 6]
    32 [16, 10, 6]
    33 [16, 10, 7]
    34 [17, 10, 7]
    35 [17, 11, 7]
    36 [18, 11, 7]
    37 [18, 11, 8]
    38 [19, 11, 8]
    39 [19, 12, 8]
    40 [20, 12, 8]
    41 [21, 12, 8]
    42 [21, 13, 8]
    43 [21, 13, 9]
    44 [22, 13, 9]
    45 [22, 14, 9]
    46 [23, 14, 9]
    47 [23, 14, 10]
    48 [24, 14, 10]
    49 [24, 15, 10]
    50 [25, 15, 10]
    51 [26, 15, 10]
    52 [26, 16, 10]
    53 [26, 16, 11]
    54 [27, 16, 11]
    55 [27, 17, 11]
    56 [28, 17, 11]
    57 [28, 17, 12]
    58 [29, 17, 12]
    59 [29, 18, 12]
    60 [30, 18, 12]
    61 [31, 18, 12]
    62 [31, 19, 12]
    63 [31, 19, 13]
    64 [32, 19, 13]
    65 [32, 20, 13]
    66 [33, 20, 13]
    67 [33, 20, 14]
    68 [34, 20, 14]
    69 [34, 21, 14]
    70 [35, 21, 14]
    71 [36, 21, 14]
    72 [36, 22, 14]
    73 [36, 22, 15]
    74 [37, 22, 15]
    75 [37, 23, 15]
    76 [38, 23, 15]
    77 [38, 23, 16]
    78 [39, 23, 16]
    79 [39, 24, 16]
    80 [40, 24, 16]
    81 [41, 24, 16]
    82 [41, 25, 16]
    83 [41, 25, 17]
    84 [42, 25, 17]
    85 [42, 26, 17]
    86 [43, 26, 17]
    87 [43, 26, 18]
    88 [44, 26, 18]
    89 [44, 27, 18]
    90 [45, 27, 18]
    91 [46, 27, 18]
    92 [46, 28, 18]
    93 [46, 28, 19]
    94 [47, 28, 19]
    95 [47, 29, 19]
    96 [48, 29, 19]
    97 [48, 29, 20]
    98 [49, 29, 20]
    99 [49, 30, 20]
    100 [50, 30, 20]
    

    结果

    tokens_in_slots = [0, 0, 0, 0]
    slot_distributions = [0.8, 0.1, 0.05, 0.05]
    

    >>> add_and_print(100)
    1 [0, 0, 1, 0]
    2 [1, 0, 1, 0]
    3 [2, 0, 1, 0]
    4 [3, 0, 1, 0]
    5 [3, 1, 1, 0]
    6 [4, 1, 1, 0]
    7 [5, 1, 1, 0]
    8 [6, 1, 1, 0]
    9 [7, 1, 1, 0]
    10 [7, 1, 1, 1]
    11 [8, 1, 1, 1]
    12 [9, 1, 1, 1]
    13 [10, 1, 1, 1]
    14 [11, 1, 1, 1]
    15 [11, 2, 1, 1]
    16 [12, 2, 1, 1]
    17 [13, 2, 1, 1]
    18 [14, 2, 1, 1]
    19 [15, 2, 1, 1]
    20 [16, 2, 1, 1]
    21 [17, 2, 1, 1]
    22 [17, 3, 1, 1]
    23 [18, 3, 1, 1]
    24 [19, 3, 1, 1]
    25 [20, 3, 1, 1]
    26 [20, 3, 2, 1]
    27 [21, 3, 2, 1]
    28 [22, 3, 2, 1]
    29 [23, 3, 2, 1]
    30 [23, 3, 2, 2]
    31 [24, 3, 2, 2]
    32 [25, 3, 2, 2]
    33 [26, 3, 2, 2]
    34 [27, 3, 2, 2]
    35 [27, 4, 2, 2]
    36 [28, 4, 2, 2]
    37 [29, 4, 2, 2]
    38 [30, 4, 2, 2]
    39 [31, 4, 2, 2]
    40 [32, 4, 2, 2]
    41 [33, 4, 2, 2]
    42 [33, 5, 2, 2]
    43 [34, 5, 2, 2]
    44 [35, 5, 2, 2]
    45 [36, 5, 2, 2]
    46 [36, 5, 3, 2]
    47 [37, 5, 3, 2]
    48 [38, 5, 3, 2]
    49 [39, 5, 3, 2]
    50 [39, 5, 3, 3]
    51 [40, 5, 3, 3]
    52 [41, 5, 3, 3]
    53 [42, 5, 3, 3]
    54 [43, 5, 3, 3]
    55 [43, 6, 3, 3]
    56 [44, 6, 3, 3]
    57 [45, 6, 3, 3]
    58 [46, 6, 3, 3]
    59 [47, 6, 3, 3]
    60 [48, 6, 3, 3]
    61 [49, 6, 3, 3]
    62 [49, 7, 3, 3]
    63 [50, 7, 3, 3]
    64 [51, 7, 3, 3]
    65 [52, 7, 3, 3]
    66 [52, 7, 4, 3]
    67 [53, 7, 4, 3]
    68 [54, 7, 4, 3]
    69 [55, 7, 4, 3]
    70 [55, 7, 4, 4]
    71 [56, 7, 4, 4]
    72 [57, 7, 4, 4]
    73 [58, 7, 4, 4]
    74 [59, 7, 4, 4]
    75 [59, 8, 4, 4]
    76 [60, 8, 4, 4]
    77 [61, 8, 4, 4]
    78 [62, 8, 4, 4]
    79 [63, 8, 4, 4]
    80 [64, 8, 4, 4]
    81 [65, 8, 4, 4]
    82 [65, 9, 4, 4]
    83 [66, 9, 4, 4]
    84 [67, 9, 4, 4]
    85 [68, 9, 4, 4]
    86 [68, 9, 5, 4]
    87 [69, 9, 5, 4]
    88 [70, 9, 5, 4]
    89 [71, 9, 5, 4]
    90 [71, 9, 5, 5]
    91 [72, 9, 5, 5]
    92 [73, 9, 5, 5]
    93 [74, 9, 5, 5]
    94 [75, 9, 5, 5]
    95 [75, 10, 5, 5]
    96 [76, 10, 5, 5]
    97 [77, 10, 5, 5]
    98 [78, 10, 5, 5]
    99 [79, 10, 5, 5]
    100 [80, 10, 5, 5]
    

    【讨论】:

    【解决方案2】:

    我想到的解决方案:

    给每个插槽一个计算分数。将令牌放入得分最高的插槽中。如果不止一个共享该分数,我不在乎我们是选择第一个还是随机的。

    计算出的分数将类似于以下 Ruby/伪代码:

    # Example values
    # Floats to avoid integer division
    slot_1_weight = 50.0
    total_weight  = 100.0
    slot_1_tokens = 2.0
    total_tokens  = 3.0
    
    if total_tokens == 0 || total_weight == 0 || slot_1_tokens
      # Avoid division by zero.
      slot_1_score = slot_1_weight
    else
      expected_distribution = slot_1_weight/total_weight
      actual_distribution = slot_1_tokens/total_tokens
      slot_1_score  = slot_1_weight * (expected_distribution/actual_distribution)
    end
    

    所以当预期和实际匹配时,分数就是原始权重。如果期望值太高,则按比例缩小权重。如果期望值太低,则按比例增加权重。

    【讨论】:

      猜你喜欢
      • 2011-07-09
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2011-09-04
      • 2012-06-18
      • 1970-01-01
      • 2017-03-16
      相关资源
      最近更新 更多