加权选择简短而简单[重复]答案

【问题标题】：Weighted choice short and simple [duplicate]加权选择简短而简单[重复]
【发布时间】：2012-06-03 21:53:16
【问题描述】：

如果我有一个列表中的项目集合。我想根据另一个权重列表从该列表中进行选择。

例如，我的收藏是['one', 'two', 'three']，权重是[0.2, 0.3, 0.5]，我希望该方法在大约一半的抽奖中给我“三”。

最简单的方法是什么？

【问题讨论】：

【解决方案1】：

从numpy 1.7 版开始，您可以使用numpy.random.choice()：

elements = ['one', 'two', 'three'] 
weights = [0.2, 0.3, 0.5]

from numpy.random import choice
print(choice(elements, p=weights))

【讨论】：

这个答案应该被验证。
完美解决方案l = [choice(elements, p=weights) for _ in range(1000)] 和from collections import Counter; Counter(l) 提供：Counter({'three': 498, 'two': 281, 'one': 221})。

【解决方案2】：

从 Python 3.6 开始，您可以使用 random.choices 进行加权随机选择（带替换）。

随机。选择（population, weights=None, *, cum_weights=None, k=1）

示例用法：

import random
random.choices(['one', 'two', 'three'], [0.2, 0.3, 0.5], k=10)
# ['three', 'two', 'three', 'three', 'three',
#  'three', 'three', 'two', 'two', 'one']

【讨论】：

【解决方案3】：

如何初始化您的列表以使您的选择与预期的权重相匹配。在这里，我列出了 100 个值，代表您想要的“拉动”百分比。

>>> import random
>>> elements = ['one', 'two', 'three'] 
>>> weights = [0.2, 0.3, 0.5]
>>>
>>> # get "sum" of result list of lists (flattens list)
>>> choices = sum([[element] * int(weight * 100)for element, weight in zip(elements, weights)], [])
>>> random.choice(choices)
three

它不是累积的，但它看起来可能是您正在寻找的。p>

【讨论】：

看起来效果一样，但是分配一个 3*100 的向量只是为了做一个选择似乎有点矫枉过正。特别是如果我会在问题首先出现的上下文中使用它，这是一个蒙特卡洛模拟，你希望尽可能快......
您应该将该信息添加到问题中。但是，您只需分配一次列表，调用“random.choice()”会很快。
是的，但我想说，如果有一种便宜的方法和一种昂贵的方法来达到相同的结果，那不用说，人们会选择便宜的。法官裁决？ :)

【解决方案4】：

你可以使用multinomial distribution（来自 numpy）来做你想做的事。例如

elements = ['one', 'two', 'three'] 
weights = [0.2, 0.3, 0.5]


import numpy as np

indices = np.random.multinomial( 100, weights, 1)
#=> array([[20, 32, 48]]), YMMV

results = [] #A list of the original items, repeated the correct number of times.
for i, count in enumerate(indices[0]):
    results.extend( [elements[i]]*count )

所以第一个位置的元素出现了 20 次，第二个位置的元素出现了 32 次，第三个位置的元素出现了 48 次，这与你对权重的预期大致相同。

如果您难以理解多项分布，我发现documentation 真的很有帮助。

【讨论】：

请注意，您可以将结果构建减少到itertools.chain.from_iterable([elements[i]]*count, for i, count in enumerate(indices[0]))，这样会更快。
事实上，您还可以通过将列表乘法替换为itertools.repeat(elements[i], count) 来进一步改进它。

【解决方案5】：

以Maus' answer 为基础，如果您想重复获得加权随机值，这很好，如果您只想要一个值，您可以通过组合numpy.random.multinomial() 和itertools.compress() 非常简单地做到这一点：

from itertools import compress
from numpy.random import multinomial

def weightedChoice(weights, objects):
    """Return a random item from objects, with the weighting defined by weights 
    (which must sum to 1)."""
    return next(compress(objects, multinomial(1, weights, 1)[0]))

【讨论】：

@aix 不小心用我自己的编辑破坏了您的编辑，回滚到您的（更好的）链接。

【解决方案6】：

如果您不想使用numpy，您可以按照相同的方法使用以下内容：

from random import random
from itertools import takewhile

def accumulate(iterator):
    """Returns a cumulative sum of the elements.
    accumulate([1, 2, 3, 4, 5]) --> 1 3 6 10 15"""
    current = 0
    for value in iterator:
        current += value
        yield current

def weightedChoice(weights, objects):
    """Return a random item from objects, with the weighting defined by weights 
    (which must sum to 1)."""
    limit = random()
    return objects[sum(takewhile(bool, (value < limit for value in accumulate(weights))))]

我们使用itertools.takewhile() 来避免在到达我们想要停止的点后检查值，否则，这与Mischa Obrecht's answer 基本相同，只是没有numpy。

【讨论】：

【解决方案7】：

这个函数有两个参数：一个权重列表和一个包含可供选择的对象的列表：

from numpy import cumsum
from numpy.random import rand
def weightedChoice(weights, objects):
    """Return a random item from objects, with the weighting defined by weights 
    (which must sum to 1)."""
    cs = cumsum(weights) #An array of the weights, cumulatively summed.
    idx = sum(cs < rand()) #Find the index of the first weight over a random value.
    return objects[idx]

它不使用任何 python 循环。

【讨论】：

cmets 似乎具有误导性。 cumsum() 给出累积值，而不是布尔值。需要明确的是，这确实有效，但 cmets 与实际发生的情况不符。
按照PEP 257的建议，我已经编辑修复，并将文档字符串放在一行中。
假设权重为正，cs 是一个排序列表。使用 numpy.searchsorted 将显着加快查找索引