基于百分比加权的选择答案

【问题标题】：selection based on percentage weighting基于百分比加权的选择
【发布时间】：2011-04-08 23:58:21
【问题描述】：

我有一组值，以及每个值的相关百分比：

a：70% 的机会
b: 20% 的机会
c: 10% 的机会

我想根据给定的百分比机会选择一个值（a、b、c）。

我该如何处理？

到目前为止，我的尝试如下所示：

r = random.random()
if r <= .7:
    return a
elif r <= .9:
    return b
else: 
    return c

我一直在想出一种算法来处理这个问题。我应该如何处理这个问题，以便它可以处理更大的值集，而不仅仅是将 if-else 流链接在一起。

（伪代码中的任何解释或答案都可以。python 或 C# 实现会特别有帮助）

【问题讨论】：

我遇到了这个问题，最终建立了一个库：github.com/kinetiq/Ether.WeightedSelector
这里的c#非常好和简单的实现：vcskicks.com/random-element.php

标签： c# python algorithm random

【解决方案1】：

取出列表并找到权重的累积总数：70、70+20、70+20+10。选择一个大于或等于零且小于总数的随机数。遍历项目并返回权重的累积总和大于此随机数的第一个值：

def select( values ):
    variate = random.random() * sum( values.values() )
    cumulative = 0.0
    for item, weight in values.items():
        cumulative += weight
        if variate < cumulative:
            return item
    return item # Shouldn't get here, but just in case of rounding...

print select( { "a": 70, "b": 20, "c": 10 } )

该解决方案在实施后还应该能够处理分数权重和加起来为任意数字的权重，只要它们都是非负数。

【讨论】：

当我第一次看到这个答案时，它里面没有任何代码。看起来我们同时忙于想出基本相同的代码。

【解决方案2】：

我认为你可以有一个小对象数组（我用Java实现，虽然我知道一点C#但我怕会写错代码），所以你可能需要自己移植。 C# 中的代码使用 struct 和 var 会小很多，但我希望你能明白

class PercentString {
  double percent;
  String value;
  // Constructor for 2 values
}

ArrayList<PercentString> list = new ArrayList<PercentString();
list.add(new PercentString(70, "a");
list.add(new PercentString(20, "b");
list.add(new PercentString(10, "c");

double percent = 0;
for (int i = 0; i < list.size(); i++) {
  PercentString p = list.get(i);
  percent += p.percent;
  if (random < percent) {
    return p.value;
  }
}

【讨论】：

对不起，我的要求误解了，我改变了我的代码
你的random来自哪里？

【解决方案3】：

令 T = 所有项目重量的总和
令 R = 0 到 T 之间的随机数
迭代项目列表，从 R 中减去每个项目的权重，并返回导致结果变为

【讨论】：

+1 因为在我的版本中，我先对列表进行排序然后迭代，你让我意识到这没有必要。

【解决方案4】：

对于 Python：

>>> import random
>>> dst = 70, 20, 10
>>> vls = 'a', 'b', 'c'
>>> picks = [v for v, d in zip(vls, dst) for _ in range(d)]
>>> for _ in range(12): print random.choice(picks),
... 
a c c b a a a a a a a a
>>> for _ in range(12): print random.choice(picks),
... 
a c a c a b b b a a a a
>>> for _ in range(12): print random.choice(picks),
... 
a a a a c c a c a a c a
>>>

总体思路：列出一个列表，其中每个项目的重复次数与其应具有的概率成正比；使用random.choice 随机（均匀地）选择一个，这将匹配您所需的概率分布。如果您的概率以特殊方式表示，可能会有点浪费内存（例如，70, 20, 10 列出 100 个项目，而 7, 2, 1 将列出仅包含 10 个具有完全相同行为的项目），但您可以划分如果您认为这在您的特定应用场景中可能很重要，则概率列表中的所有计数都按最大公因数排列。

除了内存消耗问题之外，这应该是最快的解决方案——每个所需的输出结果只生成一个随机数，并且从该随机数中进行最快的查找，没有比较 &c。如果您可能的概率非常奇怪（例如，浮点数需要匹配很多很多有效数字），其他方法可能更可取；-)。

【讨论】：

嗯，我不确定在只需要三个条目时创建数百个条目的列表的性能特征。
当百分比都是整数时，这可以正常工作（但不是最佳），但如果它们是任意实数怎么办？有更好的解决方案。
@Timwi，你测量过吗？列表创建一次，然后从中生成许多随机数，您可能会惊讶于它的执行情况。 @Mark，我确实说过，如果给定的浮点数非常精确，以至于您需要在预期的概率分布中匹配它们的许多数字（这不是一个明智的规范，请注意，但是，无论谁指定并支付代码并不总是一个明智的人，特别是当他们用别人的钱支付时......;-)。 OP 说“百分比”，这些通常四舍五入到最接近的百分比，你知道吗？
@Alex，你说得对，这确实符合规范。一旦你破译了picks 生成器，它也很容易理解。当一个更通用的解决方案几乎同样简单时，我发现很难推荐一个有限的解决方案。
@Mark，当我的代码变成一个函数时，实际上比你的更简单——当条件满足时，性能可能会好很多。 “picks generator”（它不是一个——它是一个列表理解）当然可以很容易地重构为一个循环——无论如何它都是一个 preliminary （不是在 every 上执行i> 调用，仅在那些期望概率改变）的情况下调用，因此在任何正常、有用、合理的情况下，listcomp 或循环的性能可能会被摊销掉。

【解决方案5】：

这是一个完整的 C# 解决方案：

public class ProportionValue<T>
{
    public double Proportion { get; set; }
    public T Value { get; set; }
}

public static class ProportionValue
{
    public static ProportionValue<T> Create<T>(double proportion, T value)
    {
        return new ProportionValue<T> { Proportion = proportion, Value = value };
    }

    static Random random = new Random();
    public static T ChooseByRandom<T>(
        this IEnumerable<ProportionValue<T>> collection)
    {
        var rnd = random.NextDouble();
        foreach (var item in collection)
        {
            if (rnd < item.Proportion)
                return item.Value;
            rnd -= item.Proportion;
        }
        throw new InvalidOperationException(
            "The proportions in the collection do not add up to 1.");
    }
}

用法：

var list = new[] {
    ProportionValue.Create(0.7, "a"),
    ProportionValue.Create(0.2, "b"),
    ProportionValue.Create(0.1, "c")
};

// Outputs "a" with probability 0.7, etc.
Console.WriteLine(list.ChooseByRandom());

【讨论】：

遇到错误，必须将 ChooseByRandom 定义更改为：public static T ChooseByRandom<T>(this System.Collections.Generic.IEnumerable<ProportionValue<T>> collection)
另外，如果它可以取任何值，而不仅仅是 0.3 等，它会很整洁。它应该将所有值相加并自行计算百分比，这样用户就不必关心这个了.像值 400 和 1600 最终会变成 0.2 和 0.8 等。
@Jonny 您的第二个建议（非常）容易做到：1）制作一个接收值映射的函数版本，让映射的键成为机会。 2）对所有键（机会）的值求和。在您的示例中，2000。3）将每个键（机会）除以总数，结果将是该键相对于总数的比例，介于 0 和 1 之间。在这种情况下，就像您的示例一样，0.2，和 0.8。
@Timwi 你能告诉这个算法名称是什么吗？

【解决方案6】：

def weighted_choice(probabilities):
    random_position = random.random() * sum(probabilities)
    current_position = 0.0
    for i, p in enumerate(probabilities):
        current_position += p
        if random_position < current_position:
            return i
    return None

因为random.random 将始终返回return。

【讨论】：

读者注意：如果您的分布是标准化的，则不需要sum(probabilities)。此代码也不会正确返回概率为 0 的选项。

【解决方案7】：

import random

def selector(weights):
    i=random.random()*sum(x for x,y in weights)
    for w,v in weights:
        if w>=i:
            break
        i-=w
    return v

weights = ((70,'a'),(20,'b'),(10,'c'))
print [selector(weights) for x in range(10)]

它同样适用于分数权重

weights = ((0.7,'a'),(0.2,'b'),(0.1,'c'))
print [selector(weights) for x in range(10)]

如果您有很多个权重，您可以使用 bisect 来减少所需的迭代次数

import random
import bisect

def make_acc_weights(weights):
    acc=0
    acc_weights = []
    for w,v in weights:
        acc+=w
        acc_weights.append((acc,v))
    return acc_weights

def selector(acc_weights):
    i=random.random()*sum(x for x,y in weights)
    return weights[bisect.bisect(acc_weights, (i,))][1]

weights = ((70,'a'),(20,'b'),(10,'c'))
acc_weights = make_acc_weights(weights)    
print [selector(acc_weights) for x in range(100)]

也适用于分数权重

weights = ((0.7,'a'),(0.2,'b'),(0.1,'c'))
acc_weights = make_acc_weights(weights)    
print [selector(acc_weights) for x in range(100)]

【讨论】：

【解决方案8】：

Knuth 引用了 Walker 的别名方法。对此进行搜索，我找到了http://code.activestate.com/recipes/576564-walkers-alias-method-for-random-objects-with-diffe/ 和http://prxq.wordpress.com/2006/04/17/the-alias-method/。这给出了使用线性设置时间生成的每个数字在恒定时间中所需的确切概率（奇怪的是，如果您完全使用 Knuth 描述的方法，则设置 n log n 时间，这是您可以避免的准备排序）。

【讨论】：

另见 stackoverflow.com/questions/5027757/… - 这也称为 Vose 的别名方法，因为 this 改进了该方法的（启动时间）。

【解决方案9】：

今天the update of python document举个例子来做一个带有加权概率的random.choice()：

如果权重是小整数比率，一个简单的技术是建立一个具有重复的样本总体：

>>> weighted_choices = [('Red', 3), ('Blue', 2), ('Yellow', 1), ('Green', 4)]
>>> population = [val for val, cnt in weighted_choices for i in range(cnt)]
>>> random.choice(population)
'Green'

更通用的方法是使用 itertools.accumulate() 将权重排列成累积分布，然后使用 bisect.bisect() 定位随机值：

>>> choices, weights = zip(*weighted_choices)
>>> cumdist = list(itertools.accumulate(weights))
>>> x = random.random() * cumdist[-1]
>>> choices[bisect.bisect(cumdist, x)]
'Blue'

备注：itertools.accumulate() needs python 3.2 or define it with the Equivalent.

【讨论】：

【解决方案10】：

如果你真的很想快速生成随机值，https://stackoverflow.com/a/3655773/1212517 中提到的 Walker 算法 mcdowella 几乎是最好的方法（random() 需要 O(1) 时间，而 O( N) preprocess()) 的时间。

对于任何感兴趣的人，这是我自己的算法的 PHP 实现：

/**
 * Pre-process the samples (Walker's alias method).
 * @param array key represents the sample, value is the weight
 */
protected function preprocess($weights){

    $N = count($weights);
    $sum = array_sum($weights);
    $avg = $sum / (double)$N;

    //divide the array of weights to values smaller and geq than sum/N 
    $smaller = array_filter($weights, function($itm) use ($avg){ return $avg > $itm;}); $sN = count($smaller); 
    $greater_eq = array_filter($weights, function($itm) use ($avg){ return $avg <= $itm;}); $gN = count($greater_eq);

    $bin = array(); //bins

    //we want to fill N bins
    for($i = 0;$i<$N;$i++){
        //At first, decide for a first value in this bin
        //if there are small intervals left, we choose one
        if($sN > 0){  
            $choice1 = each($smaller); 
            unset($smaller[$choice1['key']]);
            $sN--;
        } else{  //otherwise, we split a large interval
            $choice1 = each($greater_eq); 
            unset($greater_eq[$choice1['key']]);
        }

        //splitting happens here - the unused part of interval is thrown back to the array
        if($choice1['value'] >= $avg){
            if($choice1['value'] - $avg >= $avg){
                $greater_eq[$choice1['key']] = $choice1['value'] - $avg;
            }else if($choice1['value'] - $avg > 0){
                $smaller[$choice1['key']] = $choice1['value'] - $avg;
                $sN++;
            }
            //this bin comprises of only one value
            $bin[] = array(1=>$choice1['key'], 2=>null, 'p1'=>1, 'p2'=>0);
        }else{
            //make the second choice for the current bin
            $choice2 = each($greater_eq);
            unset($greater_eq[$choice2['key']]);

            //splitting on the second interval
            if($choice2['value'] - $avg + $choice1['value'] >= $avg){
                $greater_eq[$choice2['key']] = $choice2['value'] - $avg + $choice1['value'];
            }else{
                $smaller[$choice2['key']] = $choice2['value'] - $avg + $choice1['value'];
                $sN++;
            }

            //this bin comprises of two values
            $choice2['value'] = $avg - $choice1['value'];
            $bin[] = array(1=>$choice1['key'], 2=>$choice2['key'],
                           'p1'=>$choice1['value'] / $avg, 
                           'p2'=>$choice2['value'] / $avg);
        }
    }

    $this->bins = $bin;
}

/**
 * Choose a random sample according to the weights.
 */
public function random(){
    $bin = $this->bins[array_rand($this->bins)];
    $randValue = (lcg_value() < $bin['p1'])?$bin[1]:$bin[2];        
}

【讨论】：

【解决方案11】：

这是我的版本，可以应用于任何IList 并标准化权重。它基于 Timwi 的解决方案：selection based on percentage weighting

/// <summary>
/// return a random element of the list or default if list is empty
/// </summary>
/// <param name="e"></param>
/// <param name="weightSelector">
/// return chances to be picked for the element. A weigh of 0 or less means 0 chance to be picked.
/// If all elements have weight of 0 or less they all have equal chances to be picked.
/// </param>
/// <returns></returns>
public static T AnyOrDefault<T>(this IList<T> e, Func<T, double> weightSelector)
{
    if (e.Count < 1)
        return default(T);
    if (e.Count == 1)
        return e[0];
    var weights = e.Select(o => Math.Max(weightSelector(o), 0)).ToArray();
    var sum = weights.Sum(d => d);

    var rnd = new Random().NextDouble();
    for (int i = 0; i < weights.Length; i++)
    {
        //Normalize weight
        var w = sum == 0
            ? 1 / (double)e.Count
            : weights[i] / sum;
        if (rnd < w)
            return e[i];
        rnd -= w;
    }
    throw new Exception("Should not happen");
}

【讨论】：

【解决方案12】：

对此我有自己的解决方案：

public class Randomizator3000 
{    
public class Item<T>
{
    public T value;
    public float weight;

    public static float GetTotalWeight<T>(Item<T>[] p_itens)
    {
        float __toReturn = 0;
        foreach(var item in p_itens)
        {
            __toReturn += item.weight;
        }

        return __toReturn;
    }
}

private static System.Random _randHolder;
private static System.Random _random
{
    get 
    {
        if(_randHolder == null)
            _randHolder = new System.Random();

        return _randHolder;
    }
}

public static T PickOne<T>(Item<T>[] p_itens)
{   
    if(p_itens == null || p_itens.Length == 0)
    {
        return default(T);
    }

    float __randomizedValue = (float)_random.NextDouble() * (Item<T>.GetTotalWeight(p_itens));
    float __adding = 0;
    for(int i = 0; i < p_itens.Length; i ++)
    {
        float __cacheValue = p_itens[i].weight + __adding;
        if(__randomizedValue <= __cacheValue)
        {
            return p_itens[i].value;
        }

        __adding = __cacheValue;
    }

    return p_itens[p_itens.Length - 1].value;

}
}

使用它应该是这样的（在 Unity3d 中）

using UnityEngine;
using System.Collections;

public class teste : MonoBehaviour 
{
Randomizator3000.Item<string>[] lista;

void Start()
{
    lista = new Randomizator3000.Item<string>[10];
    lista[0] = new Randomizator3000.Item<string>();
    lista[0].weight = 10;
    lista[0].value = "a";

    lista[1] = new Randomizator3000.Item<string>();
    lista[1].weight = 10;
    lista[1].value = "b";

    lista[2] = new Randomizator3000.Item<string>();
    lista[2].weight = 10;
    lista[2].value = "c";

    lista[3] = new Randomizator3000.Item<string>();
    lista[3].weight = 10;
    lista[3].value = "d";

    lista[4] = new Randomizator3000.Item<string>();
    lista[4].weight = 10;
    lista[4].value = "e";

    lista[5] = new Randomizator3000.Item<string>();
    lista[5].weight = 10;
    lista[5].value = "f";

    lista[6] = new Randomizator3000.Item<string>();
    lista[6].weight = 10;
    lista[6].value = "g";

    lista[7] = new Randomizator3000.Item<string>();
    lista[7].weight = 10;
    lista[7].value = "h";

    lista[8] = new Randomizator3000.Item<string>();
    lista[8].weight = 10;
    lista[8].value = "i";

    lista[9] = new Randomizator3000.Item<string>();
    lista[9].weight = 10;
    lista[9].value = "j";
}


void Update () 
{
    Debug.Log(Randomizator3000.PickOne<string>(lista));
}
}

在此示例中，每个值都有 10% 的机会显示为调试 =3

【讨论】：

【解决方案13】：

大致基于python的numpy.random.choice(a=items, p=probs)，它接受一个数组和一个大小相同的概率数组。

    public T RandomChoice<T>(IEnumerable<T> a, IEnumerable<double> p)
    {
        IEnumerator<T> ae = a.GetEnumerator();
        Random random = new Random();
        double target = random.NextDouble();
        double accumulator = 0;
        foreach (var prob in p)
        {
            ae.MoveNext();
            accumulator += prob;
            if (accumulator > target)
            {
                break;
            }
        }
        return ae.Current;
    }

概率数组p 总和必须为（大约）1。这是为了使其与 numpy 接口（和数学）保持一致，但如果需要，您可以轻松更改。

【讨论】：