随机百分比分支的编码模式？答案

【问题标题】：Coding pattern for random percentage branching?随机百分比分支的编码模式？
【发布时间】：2018-01-31 20:20:33
【问题描述】：

假设我们有一个代码块，我们希望执行 70% 的时间和另一个 30% 的时间。

if(Math.random() < 0.7)
    70percentmethod();
else
    30percentmethod();

足够简单。但是，如果我们希望它可以轻松扩展为 30%/60%/10% 等呢？在这里，它需要添加和更改所有关于 change 的 if 语句，这不太好用，速度慢而且容易出错。

到目前为止，我发现大型开关对于这个用例非常有用，例如：

switch(rand(0, 10)){
    case 0:
    case 1:
    case 2:
    case 3:
    case 4:
    case 5:
    case 6:
    case 7:70percentmethod();break;
    case 8:
    case 9:
    case 10:30percentmethod();break;
}

很容易改成：

switch(rand(0, 10)){
    case 0:10percentmethod();break;
    case 1:
    case 2:
    case 3:
    case 4:
    case 5:
    case 6:
    case 7:60percentmethod();break;
    case 8:
    case 9:
    case 10:30percentmethod();break;
}

但这些也有其缺点，即繁琐且分成预定数量的分区。

理想的东西是基于我猜的“频率数”系统，如下所示：

(1,a),(1,b),(2,c) -> 25% a, 25% b, 50% c

如果你添加了另一个：

(1,a),(1,b),(2,c),(6,d) -> 10% a, 10% b, 20% c, 60% d

所以只需将数字相加，使总和等于 100%，然后将其拆分。

我想用自定义的 hashmap 或其他东西为它创建一个处理程序不会那么麻烦，但我想知道在我把所有的意大利面条放在这之前是否有一些既定的方式/模式或 lambda。

【问题讨论】：

不确定您是否可以使用随机数执行此操作，我猜它可能 100% 的时间低于或高于 0.7。
@Viezevingertjes 这取决于您是否要保证在 10 次运行中，两者都运行 7 次和 3 次，或者只是希望它们具有运行的概率。
请注意，rand(0,10) 给出了 11 个可能的值，而您的“60%”实际上是 70%，总计 110%。
下一个问题：我们如何编码60percent100percentMethod()？
这个问题是许多其他问题的重复...人们应该在发帖前真正搜索！

标签： java design-patterns random

【解决方案1】：

编辑：请参阅最后的编辑以获得更优雅的解决方案。不过我会把它留在里面。

您可以使用NavigableMap 来存储这些方法映射到它们的百分比。

NavigableMap<Double, Runnable> runnables = new TreeMap<>();

runnables.put(0.3, this::30PercentMethod);
runnables.put(1.0, this::70PercentMethod);

public static void runRandomly(Map<Double, Runnable> runnables) {
    double percentage = Math.random();
    for (Map.Entry<Double, Runnable> entry : runnables){
        if (entry.getKey() < percentage) {
            entry.getValue().run();
            return; // make sure you only call one method
        }
    }
    throw new RuntimeException("map not filled properly for " + percentage);
}

// or, because I'm still practicing streams by using them for everything
public static void runRandomly(Map<Double, Runnable> runnables) {
    double percentage = Math.random();
    runnables.entrySet().stream()
        .filter(e -> e.getKey() < percentage)
        .findFirst().orElseThrow(() -> 
                new RuntimeException("map not filled properly for " + percentage))
        .run();
}

NavigableMap 按键排序（例如HashMap 不保证条目），因此您可以得到按百分比排序的条目。这是相关的，因为如果您有两个项目 (3,r1),(7,r2)，它们会产生以下条目：r1 = 0.3 和 r2 = 1.0 和它们需要按此顺序进行评估（例如，如果它们以相反的顺序进行评估，结果将始终为r2）。

至于拆分，应该是这样的：像这样的元组类

static class Pair<X, Y>
{
    public Pair(X f, Y s)
    {
        first = f;
        second = s;
    }

    public final X first;
    public final Y second;
}

您可以像这样创建地图

// the parameter contains the (1,m1), (1,m2), (3,m3) pairs
private static Map<Double,Runnable> splitToPercentageMap(Collection<Pair<Integer,Runnable>> runnables)
{

    // this adds all Runnables to lists of same int value,
    // overall those lists are sorted by that int (so least probable first)
    double total = 0;
    Map<Integer,List<Runnable>> byNumber = new TreeMap<>();
    for (Pair<Integer,Runnable> e : runnables)
    {
        total += e.first;
        List<Runnable> list = byNumber.getOrDefault(e.first, new ArrayList<>());
        list.add(e.second);
        byNumber.put(e.first, list);
    }

    Map<Double,Runnable> targetList = new TreeMap<>();
    double current = 0;
    for (Map.Entry<Integer,List<Runnable>> e : byNumber.entrySet())
    {
        for (Runnable r : e.getValue())
        {
            double percentage = (double) e.getKey() / total;
            current += percentage;
            targetList.put(current, r);
        }
    }

    return targetList;
}

所有这些都添加到一个类中

class RandomRunner {
    private List<Integer, Runnable> runnables = new ArrayList<>();
    public void add(int value, Runnable toRun) {
        runnables.add(new Pair<>(value, toRun));
    }
    public void remove(Runnable toRemove) {
        for (Iterator<Pair<Integer, Runnable>> r = runnables.iterator();
            r.hasNext(); ) {
            if (toRemove == r.next().second) {
               r.remove();
               break;
            }
        }
    }
    public void runRandomly() {
        // split list, use code from above
    }
}

编辑：
实际上，如果您有一个想法卡在脑海中并且没有正确质疑它，那么您就会得到上述结果。保留RandomRunner 类接口，这样就容易多了：

class RandomRunner {
    List<Runnable> runnables = new ArrayList<>();
    public void add(int value, Runnable toRun) {
        // add the methods as often as their weight indicates.
        // this should be fine for smaller numbers;
        // if you get lists with millions of entries, optimize
        for (int i = 0; i < value; i++) {
            runnables.add(toRun);
        }
    }
    public void remove(Runnable r) {
        Iterator<Runnable> myRunnables = runnables.iterator();
        while (myRunnables.hasNext()) {
            if (myRunnables.next() == r) {
                myRunnables.remove();
            }
    }
    public void runRandomly() {
        if (runnables.isEmpty()) return;
        // roll n-sided die
        int runIndex = ThreadLocalRandom.current().nextInt(0, runnables.size());
        runnables.get(runIndex).run();
    }
}

【讨论】：

30PercentMethod 和 70PercentMethod 不是有效的 Java 方法名称
@Michael 你是对的。我只是重复使用 OP 在他的问题中给出的名称。
我很惊讶这个答案如此受欢迎（没有冒犯）。如果您有更多方法要添加到地图中，那么每种方法发生的可能性实际上并不明显——您需要从紧接其前的值中减去每个值。它也不允许你有两个方法的权重相同。
它确实允许多个等权重的结果，因为键是累积概率。因此，如果您想要概率为 0.25、0.25、0.5 的结果 A、B、C，那么您将获得 (0.25,A)、(0.5,B) 和 (1.0,C)。
@Michael 我自己有点惊讶。我现在确实为答案添加了一个更简单的解决方案，它应该可以解决您的问题。

【解决方案2】：

所有这些答案似乎都很复杂，所以我只发布保持简单的替代方案：

double rnd = Math.random()
if((rnd -= 0.6) < 0)
    60percentmethod();
else if ((rnd -= 0.3) < 0)
    30percentmethod();
else
    10percentmethod();

不需要更改其他行，并且可以很容易地看到发生了什么，而无需深入研究辅助类。一个小的缺点是它不会强制百分比总和为 100%。

【讨论】：

为什么不if(rnd < 0.6)？
拥有if(rnd < 0.6) 意味着将下一个if 设置为if(rnd < 0.9)，即跟踪早期if 的百分比总和。只有 3 或 4 个选项不是问题，但想象一下，如果您有 30 个选项，然后更改第一个选项的权重，则您必须更改每个后续 if 语句的权重。这样每个权重都只与它自己的 if 语句相关联，当然最后的 else 除外

【解决方案3】：

我不确定这是否有一个共同的名字，但我想我是在大学里作为幸运之轮学到的。

它基本上就像你描述的那样工作：它接收一个值列表和“频率数”，并根据加权概率选择一个。

list = (1,a),(1,b),(2,c),(6,d)

total = list.sum()
rnd = random(0, total)
sum = 0
for i from 0 to list.size():
    sum += list[i]
    if sum >= rnd:
        return list[i]
return list.last()

如果你想概括，列表可以是一个函数参数。

这也适用于浮点数，并且不必对数字进行规范化。如果您进行归一化（例如总计为 1），则可以跳过 list.sum() 部分。

编辑：

由于需求这里是一个实际的编译java实现和使用示例：

import java.util.ArrayList;
import java.util.Random;

public class RandomWheel<T>
{
  private static final class RandomWheelSection<T>
  {
    public double weight;
    public T value;

    public RandomWheelSection(double weight, T value)
    {
      this.weight = weight;
      this.value = value;
    }
  }

  private ArrayList<RandomWheelSection<T>> sections = new ArrayList<>();
  private double totalWeight = 0;
  private Random random = new Random();

  public void addWheelSection(double weight, T value)
  {
    sections.add(new RandomWheelSection<T>(weight, value));
    totalWeight += weight;
  }

  public T draw()
  {
    double rnd = totalWeight * random.nextDouble();

    double sum = 0;
    for (int i = 0; i < sections.size(); i++)
    {
      sum += sections.get(i).weight;
      if (sum >= rnd)
        return sections.get(i).value;
    }
    return sections.get(sections.size() - 1).value;
  }

  public static void main(String[] args)
  {
    RandomWheel<String> wheel = new RandomWheel<String>();
    wheel.addWheelSection(1, "a");
    wheel.addWheelSection(1, "b");
    wheel.addWheelSection(2, "c");
    wheel.addWheelSection(6, "d");

    for (int i = 0; i < 100; i++)
        System.out.print(wheel.draw());
  }
}

【讨论】：

没错，但这是一个更普遍的问题。我相信你知道如何在 Java 中实现它...
很酷，花车的亮点。如果可以选择为超低机会分支设置一些低于 1 的值，那就太好了。不过，这并不是我想要的，后端很简单。我更感兴趣的是如何将它链接到您以有效方式实际制作列表的部分。
我确信 Java 程序员可以读取伪代码并将其转换为所需的 SingletonRunnerFactory 调用。
@Michael：那么我告诉你一个：写答案的人可能并不完全确定这个成语，但有一个好的解决方案，OP（或任何其他人）可以使用。当然：无法理解伪代码或与他们当前使用的语言不同但非常相似的语言的一段代码的程序员不是程序员。真的。
@HongOoi 关心详细说明您认为缺少什么？

【解决方案4】：

虽然所选答案有效，但不幸的是，对于您的用例而言，它的速度会逐渐变慢。您可以使用名为Alias Sampling 的东西来代替这样做。别名抽样（或别名方法）是一种用于选择具有加权分布的元素的技术。如果选择这些元素的权重没有改变，您可以在 O(1) 时间内进行选择！。如果不是这种情况，如果您所做的选择数量与您对别名表所做的更改（更改权重）之间的比率很高，您仍然可以获得 amortized O(1) time。当前选择的答案建议使用 O(N) 算法，下一个最好的事情是 O(log(N)) 给定排序概率和binary search，但没有什么能超过我建议的 O(1) 时间。

This site 很好地概述了主要与语言无关的 Alias 方法。本质上，您创建一个表，其中每个条目代表两个概率的结果。表格中的每个条目都有一个阈值，低于阈值您获得一个值，高于您获得另一个值。您将更大的概率分布在多个表值中，以便为所有概率组合创建一个面积为 1 的概率图。

假设您有概率 A、B、C 和 D，它们的值分别为 0.1、0.1、0.1 和 0.7。别名方法会将 0.7 的概率传播到所有其他人。一个索引对应于每个概率，其中 ABC 为 0.1 和 0.15，D 的索引为 0.25。这样，您可以标准化每个概率，以便最终在 A 的索引中获得 A 的机会为 0.4，获得 D 的机会为 0.6（分别为 0.1/（0.1 + 0.15）和 0.15/（0.1 + 0.15））以及 B 和 C指数，并且 100% 的机会在 D 的指数中获得 D（0.25/0.25 为 1）。

给定一个用于索引的无偏统一 PRNG (Math.Random())，您会获得相同的选择每个索引的概率，但您也可以为每个索引进行一次硬币翻转，以提供加权概率。你有 25% 的机会落在 A 或 D 的位置上，但在其中你只有 40% 的机会选择 A，而 D 的 60%。.40 * .25 = 0.1，我们原来的概率，如果你将散布在其他指数中的所有 D 概率相加，您将再次得到 0.70。

所以要做随机选择，你只需要生成一个从0到N的随机索引，然后做一个硬币翻转，不管你添加多少项目，这非常快并且成本不变.制作别名表也不需要那么多代码行，我的 python 版本需要 80 行，包括 import 语句和换行符，而 Pandas 文章中介绍的版本大小相似（而且是 C++）

对于您的 java 实现，可以在概率和数组列表索引之间映射到您必须执行的函数，创建一个 array of functions，在您对每个函数进行索引时执行它，或者您可以使用函数对象 (functors)一种用于传递参数以执行的方法。

ArrayList<(YourFunctionObject)> function_list;
// add functions
AliasSampler aliassampler = new AliasSampler(listOfProbabilities);
// somewhere later with some type T and some parameter values. 
int index = aliassampler.sampleIndex();
T result = function_list[index].apply(parameters);

编辑：

我在 java 中创建了 AliasSampler 方法的一个版本，使用类，它使用示例索引方法，应该能够像上面一样使用。

import java.util.ArrayList;
import java.util.Collections;
import java.util.Random;

public class AliasSampler {
    private ArrayList<Double> binaryProbabilityArray;
    private ArrayList<Integer> aliasIndexList;
    AliasSampler(ArrayList<Double> probabilities){
        // java 8 needed here
        assert(DoubleStream.of(probabilities).sum() == 1.0);
        int n = probabilities.size();
        // probabilityArray is the list of probabilities, this is the incoming probabilities scaled
        // by the number of probabilities.  This allows us to figure out which probabilities need to be spread 
        // to others since they are too large, ie [0.1 0.1 0.1 0.7] = [0.4 0.4 0.4 2.80]
        ArrayList<Double> probabilityArray;
        for(Double probability : probabilities){
            probabilityArray.add(probability);
        }
        binaryProbabilityArray = new ArrayList<Double>(Collections.nCopies(n, 0.0));
        aliasIndexList = new ArrayList<Integer>(Collections.nCopies(n, 0));
        ArrayList<Integer> lessThanOneIndexList = new ArrayList<Integer>();
        ArrayList<Integer> greaterThanOneIndexList = new ArrayList<Integer>();
        for(int index = 0; index < probabilityArray.size(); index++){
            double probability = probabilityArray.get(index);
            if(probability < 1.0){
                lessThanOneIndexList.add(index);
            }
            else{
                greaterThanOneIndexList.add(index);
            }
        }

        // while we still have indices to check for in each list, we attempt to spread the probability of those larger
        // what this ends up doing in our first example is taking greater than one elements (2.80) and removing 0.6, 
        // and spreading it to different indices, so (((2.80 - 0.6) - 0.6) - 0.6) will equal 1.0, and the rest will
        // be 0.4 + 0.6 = 1.0 as well. 
        while(lessThanOneIndexList.size() != 0 && greaterThanOneIndexList.size() != 0){
            //https://stackoverflow.com/questions/16987727/removing-last-object-of-arraylist-in-java
            // last element removal is equivalent to pop, java does this in constant time
            int lessThanOneIndex = lessThanOneIndexList.remove(lessThanOneIndexList.size() - 1);
            int greaterThanOneIndex = greaterThanOneIndexList.remove(greaterThanOneIndexList.size() - 1);
            double probabilityLessThanOne = probabilityArray.get(lessThanOneIndex);
            binaryProbabilityArray.set(lessThanOneIndex, probabilityLessThanOne);
            aliasIndexList.set(lessThanOneIndex, greaterThanOneIndex);
            probabilityArray.set(greaterThanOneIndex, probabilityArray.get(greaterThanOneIndex) + probabilityLessThanOne - 1);
            if(probabilityArray.get(greaterThanOneIndex) < 1){
                lessThanOneIndexList.add(greaterThanOneIndex);
            }
            else{
                greaterThanOneIndexList.add(greaterThanOneIndex);
            }
        }
        //if there are any probabilities left in either index list, they can't be spread across the other 
        //indicies, so they are set with probability 1.0. They still have the probabilities they should at this step, it works out mathematically.
        while(greaterThanOneIndexList.size() != 0){
            int greaterThanOneIndex = greaterThanOneIndexList.remove(greaterThanOneIndexList.size() - 1);
            binaryProbabilityArray.set(greaterThanOneIndex, 1.0);
        }
        while(lessThanOneIndexList.size() != 0){
            int lessThanOneIndex = lessThanOneIndexList.remove(lessThanOneIndexList.size() - 1);
            binaryProbabilityArray.set(lessThanOneIndex, 1.0);
        }
    }
    public int sampleIndex(){
        int index = new Random().nextInt(binaryProbabilityArray.size());
        double r = Math.random();
        if( r < binaryProbabilityArray.get(index)){
            return index;
        }
        else{
            return aliasIndexList.get(index);
        }
    }

}

【讨论】：

@Michael 虽然我通常同意这个原则，但在这种情况下，我认为实际实现别名表很简单，没什么大不了的，而且使用实际上是比标记的答案简单得多。此外，这个问题询问了他所说的“事实上的编码模式”，我认为别名方法就是那种编码模式。因此，虽然最佳答案给出了一个简单的解决方案，但它也不是此类问题的标准。我认为这类似于在应该使用链表或哈希表时使用数组。
@Michael 为了避免听起来像是防御性的，我应该重申我原则上同意，总而言之，我认为这不属于过早优化的最大原因是我认为别名方法是标准OP 似乎要求的编码模式。
@Michael 此外，即使在非常罕见的最佳情况下，我的方法也保证至少与您提供的示例一样快：Math.random() 索引、比较值、返回索引、执行基于索引。这就对了。在您必须为搜索执行多次迭代的任何情况下，它总是更快。这不是一些理论上的斐波那契堆，如果给定足够大的 N，它可以变得更好，因为恒定成本很高，您可以通过了解它的工作原理来自己证明这一点。
@Micheal，解释某事的需要不会在任何层面上使其无效，哈希表很难解释，但我怀疑你会对它们的用法做出同样的判断。另外我在这里解释了算法，所以你甚至不需要看文章，而且我列出的 python 代码并不难理解，我根本无法理解你在这方面的困惑。就像我说的，23 对 35？没有那么小。同样，性能增益并非微不足道，它有几个数量级的增益，因为它的 O(1) 和 ~= const 成本很明显，OP 无疑也是循环采样。
@Michael 我不同意您需要解释某事的想法使它变得不那么简单，我也不同意我们不知道这是一个性能问题。另外，在我回复你之前，你就认为你已经“完成”了。

【解决方案5】：

您可以计算每个类别的累积概率，从 [0; 中选择一个随机数； 1) 看看这个数字在哪里。

class WeightedRandomPicker {

    private static Random random = new Random();

    public static int choose(double[] probabilties) {
        double randomVal = random.nextDouble();
        double cumulativeProbability = 0;
        for (int i = 0; i < probabilties.length; ++i) {
            cumulativeProbability += probabilties[i];
            if (randomVal < cumulativeProbability) {
                return i;
            }
        }
        return probabilties.length - 1; // to account for numerical errors
    }

    public static void main (String[] args) {
        double[] probabilties = new double[]{0.1, 0.1, 0.2, 0.6}; // the final value is optional
        for (int i = 0; i < 20; ++i) {
            System.out.printf("%d\n", choose(probabilties));
        }
    }
}

【讨论】：

【解决方案6】：

下面的回答有点像@daniu的回答，但是使用了TreeMap提供的方法：

private final NavigableMap<Double, Runnable> map = new TreeMap<>();
{
    map.put(0.3d, this::branch30Percent);
    map.put(1.0d, this::branch70Percent);
}
private final SecureRandom random = new SecureRandom();

private void branch30Percent() {}

private void branch70Percent() {}

public void runRandomly() {
    final Runnable value = map.tailMap(random.nextDouble(), true).firstEntry().getValue();
    value.run();
}

这种方式在找到匹配条目之前无需迭代整个映射，但使用TreeSet 查找具有特定键与另一个键比较的条目的功能。然而，只有当地图中的条目数量很大时，这才会产生影响。但是它确实节省了几行代码。

【讨论】：

【解决方案7】：

我会这样做：

class RandomMethod {
    private final Runnable method;
    private final int probability;

    RandomMethod(Runnable method, int probability){
        this.method = method;
        this.probability = probability;
    }

    public int getProbability() { return probability; }
    public void run()      { method.run(); }
}

class MethodChooser {
    private final List<RandomMethod> methods;
    private final int total;

    MethodChooser(final List<RandomMethod> methods) {
        this.methods = methods;
        this.total = methods.stream().collect(
            Collectors.summingInt(RandomMethod::getProbability)
        );
    }

    public void chooseMethod() {
        final Random random = new Random();
        final int choice = random.nextInt(total);

        int count = 0;
        for (final RandomMethod method : methods)
        {
            count += method.getProbability();
            if (choice < count) {
                method.run();
                return;
            }
        }
    }
}

示例用法：

MethodChooser chooser = new MethodChooser(Arrays.asList(
    new RandomMethod(Blah::aaa, 1),
    new RandomMethod(Blah::bbb, 3),
    new RandomMethod(Blah::ccc, 1)
));

IntStream.range(0, 100).forEach(
    i -> chooser.chooseMethod()
);

Run it here.

【讨论】：

看到你对其他答案的所有抱怨有点有趣，比如“不是Java”之类的；一切都以积极的被动语气。然后在这里看到你的答案，甚至不包括人类语言；至少我没有看到任何解释您的代码，您使用的算法，优点，缺点等等等。与其浪费更多时间批评其他答案，实际上这些答案非常好，为什么不先改进自己的呢？（附：differencebetween.com/…）
@SebastianMach 没有以被动攻击的语气说什么，但如果你这样解释，我也无能为力。我倾向于使我的 cmets 尽可能简洁。我发布的所有内容都是关于我认为他们的答案可以如何改进的建设性批评。没有人能免于批评 - 没有必要生气并把它当作个人。
@SebastianMach 现在，我认为您的最后一个链接有点破坏了自己，但是是的，您是正确的，我已经修复了它
就我个人而言，我总是感谢那些破坏我自己的链接。英语不是我的母语，“机会”也是我的第一选择的可能性非零。