【发布时间】:2015-09-10 06:07:51
【问题描述】:
非常重要的编辑:所有Ai都是独特的。 p>
问题
我有 A 个 n 唯一 个对象的列表。每个对象Ai都有一个可变的百分比Pi。
我想创建一个算法来生成 k 个对象的新列表 B (k n/2 并且在大多数情况下 k 明显小于 n/2。例如 n=231 , k=21)。列表 B 应该没有重复项,并且将填充来自列表 A 的对象,但具有以下限制:
对象Ai出现在B中的概率是Pi.
我尝试过的
(这些代码在 PHP 中仅用于测试目的) 我首先列出了A
$list = [
"A" => 2.5,
"B" => 2.5,
"C" => 2.5,
"D" => 2.5,
"E" => 2.5,
"F" => 2.5,
"G" => 2.5,
"H" => 2.5,
"I" => 5,
"J" => 5,
"K" => 2.5,
"L" => 2.5,
"M" => 2.5,
"N" => 2.5,
"O" => 2.5,
"P" => 2.5,
"Q" => 2.5,
"R" => 2.5,
"S" => 2.5,
"T" => 2.5,
"U" => 5,
"V" => 5,
"W" => 5,
"X" => 5,
"Y" => 5,
"Z" => 20
];
一开始我尝试了以下两种算法(这些在 PHP 中只是为了测试目的):
$result = [];
while (count($result) < 10) {
$rnd = rand(0,10000000) / 100000;
$sum = 0;
foreach ($list as $key => $value) {
$sum += $value;
if ($rnd <= $sum) {
if (in_array($key,$result)) {
break;
} else {
$result[] = $key;
break;
}
}
}
}
与
$result = [];
while (count($result) < 10) {
$sum = 0;
foreach ($list as $key => $value) {
$sum += $value;
}
$rnd = rand(0,$sum * 100000) / 100000;
$sum = 0;
foreach ($list as $key => $value) {
$sum += $value;
if ($rnd <= $sum) {
$result[] = $key;
unset($list[$key]);
break;
}
}
}
这两种算法的唯一区别是遇到重复时会再次尝试,而当对象表单列表A被拾取时会删除它。事实证明,这两种算法具有相同的概率输出。
我运行第二个算法 100,000 次,并记录每个字母被选中的次数。以下数组包含基于 100,000 次测试在任何列表 B 中选择一个字母的概率。
[A] => 30.213
[B] => 29.865
[C] => 30.357
[D] => 30.198
[E] => 30.152
[F] => 30.472
[G] => 30.343
[H] => 30.011
[I] => 51.367
[J] => 51.683
[K] => 30.271
[L] => 30.197
[M] => 30.341
[N] => 30.15
[O] => 30.225
[P] => 30.135
[Q] => 30.406
[R] => 30.083
[S] => 30.251
[T] => 30.369
[U] => 51.671
[V] => 52.098
[W] => 51.772
[X] => 51.739
[Y] => 51.891
[Z] => 93.74
回顾算法时,这是有道理的。该算法错误地将原始百分比解释为在任何给定位置而不是任何列表 B 中拾取对象的概率百分比。例如,实际上,在列表 B 中选择 Z 的机会是 93%,但在索引 Bn 中选择 Z 的机会 是 20%。这不是我想要的。我希望 Z 在列表 B 中被选中的机会为 20%。
这甚至可能吗?怎么办?
编辑 1
我尝试简单地让所有 Pi = k 的总和,如果所有 Pi是相等的,但是修改了它们的值之后,它开始变得越来越错误。
初始概率
$list= [
"A" => 8.4615,
"B" => 68.4615,
"C" => 13.4615,
"D" => 63.4615,
"E" => 18.4615,
"F" => 58.4615,
"G" => 23.4615,
"H" => 53.4615,
"I" => 28.4615,
"J" => 48.4615,
"K" => 33.4615,
"L" => 43.4615,
"M" => 38.4615,
"N" => 38.4615,
"O" => 38.4615,
"P" => 38.4615,
"Q" => 38.4615,
"R" => 38.4615,
"S" => 38.4615,
"T" => 38.4615,
"U" => 38.4615,
"V" => 38.4615,
"W" => 38.4615,
"X" => 38.4615,
"Y" =>38.4615,
"Z" => 38.4615
];
10,000 次运行后的结果
Array
(
[A] => 10.324
[B] => 59.298
[C] => 15.902
[D] => 56.299
[E] => 21.16
[F] => 53.621
[G] => 25.907
[H] => 50.163
[I] => 30.932
[J] => 47.114
[K] => 35.344
[L] => 43.175
[M] => 39.141
[N] => 39.127
[O] => 39.346
[P] => 39.364
[Q] => 39.501
[R] => 39.05
[S] => 39.555
[T] => 39.239
[U] => 39.283
[V] => 39.408
[W] => 39.317
[X] => 39.339
[Y] => 39.569
[Z] => 39.522
)
【问题讨论】:
-
The probability that an object An appears in B is Pn.这很棘手,我相信这不是您想要的。具体来说,如果k=n/2,至少有一半的元素应该有B_i>=1/2。 -
@amit 我很确定这就是我想要的,但我对我没有正确描述我的目标的可能性持开放态度。
K != n/2bust 而不是K < n/2,通常比n/2少很多,看看我上面说的示例数字。我也不明白B_i是什么意思。 -
在示例中生成“A”的概率是 2.5?在这种情况下不是概率,概率一定在
[0,1]范围内 -
@amit 这些是百分比,所以 2.5% -> .025
-
好的,现在我正在关注您,是的,您的术语是正确的。感谢您的澄清,将尝试提出答案。
标签: php algorithm probability