Python 随机列表理解答案

【问题标题】：Python Random List ComprehensionPython 随机列表理解
【发布时间】：2013-10-16 03:38:19
【问题描述】：

我有一个类似的列表：

[1 2 1 4 5 2 3 2 4 5 3 1 4 2]

我想从这个列表中创建一个包含 x 个随机元素的列表，其中所有选择的元素都不相同。困难的部分是我想通过使用列表理解来做到这一点...... 因此，如果 x = 3，可能的结果是：

[1 2 3]
[2 4 5]
[3 1 4]
[4 5 1]

等等……

谢谢！

我应该指定我不能将列表转换为集合。对不起！我需要对随机选择的数字进行加权。因此，如果 1 在列表中出现 4 次，而 3 在列表中出现 2 次，则 1 被选中的可能性是其两倍...

【问题讨论】：

你有没有想出一个方法来没有列表理解？
你考虑过用一套吗？
你需要澄清这个问题：是像[1, 2, 1] OK 这样的结果——换句话说，一个子列表有两个相同的值（在这种情况下是1）。
...选择的元素都不相同...
@FMc 该特定子句还有什么含义？

标签： python

【解决方案1】：

免责声明：“使用列表理解”的要求是荒谬的。

此外，如果您想使用权重，Eli Bendersky 的页面weighted random sampling 上列出了许多出色的方法。

以下是低效的，不能扩展，等等等等。

也就是说，它不是一个而是两个（两个！）列表推导，返回一个列表，从不重复元素，并且在某种意义上尊重权重：

>>> s = [1, 2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2]
>>> [x for x in random.choice([p for c in itertools.combinations(s, 3) for p in itertools.permutations(c) if len(set(c)) == 3])]
[3, 1, 2]
>>> [x for x in random.choice([p for c in itertools.combinations(s, 3) for p in itertools.permutations(c) if len(set(c)) == 3])]
[5, 3, 4]
>>> [x for x in random.choice([p for c in itertools.combinations(s, 3) for p in itertools.permutations(c) if len(set(c)) == 3])]
[1, 5, 2]

.. 或者，由 FMc 简化：

>>> [x for x in random.choice([p for p in itertools.permutations(s, 3) if len(set(p)) == 3])]
[3, 5, 2]

（我会将x for x 留在其中，即使不简单地写list(random.choice(..)) 或将其保留为元组会很痛苦..）

【讨论】：

聪明回答一个疯狂的问题。也许您只是想了解列表理解统计信息，但itertools.permutations(s, 3) 还不够吗？不需要组合和排列。
@FMc：不，我只是在最后一秒意识到combinations 是不够的，并在其中添加了permutations，而没有考虑我是否需要它们。好电话:^)
这是一个很好的答案，但我仍然想知道它是否回答了 OP 的问题。如果我们没有整数列表，而是有一个基于对象 ID 相等的自定义对象列表，该怎么办。你不能保证你不会连续两次拉出同一个物体。我设想这是一袋弹珠——每一个都有一个价值。您一次随机拉出 3 个弹珠，直到一个弹珠都没有剩下 - 并且可能会或可能不会限制某些弹珠不能一起提取......您正在做的是将刚刚拉出的 3 个弹珠放入放回袋子里，再拉一次。
@mgilson：当我第一次阅读 OP 的问题时，我并不认为它是模棱两可的，但在你和其他人提出的所有观点之后，我不再知道发生了什么。 :^) 你说得对，我把它当作罐子里的硬币，直到你得到三个不同的问题。
我认为我们中的任何人都不知道发生了什么。我们都在尽可能地猜测......

【解决方案2】：

一般来说，您不想在列表理解中做这种事情——这会导致代码更难阅读。但是，如果您真的需要，我们可以编写一个完全可怕的 1 班轮：

>>> values = [random.randint(0,10) for _ in xrange(12)]
>>> values
[1, 10, 6, 6, 3, 9, 0, 1, 8, 9, 1, 2]
>>> # This is the 1 liner -- The other line was just getting us a list to work with.
>>> [(lambda x=random.sample(values,3):any(values.remove(z) for z in x) or x)() for _ in xrange(4)]
[[6, 1, 8], [1, 6, 10], [1, 0, 2], [9, 3, 9]]

请不要使用此代码 - 我只是出于娱乐/学术原因发布它。

它是这样工作的：

我在列表推导中创建了一个函数，默认参数是从输入列表中随机选择的 3 个元素。在函数内部，我从values 中删除了元素，这样它们就不能再被挑选出来了。由于list.remove 返回None，我可以使用any(lst.remove(x) for x in ...) 删除值并返回False。由于any 返回False，我们在调用函数时点击了or 子句，该子句仅返回x（具有3 个随机选择项的默认值）。剩下的就是调用函数并让魔法发生。

这里需要注意的是，您需要确保您请求的组数（这里我选择 4）乘以每个组的项目数（这里我选择 3）小于或等于输入列表中的值。这似乎很明显，但无论如何可能值得一提......

这是我将shuffle 拉入列表理解的另一个版本：

>>> lst = [random.randint(0,10) for _ in xrange(12)]
>>> lst
[3, 5, 10, 9, 10, 1, 6, 10, 4, 3, 6, 5]
>>> [lst[i*3:i*3+3] for i in xrange(shuffle(lst) or 4)]
[[6, 10, 6], [3, 4, 10], [1, 3, 5], [9, 10, 5]]

这比我的第一次尝试要好得多，但是，大多数人在弄清楚这段代码在做什么之前仍然需要停下来，摸摸头。我仍然断言多行执行此操作会更好。

【讨论】：

能否请您也发布应该使用的可读版本？
@Asad -- 没有可读的版本（据我所知）仅包含像 OP 所要求的列表理解。如果删除“唯一列表理解”子句，其他答案可以很好地提供可读版本，我会考虑使用这些版本来解决这个问题——所以我真的不觉得有必要重复这项工作..
@mgilson，如果你真的想的话，很容易将洗牌合并到列表理解中。
如果 Asad 的解释是正确的——这也是我第一次阅读它的方式，那么对列表理解的要求简直是荒谬的。这就是我在对这个问题发表的第一条评论中的想法。
shuffle 返回None 所以xrange(shuffle(...) or 4) 可以工作

【解决方案3】：

如果我正确理解了您的问题，这应该可行：

def weighted_sample(L, x):
    # might consider raising some kind of exception of len(set(L)) < x

    while True:
        ans = random.sample(L, x)
        if len(set(ans)) == x:
            return ans

如果你想要很多这样的样本，你可以这样做：

[weighted_sample(L, x) for _ in range(num_samples)]

我很难理解对不只是混淆的采样逻辑的理解。逻辑有点太复杂了。这听起来像是随机添加到我的家庭作业中。

如果你不喜欢无限循环，我还没有尝试过，但我认为这会起作用：

def weighted_sample(L, x):

    ans = []        
    c = collections.Counter(L)  

    while len(ans) < x:
        r = random.randint(0, sum(c.values())
        for k in c:
            if r < c[k]:
                ans.append(k)
                del c[k]
                break
            else:
                r -= c[k]
        else:
            # maybe throw an exception since this should never happen on valid input

     return ans

【讨论】：

我不确定这是否会引入偏差，但如果存在重复项，则比完全丢弃随机选择更有效的方法是简单地从集合中尚未包含的值中随机选择一个值。到目前为止，这似乎是这里唯一真正满足 OP 要求的答案。
是的，问题在于保留原始权重。我正在做一个更好的。 :)
+1 我猜 o.o 我不知道必须对频率进行加权。我删除了我的答案。
是的。我至少会在循环上设置一个上限，如果您花了太长时间试图获得非重复选择，则抛出异常。最好的方法是根据数组中的值的频率为它们创建一个概率密度函数，并使用它来选择一个值，这将使其成为线性时间。但是，我找不到任何内置功能。
如果没有有效的解决方案（即原始列表中的不同值小于 x），则无限循环肯定会有危险

【解决方案4】：

首先，我希望你的列表可能像

[1,2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2]

因此，如果您想将给定列表中的排列打印为大小 3，您可以执行以下操作。

import itertools

l = [1,2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2]

for permutation in itertools.permutations(list(set(l)),3):
    print permutation,

输出：

(1, 2, 3) (1, 2, 4) (1, 2, 5) (1, 3, 2) (1, 3, 4) (1, 3, 5) (1, 4, 2) (1, 4, 3) (1, 4, 5) (1, 5, 2) (1, 5, 3) (1, 5, 4) (2, 1, 3) (2, 1, 4) (2, 1, 5) (2, 3, 1) (2, 3, 4) (2, 3, 5) (2, 4, 1) (2, 4, 3) (2, 4, 5) (2, 5, 1) (2, 5, 3) (2, 5, 4) (3, 1, 2) (3, 1, 4) (3, 1, 5) (3, 2, 1) (3, 2, 4) (3, 2, 5) (3, 4, 1) (3, 4, 2) (3, 4, 5) (3, 5, 1) (3, 5, 2) (3, 5, 4) (4, 1, 2) (4, 1, 3) (4, 1, 5) (4, 2, 1) (4, 2, 3) (4, 2, 5) (4, 3, 1) (4, 3, 2) (4, 3, 5) (4, 5, 1) (4, 5, 2) (4, 5, 3) (5, 1, 2) (5, 1, 3) (5, 1, 4) (5, 2, 1) (5, 2, 3) (5, 2, 4) (5, 3, 1) (5, 3, 2) (5, 3, 4) (5, 4, 1) (5, 4, 2) (5, 4, 3)

希望这会有所帮助。 :)

【讨论】：

【解决方案5】：

>>> from random import shuffle
>>> L = [1, 2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2]
>>> x=3
>>> shuffle(L)
>>> zip(*[L[i::x] for i in range(x)])
[(1, 3, 2), (2, 2, 1), (4, 5, 3), (1, 4, 4)]

您也可以使用生成器表达式代替列表推导

>>> zip(*(L[i::x] for i in range(x)))
[(1, 3, 2), (2, 2, 1), (4, 5, 3), (1, 4, 4)]

【讨论】：

您的输出有重复的值，例如。 (2, 2, 1), (1, 4, 4).
@Asad -- 没关系。限制是，如果输入列表有 20 个项目，则在您从中挑选出所有 20 个项目后，您输入的数字分布相同——只是现在随机分组。
@Asad。问题不清楚。大多数答案似乎将要求解释为与原始列表没有重复
@mgilson 不，不是。引用 OP：“我想从这个列表中创建一个 x 个随机元素的列表其中没有一个选择的元素是相同的”。
@Asad -- 你误会了。请注意，即使是 OP 的结果代码也有多个 5 和 2。当然，它们分布在不同的子列表中。关键是每个元素（考虑元素的 ID，假设所有元素都是唯一的，而不是元素值）从列表中挑选一次。

【解决方案6】：

从一种没有列表竞争的方式开始：

import random
import itertools


alphabet = [1, 2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2]


def alphas():
    while True:
        yield random.choice(alphabet)


def filter_unique(iter):
    found = set()
    for a in iter:
        if a not in found:
            found.add(a)
            yield a


def dice(x):
    while True:
        yield itertools.islice(
            filter_unique(alphas()),
            x
        )

for i, output in enumerate(dice(3)):
    print list(output)
    if i > 10:
        break

列表推导有问题的部分是filter_unique()，因为列表推导没有“记忆”它所输出的内容。可能的解决方案是生成许多输出，而没有找到质量好的输出为@DSM suggested。

【讨论】：

【解决方案7】：

缓慢而幼稚的方法是：

import random
def pick_n_unique(l, n):
    res = set()
    while len(res) < n:
        res.add(random.choice(l))
    return list(res)

这将选择元素，并且仅在具有 n 唯一元素时退出：

>>> pick_n_unique([1, 2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2], 3)
[2, 3, 4]
>>> pick_n_unique([1, 2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2], 3)
[3, 4, 5]

但是，例如，如果您有一个包含 30 个 1s 和一个 2 的列表，它可能会变慢，因为一旦它有一个 1，它就会继续旋转，直到它最终到达 2 .更好的方法是计算每个唯一元素的出现次数，选择一个按其出现次数加权的随机元素，从计数列表中删除该元素，然后重复直到获得所需的元素数量：

def weighted_choice(item__counts):
    total_counts = sum(count for item, count in item__counts.items())
    which_count = random.random() * total_counts
    for item, count in item__counts.items():
        which_count -= count
        if which_count < 0:
            return item
    raise ValueError("Should never get here")

def pick_n_unique(items, n):
    item__counts = collections.Counter(items)
    if len(item__counts) < n:
        raise ValueError(
            "Can't pick %d values with only %d unique values" % (
                n, len(item__counts))

    res = []
    for i in xrange(n):
        choice = weighted_choice(item__counts)
        res.append(choice)
        del item__counts[choice]
    return tuple(res)

无论如何，这都是一个不适合列出推导式的问题。

【讨论】：

【解决方案8】：

def sample(self, population, k):
    n = len(population)
    if not 0 <= k <= n:
        raise ValueError("sample larger than population")
    result = [None] * k
    try:
        selected = set()
        selected_add = selected.add
        for i in xrange(k):
            j = int(random.random() * n)
            while j in selected:
                j = int(random.random() * n)
            selected_add(j)
            result[i] = population[j]
    except (TypeError, KeyError):   # handle (at least) sets
        if isinstance(population, list):
            raise
        return self.sample(tuple(population), k)
    return result

以上是示例函数 Lib/random.py 的简化版本。我只删除了一些小数据集的优化代码。代码直接告诉我们如何实现自定义的示例函数：

获取随机数
如果号码之前出现过，就放弃它并获得一个新的
重复上述步骤，直到获得所需的所有样本编号。

那么真正的问题是如何通过权重从列表中获取随机值。这可能是 Python 标准库中的原始random.sample(population, 1)（这里有点矫枉过正，但很简单）。

下面是一个实现，因为重复项代表给定列表中的权重，我们可以使用int(random.random() * array_length) 来获取数组的随机索引。

import random
arr = [1, 2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2]

def sample_by_weight( population, k):
    n = len(population)
    if not 0 <= k <= len(set(population)):
        raise ValueError("sample larger than population")
    result = [None] * k
    try:
        selected = set()
        selected_add = selected.add
        for i in xrange(k):
            j = population[int(random.random() * n)]
            while j in selected:
                j = population[int(random.random() * n)]
            selected_add(j)
            result[i] = j
    except (TypeError, KeyError):   # handle (at least) sets
        if isinstance(population, list):
            raise
        return self.sample(tuple(population), k)
    return result

[sample_by_weight(arr,3) for i in range(10)]

【讨论】：

这里理解的目的是什么？ random.sample(arr,3) 已经从数组中返回了 3 个元素的样本。
这样做并不能保证我选择的元素会有所不同。这接近我需要的...如果 random.sample 会返回 3 个不同的元素，那么这将是完美的！
@Asad 如果没有误会，OP 说他/她想要一个列表，里面的元素是 3 个整数的列表。
@JoranBeasley arr 中的数据由 OP 提供，我在这里使用 range(10) 因为 OP 没有提到他想要的列表的大小。
@braden.groom 我认为我更新的答案可以在按重量选择样本结果的同时保持样本结果的唯一性。

【解决方案9】：

通过设置：

from random import shuffle
from collections import deque

l = [1, 2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2]

这段代码：

def getSubLists(l,n):
    shuffle(l) #shuffle l so the elements are in 'random' order
    l = deque(l,len(l)) #create a structure with O(1) insert/pop at both ends
    while l: #while there are still elements to choose
        sample = set() #use a set O(1) to check for duplicates
        while len(sample) < n and l: #until the sample is n long or l is exhausted
            top = l.pop() #get the top value in l
            if top in sample: 
                l.appendleft(top) #add it to the back of l for a later sample
            else:
                sample.add(top) #it isn't in sample already so use it
        yield sample #yield the sample

你最终得到：

for s in getSubLists(l,3):
    print s
>>> 
set([1, 2, 5])
set([1, 2, 3])
set([2, 4, 5])
set([2, 3, 4])
set([1, 4])

【讨论】：