重用生成器表达式答案

【问题标题】：Reusing generator expressions重用生成器表达式
【发布时间】：2018-09-01 23:59:48
【问题描述】：

生成器表达式是一个非常有用的工具，与列表推导相比有一个巨大的优势，那就是它不会为新数组分配内存。

我在使用生成器表达式时面临的问题是，我最终只能编写列表推导式，我只能使用一次这样的生成器：

>>> names = ['John', 'George', 'Paul', 'Ringo']
>>> has_o = (name for name in names if 'o' in name)
>>> for name in has_o:
...   print(name.upper())
...
JOHN
GEORGE
RINGO
>>> for name in has_o:
...   print(name.lower())
...
>>>

上面的代码说明了生成器表达式如何只能使用一次。这是当然的，因为生成器表达式返回生成器的一个实例，而不是定义一个可以一次又一次实例化的生成器函数。

有没有办法在每次使用生成器时对其进行克隆，以使其可重用，或者使生成器表达式语法返回生成器函数而不是单个实例？

【问题讨论】：

有itertools.tee，但你不能同时拥有内存优化生成器和也让它可重用。如果您需要它来提高内存效率，则需要重新创建生成器，否则列表推导式可能就是您想要的。

标签： python generator generator-expression

【解决方案1】：

将其设为lambda：

has_o = lambda names: (name for name in names if 'o' in name)
for name in has_o(["hello","rrrrr"]):
   print(name.upper())
for name in has_o(["hello","rrrrr"]):
   print(name.upper())

lambda 是一个单行代码，每次返回一个新的生成器。这里我选择了能够传递输入列表，但是如果它是固定的，你甚至不需要一个参数：

names = ["hello","rrrrr"]
has_o = lambda: (name for name in names if 'o' in name)
for name in has_o():
   print(name.upper())
for name in has_o():
   print(name.upper())

在最后一种情况下，请注意如果 names 更改或重新分配，lambda 将使用新的 names 对象。您可以使用默认值技巧来修复名称重新分配：

has_o = lambda lst=names: (name for name in lst if 'o' in name)

并且您可以使用默认的值和复制技巧来修复 names 的后续修改（当您认为您的第一个目标是避免创建列表时，这不是超级有用:)）：

has_o = lambda lst=names[:]: (name for name in lst if 'o' in name)

（现在选择吧：））

【讨论】：

我真的很喜欢第二个代码示例，这真的很健壮，可以让你“忘记”生成器是如何创建的
谢谢。你不能完全忘记，因为它仍然依赖于names。但是使用 lambda 的默认值应该可以解决这个问题。编辑以显示潜在问题以及如何在需要时解决它们
我同意，但是即使您按原样使用生成器表达式，它依赖于names 的事实也是如此，不是吗？是否每次实例化都使用其范围内的names 数组，而不是使用它创建的原始names？
是的，有时您想要关注names。这取决于您想要达到的目标。我给出了不同的解决方案。
谢谢，这是一个很好的答案。我会更改顺序并将最后两个示例放在首位，因为我认为这是人们最想要的功能，这样他们就可以立即选择您的解决方案

【解决方案2】：

itertools.tee 允许您从一个可迭代对象中创建多个迭代器：

from itertools import tee

names = ['John', 'George', 'Paul', 'Ringo']
has_o_1, has_o_2 = tee((name for name in names if 'o' in name), 2)
print('iterable 1')
for name in has_o_1:
    print(name.upper())
print('iterable 2')
for name in has_o_2:
    print(name.upper())

输出：

iterable 1
JOHN
GEORGE
RINGO
iterable 2
JOHN
GEORGE
RINGO

【讨论】：

【解决方案3】：

好的，这里有一个代码可以让你的迭代器可重用。它会在每次迭代后自动重置，因此您不必担心任何事情。它的效率如何，嗯，它是两个方法调用（一个用于 tee() 的 next() 反过来又调用迭代器本身的 next()），以及在原始迭代器之上额外的一个 try-except 块。您必须确定微小的速度损失是否可以，或者使用 lambda 来重建迭代器，如其他答案所示。



from itertools import tee

class _ReusableIter:
    """
    This class creates a generator object that wraps another generator and makes it reusable
    again after each iteration is finished.
    It makes two "copies" (using tee()) of an original iterator and iterates over the first one.
    The second "copy" is saved for later use.
    After first iteration reaches its end, it makes two "copies" of the saved "copy", and
    the previous iterator is swapped with the new first "copy" which is iterated over while the second "copy" (a "copy" of the old "copy") waits for the
    end of a new iteration, and so on.
    After each iteration, the _ReusableIter() will be ready to be iterated over again.

    If you layer a _ReusableIter() over another _ReusableIter(), the result can lead you into an indefinite loop,
    or provoke some other unpredictable behaviours.
    This is caused by later explained problem with copying instances of _ReusableIter() with tee().
    Use ReusableIterator() factory function to initiate the object.
    It will prevent you from making a new layer over an already _ReusableIter()
    and return that object instead.

    If you use the _ReusableIter() inside nested loops the first loop
    will get the first element, the second the second, and the last nested loop will
    loop over the rest, then as the last loop is done, the iterator will be reset and
    you will enter the infinite loop. So avoid doing that if the mentioned behaviour is not desired.

    It makes no real sense to copy the _ReusableIter() using tee(), but if you think of doing it for some reason, don't.
    tee() will not do a good job and the original iterator will not really be copied.
    What you will get instead is an extra layer over THE SAME _ReusableIter() for every copy returned.

    TODO: A little speed improvement can be achieved here by implementing tee()'s algorithm directly into _ReusableIter()
    and dump the tee() completely.
    """
    def __init__ (self, iterator):
        self.iterator, self.copy = tee(iterator)
        self._next = self.iterator.next

    def reset (self):
        self.iterator, self.copy = tee(self.copy)
        self._next = self.iterator.next

    def next (self):
        try:
            return self._next()
        except StopIteration:
            self.reset()
            raise

    def __iter__ (self):
        return self

def ReusableIter (iterator):
    if isinstance(iterator, _ReusableIter):
        return iterator
    return _ReusableIter(iterator)

Usage:
>>> names = ['John', 'George', 'Paul', 'Ringo']
>>> has_o = ReusableIter(name for name in names if 'o' in name)
>>> for name in has_o:
>>>     print name
John
George
Ringo
>>> # And just use it again:
>>> for name in has_o:
>>>     print name
John
George
Ringo
>>>

【讨论】：