Python FTP“块”迭代器（不将整个文件加载到内存中）答案

【问题标题】：Python FTP "chunk" iterator (without loading entire file into memory)Python FTP“块”迭代器（不将整个文件加载到内存中）
【发布时间】：2016-08-24 20:55:22
【问题描述】：

关于检索 FTP 文件并将其写入流（如字符串缓冲区或随后可以迭代的文件）的堆栈溢出有几个答案。

如：Read a file in buffer from FTP python

但是，这些解决方案涉及在开始处理内容之前将整个文件加载到内存或将其下载到磁盘。

我没有足够的内存来缓冲整个文件，并且我无权访问磁盘。这可以通过处理回调函数中的数据来完成，但是我想知道是否可以将 ftp 代码包装 以某种返回迭代器的魔法而不是在我的代码中添加回调。 p>

I.E.而不是：

def get_ftp_data(handle_chunk):
    ...
    ftp.login('uesr', 'password') # authentication required
    ftp.retrbinary('RETR etc', handle_chunk)
    ...

get_ftp_data(do_stuff_to_chunk)

我想要：

for chunk in get_ftp_data():
    do_stuff_to_chunk(chunk)

并且（与现有答案不同）我想在迭代之前不将整个 ftp 文件写入磁盘或内存。

【问题讨论】：

有类似问题Turn functions with a callback into Python generators?

标签： python ftp

【解决方案1】：

您必须将 retrbinary 调用放在另一个线程中，并将回调提要块放入迭代器：

import threading, Queue

def ftp_chunk_iterator(FTP, command):
    # Set maxsize to limit the number of chunks kept in memory at once.
    queue = Queue.Queue(maxsize=some_appropriate_size)

    def ftp_thread_target():
        FTP.retrbinary(command, callback=queue.put)
        queue.put(None)

    ftp_thread = threading.Thread(target=ftp_thread_target)
    ftp_thread.start()

    while True:
        chunk = queue.get()
        if chunk is not None:
            yield chunk
        else:
            return

如果你不能使用线程，你能做的最好的就是把你的回调写成协程：

from contextlib import closing


def process_chunks():
    while True:
        try:
            chunk = yield
        except GeneratorExit:
            finish_up()
            return
        else:
            do_whatever_with(chunk)

with closing(process_chunks()) as coroutine:

    # Get the coroutine to the first yield
    coroutine.next()

    FTP.retrbinary(command, callback=coroutine.send)
# coroutine.close() #  called by exiting the block

【讨论】：

我害怕那个。但直观地说，它似乎不应该绝对需要线程。此外，虽然我没有在原始问题中明确说明这一点，但我的执行环境没有线程。我希望有更好的方法。
@natb1：不幸的是，它确实需要线程。如果你不能使用线程，你能做的最好的就是把你的回调写成协程，这样不够灵活，而且更乱。
感谢您向我介绍协程。不幸的是，这个例子在我看来就像是一种冗长的说法 FTP.retrbinary(command, callback=do_whatever_with)
@natb1：如果do_whatever_with 是一个简单的函数，但您可以根据协程的状态将任意代码块放在那里。在它确实减少到FTP.retrbinary(command, callback=do_whatever_with) 的情况下，迭代器也将是不必要的膨胀。
@user2357112 我喜欢线程版本。协程乍一看是简单的回调解决方案，但存在显着差异 - process_chunks 生成器所有处理（对于所有块）都写在一段代码中，直到 close() 。非常好。建议：将协程创建和关闭放到with块中怎么样？