如何使用 GridFSBucket 逐块处理正在下载的文件答案

【问题标题】：How to process the file being downloaded chunk by chunk using GridFSBucket如何使用 GridFSBucket 逐块处理正在下载的文件
【发布时间】：2019-01-17 19:00:03
【问题描述】：

我的目标是编写 python 脚本，它使用 gridfs 读取文本文件。并通过它逐行迭代。当我使用gridfs.get() 时，注意到我在每次迭代中都会获得大量字节。请指导我，如何使用“get”逐行迭代。

我可以通过使用GridFsBucket 并将数据不必要地存储在临时文件中并以读取模式再次打开以逐行迭代来管理此问题。正在寻找更好的方法来处理这个问题。

    file_store = GridFSBucket(db)
    file = open('test.txt', 'wb')
    file_store.download_to_stream(raw_file[0].get('ObjectId'),file)
    if not file:
        return None
    file.close()
    file=open('test.txt','rb')
    for line in file:
        .....

【问题讨论】：

标签： python pymongo gridfs

【解决方案1】：

可以使用GridFSBucket 和open_download_stream 实现此目的。

下面是示例代码：

file_store = GridFSBucket(mongo.db, bucket_name=<fs CollectionName>)

file_handler = file_store.open_download_stream(object_id)

eachline=file_handler.readline()
while eachline:
   .........processss
   eachline = file_handler.readline()

【讨论】：