【问题标题】:Python: how to stream/pipe data out of gzip compression?Python:如何从 gzip 压缩中流式传输/管道传输数据?
【发布时间】:2012-02-08 00:39:51
【问题描述】:

我需要做这样的事情,但是在 python 中:

dd if=/dev/sdb | gzip -c | curl ftp upload

我不能在 Popen 中使用整个命令,因为:

  1. 我需要非阻塞操作
  2. 我需要进度信息(尝试循环遍历 proc.stderr 无济于事)

另一件大事是我无法在上传之前在内存或磁盘上创建压缩的 gzip 文件。

这就是我想要弄清楚如何做的事情,而 gzip_stream_of_strings(input) 是未知数:

import os, pycurl
filename = '/path/to/super/large/file.img'
filesize = os.path.getsize(filename)

def progress(dl_left, dl_completed, ul_left, ul_completed):
    return (ul_completed/filesize)*100

def main():
    c = pycurl.Curl()
    c.setopt(c.URL, 'ftp://IP/save_as.img.gz')
    c.setopt(pycurl.NOPROGRESS, 0)
    c.setopt(pycurl.PROGRESSFUNCTION, progress)
    c.setopt(pycurl.UPLOAD, 1)
    c.setopt(pycurl.INFILESIZE, filesize)
    c.setopt(pycurl.USERPWD, 'user:passwd')
    with open(filename) as input:
        c.setopt(pycurl.READFUNCTION, gzip_stream_of_stings(input))
        c.perform()
        c.close()

非常感谢任何帮助!

编辑: 这是解决方案:

from gzip import GzipFile
from StringIO import StringIO

CHUNCK_SIZE = 1024

class GZipPipe(StringIO):
    """This class implements a compression pipe suitable for asynchronous 
    process.
    Credit to cdvddt @ http://snippets.dzone.com/posts/show/5644

    @param source: this is the input file to compress
    @param name: this is stored as the name in the gzip header
    @function read: call this to read(size) chunks from the gzip stream        
    """
    def __init__(self, source = None, name = "data"):
        StringIO.__init__(self)

        self.source = source
        self.source_eof = False
        self.buffer = ""
        self.zipfile = GzipFile(name, 'wb', 9, self)

    def write(self, data):
        self.buffer += data

    def read(self, size = -1):
        while ((len(self.buffer) < size) or (size == -1)) and not self.source_eof:
            if self.source == None: 
                break
            chunk = self.source.read(CHUNCK_SIZE)
            self.zipfile.write(chunk)
            if (len(chunk) < CHUNCK_SIZE) :
                self.source_eof = True
                self.zipfile.flush()
                self.zipfile.close()
                break

        if size == 0:
            result = ""
        if size >= 1:
            result = self.buffer[0:size]
            self.buffer = self.buffer[size:]
        else:
            result = self.buffer
            self.buffer = ""

        return result

这样使用:

with open(filename) as input:
    c.setopt(pycurl.READFUNCTION, GZipPipe(input).read)

【问题讨论】:

  • 您应该按照自己的答案发布解决方案并接受它。

标签: python stream gzip


【解决方案1】:

内置的zlib 库允许使用任何文件类型对象,包括文本流。

import os, pycurl, zlib
from cStringIO import StringIO
filename = '/path/to/super/large/file.img'
filesize = os.path.getsize(filename)

def progress(dl_left, dl_completed, ul_left, ul_completed):
    return (ul_completed/filesize)*100

def main():
    c = pycurl.Curl()
    c.setopt(c.URL, 'ftp://IP/save_as.img.gz')
    c.setopt(pycurl.NOPROGRESS, 0)
    c.setopt(pycurl.PROGRESSFUNCTION, progress)
    c.setopt(pycurl.UPLOAD, 1)
    c.setopt(pycurl.INFILESIZE, filesize)
    c.setopt(pycurl.USERPWD, 'user:passwd')
    with open(filename) as input:
        s = StringIO()
        c.setopt(pycurl.READFUNCTION, s.write(zlib.compress(input.readlines())))
        c.perform()
        c.close()

我没有测试过这个。请参阅this SO question 了解更多信息。

【讨论】:

  • 感谢您的回答!链接的 SO 问题,结合使用 StringIO 的提示,让我找到了here,这解决了我的问题:)
猜你喜欢
  • 2013-11-11
  • 2018-02-21
  • 1970-01-01
  • 1970-01-01
  • 2011-10-23
  • 1970-01-01
  • 1970-01-01
  • 2016-10-11
  • 1970-01-01
相关资源
最近更新 更多