Python：如何下载 zip 文件答案

【问题标题】：Python: How to download a zip filePython：如何下载 zip 文件
【发布时间】：2010-08-20 16:43:51
【问题描述】：

我正在尝试使用此代码下载 zip 文件：

o = urllib2.build_opener( urllib2.HTTPCookieProcessor() )

#login
p = urllib.urlencode( { usernameField: usernameVal, passField: passVal } )
f = o.open(authUrl,  p )
data = f.read()
print data
f.close()

#download file
f = o.open(remoteFileUrl)
localFile = open(localFile, "wb")
localFile.write(f.read())
f.close()

我正在获取一些二进制数据，但我“下载”的文件太小，不是有效的 zip 文件。我没有正确检索 zip 文件吗？ f = o.open(remoteFileUrl) 的 HTTP 响应标头如下所示。不知道是否需要特殊处理来处理这个响应：

HTTP/1.1 200 OK 服务器：
Apache-Coyote/1.1 Pragma: 私有
缓存控制：必须重新验证
到期：1997 年 12 月 31 日星期二 23:59:59 GMT
内容处置：内联；
文件名="文件.zip";
内容类型：应用程序/zip
传输编码：分块

【问题讨论】：

标签： python

【解决方案1】：

f.read() 不一定会读取整个文件，而只是读取其中的一个数据包（如果文件很小，可能是整个文件，但不会是大文件）。

你需要像这样循环数据包：

while 1:
   packet = f.read()
   if not packet:
      break
   localFile.write(packet)
f.close()

f.read() 返回一个空包，表示您已阅读整个文件。

【讨论】：

我很好奇你在文档中的什么地方找到了这个
docs.python.org/library/urllib.html#urllib.urlopen : "返回一个类似文件的对象" 然后docs.python.org/library/stdtypes.html#file.read
真的只是一个数据包？我在显示的链接上检查了文档，没有看到任何地方说 read() 直到 EOF 才会读取。你能解释更多吗？
@Corey：urlopen 的文档说“read() 方法，如果 size 参数被省略或为负，可能直到数据流结束才读取”。

【解决方案2】：

如果你不介意将整个 zip 文件读入内存，最快的读写方法如下：

data  = f.readlines()
with open(localFile,'wb') as output:
    output.writelines(data)

否则，要在通过网络获取它们时以块的形式读取和写入，请执行

with open(localFile, "wb") as output:
    chunk = f.read()
    while chunk:
        output.write(chunk)
        chunk = f.read()

这有点不整洁，但避免了一次将整个文件保存在内存中。希望对您有所帮助。

【讨论】：

【解决方案3】：

这里有一个更健壮的解决方案，使用 urllib2 分块下载文件并打印下载状态

import os
import urllib2
import math

def downloadChunks(url):
    """Helper to download large files
        the only arg is a url
       this file will go to a temp directory
       the file will also be downloaded
       in chunks and print out how much remains
    """

    baseFile = os.path.basename(url)

    #move the file to a more uniq path
    os.umask(0002)
    temp_path = "/tmp/"
    try:
        file = os.path.join(temp_path,baseFile)

        req = urllib2.urlopen(url)
        total_size = int(req.info().getheader('Content-Length').strip())
        downloaded = 0
        CHUNK = 256 * 10240
        with open(file, 'wb') as fp:
            while True:
                chunk = req.read(CHUNK)
                downloaded += len(chunk)
                print math.floor( (downloaded / total_size) * 100 )
                if not chunk: break
                fp.write(chunk)
    except urllib2.HTTPError, e:
        print "HTTP Error:",e.code , url
        return False
    except urllib2.URLError, e:
        print "URL Error:",e.reason , url
        return False

    return file

【讨论】：

只有当您处理没有发送“Content-Lenght”标头的情况下，IMO 才会强大

【解决方案4】：

试试这个：

#download file
f = o.open(remoteFileUrl)

response = ""
while 1:
    data = f.read()
    if not data:
        break
    response += data

with open(localFile, "wb") as local_file:
    local_file.write(response)

【讨论】：