网络重新连接时Python urllib2恢复下载不起作用答案

【问题标题】：Python urllib2 resume download doesn't work when network reconnects网络重新连接时Python urllib2恢复下载不起作用
【发布时间】：2011-10-21 06:05:51
【问题描述】：

我正在使用 urllib2 制作一个恢复下载器，大致基于this 方法。我可以结束程序并重新启动它，它会从中断的地方开始下载，下载的文件大小与一次下载的文件大小相同。

但是，我在禁用和重新启用网络时对其进行了测试，但无法正确下载。文件大小最终超过文件应有的长度，并且文件无法正常工作。我错过了什么，或者这可能是一个 urllib2 错误？

    import urllib2
    opener = urllib2.build_opener();

    self.count = 0 # Counts downloaded size.
    self.downloading = True
    while (not(self.success) and self.downloading):
        try:
            self.Err = ""
            self._netfile = self.opener.open(self.url)
            self.filesize = float(self._netfile.info()['Content-Length'])

            if (os.path.exists(self.localfile) and os.path.isfile(self.localfile)):
                self.count = os.path.getsize(self.localfile)
            print self.count,"of",self.filesize,"downloaded."
            if self.count >= self.filesize:
                #already downloaded
                self.downloading = False
                self.success = True
                self._netfile.close()
                return

            if (os.path.exists(self.localfile) and os.path.isfile(self.localfile)):
                #File already exists, start where it left off:
                #This seems to corrupt the file sometimes?
                self._netfile.close()
                req = urllib2.Request(self.url)
                print "file downloading at byte: ",self.count
                req.add_header("Range","bytes=%s-" % (self.count))
                self._netfile = self.opener.open(req)
            if (self.downloading): #Don't do it if cancelled, downloading=false.
                next = self._netfile.read(1024)
                self._outfile = open(self.localfile,"ab") #to append binary
                self._outfile.write(next)
                self.readsize = desc(self.filesize) # get size mb/kb
                self.count += 1024
                while (len(next)>0 and self.downloading):
                    next = self._netfile.read(1024)
                    self._outfile.write(next)
                    self.count += len(next)
                self.success = True
        except IOError, e:
            print e
            self.Err=("Download error, retrying in a few seconds: "+str(e))
            try:
                self._netfile.close()
            except Exception:
                pass
            time.sleep(8) #Then repeat

【问题讨论】：

已经有（几乎）可以做恢复的插入式 urllib 替换：urlgrabber.baseurl.org
您是否尝试过禁用/启用网络？它会自动正确重新下载吗？
我认为它被一些Linux软件包内部管理使用，所以它应该经过很好的测试 - 我自己很久以前就成功使用过它。它甚至还有重试次数等设置。

标签： python urllib2 resume-download

【解决方案1】：

我在 IOError 处理程序中添加了 self._outfile.close() 和 self._netfile.close()，这似乎已经修复了它。我猜这个错误是由于再次打开而不关闭它造成的。

【讨论】：

这可能与 _outfile.close() 没有机会刷新缓冲区有关。许多文件操作被缓冲以提高速度，并且 .close() 会将内存中的内容刷新到磁盘，从而使文件具有正确的长度并具有正确的内容。