【发布时间】:2011-10-21 06:05:51
【问题描述】:
我正在使用 urllib2 制作一个恢复下载器,大致基于this 方法。我可以结束程序并重新启动它,它会从中断的地方开始下载,下载的文件大小与一次下载的文件大小相同。
但是,我在禁用和重新启用网络时对其进行了测试,但无法正确下载。文件大小最终超过文件应有的长度,并且文件无法正常工作。我错过了什么,或者这可能是一个 urllib2 错误?
import urllib2
opener = urllib2.build_opener();
self.count = 0 # Counts downloaded size.
self.downloading = True
while (not(self.success) and self.downloading):
try:
self.Err = ""
self._netfile = self.opener.open(self.url)
self.filesize = float(self._netfile.info()['Content-Length'])
if (os.path.exists(self.localfile) and os.path.isfile(self.localfile)):
self.count = os.path.getsize(self.localfile)
print self.count,"of",self.filesize,"downloaded."
if self.count >= self.filesize:
#already downloaded
self.downloading = False
self.success = True
self._netfile.close()
return
if (os.path.exists(self.localfile) and os.path.isfile(self.localfile)):
#File already exists, start where it left off:
#This seems to corrupt the file sometimes?
self._netfile.close()
req = urllib2.Request(self.url)
print "file downloading at byte: ",self.count
req.add_header("Range","bytes=%s-" % (self.count))
self._netfile = self.opener.open(req)
if (self.downloading): #Don't do it if cancelled, downloading=false.
next = self._netfile.read(1024)
self._outfile = open(self.localfile,"ab") #to append binary
self._outfile.write(next)
self.readsize = desc(self.filesize) # get size mb/kb
self.count += 1024
while (len(next)>0 and self.downloading):
next = self._netfile.read(1024)
self._outfile.write(next)
self.count += len(next)
self.success = True
except IOError, e:
print e
self.Err=("Download error, retrying in a few seconds: "+str(e))
try:
self._netfile.close()
except Exception:
pass
time.sleep(8) #Then repeat
【问题讨论】:
-
已经有(几乎)可以做恢复的插入式 urllib 替换:urlgrabber.baseurl.org
-
您是否尝试过禁用/启用网络?它会自动正确重新下载吗?
-
我认为它被一些Linux软件包内部管理使用,所以它应该经过很好的测试 - 我自己很久以前就成功使用过它。它甚至还有重试次数等设置。
标签: python urllib2 resume-download