在使用 head 下载之前检查文件是否存在答案

【问题标题】：Checking file exists before download using head在使用 head 下载之前检查文件是否存在
【发布时间】：2013-12-21 15:22:42
【问题描述】：

我有一个 python 脚本，它将搜索页面源并下载它在源中找到的任何文件。

但是，该脚本实际上会下载不存在的文件（死链接）。

我做了一些研究，发现可以使用 HEAD 来克服这个问题，它提供了错误代码，而无需下载文件或类似的东西。

基本上，我想检查服务器是否返回 404。如果是，那么我的文件不存在，我不想下载它。

我发现以下代码似乎可行，但需要进行一些更改才能与我的脚本一起使用..

c = httplib.HTTPConnection(<hostname>)
c.request("HEAD", <url>)
print c.getresponse().status 

urllib.urlretrieve(test, get)

应该等于网站 (http://google.com) 应该等于文件（/file1.pdf）

我需要此代码才能工作，因此它只需要 URL：http://google.com/file1.pdf 即可工作..

有没有我可以做到的？

代码取自这里：How do I check the HTTP status code of an object without downloading it?

【问题讨论】：

标签： python file-io download head

【解决方案1】：

上面似乎没有工作:(

我设法解决了它！

#Gets the header code and stores in status
status = urllib.urlopen(test).getcode()
print status #Prints status, testing purposes

#if status code is equal to 200 (OK)
  if status == 200:
      urllib.urlretrieve(test, get) #download the file
      print 'The file:', doc, 'has been saved to:', get #display success message 
  elif status == 404: #if status is equal to 404 (NOT FOUND) 
      print 'The file:', doc, 'could not be saved. Does not exist!!' #display error
  else: #Any other message then display error and the status code
      print 'Unknown Error:', status

【讨论】：

请告诉我你在运行我的代码时遇到的错误，我想看看它为什么不起作用:)

【解决方案2】：

import httplib    

file = "http://google.com/file1.pdf"

c = httplib.HTTPConnection("google.com")
c.request("HEAD", file)
if c.getresponse().status == 200:
  download(file)

【讨论】：