使用 urllib 2 捕获错误 60（超时）答案

【问题标题】：catching error 60 ( timeout ) with urllib 2使用 urllib 2 捕获错误 60（超时）
【发布时间】：2012-10-28 09:13:48
【问题描述】：

我正在尝试捕获错误 60 并继续执行我的脚本，这就是我目前正在做的事情：

import urllib2
import csv
from bs4 import BeautifulSoup


matcher = csv.reader(open('matcher.csv', "rb" ))

for i in matcher:
    url = i[1]
    if len(list(url)) > 0:
        print url
        try:
            soup = BeautifulSoup(urllib2.urlopen(url,timeout=10))   

        except urllib2.URLError, e:
            print ("There was an error: %r" % e)

它返回这个：

Traceback（最近一次调用最后一次）：文件“debug.py”，第 13 行，在汤= BeautifulSoup（urllib2.urlopen（url，timeout = 10））文件“/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py”，第 126 行，在 urlopen 返回_opener.open（url，数据，超时）文件“/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py”， 400 号线，开放 response = self._open(req, data) 文件“/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py”, 第 418 行，在 _open '_open', req) 文件 "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", 第 378 行，在 _call_chain 结果 = func(*args) 文件 "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", 第 1207 行，在 http_open 返回 self.do_open(httplib.HTTPConnection, req) 文件“/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py”，第 1180 行，在 do_open 中 r = h.getresponse(buffering=True) 文件“/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py”, 第 1030 行，在 getresponse 中 response.begin() 文件“/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py”，第 407 行，开始版本、状态、原因 = self._read_status() 文件“/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py”，第 365 行，在 _read_status 中 line = self.fp.readline() 文件“/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py”，第 447 行，在 readline 中 data = self._sock.recv(self._rbufsize) socket.timeout: 超时

我如何捕捉到这个错误并“继续”？

【问题讨论】：

看看this

标签： python

【解决方案1】：

您可以导入异常对象并修改您的 except 块：

import socket

try:
    soup = BeautifulSoup(urllib2.urlopen(url,timeout=10))   

except urllib2.URLError as e:
    print ("There was an error: %r" % e)
except socket.timeout as e: # <-------- this block here
    print "We timed out"

更新：嗯，学到了一些新东西 - 刚刚找到了对 .reason 属性的引用：

except urllib2.URLError as e:
    if isinstance(e.reason, socket.timeout):
        pass # ignore this one
    else:
        # do stuff re other errors if you can...
        raise # otherwise propagate the error

【讨论】：

【解决方案2】：

您可以尝试except Exception as e: 来捕获所有错误。但是请记住，这会捕获所有错误，如果您只想捕获特定错误，则应避免这样做。

编辑： 您可以通过以下方式检查异常类型：

except Exception as e:
    exc_type, exc_obj, exc_tb = sys.exc_info()
    fname = os.path.split(exc_tb.tb_frame.f_code.co_filename)[1]      
    print(exc_type, fname, exc_tb.tb_lineno)

【讨论】：