如果超时则跳过 URL答案

【问题标题】：Skip URL if timeout如果超时则跳过 URL
【发布时间】：2011-12-25 17:37:33
【问题描述】：

我有一个 URL 列表

我正在使用以下内容来检索它们的内容：

for url in url_list:
    req = urllib2.Request(url)
    resp = urllib2.urlopen(req, timeout=5)
    resp_page = resp.read()
    print resp_page

当超时时，程序就会崩溃。如果有socket.timeout: timed out，我只想阅读下一个 URL。如何做到这一点？

谢谢

【问题讨论】：

类似问题见：stackoverflow.com/questions/2712524/…

标签： python sockets exception-handling timeout urllib2

【解决方案1】：

虽然已经有了答案，但我想指出 URLlib2 可能不是这种行为的唯一责任人。

正如here 指出的那样（并且似乎也基于问题描述），异常可能属于socket 库。

在这种情况下，只需添加另一个except：

import socket

try:
    resp = urllib2.urlopen(req, timeout=5)
except urllib2.URLError:
    print "Bad URL or timeout"
except socket.timeout:
    print "socket timeout"

【讨论】：

不解释就投反对票真的没用，不是吗？

【解决方案2】：

我将继续假设“崩溃”是指“引发 URLError”，如 urllib2.urlopen docs 所述。请参阅 Python 教程的 Errors and Exceptions 部分。

for url in url_list:
    req = urllib2.Request(url)
    try:
        resp = urllib2.urlopen(req, timeout=5)
    except urllib2.URLError:
        print "Bad URL or timeout"
        continue # skips to the next iteration of the loop
    resp_page = resp.read()
    print resp_page

【讨论】：

【解决方案3】：

听起来你只需要捕获超时异常。我没有收到你做的 socket.timeout 消息。

req = urllib2.Request("http://127.0.0.2")
try:
    resp = urllib2.urlopen(req, timeout=5)
except urllib2.URLError:
    print "Timeout!"

显然，您需要有一个实际会超时的 URL（127.0.0.2 可能不在您的盒子上）。

【讨论】：