你导入你的错误吗?
那么你也需要为你的session.get() 设置超时时间
这取决于你的错误,但是,如果你有一个错误的 url,你会在渲染页面之前从 session.get() 得到一个错误。
因此,例如查看可以捕获的不同错误:
from requests_html import HTMLSession
from requests.exceptions import ConnectionError, InvalidSchema, ReadTimeout
from pyppeteer.errors import TimeoutError
session = HTMLSession()
links = [
'https://www.google.com/',
'h**ps://www.google.com/',
'https://deelay.me/4000/https://www.google.com/', # 4s of delay to get the page
'https://www.baaaadurl.com/',
'https://www.youtube.com/',
'https://www.google.com/',
]
for url in links:
try:
r = session.get(url, timeout=3)
r.html.render(timeout=1) # timout short to render google but not youtube
print(r.html.find('title', first=True).text, '\n')
except InvalidSchema as e:
# error for 'h**ps://www.google.com/'
print(f'For the url "{url}" the error is: {e} \n')
pass
except ReadTimeout as e:
# error due to too much delay for
# 'https://deelay.me/4000/https://www.google.com/'
print(f'For the url "{url}" the error is: {e} \n')
pass
except ConnectionError as e:
# error for 'https://www.baaaadurl.com/'
print(f'For the url "{url}" the error is: {e} \n')
pass
except TimeoutError as e:
# error if timout
# in rendering the page 'https://www.youtube.com/'
print(f'For the url "{url}" the error is: {e} \n')
pass
打印结果:
Google
For the url "h**ps://www.google.com/" the error is: No connection adapters were found for 'h**ps://www.google.com/'
For the url "https://deelay.me/4000/https://www.google.com/" the error is: HTTPSConnectionPool(host='deelay.me', port=443): Read timed out. (read timeout=3)
For the url "https://www.baaaadurl.com/" the error is: HTTPSConnectionPool(host='www.baaaadurl.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f2596ba6460>: Failed to establish a new connection: [Errno -2] Name or service not known'))
For the url "https://www.youtube.com/" the error is: Navigation Timeout Exceeded: 1000 ms exceeded.
Google
这样您就可以捕获错误并继续循环。