检查是否存在带有请求的网站不起作用答案

【问题标题】：Check if a Website Exists With Requests Isn't Working检查是否存在带有请求的网站不起作用
【发布时间】：2018-07-17 20:43:42
【问题描述】：

所以，我几天前了解了 Web Scraping 的工作原理，而我今天却在胡闹。我想知道如何测试页面是否存在/不存在。所以，我查了一下，发现Python check if website exists。我正在使用requestsmodule，我从答案中得到了这个代码：

import requests
request = requests.get('http://www.example.com')
if request.status_code == 200:
    print('Web site exists')
else:
    print('Web site does not exist')

我试了一下，既然 example.com 存在，它就会打印“Web site exists”。但是，我尝试了一些我确定不存在的东西，比如 examplewwwwwww.com，它给了我this error。为什么要这样做，我怎样才能防止它打印出错误（而是说该网站不存在）？

【问题讨论】：

正如该页面所示，它会抛出 ConnectionError stackoverflow.com/questions/16778435/…
那里没有服务器可以为您提供状态。阅读您发布的链接的 cmets，而不是使用 try... except ConnectionError 之类的内容。
一些网站阻止您认为这是一次抓取尝试，因为您的用户代理和其他功能知道您不是真正的浏览器。这解释了为什么某些被 404 拒绝的 url 实际上确实可以在浏览器中工作

标签： python web-scraping

【解决方案1】：

您必须用try/except 封装request.get 调用并处理可能出现的各种异常，其中之一是ConnectionError。

您得到这个是因为响应 status_code 不等于 200 和无法连接到所需的 HTTP 地址是两件不同的事情。

Here 是您在使用requests 库发出请求时可能遇到的异常。

【讨论】：

【解决方案2】：

你可以像这样使用try/except：

import requests
from requests.exceptions import ConnectionError

try:
    request = requests.get('http://www.example.com')
except ConnectionError:
    print('Web site does not exist')
else:
    print('Web site exists')

【讨论】：

【解决方案3】：

好吧，您收到错误是因为您想要获取的 url 无效，但是您可以使用 try - except 块作为此块轻松检查：

import requests
from requests.exceptions import MissingSchema

try:
    request = requests.get('examplewwwwwww.com')
except MissingSchema:
    print('The provided URL is invalid.')

【讨论】：

【解决方案4】：

只是列出我的做法，也许它对某人有价值：

  try:
     response = requests.get('https://github.com')
     if response.ok:
        ready = 1
        break
  except requests.exceptions.RequestException:
     print("Website not availabe...")

【讨论】：