无法使用 beautifulSoup 抓取网站答案

【问题标题】：Unable to scrape a Website using beautifulSoup无法使用 beautifulSoup 抓取网站
【发布时间】：2018-05-24 04:50:06
【问题描述】：

我尝试使用漂亮的汤（bs4）来抓取页面，但是在抓取数据时遇到了问题，我什至提到了这个答案中指出的标题Stackoverflow Question 这是我的代码

from bs4 import BeautifulSoup
import requests
headers = {
'Referer': 'hello',
 }
 r=requests.get
 ('https://www.doamin.com/bangalore/restaurants',headers=headers)
 print(r.status_code)

这是我遇到的错误

requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))

还有这个

 raise RemoteDisconnected("Remote end closed connection without"
 http.client.RemoteDisconnected: Remote end closed connection without 
 response

我什至尝试过使用用户代理

import requests
url = 'https://www.example.com/bangalore/restaurants'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 
(KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'}
response = requests.get(url, headers=headers)
print(response.content)

但仍然出现同样的错误！

谁能帮帮我？

【问题讨论】：

似乎服务器正在中止您的请求。您可能需要添加一些额外的标题，例如 User-Agent 等。另外请不要添加您正在尝试的域名

标签： python web-scraping beautifulsoup python-requests

【解决方案1】：

我猜服务器正在通过检查有效 Chrome 版本列表（如果您在用户代理中指定 Chrome 浏览器）更彻底地检查用户代理字符串。您指定的版本 (41.0.2228) 未在 Chrome version history 中列出。例如使用 41.0.2272 ：

import requests
url = 'https://www.example.com/bangalore/restaurants'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 
(KHTML, like Gecko) Chrome/41.0.2272.0 Safari/537.36'}
response = requests.get(url, headers=headers)
print(response.content)

【讨论】：

【解决方案2】：

Zomato（和许多其他数据收集网站）很可能已采取措施阻止数据抓取工具或数据挖掘者。只需使用他们的 API：https://developers.zomato.com/api

【讨论】：

难道我不能通过刮擦来做到这一点吗？
正如@itzmeontv 所说，您需要正确的User-Agent 和其他headers 信息才能在任何页面中导航，以模仿其真实人类的脚本。加载上述页面后，您可以通过 ChromeDevTools 或 FirefoxDevTools 找到您自己的信息。
我尝试同时使用标头和用户代理，但仍然遇到相同的错误