403 禁止错误。无法访问此站点答案

【问题标题】：403 forbidden error. can't access to this site403 禁止错误。无法访问此站点
【发布时间】：2020-01-07 03:22:32
【问题描述】：

我想在通过scrapy spider向该站点发送请求时抓取https://health.usnews.com/doctors/specialists-index，它显示状态代码为403。在我的请求中，我添加了user_agent，但它也不起作用。

我提到了这两个答案 Python Doesn't Have Permission To Access On This Server / Return City/State from ZIP 和 403:You don't have permission to access /index.php on this server 但它对我不起作用。

我的user_agent 是Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36。有人帮我抓取上述网站。

【问题讨论】：

标签： python python-3.x scrapy

【解决方案1】：

尝试在标题中添加“权限”。以下在scrapy shell中对我有用：

from scrapy import Request
headers = {
     'authority': 'health.usnews.com',
     'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36',
}
url = "https://health.usnews.com/doctors/specialists-index"
req = Request(url, headers=headers)
fetch(req)

【讨论】：