从网站提取数据时如何绕过验证码。我从 https://jp.indeed.com/ 中提取答案

【问题标题】：How to bypass captcha when extracting data from website. I am extracting from https://jp.indeed.com/从网站提取数据时如何绕过验证码。我从 https://jp.indeed.com/ 中提取
【发布时间】：2021-04-30 06:31:37
【问题描述】：

在本地提取时，没有问题，但是当我使用我的生产网站时，提取前有验证码屏幕。我正在使用 Heroku 和 Django。

 header ={ "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:77.0) Gecko/20100101 
 Firefox/77.0" }
 r = requests.get(url,headers=header)
 soup=BeautifulSoup(r.content,'html.parser')

但是当我打印汤变量时，我可以看到有一个表格可以解决验证码。如何绕过验证码

【问题讨论】：

呃……不？引用 Indeed 的 ToS：Use of any automated system or software, whether operated by a third party or otherwise, to extract data from the Site (such as screen scraping or crawling) is prohibited. 如果你想打破它，我们为什么要帮助你？

标签： python heroku beautifulsoup python-requests user-agent

【解决方案1】：

没有办法绕过验证码。这就是验证码的用途，因此您无法自动执行网页上的任何操作。

【讨论】：