【发布时间】:2018-11-07 11:10:28
【问题描述】:
我尝试使用 Selenium(使用 geckodriver)访问一个站点,它说我被阻止了,但我可以使用 Firefox 浏览器手动访问它。所以我比较了我的指纹组件,唯一的区别是当我使用 Selenium 时,在 Navigator 对象中“webdriver”设置为“true”。我尝试运行此代码:
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
firefox_binary = '/usr/bin/firefox'
options = Options()
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
caps = DesiredCapabilities().FIREFOX
# caps["pageLoadStrategy"] = "normal" # complete
caps["pageLoadStrategy"] = "eager" # interactive
injected_javascript=("Object.defineProperty(navigator, 'webdriver', { value: 'false' })")
driver = webdriver.Firefox(executable_path=r'/home/kkkk/ggecko/geckodriver', firefox_binary=firefox_binary)
driver.get('https://auth.citromail.hu/regisztracio/')
driver.execute_async_script(injected_javascript)
但它刚刚加载了“webdriver”仍设置为“true”的页面,然后返回此消息:
Traceback (most recent call last):
File "/home/kkkk/driverr.py", line 14, in <module>
driver.execute_async_script(injected_javascript)
File "/home/kkkk/.local/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 652, in execute_async_script
'args': converted_args})['value']
File "/home/kkkk/.local/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 314, in execute
self.error_handler.check_response(response)
File "/home/kkkk/.local/lib/python3.5/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message: Timed out
我做错了什么还是有其他方法可以做到这一点?
【问题讨论】:
-
脚本需要在页面加载前注入。试试这个:intoli.com/blog/making-chrome-headless-undetectable
标签: javascript python selenium web-scraping