没有为 Selenium 测试加载 Javascript答案

【问题标题】：Javascript not loading for Selenium tests没有为 Selenium 测试加载 Javascript
【发布时间】：2020-10-16 18:20:54
【问题描述】：

我正在尝试使用 Selenium 为我的一个宠物项目提取一些数据。我已经成功加载了几页并获得了他们的数据，但是每次我测试这个站点时都会停止加载。我尝试过的事情：

在无头和非无头（有头？？）版本的 Firefox 中使用 geckodriver
在无头和非无头版本的 Chrome 中使用 chromedriver
检查 pip3 和 Selenium 都是最新的稳定版本
使用用户代理配置文件打开 Chrome
使用随机用户代理配置文件（来自 random_user_agent 库）打开 Chrome
硬编码最多等待 30 秒 (time.sleep)
在请求中加载页面（事后看来，如果我正在寻找 javascript，这很愚蠢 - 没用）

The URL

我的理论是他们以某种方式阻止了 Selenium，也许是this？但我没有办法测试它。不使用 Selenium 浏览器实例（即常规浏览器）加载页面时，问题不会持续存在。我的代码如下：

from selenium import webdriver

# requirements to wait until specific part of page is open
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException, NoSuchElementException
from selenium.webdriver.chrome.options import Options

options = Options()
options.add_argument("--lang=en_US")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option("useAutomationExtension", False)
options.add_argument("disable-infobars")

browser = webdriver.Chrome(options=options)        
delay = 5
browser.get("https://shop.coles.com.au/a/alexandria/product/nutella-spread-chocolate-hazelnut-2620684p")
# this is where the page is not loading & therefore throwing ElementNotFound exception

try:
    price_dollars = WebDriverWait(browser, delay).until(EC.presence_of_element_located((By.CLASS_NAME, "price-dollars")))
    price_cents = browser.find_element_by_class_name("price-cents")
    
    # converts strings into floats with decimals (to one place only)
    fl_price_dollars = float(price_dollars.text)
    fl_price_cents = float(price_cents.text)
    fl_price_concat = fl_price_dollars + fl_price_cents*10**-2
    print(type(fl_price_concat)) # check this is a float type not string
    print(fl_price_concat)
except TimeoutException:
    print("Timeout1")
    pass
except NoSuchElementException:  # need to catch all exceptions & pass to quit() or processes will continue to run
    print("Element not found")
    pass

browser.quit()

当我使用 Selenium 浏览器实例打开页面时加载的页面源代码：


<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <link rel="shortcut icon" href="about:blank">
</head>
<body>
<script src="/149e9513-01fa-4fb0-aad4-566afd725d1b/2d206a39-8ed7-437e-a3be-862e0f06eea3/j.js"></script>
<script src="/149e9513-01fa-4fb0-aad4-566afd725d1b/2d206a39-8ed7-437e-a3be-862e0f06eea3/f.js"></script>
<script src="/149e9513-01fa-4fb0-aad4-566afd725d1b/2d206a39-8ed7-437e-a3be-862e0f06eea3/fingerprint/script/kpf.js?url=/149e9513-01fa-4fb0-aad4-566afd725d1b/2d206a39-8ed7-437e-a3be-862e0f06eea3/fingerprint&token=c2e6cd9a-e76e-cd51-288d-f604aea52023"></script>
</body>
</html>

编辑 the following answer 2020 年 6 月为我工作

【问题讨论】：

看起来他们正在使用 FingerPrint2 阻止你，即使禁用了 JS，似乎还有其他 WAF 机制。他们正在积极阻止您抓取他们的网站。
@Lucan 谢谢你，你怎么确定它是 FingerPrint2？你也知道如何绕过这个吗？
它在他们的来源中，当您查看被阻止的页面时更容易发现。我尝试了基本的方法来解决它，就像你自己一样（UA、代理、选项），但我没有成功。
看起来这条路比预期的要长...谢谢您的提醒，非常感谢！

标签： python python-3.x selenium selenium-chromedriver

【解决方案1】：

尝试添加这个参数： options.add_argument("--disable-blink-features=AutomationControlled")

关键是让'navigator.webdriver'返回未定义。如果 Chrome 由 Webdriver（由 Selenium 使用）控制，则返回“true”。

如果您添加此参数，则 javascript 调用（您可以在开发工具控制台中对其进行测试）navigator.webdriver 将返回“未定义”，这与您在常规 Chrome 中运行时相同。

【讨论】：

这在我的情况下非常有效。非常感谢@matteo84！