使用 Python 和 Selenium 进行网页抓取 [重复]答案

【问题标题】：Web scraping using Python and Selenium [duplicate]使用 Python 和 Selenium 进行网页抓取 [重复]
【发布时间】：2017-11-17 08:51:50
【问题描述】：

我正在使用以下代码使用 Python 提交表单。当输入的值正确时，它会重定向到一个名为http://localhost/a/my.php 的新页面。如何检查页面是否使用 python 重定向，以便我可以知道输入的值正确。

from selenium import webdriver

webpage = r"http://localhost/a/"
driver = webdriver.Chrome("C:\chromedriver_win32\chromedriver.exe")
for i in range(10):
    searchterm = i # edit me
    driver.get(webpage)
    sbox = driver.find_element_by_class_name("txtSearch")
    sbox.send_keys(searchterm)

    submit = driver.find_element_by_class_name("sbtSearch")
    submit.click()

【问题讨论】：

您可以使用driver.current_url 来确认它是您要查找的地址。如果它不起作用，也许你必须等待页面加载，只需添加一个time.sleep(x) 或其他东西。

标签： python selenium

【解决方案1】：

在新的 DOM 加载后找到一个仅存在的元素。如果你能找到它，你就在新页面上。

try:
    driver.find_element_by_class_name("txtSearch")
    print("redirected to new page")
except NoSuchElementException:
    print("oops, no redirect happened")

【讨论】：

如果导航到新页面失败el = driver.find_element_by_class_name("txtSearch")应该给你NoSuchElementException
投反对票？？我可以知道为什么吗？
我已经澄清过了。您的第一种方法效率不高，因为如果未打开新页面，脚本应该会异常中断
刚刚编辑以捕获NoSuchElementException，而不是使用is_displayed()。

【解决方案2】：

尝试使用 current_url:

driver.current_url

【讨论】：

【解决方案3】：

要检查页面是否正确重定向，请使用 WebDriverWait（又名 "explicit wait"）和适当的 expected_conditions 子句设置为以下之一：

Python：

url_to_be：

WebDriverWait(driver, 10).until(EC.url_to_be("https://www.google.co.in/"))

url_matches：

WebDriverWait(driver, 10).until(EC.url_matches("https://www.google.co.in/"))

url_contains：

WebDriverWait(driver, 10).until(EC.url_contains("google"))

url_changes：

WebDriverWait(driver, 10).until(EC.url_changes("https://www.google.co.in/"))

【讨论】：