webdriver 无法更新网址答案

【问题标题】：webdriver failing to update web addresswebdriver 无法更新网址
【发布时间】：2021-01-03 15:55:34
【问题描述】：

我正在尝试使用以下代码收集网站上显示的多个链接的地址：

from selenium import webdriver
import time
from bs4 import BeautifulSoup
import urllib.request

driver = webdriver.Chrome(executable_path='C:/Users/seongwoo/Desktop/USHL data scraping/chromedriver.exe')

url = ("https://www.ushl.com/view#/schedule")
driver.get(url)

driver.find_element_by_xpath("//select[@ng-model ='selectedSeason']/option[@label='2018-19']").click()
time.sleep(3)
driver.find_element_by_xpath("//select[@ng-model ='selectedTeam']/option[@label='Youngstown Phantoms']").click()
time.sleep(3)
driver.find_element_by_xpath("//select[@ng-model ='selectedMonth']/option[@Value='12']").click()
time.sleep(3)
driver.find_element_by_xpath("//a[@ng-click=\"location='home';\"]").click()
time.sleep(3)
driver.find_element_by_xpath('//a[@class="ht-btn-submit ng-binding"]').click()
time.sleep(10)

window_before = driver.window_handles[0]    #store the monther window's handle 

buttons = driver.find_elements_by_class_name('ht-table-game-report')   #use this instead of 'by_xpath'

for button_index in range(len(buttons)):
    time.sleep(3)
    
    buttons[button_index].click()    ##this is where you decide which of the reports to click on 

    #after clicking the link store the window handle of newly opened window as
    window_after = driver.window_handles[1] 

    #then execute the switch to window method to move to newly opened window

    driver.switch_to.window(window_after)     
    current_URL = driver.current_url            #Hthis does not seem to update the address

    print(current_URL)

    webUrl  = urllib.request.urlopen(current_URL)

    driver.switch_to.window(window_before)

我会想到

driver.switch_to.window(window_after)     
current_URL = driver.current_url

点击链接后会更新地址。

如果有人能指出为什么current_URL 永远停留在第一个更新的地址并且之后无法更新，我将不胜感激。

【问题讨论】：

标签： python selenium web-scraping webdriver

【解决方案1】：

因为window_after = driver.window_handles[1] 总是指您打开的第一页。要更新您的窗口并移动到您最近打开的页面，您应该使用button_index 将window_after 定义为window_after = driver.window_handles[button_index+1]

【讨论】：

不正确，如果您打开一个窗口driver.window_handles[0] 将是第一页而不是[1]。 button_index 也不代表窗口索引
如果您想获取单击按钮后打开的页面的 url，您可以使用按钮列表的索引移动到单击该按钮打开的页面。使用window_after = driver.window_handles[button_index+1] 的输出将是驱动程序打开的页面的 url 列表
您假设 OP 没有关闭前一个，而是单击了每个链接。如果窗口索引增加你最好使用[-1]
我会删除downvote，但是打开的第一页是index = 1的说法不正确。用 driver.get() 打开的第一页总是 index=0。感谢您修复您提供的答案中的索引
虽然不能保证窗口句柄顺序，对吧？ github.com/w3c/webdriver/issues/386

【解决方案2】：

每次迭代都会打开一个新窗口，这会增加打开的窗口数量，driver.window_handles[1] 将始终是打开的第二个窗口。

一个快速的解决方案是使用[-1] 始终使用上次打开的窗口：

driver.window_handles[-1]

返回主窗口：

driver.switch_to.window(driver.window_handles[0])

您也可以在切换回原始窗口之前关闭最后打开的窗口

driver.close()
driver.switch_to.window(driver.window_handles[0])

【讨论】：

我认为不能保证窗口顺序：github.com/w3c/webdriver/issues/386
哇非常有趣.. 这也意味着[0] 也不能是默认值。线程是旧的，但我们应该假设它仍然是这种情况。
对。不确定这是否仅适用于 java 或 python。但不应该假设顺序，因此您需要比较列表。

【解决方案3】：

不能假定窗口句柄的顺序。这里提供了一个很好的代码 sn-p： https://www.selenium.dev/documentation/en/webdriver/browser_manipulation/

# Open URL
driver.get("https://seleniumhq.github.io")

# Setup wait for later
wait = WebDriverWait(driver, 10)

# Store the ID of the original window
original_window = driver.current_window_handle

# Check we don't have other windows open already
assert len(driver.window_handles) == 1

# Click the link which opens in a new window
driver.find_element(By.LINK_TEXT, "new window").click()

# Wait for the new window or tab
wait.until(EC.number_of_windows_to_be(2))

# Loop through until we find a new window handle
for window_handle in driver.window_handles:
    if window_handle != original_window:
        driver.switch_to.window(window_handle)
        break

# Wait for the new tab to finish loading content
wait.until(EC.title_is("SeleniumHQ Browser Automation"))

基本上，您需要确保选择了正确的窗口，而不仅仅是假设。

【讨论】：