通过页面迭代硒答案

【问题标题】：Iterating with selenium through pages通过页面迭代硒
【发布时间】：2021-08-19 19:33:40
【问题描述】：

这是我抓取的第一个网页，我发现的其他一些解决方案似乎没有太大帮助。正如您将看到的，“下一步”按钮仍然可见，但是当您到达最后一页时，CSS 会发生一些变化。

一些笔记。我正在使用 python、selenium 和 google chrome。

我正在尝试遍历此页面上表格的每个部分：https://caearlyvoting.sos.ca.gov/

我已经弄清楚如何遍历每个县，并获取我需要的信息（我认为）。但是，当表格的记录多于默认显示的 10 条记录时，我对如何移动到下一页感到困惑。

我已经尝试过这种变体

  try:
        next_page = driver.find_element_by_class_name('paginate_button')
        next_page.click()
    except NoSuchElementException:
        pass

但没有运气。我尝试以不同的方式获取元素，但遇到了同样的问题。

谁能帮我弄清楚如何点击每个页面，抓住我需要的东西，然后移动到下一个县？我不需要帮助从表格中获取信息，只需单击页面然后移动到下一个县。

编辑这是基于后续的代码的其余部分。我在构建它时遇到了困难。

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains
import pandas as pd
import time # not for production

# Name of the counties Single column with county names
county_df = pd.read_csv('Counties.csv')

# Path to driver on this computer
chrome_driver_path = r'C:\Windows\chromedriver'

# url to scrape
url = 'https://caearlyvoting.sos.ca.gov/'

with webdriver.Chrome(executable_path=chrome_driver_path)as driver:
    # Open window, maximize and set an implicit wait
    driver.get(url)
    driver.maximize_window()
    driver.implicitly_wait(10)
    actions = ActionChains(driver) #* New line here from stackoverflow
    # find the county selection
    county_selector = driver.find_element_by_id('CountyID')
    # for loop tomove through the counties
    for county in county_df['County'][:5]:
        # Input the county namne
        county_selector.send_keys(county)
        ### Code to grab data goes here
        
        ########* Code from stackoverflow ########
        while True:
            next_page = driver.find_element_by_css_selector(".paginate_button.next")
            next_bnt_classes = next_page.get_attribute("class")
            if "disabled" in next_bnt_classes:
                break  #last page reached, no more next pages, break the loop
            else:
                actions.move_to_element(next_page).perform()
                time.sleep(0.5)
                #get the actual next page button and click it
                driver.find_element_by_css_selector(".paginate_button.next a").click()

【问题讨论】：

标签： python web-scraping selenium-chromedriver

【解决方案1】：

您使用了错误的定位器。
此外，下一页按钮可能会出现在页面底部的视图之外，因此您必须滚动到该元素，然后才能单击它。
在最后一页上，下一页按钮被禁用。
在这种情况下，它包含 disabled 类名。
所以你的代码可以是：

from selenium.webdriver.common.action_chains import ActionChains

actions = ActionChains(driver)

while True:
    #grab the data from current page, after that:
    next_page = driver.find_element_by_css_selector(".paginate_button.next")
    next_bnt_classes = next_page.get_attribute("class")
    if "disabled" in next_bnt_classes:
        break  #last page reached, no more next pages, break the loop
    else:
        next_page = driver.find_element_by_css_selector(".paginate_button.next")
        actions.move_to_element(next_page).perform()
        time.sleep(0.5)
        #get the actual next page button and click it
        driver.find_element_by_css_selector(".paginate_button.next a").click()

UPD
工作代码略有不同：

from selenium.webdriver.common.action_chains import ActionChains

actions = ActionChains(driver)

while True:
    #grab the data from current page, after that:
    next_page = driver.find_element_by_css_selector(".paginate_button.next")
    next_bnt_classes = next_page.get_attribute("class")
    if next_bnt_classes == 'paginate_button next disabled':
        break  #last page reached, no more next pages, break the loop
    else:
        # Move to the next page for the county and append the data              
        next_page.click()

【讨论】：

感谢您的帮助。我对这段代码的结构有点麻烦。我在上面粘贴了更多代码。你能帮我确保我有正确的顺序吗？抱歉，python 还不是我的主要语言。
如我所见 - 看起来不错。您应该实际运行它，看看它是否正确。我自己做不到。顺便说一句，我用 Java 编码，而不是 Python。我只在这里使用 Python，在 Stackoverflow 上回答 :)
我收到此错误：selenium.common.exceptions.StaleElementReferenceException：消息：过时元素引用：元素未附加到页面文档（会话信息：chrome=92.0.4515.159）谢谢您的帮助!
我相信问题出在if "disabled" in next_bnt_classes: 因为在调试器中next_page 是
我认为这行不会导致过时元素异常...