【问题标题】:I am getting this error while extracting text from elements. Message: stale element reference: element is not attached to the page document从元素中提取文本时出现此错误。消息:过时的元素引用:元素未附加到页面文档
【发布时间】:2021-02-19 03:14:51
【问题描述】:

我正在尝试使用 selenium 在多个页面上搜索亚马逊的产品价格。我能够获取产品名称和产品价格的所有元素,但是在从中提取文本时,Selenium 会引发错误。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document

这是我的代码:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import NoSuchElementException, StaleElementReferenceException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from openpyxl import Workbook
import time

driver = webdriver.Chrome(r'C:\Users\varun\OneDrive\Documents\python projects\chromedriver.exe')
url = 'https://www.amazon.in/'
driver.get(url)
driver.find_element(By.XPATH, "//input[@id='twotabsearchtextbox']").send_keys("oppo mobile")
driver.find_element(By.XPATH, "//input[@value='Go']").click()
brand = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//span[text() = 'Oppo']")))
brand.click()
ele = driver.find_element(By.XPATH, "//ul[@class='a-pagination']/li[6]")

url_list = []
products_list = []
prices_list = []


for page in range(int(ele.text)):
    page_ = page+1
    url_list.append(driver.current_url)
    prod_name_list = driver.find_elements(By.XPATH, "//span[@class='a-size-medium a-color-base a-text-normal']")
    prod_prices_list = driver.find_elements(By.XPATH, "//span[@class='a-price-whole']")
    driver.implicitly_wait(4)
    products_list = products_list + prod_name_list
    prices_list = prices_list + prod_prices_list
    try:
        driver.find_element(By.XPATH, "//li[@class='a-last']").click()
        print("page " + str(page_) + " is grabbed.")
        print(driver.current_url)
    except NoSuchElementException:
        print("All pages are collected!")
    time.sleep(5)

print("---------------------------------------------------")
print(products_list)
print("---------------------------------------------------")
print(prices_list)

product_name = []
prices = []

for product in products_list:
    product_name.append(product.text)
for price in prices_list:
    prices.append(price.text)


print(product_name)
print(prices)

错误信息出现在这一行:

for product in products_list:
    product_name.append(product.text)
for price in prices_list:
    prices.append(price.text)

我尝试通过放置隐式等待来减慢抓取速度,然后也会弹出错误。请帮我解决这个错误。 谢谢!

【问题讨论】:

  • 您应该在前面的 for 循环中附加文本。如果你缩进你的 for 循环,使它们与前一个循环分开,它将起作用。
  • 是的,它起作用了!!!...谢谢! @ArundeepChohan
  • driver.implicitly_wait(4) 是你只设置一次的东西。
  • 对不起?..我没听懂你。@ArundeepChohan
  • 这是你不需要在循环中设置的东西,你可以把它拿出来。

标签: python selenium selenium-webdriver web-scraping xpath


【解决方案1】:

我建议你不要依赖 Selenium 的等待。将此 Java 方法转换为 python 方法:

public static Boolean isVisibleInViewport(WebElement element) {
          WebDriver driver = ((RemoteWebElement)element).getWrappedDriver();

          return (Boolean)((JavascriptExecutor)driver).executeScript(
              "var elem = arguments[0],                 " +
              "  box = elem.getBoundingClientRect(),    " +
              "  cx = box.left + box.width / 2,         " +
              "  cy = box.top + box.height / 2,         " +
              "  e = document.elementFromPoint(cx, cy); " +
              "for (; e; e = e.parentElement) {         " +
              "  if (e === elem)                        " +
              "    return true;                         " +
              "}                                        " +
              "return false;                            "
              , element);
        }

(只需编辑三行代码)。这个 JS 函数只是真正检查元素在视口中对用户可见,我一直使用它,这是循环中最好的函数之一。没有更多乏味的等待。

【讨论】:

    猜你喜欢
    • 2018-10-23
    • 2020-11-13
    • 2021-12-19
    • 1970-01-01
    • 1970-01-01
    • 2019-09-30
    • 2022-08-17
    • 2022-01-25
    • 2019-02-03
    相关资源
    最近更新 更多