使用 selenium webdriver 时的空元素答案

【问题标题】：empty element when using selenium webdriver使用 selenium webdriver 时的空元素
【发布时间】：2021-04-05 09:07:13
【问题描述】：

我正在尝试使用 selenium webdriver 定位以下元素：

<div class="lv-product__details"><div class="lv-product__details-head"><span class="lv-product__details-sku">
            M40712
          </span> <div class="lv-product-add-to-wishlist"><button aria-label="Add to Wishlist" aria-disabled="false" tabindex="0" class="lv-icon-button lv-product-add-to-wishlist__button"><svg focusable="false" aria-hidden="true" class="lv-icon"><use xlink:href="/_nuxt/icons.svg#sprite-navigation-wishlist-off"></use></svg></button></div></div> <h1 class="lv-product__title">
          Pochette Accessoires
        </h1> <div class="lv-product-variations"><button class="lv-product-variation-selector list-label-l lv-product-variations__selector" aria-expanded="false"><span class="lv-product-variation-selector__title -text-is-medium">
    Material

我试过了：

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

url = "https://en.louisvuitton.com/eng-nl/products/pochette-accessoires-monogram-005656"

options = Options()
options.headless = True
driver = webdriver.Chrome('path/to/chromedriver', chrome_options=options)
driver.get(url)

elem = driver.find_element_by_class_name("lv-product__details")

或通过 Xpath

elem = driver.find_element_by_xpath('//*[@id="__layout"]/div/div[2]/div[2]/div[1]/div[1]/div[2]')

但 elem 作为空列表返回。我做错了什么/可以做不同的事情来访问网站的内容吗？

【问题讨论】：

在没有无头的情况下可以工作吗？如果它确实检查了 page_source，它可能会检测到你的机器人，然后你必须更改选项来修复它，比如添加用户代理等等。
@ArundeepChohan 你是对的，只要我将 headless 设置为 True - elem 作为空字符串返回。但是，当False 时，它能够检索元素。

标签： python html selenium web-scraping

【解决方案1】：

我认为你的 XPath 格式不正确

试试这个

driver.find_element_by_xpath('/div/div[2]/div[2]/div[1]/div[1]/div[2]/div[3]')

或者这个

driver.find_element_by_class_name("lv-icon-button lv-product-add-to-wishlist__button")

并尝试导入时间

import time

time.sleep(3) # To make sure everything loads before selenium starts to locate the element

【讨论】：

我通常也使用 driver.implicitly_wait(10)，所以也尝试一下
感谢您的回答 - 问题变得无头无脑。当无头设置为False时，可以检索元素。

【解决方案2】：

from fake_useragent import UserAgent
ua = UserAgent()
user_agent = ua.random
print(user_agent)
options.add_argument(f'user-agent={user_agent}')

您所要做的就是添加一个用户代理。

M40712
POCHETTE ACCESSOIRES
Material
Monogram Canvas
currently selected
600,00€
Always the epitome of iconic style, this interpretation of the Pochette Accessoires can accommodate a Zippy Wallet. In Monogram canvas, it easily carries all the daily necessities.
Detailed features
23.5 x 13.5 x 4 cm
(Length x Height x Width)
Natural cowhide leather trimmings
Zipper closure
Golden color metallic pieces
See More
PRODUCT CARE

【讨论】：

您好假用户代理帮助但仍然返回一个空元素。我发现它在我实现了除了假用户代理之外的以下选项后终于奏效了：
options.add_argument("--window-size=1920,1080"), options.add_argument('--ignore-certificate-errors'), options.add_argument('--allow-running-insecure-content'), options.add_argument("--disable-extensions"), options.add_argument("--proxy-server='direct://'"), options.add_argument("--proxy-bypass-list=*"), options.add_argument("--start-maximized"), options.add_argument('--disable-gpu'), options.add_argument('--disable-dev-shm-usage'), options.add_argument('--no-sandbox'),