为什么硒给出重复元素？答案

【问题标题】：why selenium gives duplicate element?为什么硒给出重复元素？
【发布时间】：2021-07-12 14:12:49
【问题描述】：

网址 = https://www.amazon.com/gp/bestsellers/beauty/ref=zg_bs_nav_0

products=driver.find_elements_by_xpath('//div[@class="a-section a-spacing-none aok-relative"]')

for pro in products:
            _rank=str(pro.find_element_by_xpath('//span[@class="zg-badge-text"]').text).replace("#", "")
            _link=pro.find_element_by_xpath('//div[@class="a-section a-spacing-none aok-relative"]/span/a[@class="a-link-normal"]').get_attribute('href')

我得到了 50 个相同的产品价值。怎么可能我不知道。我需要页面所有产品数据。为什么硒只给出第一个元素 50 次。

我正在使用最新版本的 Chrome 驱动器和 selenium。

我的输出：

[['https://www.amazon.com/essence-Princess-Effect-Mascara-Cruelty/dp/B00T0C9XRK/ref=zg_bs_beauty_1/137-1053715-3426412?_encoding=UTF8&psc=1&refRID=GVS76499NHPKKRPTDZTR', '1', 'Beauty & Personal Care'],
['https://www.amazon.com/essence-Princess-Effect-Mascara-Cruelty/dp/B00T0C9XRK/ref=zg_bs_beauty_1/137-1053715-3426412?_encoding=UTF8&psc=1&refRID=GVS76499NHPKKRPTDZTR', '1', 'Beauty & Personal Care'],
['https://www.amazon.com/essence-Princess-Effect-Mascara-Cruelty/dp/B00T0C9XRK/ref=zg_bs_beauty_1/137-1053715-3426412?_encoding=UTF8&psc=1&refRID=GVS76499NHPKKRPTDZTR', '1', 'Beauty & Personal Care']]

【问题讨论】：

请提供Minimal code to reproduce output。
我需要产品链接和排名。但是我得到了重复的值=。

标签： python python-3.x selenium selenium-webdriver web-scraping

【解决方案1】：

你应该使用这个：

products=driver.find_elements_by_xpath('//div[@class="a-section a-spacing-none aok-relative"]')

for pro in products:
            _rank=str(pro.find_element_by_xpath('.//span[@class="zg-badge-text"]').text).replace("#", "")
            _link=pro.find_element_by_xpath('.//span[@class='aok-inline-block zg-item']/a[@class='a-link-normal']').get_attribute('href')

当使用此定位器//span[@class="zg-badge-text"] 进行搜索时，它会从整个页面中带来与此定位器匹配的第一个元素。
但是，如果您使用 .//span[@class="zg-badge-text"] 执行此操作，它将在父元素 pro 内带来与此定位器匹配的第一个元素。
更多解释请见here。

【讨论】：

selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":".//div[@class="a-section a-spacing-none aok-relative"]/span/a[@class="a-link-normal"]"}
嗯，最初我只是使用了你的定位器......但现在我更新了它们，所以它应该可以正常工作

【解决方案2】：

如果我必须获取产品名称，比方说。我会使用下面的 xpath ：

//li[@class='zg-item-immersion']/descendant::div[contains(@class, 'sc-truncated')]

在代码中它会是这样的：

for names in driver.find_elements(By.XPATH, " //li[@class='zg-item-immersion']/descendant::div[contains(@class, 'sc-truncated')]"):
    print(names.get_attribute('innerHTML'))

PS：您的find_elements 中也缺少]

更新1：

您可以使用下面的xpath 来获取锚标记。

//li[@class='zg-item-immersion']/descendant::div[contains(@class, 'sc-truncated')]/..

【讨论】：

我必须返回该产品的链接和排名。
[[url1,Rank1],[url2,rank2]] - 示例