【问题标题】:Cannot click on SVG element within loop and grab all content无法单击循环内的 SVG 元素并获取所有内容
【发布时间】:2022-01-05 08:44:25
【问题描述】:

我正在尝试在工作网站上搜索有关其职位名称的信息。我还想转到下一页,提取数据并继续前进,直到没有更多可用页面。但是,当我尝试单击下一页是 svg 标记时,我收到以下错误:

ElementClickInterceptedException: Message: element click intercepted: Element <path d="M5.408.153a.588.588 0 00-.098.755l.059.076L13.566 10 5.37 19.016a.588.588 0 00-.025.761l.065.07c.216.197.54.202.761.026l.07-.066 8.27-9.096c.337-.372.363-.925.077-1.324l-.078-.097L6.24.193a.588.588 0 00-.832-.04z" fill-rule="evenodd"></path> is not clickable at point (1357, 686). Other element would receive the click: <section id="explicit_consent" class="prompt-container">...</section>
  (Session info: chrome=96.0.4664.110)
Stacktrace:
0   chromedriver                        0x000000010edfa269 __gxx_personality_v0 + 582729
1   chromedriver                        0x000000010ed85c33 __gxx_personality_v0 + 106003
2   chromedriver                        0x000000010e942e28 chromedriver + 171560
3   chromedriver                        0x000000010e97f681 chromedriver + 419457
4   chromedriver                        0x000000010e97d33e chromedriver + 410430
....
....

这是我正在使用的脚本:

from selenium import webdriver
import time
import pandas as pd
from collections import defaultdict
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup

url1 = {'Accounting_and_Finance': ['https://www.jobsite.co.uk/jobs/Degree-Accounting-and-Finance'],
             'Aeronautical_Engineering': ['https://www.jobsite.co.uk/jobs/Degree-Aeronautical-Engineering']}


driver = webdriver.Chrome()
driver.implicitly_wait(10)
wait = WebDriverWait(driver, 10)
driver.maximize_window()
test_data = defaultdict(list)
for k, html in url1.items():
    for stuff_in in html:
        time.sleep(5)
        driver.get(stuff_in)
        wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.row.job-results-row")))
        soup = BeautifulSoup(driver.page_source, 'lxml')
        
        for match in soup.find('div', {'class':'ResultsSectionContainer-sc-gdhf14-0 kteggz'}):
            data = match.select('article h2[class="sc-fzoJMP gRGXcO"]')
            #test_data['job_title'].append(data.text.strip())
            print(data)
            points = wait.until(EC.presence_of_all_elements_located((By.XPATH, "//*[name()='path' and contains(@d,'M5.408.153')]")))
            for point in points:
                point.click()
            time.sleep(1)

更新:在使用来自@Arundeep 的附加代码时,我有以下内容:

driver = webdriver.Chrome()
driver.implicitly_wait(5)
wait = WebDriverWait(driver, 5)
driver.maximize_window()
test_data = defaultdict(list)
for k, html in url1.items():
    for stuff_in in html:
        time.sleep(5)
        driver.get(stuff_in)
        wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.row.job-results-row")))
        soup = BeautifulSoup(driver.page_source, 'lxml')
        
        while True:
            for match in soup.find('div', {'class':'ResultsSectionContainer-sc-gdhf14-0 kteggz'}):
                for m in range(1, 26):
                    data = match.select(f'body > div:nth-child(2) > div:nth-child(1) > div:nth-child(2) > div:nth-child(1) > div:nth-child(2) > div:nth-child(1) > div:nth-child(2) > div:nth-child(2) > div:nth-child(1) > div:nth-child({m}) > article:nth-child(1) > div:nth-child(3) > dl:nth-child(5) > span:nth-child(1)')
                    #test_data['job_title'].append(data.text.strip())
                    print(data)
                    wait=WebDriverWait(driver,60)
                    try:
                        wait.until(EC.element_to_be_clickable((By.XPATH,"//a[@data-at='pagination-next'][not(@disabled)]"))).click()
                    except:
                        break

我不得不更改 CSS 选择器,使其更通用,因为每个页面的类名称都会发生变化。但是,我似乎无法获取文本,因为这会不断引发错误。在不抓取文本的情况下,我得到以下输出:

[<span class="sc-AxjAm hCtIkK sc-fznAgC fSfCSi" fill="#3a434f"><svg viewbox="0 0 16 16"><path d="M15.52 5.06a.48.48 0 01.472.394L16 5.54v7.04a1.12 1.12 0 01-.998 1.113l-.122.007H3.36a.48.48 0 01-.086-.952l.086-.008h11.52a.16.16 0 00.152-.11l.008-.05V5.54a.48.48 0 01.48-.48zm-1.28-1.28a.48.48 0 01.472.394l.008.086v7.04a1.12 1.12 0 01-.998 1.113l-.122.007H2.08a.48.48 0 01-.086-.952l.086-.008H13.6a.16.16 0 00.152-.11l.008-.05V4.26a.48.48 0 01.48-.48zM11.683 2.5c.795 0 1.44.645 1.44 1.44v5.484a1.44 1.44 0 01-1.44 1.44H1.44A1.44 1.44 0 010 9.424V3.94C0 3.145.645 2.5 1.44 2.5zM.96 8.634v.79c0 .265.215.48.48.48l.789-.001L.96 8.634zm8.575-5.175H3.588L.96 6.087v1.189l2.627 2.627h5.949l2.627-2.628V6.088L9.535 3.459zm2.628 5.174l-1.269 1.27h.79a.48.48 0 00.471-.393l.008-.086v-.791zM6.562 4.351a2.33 2.33 0 110 4.662 2.33 2.33 0 010-4.662zm0 .96a1.37 1.37 0 100 2.742 1.37 1.37 0 000-2.742zm-3.438.89a.49.49 0 01.48.5c0 .246-.17.45-.394.493l-.086.008h-.529a.49.49 0 01-.48-.5c0-.246.17-.45.394-.492l.086-.008h.53zm7.404 0a.49.49 0 01.48.5c0 .246-.17.45-.394.493l-.086.008h-.529a.49.49 0 01-.48-.5c0-.246.17-.45.394-.492l.086-.008h.529zm1.155-2.741l-.79-.001 1.27 1.271v-.79a.48.48 0 00-.394-.472l-.086-.008zM2.23 3.459l-.79.001a.48.48 0 00-.48.48v.789l1.27-1.27z" fill-rule="evenodd"></path></svg></span>]
[]
[]
[]
[]
...
...

它点击下一页,但我无法获取输出给出的每一页的数据。这是循环和选择器的问题吗?

【问题讨论】:

  • 能否分享完整代码让我们直接复制粘贴并尝试运行解决问题?其他明智的尝试不同的 xpath 或使用 css 选择器。
  • @jaykishan 我已经添加了库依赖项
  • 你遇到什么样的错误?

标签: python selenium selenium-webdriver


【解决方案1】:
url1 = {'Accounting_and_Finance': ['https://www.jobsite.co.uk/jobs/Degree-Accounting-and-Finance'],
             'Aeronautical_Engineering': ['https://www.jobsite.co.uk/jobs/Degree-Aeronautical-Engineering']}


driver = webdriver.Chrome()
driver.implicitly_wait(10)
wait = WebDriverWait(driver, 10)
driver.maximize_window()
test_data = defaultdict(list)
for k, html in url1.items():
    for stuff_in in html:
        time.sleep(5)
        driver.get(stuff_in)
        wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.row.job-results-row")))
        soup = BeautifulSoup(driver.page_source, 'lxml')
        
        for match in soup.find('div', {'class':'ResultsSectionContainer-sc-gdhf14-0 kteggz'}):
            data = match.select('article h2[class="sc-fzoJMP gRGXcO"]')
            #test_data['job_title'].append(data.text.strip())
            print(data)
            points = wait.until(EC.presence_of_all_elements_located((By.XPATH, "//*[name()='path' and contains(@d,'M5.408.153')]")))
            for point in points:
                driver.find_element_by_xpath(point).click()
            time.sleep(1)

可能会起作用。如果不起作用,请尝试并发表评论。另一种方法只是改变元素的存在或visibility_of_element_located 或使用“arguments[0].click();” (js) 点击元素。

【讨论】:

  • 我收到以下错误:Message: invalid argument: 'value' must be a string
  • 将 xpath 作为字符串 str(point) 传递或获取索引并传递索引。
  • 我收到以下错误:DeprecationWarning: find_element_by_* commands are deprecated. Please use find_element()
  • 加上这个错误:和nvalid selector: Unable to locate an element with the xpath expression &lt;selenium.webdriver.remote.webelement.WebElement (session="6c5110e618573f2cd4479a5844718ab7", element="abf0692f-ee04-4475-9d44-640443cec3da")&gt; because of the following error: SyntaxError: Failed to execute 'evaluate' on 'Document': The string '&lt;selenium.webdriver.remote.webelement.WebElement (session="6c5110e618573f2cd4479a5844718ab7", element="abf0692f-ee04-4475-9d44-640443cec3da")&gt;' is not a valid XPath expression.
  • 发送完整代码 ....
【解决方案2】:
wait=WebDriverWait(driver,60)     
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR,"#ccmgt_explicit_accept > span"))).click()

你有一个重叠的元素,请先关闭它。

<section id="explicit_consent" class="prompt-container">...</section>

要找到分页-下一个,我建议使用以下内容,这样您就可以知道它何时被禁用。

while True:
    try:
        wait.until(EC.element_to_be_clickable((By.XPATH,"//a[@data-at='pagination-next'][not(@disabled)]"))).click()
    except:
        break

进口:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC

【讨论】:

  • 如果你有兴趣,我已经更新了我的帖子!
  • 由于某种原因,右下角的聊天框拦截了下一页的点击,并完全停止:mya-widget-bubble-wrapper__15npR
  • elem=wait.until(EC.element_to_be_clickable((By.XPATH,"//a[@data-at='pagination-next'][not(@disabled)]"))) driver.execute_script("arguments[0].click();", elem)
猜你喜欢
  • 2020-05-01
  • 2012-03-25
  • 1970-01-01
  • 1970-01-01
  • 2021-03-26
  • 1970-01-01
  • 2013-10-08
  • 2019-02-20
  • 1970-01-01
相关资源
最近更新 更多