【发布时间】:2022-01-05 08:44:25
【问题描述】:
我正在尝试在工作网站上搜索有关其职位名称的信息。我还想转到下一页,提取数据并继续前进,直到没有更多可用页面。但是,当我尝试单击下一页是 svg 标记时,我收到以下错误:
ElementClickInterceptedException: Message: element click intercepted: Element <path d="M5.408.153a.588.588 0 00-.098.755l.059.076L13.566 10 5.37 19.016a.588.588 0 00-.025.761l.065.07c.216.197.54.202.761.026l.07-.066 8.27-9.096c.337-.372.363-.925.077-1.324l-.078-.097L6.24.193a.588.588 0 00-.832-.04z" fill-rule="evenodd"></path> is not clickable at point (1357, 686). Other element would receive the click: <section id="explicit_consent" class="prompt-container">...</section>
(Session info: chrome=96.0.4664.110)
Stacktrace:
0 chromedriver 0x000000010edfa269 __gxx_personality_v0 + 582729
1 chromedriver 0x000000010ed85c33 __gxx_personality_v0 + 106003
2 chromedriver 0x000000010e942e28 chromedriver + 171560
3 chromedriver 0x000000010e97f681 chromedriver + 419457
4 chromedriver 0x000000010e97d33e chromedriver + 410430
....
....
这是我正在使用的脚本:
from selenium import webdriver
import time
import pandas as pd
from collections import defaultdict
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
url1 = {'Accounting_and_Finance': ['https://www.jobsite.co.uk/jobs/Degree-Accounting-and-Finance'],
'Aeronautical_Engineering': ['https://www.jobsite.co.uk/jobs/Degree-Aeronautical-Engineering']}
driver = webdriver.Chrome()
driver.implicitly_wait(10)
wait = WebDriverWait(driver, 10)
driver.maximize_window()
test_data = defaultdict(list)
for k, html in url1.items():
for stuff_in in html:
time.sleep(5)
driver.get(stuff_in)
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.row.job-results-row")))
soup = BeautifulSoup(driver.page_source, 'lxml')
for match in soup.find('div', {'class':'ResultsSectionContainer-sc-gdhf14-0 kteggz'}):
data = match.select('article h2[class="sc-fzoJMP gRGXcO"]')
#test_data['job_title'].append(data.text.strip())
print(data)
points = wait.until(EC.presence_of_all_elements_located((By.XPATH, "//*[name()='path' and contains(@d,'M5.408.153')]")))
for point in points:
point.click()
time.sleep(1)
更新:在使用来自@Arundeep 的附加代码时,我有以下内容:
driver = webdriver.Chrome()
driver.implicitly_wait(5)
wait = WebDriverWait(driver, 5)
driver.maximize_window()
test_data = defaultdict(list)
for k, html in url1.items():
for stuff_in in html:
time.sleep(5)
driver.get(stuff_in)
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.row.job-results-row")))
soup = BeautifulSoup(driver.page_source, 'lxml')
while True:
for match in soup.find('div', {'class':'ResultsSectionContainer-sc-gdhf14-0 kteggz'}):
for m in range(1, 26):
data = match.select(f'body > div:nth-child(2) > div:nth-child(1) > div:nth-child(2) > div:nth-child(1) > div:nth-child(2) > div:nth-child(1) > div:nth-child(2) > div:nth-child(2) > div:nth-child(1) > div:nth-child({m}) > article:nth-child(1) > div:nth-child(3) > dl:nth-child(5) > span:nth-child(1)')
#test_data['job_title'].append(data.text.strip())
print(data)
wait=WebDriverWait(driver,60)
try:
wait.until(EC.element_to_be_clickable((By.XPATH,"//a[@data-at='pagination-next'][not(@disabled)]"))).click()
except:
break
我不得不更改 CSS 选择器,使其更通用,因为每个页面的类名称都会发生变化。但是,我似乎无法获取文本,因为这会不断引发错误。在不抓取文本的情况下,我得到以下输出:
[<span class="sc-AxjAm hCtIkK sc-fznAgC fSfCSi" fill="#3a434f"><svg viewbox="0 0 16 16"><path d="M15.52 5.06a.48.48 0 01.472.394L16 5.54v7.04a1.12 1.12 0 01-.998 1.113l-.122.007H3.36a.48.48 0 01-.086-.952l.086-.008h11.52a.16.16 0 00.152-.11l.008-.05V5.54a.48.48 0 01.48-.48zm-1.28-1.28a.48.48 0 01.472.394l.008.086v7.04a1.12 1.12 0 01-.998 1.113l-.122.007H2.08a.48.48 0 01-.086-.952l.086-.008H13.6a.16.16 0 00.152-.11l.008-.05V4.26a.48.48 0 01.48-.48zM11.683 2.5c.795 0 1.44.645 1.44 1.44v5.484a1.44 1.44 0 01-1.44 1.44H1.44A1.44 1.44 0 010 9.424V3.94C0 3.145.645 2.5 1.44 2.5zM.96 8.634v.79c0 .265.215.48.48.48l.789-.001L.96 8.634zm8.575-5.175H3.588L.96 6.087v1.189l2.627 2.627h5.949l2.627-2.628V6.088L9.535 3.459zm2.628 5.174l-1.269 1.27h.79a.48.48 0 00.471-.393l.008-.086v-.791zM6.562 4.351a2.33 2.33 0 110 4.662 2.33 2.33 0 010-4.662zm0 .96a1.37 1.37 0 100 2.742 1.37 1.37 0 000-2.742zm-3.438.89a.49.49 0 01.48.5c0 .246-.17.45-.394.493l-.086.008h-.529a.49.49 0 01-.48-.5c0-.246.17-.45.394-.492l.086-.008h.53zm7.404 0a.49.49 0 01.48.5c0 .246-.17.45-.394.493l-.086.008h-.529a.49.49 0 01-.48-.5c0-.246.17-.45.394-.492l.086-.008h.529zm1.155-2.741l-.79-.001 1.27 1.271v-.79a.48.48 0 00-.394-.472l-.086-.008zM2.23 3.459l-.79.001a.48.48 0 00-.48.48v.789l1.27-1.27z" fill-rule="evenodd"></path></svg></span>]
[]
[]
[]
[]
...
...
它点击下一页,但我无法获取输出给出的每一页的数据。这是循环和选择器的问题吗?
【问题讨论】:
-
能否分享完整代码让我们直接复制粘贴并尝试运行解决问题?其他明智的尝试不同的 xpath 或使用 css 选择器。
-
@jaykishan 我已经添加了库依赖项
-
你遇到什么样的错误?
标签: python selenium selenium-webdriver