使用 selenium 抓取数据答案

【问题标题】：Web Scraping data with selenium使用 selenium 抓取数据
【发布时间】：2020-09-09 19:36:15
【问题描述】：

你好我正在刮这个页面https://www.betexplorer.com/soccer/china/super-league-2016/beijing-guoan-henan-jianye/S49KzkvO/我必须刮这些数据

Country = driver.find_element_by_xpath("/html/body/div[4]/div[4]/div/div/div[1]/section/ul[1]/li[3]/a").text
leagueseason = driver.find_element_by_xpath("/html/body/div[4]/div[4]/div/div/div[1]/section/header/h1/a").text
Home = driver.find_element_by_xpath("/html/body/div[4]/div[4]/div/div/div[1]/section/ul[2]/li[1]/h2/a").text
Away = driver.find_element_by_xpath("/html/body/div[4]/div[4]/div/div/div[1]/section/ul[2]/li[3]/h2/a").text

我尝试使用这些 XPATH，但我会适应更具体的 XPath，因为这可能会发生变化。有什么建议吗？谢谢

【问题讨论】：

标签： python selenium xpath css-selectors webdriverwait

【解决方案1】：

要打印元素的innerText，您必须为visibility_of_element_located() 诱导WebDriverWait，您可以使用以下任一Locator Strategies：

使用css-selectors 和get_attribute("innerHTML")：

中国：

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "ul.list-breadcrumb li:nth-child(3) a"))).get_attribute("innerHTML"))

2016 年超级联赛：

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "h1.wrap-section__header__title>a"))).get_attribute("innerHTML"))

北京国安：

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "ul.list-details>li:first-child h2.list-details__item__title>a"))).get_attribute("innerHTML"))

河南建业：

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "ul.list-details>li:nth-child(3) h2.list-details__item__title>a"))).get_attribute("innerHTML"))

使用xpath和text属性：

中国：

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//ul[@class='list-breadcrumb']//following::li[3]//a"))).text)

2016 年超级联赛：

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h1[@class='wrap-section__header__title']/a"))).text)

北京国安：

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//ul[@class='list-details']//following::li[1]//h2/a"))).text)

河南建业：

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//ul[@class='list-details']//following::li[2]//h2/a"))).text)

注意：您必须添加以下导入：

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

您可以在How to retrieve the text of a WebElement using Selenium - Python找到相关讨论

结尾

链接到有用的文档：

get_attribute() 方法 Gets the given attribute or property of the element.
text 属性返回 The text of the element.
Difference between text and innerHTML using Selenium

【讨论】：