【问题标题】:Web Scraping data with selenium使用 selenium 抓取数据
【发布时间】:2020-09-09 19:36:15
【问题描述】:

你好我正在刮这个页面https://www.betexplorer.com/soccer/china/super-league-2016/beijing-guoan-henan-jianye/S49KzkvO/我必须刮这些数据

Country = driver.find_element_by_xpath("/html/body/div[4]/div[4]/div/div/div[1]/section/ul[1]/li[3]/a").text
leagueseason = driver.find_element_by_xpath("/html/body/div[4]/div[4]/div/div/div[1]/section/header/h1/a").text
Home = driver.find_element_by_xpath("/html/body/div[4]/div[4]/div/div/div[1]/section/ul[2]/li[1]/h2/a").text
Away = driver.find_element_by_xpath("/html/body/div[4]/div[4]/div/div/div[1]/section/ul[2]/li[3]/h2/a").text

我尝试使用这些 XPATH,但我会适应更具体的 XPath,因为这可能会发生变化。有什么建议吗?谢谢

【问题讨论】:

    标签: python selenium xpath css-selectors webdriverwait


    【解决方案1】:

    要打印元素的innerText,您必须为visibility_of_element_located() 诱导WebDriverWait,您可以使用以下任一Locator Strategies

    • 使用get_attribute("innerHTML")

      • 中国

        print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "ul.list-breadcrumb li:nth-child(3) a"))).get_attribute("innerHTML"))
        
      • 2016 年超级联赛

        print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "h1.wrap-section__header__title>a"))).get_attribute("innerHTML"))
        
      • 北京国安

        print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "ul.list-details>li:first-child h2.list-details__item__title>a"))).get_attribute("innerHTML"))
        
      • 河南建业

        print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "ul.list-details>li:nth-child(3) h2.list-details__item__title>a"))).get_attribute("innerHTML"))
        
    • 使用text属性:

      • 中国

        print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//ul[@class='list-breadcrumb']//following::li[3]//a"))).text)
        
      • 2016 年超级联赛

        print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h1[@class='wrap-section__header__title']/a"))).text)
        
      • 北京国安

        print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//ul[@class='list-details']//following::li[1]//h2/a"))).text)
        
      • 河南建业

        print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//ul[@class='list-details']//following::li[2]//h2/a"))).text)
        
      • 注意:您必须添加以下导入:

        from selenium.webdriver.support.ui import WebDriverWait
        from selenium.webdriver.common.by import By
        from selenium.webdriver.support import expected_conditions as EC
        

    您可以在How to retrieve the text of a WebElement using Selenium - Python找到相关讨论


    结尾

    链接到有用的文档:

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2021-10-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-03-21
      • 2017-05-12
      • 1970-01-01
      相关资源
      最近更新 更多