【问题标题】:Iterating over selenium webdriver's driver.find_elements遍历 selenium webdriver 的 driver.find_elements
【发布时间】:2019-06-05 23:06:51
【问题描述】:

我必须从 AXS.com 网站上抓取所有活动详细信息,作为我的网络抓取任务的一部分。我尝试过将 chrome web 驱动程序与 Python+Selenium 一起使用。

我可以通过使用driver.find_element_by_class_name() 来获得价值,例如driver.find_element_by_class_name("headliner").text

但这只会得到第一项。在使用driver.find_elements(By.XPATH,"//div[@class='results-table results-table--events']") 后尝试迭代时,我被卡住了。

from bs4 import BeautifulSoup
from selenium import webdriver
import time
driver = webdriver.Chrome('/home/.../chromedriver_linux64/chromedriver')
driver.get("https://www.axs.com/browse/music/alternative-punk")
driver.implicitly_wait(10)
allevent_details = driver.find_elements(By.XPATH,"//div[@class='results-table results-table--events']")     
for i in allevent_details:
    print(i.find_element_by_class_name("headliner").text)

错误

NoSuchElementException: no such element: Unable to locate element: {"method":"class name","selector":"headliner"}
(Session info: chrome=74.0.3729.169)
(Driver info: chromedriver=74.0.3729.6 (255758eccf3d244491b8a1317aa76e1ce10d57e9-refs/branch-heads/3729@{#29}),platform=Linux 4.15.0-50-generic x86_64)

预期:

  • 内波
  • BLOXX....等

【问题讨论】:

    标签: python selenium-webdriver xpath css-selectors webdriverwait


    【解决方案1】:

    改变逻辑如下。

    from bs4 import BeautifulSoup
    from selenium import webdriver
    import time
    driver = webdriver.Chrome('/home/.../chromedriver_linux64/chromedriver')
    driver.get("https://www.axs.com/browse/music/alternative-punk")
    driver.implicitly_wait(10)
    allevent_details = driver.find_elements(By.XPATH,"//div[@class='results-table results-table--events']//div[@class='headliner']")     
    for i in allevent_details:
        print(i.text)
    

    【讨论】:

      【解决方案2】:

      尝试以下任一定位器。

      使用 Xpath

      allevent_details = driver.find_elements(By.XPATH,"//div[@class='results-table results-table--events']")
      for i in allevent_details:
           print(i.find_element_by_xpath(".//div[@class='headliner']").text)
      

      使用 Css 选择器

      for item in driver.find_elements_by_css_selector('.headliner'):
          print(item.text)
      

      【讨论】:

        【解决方案3】:

        要从webpage 中提取所有事件标题,您需要为visibility_of_all_elements_located() 诱导WebDriverWait,您可以使用以下任一Locator Strategies

        • 使用CSS_SELECTOR

          print([my_elem.text for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div.headliner")))])
          
        • 使用XPATH

          print([my_elem.text for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='headliner']")))])
          
        • 控制台输出:

          ['Inner Wave', 'BLOXX, Hembree and Warbly Jets', 'Frenship', 'LANY', 'together PANGEA & Vundabar', 'Night Beats', 'New Politics', 'The Technicolors', 'Davila 666', 'Vansire + BOYO', 'The Starting Line', 'Katzù Oso', 'The Raconteurs', 'Cayucas', 'ALT 98.7 Summer Camp']
          
        • 注意:您必须添加以下导入:

          from selenium.webdriver.support.ui import WebDriverWait
          from selenium.webdriver.common.by import By
          from selenium.webdriver.support import expected_conditions as EC
          

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 1970-01-01
          • 2012-08-16
          • 1970-01-01
          • 2019-12-31
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2012-07-05
          相关资源
          最近更新 更多