【问题标题】:How to get a table with dynamic id using Selenium with Python如何使用 Selenium 和 Python 获取具有动态 id 的表
【发布时间】:2020-10-18 16:40:14
【问题描述】:

我正在尝试从该页面https://www.holidayfrancedirect.co.uk/holiday-rentals/RG007075/index.htm 和其他类似页面中获取表格。

有问题的表格有一个动态 ID table-XXXX,其中 X 是每次页面加载时不同的数字。

该表具有以下属性:

class="tablesaw tablesaw-stack table-bordered table-centered rates-availability-table"

data-tablesaw-mode="stack"

我尝试了以下变体来定位此表(已查阅此帖子 How to find element by part of its id name in selenium with python),但似乎没有任何效果。

find_elements_by_css_selector("[id*='tab']")

find_elements_by_css_selector("[class*='tablesaw']")

find_elements_by_css_selector("[data-tablesaw-mode*='stack']")

【问题讨论】:

    标签: python selenium xpath web-scraping css-selectors


    【解决方案1】:

    WebElementAJAX 元素,因此要打印您必须为visibility_of_element_located() 诱导WebDriverWait 的值,您可以使用以下Locator Strategies 之一:

    • 使用CSS_SELECTOR

      driver.get('https://www.holidayfrancedirect.co.uk/holiday-rentals/RG007075/index.htm')
      print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table.tablesaw.tablesaw-stack.table-bordered.table-centered.rates-availability-table"))).text)
      
    • 使用XPATH

      driver.get('https://www.holidayfrancedirect.co.uk/holiday-rentals/RG007075/index.htm')
      print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table[@class='tablesaw tablesaw-stack table-bordered table-centered rates-availability-table']"))).text)
      
    • 控制台输出:

      Start Date End Date 3 Nights 4 Nights 5 Nights 6 Nights 7 Nights
      28 Mar 2020 1 May 2020 £225 £300 £350 £410 £470
      2 May 2020 26 Jun 2020 £250 £330 £400 £460 £530
      27 Jun 2020 3 Jul 2020 - - - - £675
      4 Jul 2020 10 Jul 2020 - - - - £920
      11 Jul 2020 14 Aug 2020 - - - - £985
      15 Aug 2020 21 Aug 2020 - - - - £920
      22 Aug 2020 28 Aug 2020 - - - - £675
      29 Aug 2020 31 Oct 2020 - - - - £470
      
    • 注意:您必须添加以下导入:

      from selenium.webdriver.support.ui import WebDriverWait
      from selenium.webdriver.common.by import By
      from selenium.webdriver.support import expected_conditions as EC
      

    【讨论】:

      【解决方案2】:

      数据通过 JavaScript 动态加载。但是您可以使用他们的 API 来加载表格。

      例如:

      import requests
      from bs4 import BeautifulSoup
      
      
      url = 'https://www.holidayfrancedirect.co.uk/holiday-rentals/RG007075/index.htm'
      rates_url = 'https://www.holidayfrancedirect.co.uk/api/property-rates/{property_id}/2020'
      property_id = url.split('/')[-2]
      
      data = requests.get(rates_url.format(property_id=property_id)).json()
      soup = BeautifulSoup(data['ratesHtml'], 'html.parser')
      
      # print table to screen:
      for tr in soup.select('tr'):
          tds = [td.get_text(strip=True) for td in tr.select('td, th')]
          print(('{:<15}'*7).format(*tds))
      

      打印:

      Start Date     End Date       3 Nights       4 Nights       5 Nights       6 Nights       7 Nights       
      28 Mar 2020    1 May 2020     £225           £300           £350           £410           £470           
      2 May 2020     26 Jun 2020    £250           £330           £400           £460           £530           
      27 Jun 2020    3 Jul 2020     -              -              -              -              £675           
      4 Jul 2020     10 Jul 2020    -              -              -              -              £920           
      11 Jul 2020    14 Aug 2020    -              -              -              -              £985           
      15 Aug 2020    21 Aug 2020    -              -              -              -              £920           
      22 Aug 2020    28 Aug 2020    -              -              -              -              £675           
      29 Aug 2020    31 Oct 2020    -              -              -              -              £470           
      

      【讨论】:

        猜你喜欢
        • 2021-12-30
        • 2022-01-01
        • 1970-01-01
        • 1970-01-01
        • 2018-09-09
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多