【问题标题】:Selenium - Cannot locate elements in page sourceSelenium - 无法在页面源中找到元素
【发布时间】:2019-09-18 14:36:55
【问题描述】:

我正在尝试使用 Selenium 抓取网页,但由于某种原因,我需要的元素没有显示在页面源中

我尝试使用 WebDriverWait 直到页面加载。我还尝试查看数据是否在我需要切换到的不同帧中。

driver.get('https://foreclosures.cabarruscounty.us/')

try:
    WebDriverWait(driver, delay).until(EC.presence_of_element_located((By.XPATH,'//*[@id="app"]/div[5]/div/div')))
    print("Page is ready!")

    web_url = driver.page_source
    print(web_url)

except TimeoutException:
    print("Loading took too much time!")

我希望看到我可以提取的每个单独的属性卡的所有记录。但是,页面源不显示任何此类数据。

如果我手动加载网页并检查源,数据只是不存在 view-source:https://foreclosures.cabarruscounty.us/

【问题讨论】:

  • 显而易见的答案是该元素确实不存在。你确定 xpath 是正确的吗?
  • 是的,xpath 是正确的。
  • 你在这个页面中寻找什么数据
  • 对于每条记录,我都需要 Real ID、Case Number 和 Owner 字段。

标签: python selenium selenium-webdriver xpath webdriverwait


【解决方案1】:

试试下面的代码。它将返回所有元素。使用visibility_of_all_elements_located()

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver

driver=webdriver.Chrome()
driver.get("https://foreclosures.cabarruscounty.us/")
elements=WebDriverWait(driver,30).until(EC.visibility_of_all_elements_located((By.XPATH,"//div[@id='app']//div[@class='card-body']/div[1]")))
allrecord=[ele.text for ele in elements]
print(allrecord) #it will give you all record.

如果您只打印第一个元素值。

print(allrecord[0].splitlines())

你会得到以下输出:

['Real ID: 04-086 -0040.00', 'Status: SALE SCHEDULED', 'Case Number: 18-CVD-2804', 'Tax Value: $29,660', 'Min Bid: $10,067', 'Sale Date: 10/03/2019', 'Sale Time: 12:00 PM', 'Owner: DOUGLAS JAMES W', 'Attorney: ZACCHAEUS LEGAL SVCS']

【讨论】:

    【解决方案2】:

    要提取第一个 Real IDCase NumberOwner 字段,您必须为visibility_of_element_located(),你可以使用下面的Locator Strategies

    • 代码块:

      from selenium import webdriver
      from selenium.webdriver.common.by import By
      from selenium.webdriver.support.ui import WebDriverWait
      from selenium.webdriver.support import expected_conditions as EC
      
      chrome_options = webdriver.ChromeOptions()
      chrome_options.add_argument("start-maximized")
      chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
      chrome_options.add_experimental_option('useAutomationExtension', False)
      driver = webdriver.Chrome(options=chrome_options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
      driver.get("https://foreclosures.cabarruscounty.us/");
      Real_ID = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='row']//div[@class='card cardClass']/img//following::div[@class='card-body']//div/b"))).text
      Case_Number = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='row']//div[@class='card cardClass']/img//following::div[@class='card-body']//div//following-sibling::b[2]"))).text
      Owner = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='row']//div[@class='card cardClass']/img//following::div[@class='card-body']//div//following-sibling::b[7]"))).text
      print("{} is {} owned by {}".format(Real_ID,Case_Number,Owner))
      driver.quit()
      
    • 控制台输出:

      Real ID: 04-086 -0040.00 is Case Number: 18-CVD-2804 owned by Owner: DOUGLAS JAMES W
      

    【讨论】:

      【解决方案3】:

      您可以使用 ImplicitWait 和 PageLoad 来等待元素:

      //For 30 seconds
      driver.Manage().Timeouts().ImplicitWait = TimeSpan.FromSeconds(30);
      driver.Manage().Timeouts().PageLoad = TimeSpan.FromSeconds(30);
      

      此代码适用于 C# 和 Selenium

      【讨论】:

      • 我不确定等待或页面加载是否是问题所在。我需要的数据在页面源中不存在,即使我手动加载网页并检查。
      • 好的,我尝试去foreclosures.cabarruscounty.us,但无法访问网站
      猜你喜欢
      • 1970-01-01
      • 2016-07-19
      • 2021-09-05
      • 2017-11-15
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2013-09-16
      相关资源
      最近更新 更多