【问题标题】:Webscraping using Selenium - Element not found使用 Selenium 进行网页抓取 - 未找到元素
【发布时间】:2017-09-20 19:08:56
【问题描述】:

我正在尝试抓取这个网站:

https://www.novanthealth.org/home/patients--visitors/locations/clinics.aspx?behavioral-health=yes

我想获取诊所名称和地址,这是我正在使用的 python 代码

from selenium import webdriver
import pd 
import time 

#driver = webdriver.Chrome()
specialty   = ["behavioral-health","dermatology","colon","ear-nose-and-    throat","endocrine","express","family-practice","foot-and-ankle",
           "gastroenterology","heart-%26-vascular","hepatobiliary-and-pancreas","infectious-disease","inpatient","internal-medicine",
           "neurology","nutrition","ob%2Fgyn","occupational-medicine","oncology","orthopedics","osteoporosis","pain-management",
           "pediatrics","plastic-surgery","pulmonary","rehabilitation","rheumatology","sleep","spine","sports-medicine","surgical","urgent-care",
           "urology","weight-loss","wound-care","pharmacy"]
name = []
address = []

for q in specialty: 
    driver = webdriver.Chrome()
    driver.get("https://www.novanthealth.org/home/patients--   visitors/locations/clinics.aspx?"+q+"=yes")
    x = driver.find_element_by_class_name("loc-link-right")
    num_page = str(x.text).split(" ")
    x.click() 

    for i in num_page:
        btn = driver.find_element_by_xpath('//*[@id="searchResults"]/div[2]/div[2]/button['+i+']')
        btn.click() 
        time.sleep(8) #instaed of this use waituntil #     
        temp = driver.find_element_by_class_name("gray-background").text
        temp0 = temp.replace("Get directions Website View providers\n","")

        x_temp = temp0.split("\n\n\n")

        for j in range(0,len(x_temp)-1):
            temp1 = x_temp[j].split("Phone:")
            name.append(temp1[0].split("\n")[1])
            temp3 = temp1[1].split("Office hours:")
            temp4 = temp3[0].split("\n")
            temp5 = temp4[1:len(temp4)]
            address.append(" ".join(temp5))
   driver.close()   

如果我一次只将它用于一个专业,但当我像上面那样在循环中传递专业时,此代码工作正常,代码在第二次迭代中失败并出现错误:

Traceback (most recent call last):
 File "<stdin>", line 10, in <module>
File "C:\Anaconda2\lib\site- packages\selenium\webdriver\remote\webelement.py", line 77, in click self._execute(Command.CLICK_ELEMENT)
File C:\Anaconda2\lib\sitepackages\selenium\webdriver\remote\webelement.py", line 493, in _execute return self._parent.execute(command, params)
File "C:\Anaconda2\lib\site-packages\selenium\webdriver\remote\webdriver.py",     line 249, in execute self.error_handler.check_response(response)
 File "C:\Anaconda2\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 193, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.ElementNotVisibleException: Message: element not visible
(Session info: chrome=46.0.2490.80)
(Driver info: chromedriver=2.19.346078    (6f1f0cde889532d48ce8242342d0b84f94b114a1),platform=Windows NT 6.1 SP1 x86_64

我没有太多使用python的经验,任何帮助将不胜感激

【问题讨论】:

  • 您必须让您的网络驱动程序等待几秒钟,直到相应的元素出现在页面上。看看 webdriver_wait 函数..
  • 我已经浏览了这方面的文档,但是在实现它时遇到了一些问题,你能给它一个示例代码吗?谢谢!
  • @AvinashRaj 我添加了 wait = WebDriverWait(driver, 10) wait.until(EC.presence_of_element_located((By.ID, "searchResults"))), 以上 btn = driver.find_element_by_xpath('// *[@id="searchResults"]/div[2]/div[2]/button['+i+']') 这次它运行了 2 次迭代,但在第三次迭代中给出了相同的错误
  • @Vaibhav:避免在此处直接询问“示例代码”是值得的。这通常被理解为“你愿意为我做我的工作吗”,即使这不是实际意图。

标签: python python-2.7 selenium web-scraping


【解决方案1】:

错误消息告诉你为什么它不起作用。

ElementNotVisibleException: Message: element not visible

如果您不向下滚动查看该元素,则该元素不可见。

你必须根据浏览器的大小向下滚动列表,

只需从源页面中提取数据,这样更容易。

【讨论】:

    【解决方案2】:

    通常我会使用 Selenium Basic,一个 excel 插件。您可以在 Python 中使用相同的逻辑。这是在 VBA 中尝试过的,对我来说效果很好。

    Private assert As New assert
    Private driver As New Selenium.ChromeDriver
    
    Sub sel_novanHealth()
    Set ObjWB = ThisWorkbook
    Set ObjExl_Sheet1 = ObjWB.Worksheets("Sheet1")
    Dim Name As Variant
    
       'Open the website
        driver.get "https://www.novanthealth.org/home/patients--visitors/locations.aspx"
    
        driver.Window.Maximize
    
        driver.Wait (1000)
    
        'Find out the total number of pages to be scraped
        lnth = driver.FindElementsByXPath("//button[@class='paginate_button']").Count
       'Running the Loop for the Pages
        For y = 2 To lnth
                'Running the Loop for the Elements
                For x = 1 To 10
                    Name = driver.FindElementsByXPath("//div[@class='span12 loc-heading']")(x).Text
                    ' Element 2
                     'Element 3
                Next x
                    driver.FindElementsByXPath("//button[@class='paginate_button']")(y).Click
        Next y
    
            driver.Wait (1000)
    
    
    End Sub
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2019-04-06
      • 2019-06-23
      • 2019-02-03
      • 2022-01-14
      相关资源
      最近更新 更多