【问题标题】:Selenium webdriver will not fully load page (python)Selenium webdriver 不会完全加载页面(python)
【发布时间】:2018-01-02 03:58:42
【问题描述】:

我一直在使用 selenium webdriver 和 python 尝试登录这个网站Login Page Here

为此,我在 python 中执行了以下操作:

from selenium import webdriver 
import bs4 as bs


driver = webdriver.Chrome()
driver.get('https://app.chatra.io/')

然后我继续尝试使用 Beautiful Soup 进行解析:

html = driver.execute_script('return document.documentElement.outerHTML')
soup = bs.BeautifulSoup(html, 'html.parser')
print(soup.prettify)

主要问题是页面永远不会完全加载。当我自己在浏览器中加载页面时,一切都很好。然而,当 selenium webdriver 尝试加载它时,它似乎只是中途停止。

知道为什么吗?关于如何解决它或在哪里学习有什么想法吗?

【问题讨论】:

    标签: python python-3.x selenium-webdriver selenium-chromedriver


    【解决方案1】:

    首先,我在最新的 Chrome 中也可以重现这个问题(chromedriver 2.34 - 也是当前最新的) - 目前还不确定发生了什么。解决方法:Firefox 非常适合我


    而且,我会在 driver.get() 和 HTML 解析之间添加一个额外的步骤 - explicit wait 让页面正确加载,直到所需的条件为真:

    import bs4 as bs
    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.wait import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    
    driver = webdriver.Firefox()
    driver.get('https://app.chatra.io/')
    
    wait = WebDriverWait(driver, 10)
    wait.until(EC.visibility_of_element_located((By.ID, "signin-email")))
    
    html = driver.execute_script('return document.documentElement.outerHTML')
    soup = bs.BeautifulSoup(html, 'html.parser')
    print(soup.prettify())
    

    请注意,您还需要调用 prettify() - 这是一个方法。

    【讨论】:

      【解决方案2】:

      您面临的问题有以下几个方面:

      • 当您尝试获取 BeautifulSoup 的帮助时,如果您尝试使用 urllib.request 中的 urlopen ,则会出现错误全部:

        urllib.error.HTTPError: HTTP Error 403: Forbidden
        

        这意味着 urllib.request 被检测到并且 HTTP Error 403: Forbidden 被提出。因此,使用 selenium 中的 webdriver 是有意义的。

      • 接下来,当您在 ChromeDriverChrome 的帮助下,Website 最初会打开并呈现。但是很快就会检测到 ChromeDriverWebDriver 并且 ChromeDriver 无法解析 <head><body> 标记。您会看到最小的标题:

        <!DOCTYPE html>
        <html xmlns="http://www.w3.org/1999/xhtml" class="supports cssfilters flexwrap chrome webkit win hover web"></html>
        
      • 最后,当您在 GeckoDriverFirefox Quantum 的帮助下,Website 会打开并正确呈现如下:

        代码块:

        from selenium import webdriver
        from bs4 import BeautifulSoup as soup
        
        driver = webdriver.Firefox(executable_path=r'C:\Utility\BrowserDrivers\geckodriver.exe')
        driver.get('https://app.chatra.io/')
        html = driver.execute_script('return document.documentElement.outerHTML')
        pagesoup = soup(html, "html.parser")
        print(pagesoup)
        

        控制台输出:

        <html class="supports cssfilters flexwrap firefox gecko win hover web"><head>
        <link class="" href="https://app.chatra.io/b281cc6b75916e26b334b5a05913e3eb18fd3a4d.css?meteor_css_resource=true&amp;_g_app_v_=51" rel="stylesheet" type="text/css"/>
        <meta charset="utf-8"/>
        <meta content="IE=edge" http-equiv="X-UA-Compatible"/>
        <meta content="width=device-width, initial-scale=1, maximum-scale=1, minimum-scale=1, user-scalable=no, viewport-fit=cover" name="viewport"/>
        .
        .
        .
        <em>··· Chatra</em>
        .
        .
        .
        </div></body></html>
        
      • prettify 添加到 soup 提取中:

        代码块:

        from selenium import webdriver
        from bs4 import BeautifulSoup as soup
        
        driver = webdriver.Firefox(executable_path=r'C:\Utility\BrowserDrivers\geckodriver.exe')
        driver.get('https://app.chatra.io/')
        html = driver.execute_script('return document.documentElement.outerHTML')
        pagesoup = soup(html, "html.parser")
        print(pagesoup.prettify)
        

        控制台输出:

        <bound method Tag.prettify of <html class="supports cssfilters flexwrap firefox gecko win hover web"><head>
        <link class="" href="https://app.chatra.io/b281cc6b75916e26b334b5a05913e3eb18fd3a4d.css?meteor_css_resource=true&amp;_g_app_v_=51" rel="stylesheet" type="text/css"/>
        <meta charset="utf-8"/>
        <meta content="IE=edge" http-equiv="X-UA-Compatible"/>
        <meta content="width=device-width, initial-scale=1, maximum-scale=1, minimum-scale=1, user-scalable=no, viewport-fit=cover" name="viewport"/>
        .
        .
        .
        <em>··· Chatra</em>
        .
        .
        .
        </div></body></html>>
        
      • 你也可以使用Seleniumpage_source方法如下:

        代码块:

        from selenium import webdriver
        
        driver = webdriver.Firefox(executable_path=r'C:\Utility\BrowserDrivers\geckodriver.exe')
        driver.get('https://app.chatra.io/')
        print(driver.page_source)
        

        控制台输出:

      <html class="supports cssfilters flexwrap firefox gecko win hover web">
      
      <head>
        <link rel="stylesheet" type="text/css" class="" href="https://app.chatra.io/b281cc6b75916e26b334b5a05913e3eb18fd3a4d.css?meteor_css_resource=true&amp;_g_app_v_=51">
        <meta charset="utf-8">
        <meta http-equiv="X-UA-Compatible" content="IE=edge">
        <meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1, minimum-scale=1, user-scalable=no, viewport-fit=cover">
      
        <!-- platform specific stuff -->
        <meta name="msapplication-tap-highlight" content="no">
        <meta name="apple-mobile-web-app-capable" content="yes">
      
        <!-- favicon -->
        <link rel="shortcut icon" href="/static/favicon.ico">
      
        <!-- win8 tile -->
        <meta name="msapplication-TileImage" content="/static/win-tile.png">
        <meta name="msapplication-TileColor" content="#ffffff">
        <meta name="application-name" content="Chatra">
      
        <!-- apple touch icon -->
        <!--<link rel="apple-touch-icon" sizes="256x256" href="/static/?????.png">-->
      
        <title>··· Chatra</title>
      
        <style>
          body {
            background: #f6f5f7
          }
        </style>
      
        <style type="text/css"></style>
      </head>
      
      <body>
      
      
      
        <script async="" src="https://www.google-analytics.com/analytics.js"></script>
        <script type="text/javascript" src="/meteor_runtime_config.js"></script>
      
        <script type="text/javascript" src="https://app.chatra.io/9153feecdc706adbf2c71253473a6aa62c803e45.js?meteor_js_resource=true&amp;_g_app_v_=51"></script>
      
      
      
        <div class="body body-layout">
          <div class="body-layout__main main-layout">
            <aside class="main-layout__left-sidebar">
              <div class="left-sidebar-layout">
              </div>
            </aside>
            <div class="main-layout__content">
              <div class="content-layout">
      
      
                <main class="content-layout__main is-no-fades js-popover-boundry js-main">
      
                  <div class="center loading loading--light">
                    <div class="content-padding nothing">
      
      
                      <em>··· Chatra</em>
      
      
                    </div>
                  </div>
      
                </main>
              </div>
            </div>
          </div>
        </div>
      </body>
      </html>

      【讨论】:

        猜你喜欢
        • 2014-06-11
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2014-03-26
        • 2014-12-21
        • 2015-06-24
        相关资源
        最近更新 更多