【问题标题】:Not rendering entire page with Selenium不使用 Selenium 渲染整个页面
【发布时间】:2021-01-30 19:48:05
【问题描述】:

我需要整个网页源来进行抓取,但我只得到了其中的一部分。

代码试验:

options = Options()
options.add_argument('--headless')
options.add_argument('--disable-gpu')

driver = webdriver.Chrome(options=options)
driver.get(url)

time.sleep(10)

page = driver.page_source
driver.quit()
soup = BeautifulSoup(page, 'html5lib')

return soup

网址是:https://superbet.ro/pariuri-sportive/fotbal/live

【问题讨论】:

    标签: python selenium-webdriver xpath css-selectors webdriverwait


    【解决方案1】:

    由于

    这可能会有所帮助:Switching into second iframe in Selenium Python3

    【讨论】:

    • 现在,当我在打开页面源的情况下向下滚动时,会在滚动时添加一些事件和代码。那是因为 iframe 的存在?
    • 不,那将是客户端 Javascript(Ajax)
    【解决方案2】:

    要提取页面源,您需要:

    • 点击OK按钮接受cookies。

    • 使用visibility_of_element_located() 诱导WebDriverWait 获得WebElement 的可见性。

    • 您可以使用以下任一Locator Strategies

      • 使用CSS_SELECTOR

        driver.get("https://superbet.ro/pariuri-sportive/fotbal/live")
        WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "a#CybotCookiebotDialogBodyLevelButtonAccept[href]"))).click()
        WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "span.section-header__title")))
        
      • 使用XPATH

        driver.get("https://superbet.ro/pariuri-sportive/fotbal/live")
        WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[@id='CybotCookiebotDialogBodyLevelButtonAccept' and @href]"))).click()
        WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//span[@class='section-header__title']")))
        
    • 控制台输出:

    <html lang="en" style="--vh:6.13px;">
    
    <head>
      <meta charset="utf-8">
      <meta name="description" content="">
      <meta http-equiv="X-UA-Compatible" content="IE=edge">
      <meta name="viewport" content="width=device-width,initial-scale=1,user-scalable=0">
      <title>Superbet | Pariuri Sportive Online, Live, Casino, Loto, Virtuale</title>
      <script type="text/javascript" charset="UTF-8" async="" src="https://consentcdn.cookiebot.com/consentconfig/a438e411-35ff-432b-863f-3d25bed37901/state.js"></script>
      <script type="text/javascript" charset="UTF-8" async="" src="https://consent.cookiebot.com/logconsent.ashx?action=accept&amp;nocache=1612037844156&amp;referer=https%3A%2F%2Fsuperbet.ro%2Fpariuri-sportive%2Ffotbal%2Flive&amp;dnt=false&amp;method=strict&amp;clp=true&amp;cls=true&amp;clm=true&amp;cbid=a438e411-35ff-432b-863f-3d25bed37901&amp;cbt=leveloptin&amp;hasdata=true"></script>
      <script type="text/javascript" charset="UTF-8" async="" src="https://consent.cookiebot.com/a438e411-35ff-432b-863f-3d25bed37901/cc.js?renew=false&amp;referer=superbet.ro&amp;dnt=false&amp;forceshow=false&amp;cbid=a438e411-35ff-432b-863f-3d25bed37901&amp;whitelabel=false&amp;brandid=CookieConsent&amp;framework="></script>
      <script type="text/javascript" async="" src="https://consent.cookiebot.com/uc.js?cbid=a438e411-35ff-432b-863f-3d25bed37901"></script>
      <script async="" src="https://www.googletagmanager.com/gtm.js?id=GTM-MN5RWMH"></script>
      <script>
        if (!window.location.hostname.includes('local')) {
          window.dataLayer = window.dataLayer || [];
          window.dataLayer.push({
            originalLocation: document.location.protocol + '//' +
              document.location.hostname +
              document.location.pathname +
              document.location.search
          });
          (function(w, d, s, l, i) {
            w[l] = w[l] || [];
            w[l].push({
              'gtm.start': new Date().getTime(),
              event: 'gtm.js'
            });
            var f = d.getElementsByTagName(s)[0],
              j = d.createElement(s),
              dl = l != 'dataLayer' ? '&l=' + l : '';
            j.async = true;
            j.src = 'https://www.googletagmanager.com/gtm.js?id=' + i + dl;
            f.parentNode.insertBefore(j, f);
          })(window, document, 'script', 'dataLayer', 'GTM-MN5RWMH');
        }
      </script>
      . 
      . 
      .
      <iframe data-product="web_widget" title="No content" tabindex="-1" aria-hidden="true" src="about:blank" style="width: 0px; height: 0px; border: 0px; position: absolute; top: -9999px;"></iframe><iframe name="__uspapiLocator" tabindex="-1" role="presentation"
        aria-hidden="true" title="Blank" style="display: none; position: absolute; width: 1px; height: 1px; top: -9999px;"></iframe><iframe tabindex="-1" role="presentation" aria-hidden="true" title="Blank" src="https://consentcdn.cookiebot.com/sdk/bc-v2.min.html"
        style="position: absolute; width: 1px; height: 1px; top: -9999px;"></iframe>
      <div><iframe title="Deschide o miniaplicație widget unde puteți găsi mai multe informații" id="launcher" tabindex="-1" style="width: 142px; height: 50px; padding: 0px; margin: 10px 20px; position: fixed; bottom: 30px; overflow: visible; opacity: 0; border: 0px; z-index: 999998; transition-duration: 250ms; transition-timing-function: cubic-bezier(0.645, 0.045, 0.355, 1); transition-property: opacity, top, bottom; top: -9999px; visibility: hidden;"></iframe>
        <iframe
          title="Găsiți mai multe informații aici" id="webWidget" tabindex="-1" style="width: 374px; max-height: calc(100vh - 32px); height: 572px; position: fixed; opacity: 0; border: 0px; transition-duration: 250ms; transition-timing-function: cubic-bezier(0.645, 0.045, 0.355, 1); transition-property: opacity, top, bottom; top: -9999px; visibility: hidden; z-index: 999999;"></iframe>
      </div>
    
      </body>
    
    </html>

    参考文献

    您可以在以下位置找到一些相关的详细讨论:

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2013-08-12
      • 2019-04-01
      • 2021-03-20
      • 1970-01-01
      • 2020-11-19
      • 1970-01-01
      • 2012-12-29
      • 2015-02-25
      相关资源
      最近更新 更多