【问题标题】:Web Scraping Linkedin Pagination does not work for searchesWeb Scraping Linkedin 分页不适用于搜索
【发布时间】:2019-03-03 15:48:56
【问题描述】:

当我尝试以下操作时,我无法让 Linkedin 分页:

搜索网址:https://www.linkedin.com/search/results/people/?keywords=Business%20Development&origin=SWITCH_SEARCH_VERTICAL

然后我可以转到第一面,向下滚动(无限滚动),单击“下一步”,效果很好,但是在第 2 页上它不滚动。我已经意识到通过添加“&page = 2”不会更新url,因此不会更新滚动变量。我找到了另一种方法来完成这项工作 - 我只是想知道我哪里出错了,有没有专业人士来修复这个脚本?

from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.common.exceptions import NoSuchElementException
from time import sleep

userid = 'myemail@mail.com'
password = 'secret'

driver = webdriver.Chrome()

driver.get('https://www.linkedin.com')

driver.find_element_by_xpath("""//*[@id="login-email"]""").send_keys(userid)
driver.find_element_by_xpath("""//*[@id="login-password"]""").send_keys(password)
driver.find_element_by_xpath("""//*[@id="login-submit"]""").click()

driver.get('https://www.linkedin.com/search/results/people/?keywords=Business%20Development&origin=SWITCH_SEARCH_VERTICAL')

while True:

  SCROLL_PAUSE_TIME = 0.5

  # Get scroll height
  last_height = driver.execute_script("return document.body.scrollHeight")
  print('current url' + driver.current_url)

  while True:
      # Scroll down to bottom
      driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

      # Wait to load page
      sleep(SCROLL_PAUSE_TIME)

      # Calculate new scroll height and compare with last scroll height
      new_height = driver.execute_script("return document.body.scrollHeight")
      print('new height ' + str(new_height))
      if new_height == last_height:
          break
      last_height = new_height

  driver.find_element_by_xpath("""//button[@class='artdeco-pagination__button artdeco-pagination__button--next artdeco-button artdeco-button--muted artdeco-button--icon-right artdeco-button--1 artdeco-button--tertiary ember-view' and contains(.,'Next')]""").click()

【问题讨论】:

    标签: python-3.x selenium selenium-chromedriver


    【解决方案1】:

    试试这样:

    driver.execute_script("$('.artdeco-pagination__button--next').click()")
    

    您不需要滚动。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2018-11-02
      • 1970-01-01
      • 2021-03-23
      • 2014-02-21
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2023-03-31
      相关资源
      最近更新 更多