如何从具有无限滚动的网站中抓取？答案

【问题标题】：How can I scrape from websites that have infinite scrolling?如何从具有无限滚动的网站中抓取？
【发布时间】：2021-07-23 23:20:45
【问题描述】：

我已经设法创建了一个可以收集项目描述的网络爬虫，但是页面在滚动时会加载更多项目。

from selenium import webdriver 
import time
import requests
from bs4 import BeautifulSoup
from numpy import mean

namelist=[]
driver=webdriver.Chrome()
driver.get("https://waxpeer.com/")
time.sleep(15)



links = driver.find_elements_by_xpath("//div[@class='lpd_div']/a")

我还需要将项目描述格式化为：

★ Karambit| Gamma Doppler (Factory new)

而不是：

★ Karambit

Gamma Doppler

Factory new

desc = driver.find_elements_by_xpath("//div[@class='lpd_div']/div[2]/p")
for item in desc:
    print(item.text)

【问题讨论】：

您必须滚动并加载所有内容才能执行您需要执行的操作。
但是我如何通过硒或其他方式做到这一点？
Duplicate?
复制整个代码？
什么？重复我的意思是这个问题以前被问过，你可以用那个问题来解决你的问题。

标签： python selenium web-scraping beautifulsoup bots

【解决方案1】：

这是我目前要抓取的无限滚动页面。

def scroll():
  items = self.w.until(ec.presence_of_all_elements_located(self.item_locator))
  ActionChains(self.driver).move_to_element(items[-1]).perform()
  loader = self.driver.find_elements(*self.loader_locator)
  if loader:
    return True
  return False

ActionChains 部分将找到最后一项并滚动到它，从而导致页面发送更多内容的请求。这部分测试我刚刚验证了无限滚动的工作原理，但是如果你想对找到的元素做任何事情，你可以将这些项目附加到主列表中。

self.w 顺便说一下是 WebDriverWait。

【讨论】：

这对其他项目很有帮助，但是在这个例子中之前的答案更简单，但是谢谢。

【解决方案2】：

没有必要使用Selenium。数据可通过向网站 API 发送GET 请求获得，格式如下：

https://waxpeer.com/api/data/index/?skip={offset}&sort=best_deals&game=csgo&all=0

每个页面的offset + 50。

例如，打印名称：

import requests

URL = (
    "https://waxpeer.com/api/data/index/?skip={offset}&sort=best_deals&game=csgo&all=0"
)

offset = 0

while True:
    try:
        response = requests.get(URL.format(offset=offset)).json()
        for data in response["items"]:
            print(data["name"])
        print("-" * 80)
        offset += 50
    except KeyError:
        break

输出：

★ Karambit | Gamma Doppler (Factory New)
★ Karambit | Gamma Doppler (Factory New)
★ Karambit | Gamma Doppler (Factory New)
★ Butterfly Knife | Doppler (Factory New)
★ Butterfly Knife | Doppler (Factory New)
★ Karambit | Gamma Doppler (Factory New)
★ Karambit | Gamma Doppler (Factory New)
...
...

【讨论】：

非常感谢，我不知道我在想什么