Soup.find_all 只返回 Python 3.5.1 中的一些结果答案

【问题标题】：Soup.find_all is only returning Some of the results in Python 3.5.1Soup.find_all 只返回 Python 3.5.1 中的一些结果
【发布时间】：2016-06-05 14:55:36
【问题描述】：

我正在尝试从我的网页中获取所有具有 class= "thumb" 的缩略图的 URL，但 soup.find_all 只打印最近的 22 个左右。

代码如下：

import requests
from bs4 import BeautifulSoup
r = requests.get("http://rayleighev.deviantart.com/gallery/44021661/Reddit")
soup = BeautifulSoup(r.content, "html.parser")
links = soup.find_all("a", {'class' : "thumb"})
for link in links:
    print(link.get("href"))

【问题讨论】：

据我所知，链接页面上有 24 个这样的链接。所以我猜代码工作正常。

标签： python python-3.x beautifulsoup python-requests

【解决方案1】：

我认为您的意思是询问按照分页并抓取列表中的所有链接。这是该想法的实现 - 使用 offset 参数并抓取链接，直到没有更多链接存在，将 offset 增加 24（每页链接数）：

import requests
from bs4 import BeautifulSoup


offset = 0
links = []
with requests.Session() as session:
    while True:
        r = session.get("http://rayleighev.deviantart.com/gallery/44021661/Reddit?offset=%d" % offset)
        soup = BeautifulSoup(r.content, "html.parser")
        new_links = [link["href"] for link in soup.find_all("a", {'class': "thumb"})]

        # no more links - break the loop
        if not new_links:
            break

        links.extend(new_links)
        print(len(links))
        offset += 24

print(links)

【讨论】：

很好，谢谢你的工作。不用麻烦，但需要进行哪些修改才能获取画廊中仅一个页面的所有 24 个链接并将它们放入列表中？
@Rayleighev 您可以摆脱循环并将offset 保留为0，或者只使用不带偏移量的http://rayleighev.deviantart.com/gallery/44021661/Reddit url。希望对您有所帮助。