BeautifulSoup4 无法深入查找文章答案

【问题标题】：BeautifulSoup4 can't go deep enough to find articlesBeautifulSoup4 无法深入查找文章
【发布时间】：2020-09-23 17:13:12
【问题描述】：

我刚开始尝试使用 python 和 BeautifulSoup。

我想获取与特定城市相关的文章的链接

这是当前代码

import requests
from bs4 import BeautifulSoup

city = "london"
result = requests.get('https://www.origo.hu/kereses/index.html?q=' + city)


def main_loop():
    soup = BeautifulSoup(result.content, features="lxml")
    articles = soup.find("div", "oc-articleList")

    print(articles)


if result.status_code == 200:
    main_loop()
else:
    print('error:', result.status_code)

结果是：

<div class="oc-articleList"></div>

我尝试的第一件事是获取文章：

articles = soup.find_all("article")

但它可以找到任何东西。

如果您检查网站源代码，它看起来像这样：

<div class="oc-articleList">
    <article>...</article>
    <article>...</article>
    <article>...</article>
    <article>...</article>
    .
    .
    .
</div>

如何让 BS 解析更深入的 DOM？

【问题讨论】：

如果没有实际的网址，可能很难得到答案。
可以分享网址吗？
我编辑了它。

标签： python web web-scraping beautifulsoup python-requests

【解决方案1】：

回答 1)
TLDR：只需在搜索第一个元素后添加另一个 .find() 或 .find_all() 即可找到嵌套元素。

一旦您使用 soup.find() 找到了 div 元素（在您的情况下为变量文章），您可以使用 .find() 对其调用另一个查询或 .find_all().

为了说明，根据您提供的代码：

...

def main_loop():
    soup = BeautifulSoup(result.content, features="lxml")

    ### ADDED .find_all() after the first search ###
    articles = soup.find("div", "oc-articleList").find_all("article")

    print(articles)


...

记住 find_all() 现在会返回一个列表

答案 2)
请求仅捕获 HTML 和 CSS 内容，而 JavaScript 不显示。

解决方案：使用预渲染服务，即

result = requests.get("http://service.prerender.io/https://www.sample.com/")

【讨论】：

我厌倦了这个，但它只是返回一个空数组。
添加了第二个答案