无法使用漂亮的汤和 python 检索链接答案

【问题标题】：Unable to retrieve links using beautiful soup and python无法使用漂亮的汤和 python 检索链接
【发布时间】：2021-08-04 09:34:03
【问题描述】：

我正在尝试提取 url 下的所有 url： https://www.scotts.com/en-us/library/lawn-food

我意识到它不会返回几个网址，例如 https://www.scotts.com/en-us/library/lawn-food/when-feed-greener-lawn 还有更多

我在下面提到了我的代码sn-p：

import time
from random import randint
import requests
from bs4 import BeautifulSoup, SoupStrainer
import re

def scrape_google_summaries(url):
    time.sleep(randint(0, 2))  # relax and don't let google be angry
    r = requests.get(url)
    content = r.text

    soup = BeautifulSoup(content, "html.parser",parse_only=SoupStrainer('a', href=True))
    summary=[]
    for link in soup:#.find_all('a'):
        summary.append(link.get('href'))
        
    return summary

output = scrape_google_summaries("https://www.scotts.com/en-us/library/lawn-food")

【问题讨论】：

网站使用 javascript 加载数据。我相信这就是没有得到预期结果的原因。
该站点正在由 JavaScript 加载。使用Selenium。

标签： python python-3.x web-scraping beautifulsoup

【解决方案1】：

我通过将r.text（即content）保存到本地文件来检查，然后我在浏览器中打开它，正如预期的那样，您试图抓取的所有文章链接都不存在......！这意味着所有这些链接都是动态生成的。对于抓取动态生成的网站内容，不考虑使用 beautifulSoup。您将不得不使用其他一些工具，例如 selenium 或 requests_html。

【讨论】：

【解决方案2】：

我建议使用 selenium，它具有向下滚动功能。

更多信息在这里：https://stackoverflow.com/a/27760083/8623540

【讨论】：