在 Python 中找不到使用 Beautiful Soup 的特定链接答案

【问题标题】：Can't find a specific link using Beautiful Soup in Python在 Python 中找不到使用 Beautiful Soup 的特定链接
【发布时间】：2017-04-21 00:34:06
【问题描述】：

我无法使用 BeautifulSoup 从网页中提取特定链接。具体网页为http://punchdrink.com/recipe-archives/?filter-spirit__term=Gin

当我检查源代码时，我看到了我想要抓取的链接，特别是食谱的链接（例如：http://punchdrink.com/recipes/breakfast-martini/），但是当我使用 BeautifulSoup 时，这些链接不会显示在 HTML 中全部。

这是我获取 HTML 的代码：

def drinkScraper(url, searchTerm):
  res = requests.get(url)
  res.raise_for_status()
  soup = bs4.BeautifulSoup(res.text)

printing soup 给出的 html 没有引用该页面上任何指向食谱的链接。

我正在尝试从这个网站上抓取他们档案中所有食谱的链接，但我似乎无法弄清楚这一点。

感谢您的帮助。

【问题讨论】：

因为它是一个动态网站，所以你必须检查 Ajax 请求以获取 url。
@amigcamel 谢谢！我最终使用 selenium 来查找链接。不过，我会更多地考虑您对未来的建议。

标签： python html beautifulsoup

【解决方案1】：

虽然如上所述您可以使用selenium，但您也可以通过关注XHR 请求并通过requests 模拟它们来学习。如果您在打开 Firebug 或 Chrome 开发人员工具时注意到，在搜索术语时，它会请求一个 api（通过 XHR）并以json 格式返回您的结果。您可以简单地请求参数并解析结果。

像这样：

from bs4 import BeautifulSoup
import requests

jsonRequestData = '{"requests":[{"indexName":"wp_posts_recipe","params":"query=&hitsPerPage=1000&maxValuesPerFacet=100&page=0&distinct=false&facetingAfterDistinct=true&filters=record_index%3D0&facets=%5B%22spirit%22%2C%22style%22%2C%22season%22%2C%22flavor_profile%22%2C%22family%22%5D&tagFilters=&facetFilters=%5B%22spirit%3AGin%22%5D"}]}'
headers = {'Content-type': 'application/x-www-form-urlencoded', 'Accept': 'application/json'}

response = requests.post('http://h0iee3ergc-2.algolianet.com/1/indexes/*/queries?x-algolia-agent=Algolia%20for%20vanilla%20JavaScript%20(lite)%203.21.1%3Binstantsearch.js%201.11.6%3BJS%20Helper%202.19.0&x-algolia-application-id=H0IEE3ERGC&x-algolia-api-key=9a128c4989675ec375c59a2de9ef3fc1', headers=headers, data=jsonRequestData)

for hit in response.json()["results"][0]["hits"]:
    print ("%s (%s)" % (hit["post_title"], hit["permalink"]))

其中jsonRequestData 是数据form post data，您可以在其中更改搜索词，headers 是您要发送的标头。

它会输出：

State Street Bloody Mary (http://punchdrink.com/recipes/state-street-bloody-mary/)
I'm Ya Huckleberry (http://punchdrink.com/recipes/im-ya-huckleberry/)
Girl From Cadiz (http://punchdrink.com/recipes/girl-from-cadiz/)
Breakfast Martini (http://punchdrink.com/recipes/breakfast-martini/)
Juniperotivo (http://punchdrink.com/recipes/juniperotivo/)
....

【讨论】：