BeautifulSoup 没有获得完整的图像地址答案

【问题标题】：BeautifulSoup isn't getting the full Image AddressBeautifulSoup 没有获得完整的图像地址
【发布时间】：2021-07-03 06:39:04
【问题描述】：

我正在使用漂亮的汤从网站上抓取图像，但是我的代码没有返回在检查网页时可见的图像的完整地址。

for b in soup.select(".thumb_div.clear a"):
            imagelink = a["href"].replace("/mushrooms/", "http://www.foragingguide.com/mushrooms/")
            print(imagelink)

应该返回：http://static.foragingguide.com/photos/mushrooms/amethyst_deceiver/87.jpg 因为源代码是：


<a href="http://static.foragingguide.com/photos/mushrooms/amethyst_deceiver/87.jpg" rel="lightbox[photos]" title="Amethyst Deceiver (Laccaria amethystina)">

但只是返回 http://static.foragingguide.com/photos/mushrooms/amethyst_deceiver/ 而没有 jpg 文件结尾，这是工作所必需的。

有谁知道这是为什么？谢谢。

【问题讨论】：

为什么需要更换？链接不是绝对的吗？
它不返回绝对链接，只是一个相对路径，因此我做了一个替换

标签： python html web web-scraping beautifulsoup

【解决方案1】：

其实不用替换，直接定位图片源即可。

例如：

import requests
from bs4 import BeautifulSoup


end_point = "http://www.foragingguide.com/mushrooms/sp/amethyst_deceiver"
response = requests.get(end_point).text
soup = BeautifulSoup(response, "lxml").select(".thumb_div a")
print("\n".join(i["href"] for i in soup))

输出：

http://static.foragingguide.com/photos/mushrooms/amethyst_deceiver/87.jpg
http://static.foragingguide.com/photos/mushrooms/amethyst_deceiver/88.jpg
http://static.foragingguide.com/photos/mushrooms/amethyst_deceiver/90.jpg
http://static.foragingguide.com/photos/mushrooms/amethyst_deceiver/91.jpg

【讨论】：

很好的解决方案，但是我正在尝试获取常规图像的路径，而不是缩略图，我可以按照相同的过程来获取在 a 标签中链接的常规图像吗？

【解决方案2】：

简单的解决方案”

for b in soup.select(".thumb_div a"):
            imagelink = b["href"]
            print(imagelink)

原来 a["href"] 中的 "a" 与无关，它是 "a" 可迭代的，它不存在。将代码更改为 b["href"] 有效。

【讨论】：