Python获取锚文本链接和href值但忽略图像链接答案

【问题标题】：Python get anchor text links and href values but ignore image linksPython获取锚文本链接和href值但忽略图像链接
【发布时间】：2021-03-11 22:53:44
【问题描述】：

我有以下 Python 代码来从页面路径中抓取锚文本链接和相应的 href 值：

from requests_html import HTMLSession
from urllib.request import urlopen
from bs4 import BeautifulSoup
import requests

url="https://www.mydomain.co.uk/contact-us"

session = HTMLSession()
r = session.get(url)

b  = requests.get(url)
soup = BeautifulSoup(b.text, "lxml")

for link in soup.find_all('a'):
    print(link.text, '-', link.get('href'))

它工作正常，但它也会抓取图像链接并输出“-”如果它是图像。例如：

Contact Us - /contact-us
About Us - /about
- /locations

我希望它忽略任何图像 href 链接，因此输出为：

Contact Us - /contact-us
About Us - /about

这可能吗？

谢谢

【问题讨论】：

标签： python web-scraping beautifulsoup

【解决方案1】：

for link in soup.find_all('a'):
    if not link.find('img'):
        print(link.text, '-', link.get('href'))

【讨论】：

谢谢。它有点工作，但仍然会选择一些图像链接。有没有办法设置它，如果包含 src="xxx" 然后忽略它？
好的，我已经更新了答案。这次它只会在a标签中不包含img标签时打印输出。