在 Python 中的 BeautifulSoup 中获取 NextSibling答案

【问题标题】：Get NextSibling in BeautifulSoup in python在 Python 中的 BeautifulSoup 中获取 NextSibling
【发布时间】：2021-04-13 22:32:22
【问题描述】：

我正在尝试从网页获取链接，并且我已成功获取所需链接旁边的图像的 href，但是在尝试使用 next_sibling 时，我得到了 None .. 这是我的尝试

import requests
from bs4 import BeautifulSoup

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36'}

response = requests.get('http://index-of.es/Python/', headers=headers)
soup = BeautifulSoup(response.text, 'lxml')
#print(soup.select('a img'))
for item in soup.select("a img"):
    print(item.next_sibling)

如果我使用print(item)，代码可以工作，但是当我试图捕捉下一个兄弟时，它对我不起作用任何想法。

【问题讨论】：

标签： python beautifulsoup python-requests

【解决方案1】：

你走错路了。该链接与 img 的父节点相关联。我还会使用更具选择性的 css 选择器来获取正确的 img 节点

for item in soup.select("[alt='[   ]']"):
    print('http://index-of.es/Python/' + item.parent['href'])

当然，如果您不关心img，那么使用:has (bs4 4.7.1+) 来指定父a 有一个具有特定alt 值的子：

print(['http://index-of.es/Python/' + i['href'] for i in soup.select("a:has([alt='[   ]'])")])

【讨论】：

你能看看这个链接stackoverflow.com/questions/65628561吗？

【解决方案2】：

我已经搜索了很多，直到我能弄明白

for item in soup.select("a img"):
    try:
        if item.find_next('a')['href'][0] != '/':
            print('http://index-of.es/Python/' + item.find_next('a')['href'])
    except:
        pass

【讨论】：