在python抓取中找到彼此相邻的独立标签答案

【问题标题】：Finding independent tags right next to each other in python scraping在python抓取中找到彼此相邻的独立标签
【发布时间】：2021-02-12 23:37:07
【问题描述】：

我正在尝试抓取维基百科页面。并面临一个简单的问题，无法找到解决方案。 th 和 td 有 2 个标签彼此相邻。而且，两者都是独立的。我想根据另一个标签的值（相互独立）来获取1标签的文本。

这是一个例子：

<th scope="row" style="white-space:nowrap;padding-right:0.65em;">Budget</th>
<td style="line-height:1.3em;">$200 million<sup id="cite_ref-3" class="reference"><a href="#cite_note-3">[3]</a></sup></td>

如果“th”标签文本是“Budget”，我想获取“td”标签的价值（即 2 亿美元）。请记住，唯一的对应关系是“彼此相邻”。

【问题讨论】：

张贴网址会有帮助

标签： python html python-3.x

【解决方案1】：

from bs4 import BeautifulSoup

html = '''<th scope="row" style="white-space:nowrap;padding-right:0.65em;">Budget</th>
<td style="line-height:1.3em;">$200 million<sup id="cite_ref-3" class="reference"><a href="#cite_note-3">[3]</a></sup></td>'''

soup = BeautifulSoup(html, 'lxml')
td_text = soup.find(lambda tag: tag.name=='td' and 'Budget' in tag.parent.text).text
print(td_text)

出来：

$200 million[3]

【讨论】：