如何抓取跟随另一个 html 行的特定 html 行

【问题标题】：How to scrape a specific html line that follow another html line如何抓取跟随另一个 html 行的特定 html 行
【发布时间】：2019-11-07 08:25:27
【问题描述】：

我想从一个看起来像这样的 html 页面中抓取一些数据

<tr>
 <td> Some information <td>
 <td> 123 </td>
</tr>
<tr>
 <td> some other information </td>
 <td> 456 </td>
</tr>
<tr>
 <td> and the info continues </td>
 <td> 789 </td>
</tr>

我想要的是获取给定 html 行之后的 html 行。也就是说，如果我看到“一些其他信息”，我想要输出“456”。我想过将正则表达式与 BeautifulSoup 中的 .find_next 结合起来，但我对此没有任何运气（我对正则表达式也不太熟悉）。有人知道怎么做吗？提前，非常感谢

【问题讨论】：

标签： python regex beautifulsoup

【解决方案1】：

实际上，在 BeautifulSoup 中混合使用 regex 和 find_next，您可以实现您想要的：

from bs4 import BeautifulSoup
import re

html = """
<tr>
 <td> Some information <td>
 <td> 123 </td>
</tr>
<tr>
 <td> some other information </td>
 <td> 456 </td>
</tr>
<tr>
 <td> and the info continues </td>
 <td> 789 </td>
</tr>
"""

soup = BeautifulSoup(html)
x = soup.find('td', text = re.compile('some other information'))
print(x.find_next('td').text)

输出

'456'

EDIT 将 x.find_next('td').contents[0] 替换为 x.find_next('td').text，更短

【讨论】：