查找带有特定文本的 <td> 标记值（美汤）答案

【问题标题】：Find <td> tag value with specific text (Beautiful Soup)查找带有特定文本的 <td> 标记值（美汤）
【发布时间】：2016-08-29 17:50:45
【问题描述】：

我需要从 HTML 页面中提取表格中结构化的数据。数据结构都是这样的：

<td class="def">
            <div><b>First Name:</b></div>
        </td>
        <td class="def">Jhon
        </td>

<td class="def">
            <div><b>Last Name:</b></div>
        </td>
        <td class="def">Smith
        </td>

我需要单独提取数据。例如

print first_name
>> Jhon
print last_name
>> Smith

一个简单的soup.find('td', {'class':'def'}) 将不起作用，因为它会匹配所有内容（名字：、Jhon、姓氏：、史密斯）。

知道如何查找特定数据吗？ here 发布了同样的问题，但给出的解决方案根本不起作用......

【问题讨论】：

查看该链接上的第二个答案

标签： html regex beautifulsoup

【解决方案1】：

这样怎么样：

>>> tds = soup.find_all('td', {'class':'def'})
>>> [td.find_next_sibling('td', {'class':'def'}).text.strip() \
...     for td in tds if "First Name:" in s.text]
... 
[u'Jhon']
>>> [td.find_next_sibling('td', {'class':'def'}).text.strip() \
...     for td in tds if "Last Name:" in s.text]
... 
[u'Smith']

【讨论】：

【解决方案2】：

试试这个

First Name:.*?<td class="def">([^\n]+).*?Last Name:.*?<td class="def">([^\n]+)

Regex demo

解释：
.：除换行符以外的任何字符sample
*：零次或多次sample
?：一次或无 sample
( … ): 捕获组 sample
[^x]: 一个不是 x 的字符 sample
\: 转义一个特殊字符 sample
@ 987654336@：一个或多个sample

【讨论】：