Python美汤选择文字答案

【问题标题】：Python beautiful soup select textPython美汤选择文字
【发布时间】：2014-03-03 11:18:31
【问题描述】：

以下是我要解析的 HTML 代码示例：

<html>
<body>
<td style="PADDING-LEFT:  5px"bgcolor="ffffff" class="style8"> Example BLAB BLAB BLAB </td>
<td style="PADDING-LEFT:  5px"bgcolor="ffffff" class="style8"> BLAB BLAB BLAB </td>
<td style="PADDING-LEFT:  5px"bgcolor="ffffff" class="style8"> BLAB BLAB BLAB </td>
<td style="PADDING-LEFT:  5px"bgcolor="ffffff" class="style8"> BLAB BLAB BLAB </td>
</body>
</html>

我正在使用美丽的汤来解析 HTML 代码，方法是选择 style8 如下（其中 html 读取我的 http 请求的结果）：

html = result.read()
soup = BeautifulSoup(html)

content = soup.select('.style8')

在本例中，content 变量返回一个包含 4 个标签的列表。我想检查content.text，它包含每个style8 类的文本，如果列表中的每个项目包含Example 并将其附加到变量中。如果它遍历整个列表并且Example 没有出现在列表中，则它会将Not present 附加到变量中。

到目前为止，我得到了以下内容：

foo = []

for i, tag in enumerate(content):
    if content[i].text == 'Example':
        foo.append('Example')
        break
    else:
        continue

这只会将Example 附加到foo（如果它出现），但如果它没有出现在整个列表中，它不会附加Not Present。

任何这样做的方法都值得赞赏，或者搜索整个结果以检查是否存在字符串的更好方法会很棒

【问题讨论】：

标签： python html-parsing beautifulsoup

【解决方案1】：

您可以使用find_all() 查找所有td 元素和class='style8'，并使用列表推导构造foo 列表：

from bs4 import BeautifulSoup


html = """<html>
<body>
<td style="PADDING-LEFT:  5px"bgcolor="ffffff" class="style8"> Example BLAB BLAB BLAB </td>
<td style="PADDING-LEFT:  5px"bgcolor="ffffff" class="style8"> BLAB BLAB BLAB </td>
<td style="PADDING-LEFT:  5px"bgcolor="ffffff" class="style8"> BLAB BLAB BLAB </td>
<td style="PADDING-LEFT:  5px"bgcolor="ffffff" class="style8"> BLAB BLAB BLAB </td>
</body>
</html>"""

soup = BeautifulSoup(html)

foo = ["Example" if "Example" in node.text else "Not Present" 
       for node in soup.find_all('td', {'class': 'style8'})]
print foo

打印：

['Example', 'Not Present', 'Not Present', 'Not Present']

【讨论】：

【解决方案2】：

如果你只是想检查它是否被找到，你可以使用一个简单的布尔标志，如下所示：

foo = []
found = False
for i, tag in enumerate(content):
    if content[i].text == 'Example':
        found = True
        foo.append('Example')
        break
    else:
        continue
if not found:
    foo.append('Not Example')

如果我得到你想要的，这可能是一个简单的方法，虽然 alecxe 的解决方案看起来很棒。

【讨论】：