使用 BeautifulSoup 直接从 HTML 中提取数据答案

【问题标题】：Extracting data directly from HTML with BeautifulSoup使用 BeautifulSoup 直接从 HTML 中提取数据
【发布时间】：2015-09-29 17:21:42
【问题描述】：

我有以下 HTML 数据。我需要使用 BeautifulSoup4 从中获取“2”：

<td rowspan="2" style="text-align: center; vertical-align: middle;">
    <small>3</small>
</td>

我试过了：

k.find('rowspan')['style']

产生异常的原因：

Traceback（最近一次调用最后一次）：文件“”，第 1 行，TypeError：列表索引必须是整数，而不是 str

是否可以使用 BS4 来实现？还是我应该使用不同的库来直接提取 CSS？

【问题讨论】：

请告诉我，当我在处理问题的第一部分时，我是否改变了您问题的含义。如果您感兴趣的是 rowspan 属性的值，为什么要使用 k.find('rowspan')['style']？ k 是什么？这和 CSS 有什么关系？
k 这是我的 html 代码被汤所采用。这是我想到的第一个想法。我正在用 python 和 bs4 开始我的冒险。那么我应该使用什么来获取rowspan的属性？
你绝对确定你真的使用find方法，不是findall方法吗？因为find method never returns lists, only None if it doesn't find any matching tag.

标签： python html css web-scraping beautifulsoup

【解决方案1】：

试试这个：

from bs4 import BeautifulSoup
soup = BeautifulSoup('<td rowspan="2" style="text-align: center; vertical-align: middle;"><small>3</small></td>', 'html.parser')
print(soup.td['rowspan'])

【讨论】：

【解决方案2】：

你为什么使用find("rowspan")？您不是在搜索 <rowspan> 标记。

find method在传递单个字符串参数时根据标签名称搜索标签。

你应该使用的是这样的东西，这意味着，“找到第一个属性值为rowspan="2"的<td>标签，并返回其style属性的值”：

k.find('td', rowspan="2")['style']

请参阅文档的"Kinds of filters" 部分，了解指定要搜索的标签的各种方法。

【讨论】：