如何使用漂亮的汤来获取包含/不包含特定类的元素答案

【问题标题】：How to use beautiful soup to get elements that contain/do not contain specific classes如何使用漂亮的汤来获取包含/不包含特定类的元素
【发布时间】：2022-01-12 10:09:06
【问题描述】：

我想获取一个表格并使用 pyhon 脚本将其保存到 Excel。回复如下：

<body>
<table id="need">
<tr height="30" align="center">
<td>need</td>
<td id="td1">need</td>
<td id="td2" type="wholeLast">not need</td>
<td id="td3" type="whole">need</td>
...
</tr>
<tr height="30" align="center" cid="2" class="txt">
<td>not need</td>
<td id="td1">not need</td>
<td id="td2" type="wholeLast">not need</td>
<td id="td3" type="whole">not need</td>
...
</tr>
...
</table>
<table>
...
</table>
</body>

我需要获取<tr> 中的内容，除了带有'class="txt"' 的<tr> 和带有'type="wholeLast"' 的<td> 之外的<td>。简而言之，我需要在上述响应中获得所有“需要”。

我试过这个：trs = soup.find_all("tr", attrs={"height": "30", "align": "center"})。但我不知道如何删除 type="wholeLast" 的<td>。也许我需要使用其他方式。

感谢任何建议。

【问题讨论】：

标签： python html beautifulsoup

【解决方案1】：

使用 css 选择器和 not 伪类你可以做到这一点

tds=soup.select('tr:not(.txt) td:not([type="wholeLast"])')

【讨论】：

一般同意——认为你可以在:not()伪类soup.select('tr:not(.txt) td:not([type="wholeLast"])')中弹出td