Beautifulsoup 匹配空类答案

【问题标题】：Beautifulsoup Match Empty ClassBeautifulsoup 匹配空类
【发布时间】：2020-08-19 21:01:01
【问题描述】：

我正在一个网站上抓取一个表格，我只是试图返回类为空白的任何行（第 1 行和第 4 行）

<tr class>Row 1</tr>
<tr class="is-oos ">Row 2</tr>
<tr class="somethingelse">Row 3</tr>
<tr class>Row 4</tr>

（注意is-oos 类的末尾有一个尾随空格。

当我执行soup.findAll('tr', class_=None) 时，它匹配所有行。这是因为由于尾随空格，第 2 行具有类 ['is-oos', '']。有没有一种简单的方法来匹配这些行？

【问题讨论】：

标签： python python-3.x web-scraping beautifulsoup

【解决方案1】：

试试class_=""：

from bs4 import BeautifulSoup

html_doc = """<tr class>Row 1</tr>
<tr class="is-oos ">Row 2</tr>
<tr class="somethingelse">Row 3</tr>
<tr class>Row 4</tr>"""

soup = BeautifulSoup(html_doc, "html.parser")

print(*soup.find_all('tr', class_=""))

# Or to only get the text
print( '\n'.join(t.text for t in soup.find_all('tr', class_="")) )

输出：

<tr class="">Row 1</tr> <tr class="">Row 4</tr>
Row 1
Row 4

编辑要只获取现货，我们可以检查标签的属性：

import requests
from bs4 import BeautifulSoup

URL = "https://gun.deals/search/apachesolr_search/736676037018"

soup = BeautifulSoup(requests.get(URL).text, "html.parser")

for tag in soup.find_all('tr'):
    if tag.attrs.get('class') == ['price-compare-table__oos-breaker', 'js-oos-breaker']:
        break
    print(tag.text.strip())

【讨论】：

当我在你的 html_doc 上运行 Beautifulsoup 时，它说 Row2 只有类 is-oos 但是当我在 response.text 上运行它时，从 URL 中，它说 Row2 有类[`is-oos',''] 所以你的代码不能工作，因为它捕获了第 2 行
@Bijan 可以分享网址吗？
gun.deals/search/apachesolr_search/736676037018 就是其中之一。我正在尝试匹配是否有库存（tr 其中类不是is-oos）
啊，我明白了，当它看到js-oos-breaker时打破它很聪明