使用网页抓取在 HTML 中查找某个标签答案

【问题标题】：Find a certain tag in HTML with web scraping使用网页抓取在 HTML 中查找某个标签
【发布时间】：2022-10-30 03:50:07
【问题描述】：

我正在使用 for 循环在不同的 html 页面上进行网络抓取，并且我需要为每个页面找到一个特定的标签（我正在使用 BeautifulSoup 和 find_all 方法）。但并非在所有页面中都存在该标签。所以我需要找到一种简单的方法来检查该标签是否存在。我试图编写此代码以检查标签是否不存在，但它不起作用。

    ---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [92], in <cell line: 5>()
     36 sal_play = salary.find_all('tr')[1:]
     37 print(sal_play)
---> 38 if sal_play.find_all('tr', class_='thead') is None :
     39     print('1')
     40 else:

AttributeError: 'list' object has no attribute 'find'

【问题讨论】：

goal = soup.select("tr.thead"); if goal: print(goal)
第一个 find_all 给你列表，你必须使用 for-loop 在每个元素上分别运行第二个 find_all。
谢谢@furas，终于明白了这个问题！正如你所说，我在错误的元素上调用 find_all 。无论如何，我在我的代码中使用了 select 方法，因为它可读性更好（我正在做一篇关于网络抓取的论文）。你救了我的一天！

标签： python html beautifulsoup

【解决方案1】：

正如错误消息所说，您不能直接在列表上运行find - 您必须在每个项目上运行它

如果您只想打印 '1'他们都没有有标题行使用：

if not [s for s in sal_play if s.find('tr', class_='thead')]: 
    print('1')

或者如果你想打印 '1'一些其中别有一个标题行使用：

if [s for s in sal_play if s.find('tr', class_='thead') is None]: 
    print('1')

顺便说一句，如果标签不存在，find_all 将返回一个空列表 ([])，find 将返回None，所以if ...find_all(....) is None: do x 几乎可以确保 x 永远不会发生......

【讨论】：