BS4 findAll html标签与标签名称的相同部分答案

【问题标题】：BS4 findAll html tags with same part of tag nameBS4 findAll html标签与标签名称的相同部分
【发布时间】：2021-11-16 04:50:06
【问题描述】：

我正在使用 bs4 来获取 web 的 html 标签：

html = BeautifulSoup(requests.get(temp_cat_link).text, 'html.parser')
items =html.findAll('h4',{'class':'item-title font-weight-normal '})# this tag have a tag name contain white space at the end

但是当我检查它时实际上并没有得到所有标签，因为有些标签名称末尾没有空格。它只返回item-title font-weight-normal 标签。所以我把我的代码改成这样：

html = BeautifulSoup(requests.get(temp_cat_link).text, 'html.parser')
items =html.findAll('h4',{'class':'item-title font-weight-normal'})# this tag name doesn't contain white space at the end

但它只获取所有标签item-title font-weight-normal。这里的问题是我如何才能在 html 标记中实际获取名称中具有相同字符串部分的所有标记

item-title font-weight-normal 和 item-title font-weight-normal 只有一行html.findAll

【问题讨论】：

标签： python web-scraping beautifulsoup

【解决方案1】：

您可以使用regex 来匹配带有或不带有尾随空格的字符串：

import re
from bs4 import BeautifulSoup

html = BeautifulSoup(requests.get(temp_cat_link).text, 'html.parser')
items = html.findAll('h4',{'class':re.compile(r'item-title font-weight-normal\s*')})

【讨论】：