如何使用漂亮的汤刮掉标题属性中的文本？答案

【问题标题】：How to scrape text in title attribute using beautiful soup?如何使用漂亮的汤刮掉标题属性中的文本？
【发布时间】：2020-04-18 11:46:21
【问题描述】：

我正在使用漂亮的汤来抓取这个网站，我需要完全获取产品名称。当我使用 h2 标签时，我最终会得到诸如“NIVEA Soft Light Moisturizing Cream Berry Blossom Fragrance ...”之类的名称。

我不想要这些点在最后，只想要完整的名称。这是我用于抓取数据的代码 sn-p：

div_soup=data_soup.findAll('div',{'class':'product-list-box card desktop-cart'})

table_rows=[]
for div in div_soup:
   current_row=[]
   product_name=div.findAll('h2',{})
   product_price=div.findAll('span',{'class':'post-card__content-price-offer'})
   for idx,data in enumerate(product_name):
       current_row.append(data.text)
   for idx,data in enumerate(product_price):
       current_row.append(data.text)
   table_rows.append(current_row)

我不知道要使用的适当标签，也不知道是否应该在字典中传递一些东西。

我正在抓取的网站的 URL：https://www.nykaa.com/skin/moisturizers/face-moisturizer-day-cream/c/8394?root=nav_3

【问题讨论】：

标签： python web-scraping data-science

【解决方案1】：

for idx,data in enumerate(product_name): if data.get('title') is not None: current_row.append(data['title'])

应该做你想做的事

也可能最好将您的代码重构为

product_name=div.find('h2', {'title': True).get('title')

所以你只需要寻找一个带有title属性的h2标签就可以避免for循环

【讨论】：