如何从网站上抓取描述答案

【问题标题】：how to scrape a description from a website如何从网站上抓取描述
【发布时间】：2021-09-14 11:37:47
【问题描述】：

我正在从这个网站上寻找一些信息：https://www.vogue.it/moda/gallery/met-gala-2021-red-carpet-look-ospiti-celeb

我要做的是刮掉照片下的所有描述，例如“Oscar de la Renta 定制的 Billie Eilish”、“Haider Ackermann e Converse 的 Timothée Chalamet”等等。我认为描述的名称是“.gallery-slide-caption__dek-container”，但它并没有刮掉任何东西。我的代码是：

import pprint
detail_looks = []
for look in list_looks:
    title = ""
    if(len(look.find_elements_by_css_selector(".gallery-slide-caption__dek-container")) > 0):
      title = look.find_elements_by_css_selector("gallery-slide-caption__dek-container")[0].text

    detail_looks.append({'title': title})

len(detail_looks)
pprint.pprint(detail_looks[0:5])

但是输出是空的：[{'title': ''}, {'title': ''}, {'title': ''}, {'title': ''}, {'title' : ''}]

你能帮帮我吗？谢谢

【问题讨论】：

你需要看更深一层，如果我在 chrome 中选择项目，它的结构是这样的= div.gallery-slide-caption__dek-container > div.gallery-slide-caption__dek > div > p 所以也许类似的东西 =look.find_elements_by_css_selector("gallery-slide-caption__dek-container gallery-slide-caption__dek")[0].text

标签： web-scraping

【解决方案1】：

要获取所有标题，您可以选择带有class="gallery-slide-caption__dek" 的所有标签：

import requests
from bs4 import BeautifulSoup

url = "https://www.vogue.it/moda/gallery/met-gala-2021-red-carpet-look-ospiti-celeb"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

for i, caption in enumerate(soup.select(".gallery-slide-caption__dek"), 1):
    print(i, caption.get_text(strip=True))

打印：

1 Billie Eilish in Oscar de la Renta custom-made
2 Timothée Chalamet in Haider Ackermann e Converse
3 Amanda Gorman in Vera Wang
4 Keke Palmer
5 Bee Carrozzini in Valentino Haute Couture

...and so on.

【讨论】：