【问题标题】:how to scrape a description from a website如何从网站上抓取描述
【发布时间】:2021-09-14 11:37:47
【问题描述】:

我正在从这个网站上寻找一些信息:https://www.vogue.it/moda/gallery/met-gala-2021-red-carpet-look-ospiti-celeb

我要做的是刮掉照片下的所有描述,例如“Oscar de la Renta 定制的 Billie Eilish”、“Haider Ackermann e Converse 的 Timothée Chalamet”等等。我认为描述的名称是“.gallery-slide-caption__dek-container”,但它并没有刮掉任何东西。 我的代码是:

import pprint
detail_looks = []
for look in list_looks:
    title = ""
    if(len(look.find_elements_by_css_selector(".gallery-slide-caption__dek-container")) > 0):
      title = look.find_elements_by_css_selector("gallery-slide-caption__dek-container")[0].text

    detail_looks.append({'title': title})

len(detail_looks)
pprint.pprint(detail_looks[0:5])

但是输出是空的:[{'title': ''}, {'title': ''}, {'title': ''}, {'title': ''}, {'title' : ''}]

你能帮帮我吗?谢谢

【问题讨论】:

  • 你需要看更深一层,如果我在 chrome 中选择项目,它的结构是这样的= div.gallery-slide-caption__dek-container > div.gallery-slide-caption__dek > div > p 所以也许类似的东西 =look.find_elements_by_css_selector("gallery-slide-caption__dek-container gallery-slide-caption__dek")[0].text

标签: web-scraping


【解决方案1】:

要获取所有标题,您可以选择带有class="gallery-slide-caption__dek" 的所有标签:

import requests
from bs4 import BeautifulSoup

url = "https://www.vogue.it/moda/gallery/met-gala-2021-red-carpet-look-ospiti-celeb"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

for i, caption in enumerate(soup.select(".gallery-slide-caption__dek"), 1):
    print(i, caption.get_text(strip=True))

打印:

1 Billie Eilish in Oscar de la Renta custom-made
2 Timothée Chalamet in Haider Ackermann e Converse
3 Amanda Gorman in Vera Wang
4 Keke Palmer
5 Bee Carrozzini in Valentino Haute Couture

...and so on.

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2022-12-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2022-12-01
    • 2016-09-08
    • 2021-06-22
    • 2011-08-02
    相关资源
    最近更新 更多