【问题标题】:Get data-product element with beautiful soup用美汤获取数据产品元素
【发布时间】:2018-03-30 15:29:13
【问题描述】:

我正在尝试使用 Beautiful Soup 从网站获取数据。我有这部分代码,我想在 data-product 元素中获取 JSON 部分。 我该怎么做?

这段代码:

soup_catalog.find('a',class_="product-li")

返回这个:

<a class="product-li" data-product='{"product":"0431772", "basketId":"043177500", "type":"product", "category":"ga", "subCategory":"gpes", "webVideoUrl": "None", "brand":"konami", "title_url": "pes-2018-para-ps3-konami", "title": "PES 2018 para PS3", "reference": "Konami", "stockTypes": {"043177500": "F"}, "price": "89.9"}' href="https://www.magazineluiza.com.br/pes-2018-para-ps3-konami/p/0431772/ga/gpes/" itemprop="url" title="PES 2018 para PS3">\n<span class="js-wishlist-action wishlist__simple-text">\n<i class="wishlist__favorite-icon js-add-wishlist"></i>\n</span>\n<div class="alignment-image">\n<img alt="PES 2018 para PS3 - Konami" class="product-image" data-original="https://c.mlcdn.com.br//pes-2018-para-ps3-konami/v/210x210/043177500.jpg" height="210" src="https://d25zlb44gqlazw.cloudfront.net/static/img/default/white1x1-e0a7e4ed.gif" title="PES 2018 para PS3 - Konami" width="210"/>\n</div>\n<noscript>\n<img alt="PES 2018 para PS3 - Konami" height="210" itemprop="image" src="https://c.mlcdn.com.br//pes-2018-para-ps3-konami/v/210x210/043177500.jpg" title="PES 2018 para PS3 - Konami" width="210"/>\n</noscript>\n<span class="product-content-other-informations">\n<span class="rating-container">\n<span class="rateing sprite-stars star-medium" itemprop="aggregateRating" itemscope="" itemtype="http://schema.org/AggregateRating">\n<em class="sprite-stars" style="width:90.0%"></em>\n<meta content="4.5" itemprop="ratingValue">\n<meta content="78" itemprop="reviewCount">\n</meta></meta></span>\n</span>\n</span>\n<h3 class="productTitle" itemprop="name">PES 2018 para PS3 - Konami</h3>\n<meta content="0431772" itemprop="productID">\n<meta content="None" itemprop="description">\n<p itemscope="" itemtype="http://schema.org/Brand"><meta content="konami" itemprop="name"/></p>\n<span class="productPrice" itemprop="offers" itemscope="" itemtype="http://schema.org/Offer">\n<span class="priceContent color-green none-product-showcase">desconto de R$ 79,10</span>\n<meta content="BRL" itemprop="priceCurrency">\n<meta content="89,90" itemprop="price">\n<span class="originalPrice">de R$ 169,00</span>\n<span class="price">\n                        por R$ 89,90\n                    </span>\n<meta content="InStock" itemprop="availability"/>\n</meta></meta></span>\n</meta></meta></a>

然后我尝试了:

soup_catalog.find('a',class_="product-li").find('data-product')

但是没有返回数据产品。 我该怎么做?

【问题讨论】:

标签: python beautifulsoup


【解决方案1】:

这应该会有所帮助

from bs4 import BeautifulSoup

s = """<a class="product-li" data-product='{"product":"0431772", "basketId":"043177500", "type":"product", "category":"ga", "subCategory":"gpes", "webVideoUrl": "None", "brand":"konami", "title_url": "pes-2018-para-ps3-konami", "title": "PES 2018 para PS3", "reference": "Konami", "stockTypes": {"043177500": "F"}, "price": "89.9"}' href="https://www.magazineluiza.com.br/pes-2018-para-ps3-konami/p/0431772/ga/gpes/" itemprop="url" title="PES 2018 para PS3">\n<span class="js-wishlist-action wishlist__simple-text">\n<i class="wishlist__favorite-icon js-add-wishlist"></i>\n</span>\n<div class="alignment-image">\n<img alt="PES 2018 para PS3 - Konami" class="product-image" data-original="https://c.mlcdn.com.br//pes-2018-para-ps3-konami/v/210x210/043177500.jpg" height="210" src="https://d25zlb44gqlazw.cloudfront.net/static/img/default/white1x1-e0a7e4ed.gif" title="PES 2018 para PS3 - Konami" width="210"/>\n</div>\n<noscript>\n<img alt="PES 2018 para PS3 - Konami" height="210" itemprop="image" src="https://c.mlcdn.com.br//pes-2018-para-ps3-konami/v/210x210/043177500.jpg" title="PES 2018 para PS3 - Konami" width="210"/>\n</noscript>\n<span class="product-content-other-informations">\n<span class="rating-container">\n<span class="rateing sprite-stars star-medium" itemprop="aggregateRating" itemscope="" itemtype="http://schema.org/AggregateRating">\n<em class="sprite-stars" style="width:90.0%"></em>\n<meta content="4.5" itemprop="ratingValue">\n<meta content="78" itemprop="reviewCount">\n</meta></meta></span>\n</span>\n</span>\n<h3 class="productTitle" itemprop="name">PES 2018 para PS3 - Konami</h3>\n<meta content="0431772" itemprop="productID">\n<meta content="None" itemprop="description">\n<p itemscope="" itemtype="http://schema.org/Brand"><meta content="konami" itemprop="name"/></p>\n<span class="productPrice" itemprop="offers" itemscope="" itemtype="http://schema.org/Offer">\n<span class="priceContent color-green none-product-showcase">desconto de R$ 79,10</span>\n<meta content="BRL" itemprop="priceCurrency">\n<meta content="89,90" itemprop="price">\n<span class="originalPrice">de R$ 169,00</span>\n<span class="price">\n                        por R$ 89,90\n                    </span>\n<meta content="InStock" itemprop="availability"/>\n</meta></meta></span>\n</meta></meta></a>"""
soup = BeautifulSoup(s, "html.parser")
i = soup.find("a",class_="product-li")
print(i["data-product"])

输出:

{"product":"0431772", "basketId":"043177500", "type":"product", "category":"ga", "subCategory":"gpes", "webVideoUrl": "None", "brand":"konami", "title_url": "pes-2018-para-ps3-konami", "title": "PES 2018 para PS3", "reference": "Konami", "stockTypes": {"043177500": "F"}, "price": "89.9"}

【讨论】:

    【解决方案2】:

    您可以从标签的属性中获取数据,如下所示:

    soup_catalog.find('a',class_='product-li').get('data-provider')
    

    【讨论】:

      猜你喜欢
      • 2021-05-03
      • 1970-01-01
      • 2019-02-18
      • 2015-03-18
      • 2012-08-01
      • 2017-12-11
      • 1970-01-01
      • 1970-01-01
      • 2019-11-14
      相关资源
      最近更新 更多