【发布时间】:2016-11-23 03:52:01
【问题描述】:
我正在尝试为蓝宝石眼线笔产品抓取 https://store.fabspy.com/collections/new-arrivals-beauty,并返回与产品 ID 关联的信息。到目前为止,我有:
from bs4 import BeautifulSoup
import urllib2
url = 'https://store.fabspy.com/collections/new-arrivals-beauty'
page = BeautifulSoup(url.read())
soup = BeautifulSoup((page))
tag = 'div class="product-content"'
if row in soup.html.body.findAll(tag):
data = row.findAll('id')
if data and 'sapphire' in data[0].text:
print data[4].text
我试图接收的信息如下;
<div class="product-content">
<div class="pc-inner">
<div data-handle="clematis-dewdrop-sparkling-eye-pencil-g7454c-sapphire"
data-target="#quick-shop-popup"
class="quick_shop quick-shop-button"
data-toggle="modal"
title="Quick View">
<span>+ Quick View</span>
<span class="json hide">
{
"id":8779050374,
"title":"Clematis - Dewdrop Sparkling Gel Eye Liner Pencil # G7454C**Sapphire**",
"handle":"clematis-dewdrop-sparkling-eye-pencil-g7454c-sapphire",
"description":"\u003cdiv\u003e\r\n\r\nGel Formula, Rich Colour, Matte Finish, Long-Wearing, Safe for Waterline\r\n\r\n\u003cbr\u003e\n\u003c\/div\u003e\u003cdiv\u003e\u003cbr\u003e\u003c\/div\u003e \u003cimg alt=\"\" src=\"\/\/i.imgur.com\/adW5MKl.jpg\"\u003e",
"published_at":"2016-10-17T20:15:40+08:00",
"created_at":"2016-10-17T20:15:40+08:00",
"vendor":"Clematis",
"type":"Latest,Beauty,New,Makeup,Best, Clematis, Eyes",
"tags":["Beauty","Best","Clematis","Eyes","Latest","Makeup","New"],
"price":4900,
"price_min":4900,
"price_max":4900,
"available":true,
"price_varies":false,
"compare_at_price":7900,
"compare_at_price_min":7900,
"compare_at_price_max":7900,
"compare_at_price_varies":false,
"variants":[{"id":31447937030", "title":"N\/A"]
}
特别是末尾的id。请指定我的脚本应该关注哪个标签来检索此信息,以及我如何在脚本中关键字搜索sapphire 颜色及其id,谢谢!
【问题讨论】:
-
需要重点获取
span里面的文字,用class="json hide"和JSON解析文字
标签: python html css web-scraping