【发布时间】:2018-04-27 16:32:54
【问题描述】:
<ul class="products-grid">
<li class="item">
<div class="product-block">
<div class="product-block-inner">
<a href="#" title="Product A" class="product-image"><img src="#/producta.jpg"></a>
<h2 class="product-name"><a href="#">Product A</a></h2>
<div class="price-box">
<span class="regular-price" id="#">
<span class="price">Rs 1,849</span>
</span>
</div>
</div>
</div>
</li>
<li class="item">
<div class="product-block">
<div class="product-block-inner">
<a href="#" title="Product B" class="product-image"><img src="#/productb.jpg"></a>
<h2 class="product-name"><a href="#">Product B</a></h2>
<div class="price-box">
<span class="regular-price" id="#">
<span class="price">Rs 1,849</span>
</span>
</div>
</div>
</div>
</li>
</ul>
我此时正在循环中抓取item。
products = response.xpath('//ul[@class="products-grid"]//li//div[@class="product-block"]//div[@class="product-block-inner"]').extract()
得到product-block-inner 节点后,我将其保存到products 中,然后我将不得不像这样循环
for product in products:
// parse the div.product-block-inner further deep down
// to get name, price, image etc
// and save it to a dict and yeild
pass
我是否有可能在最终列表中获得所有div.product-block-inner 的文本、href 而不循环
【问题讨论】:
-
为什么要在这里面打负分?
标签: python-2.7 xpath scrapy