从所有匹配节点中提取信息而不循环 xpath答案

【问题标题】：Extract information from all matching nodes without looping xpath从所有匹配节点中提取信息而不循环 xpath
【发布时间】：2018-04-27 16:32:54
【问题描述】：

<ul class="products-grid">
    <li class="item">
        <div class="product-block">
            <div class="product-block-inner">
                <a href="#" title="Product A" class="product-image"><img src="#/producta.jpg"></a>
                <h2 class="product-name"><a href="#">Product A</a></h2>
                <div class="price-box">
                    <span class="regular-price" id="#">
                        <span class="price">Rs 1,849</span>
                    </span>
                </div>
            </div>
        </div>
    </li>
    <li class="item">
        <div class="product-block">
            <div class="product-block-inner">
                <a href="#" title="Product B" class="product-image"><img src="#/productb.jpg"></a>
                <h2 class="product-name"><a href="#">Product B</a></h2>
                <div class="price-box">
                    <span class="regular-price" id="#">
                        <span class="price">Rs 1,849</span>
                    </span>
                </div>
            </div>
        </div>
    </li>
</ul>

我此时正在循环中抓取item。

products = response.xpath('//ul[@class="products-grid"]//li//div[@class="product-block"]//div[@class="product-block-inner"]').extract()

得到product-block-inner 节点后，我将其保存到products 中，然后我将不得不像这样循环

for product in products:
   // parse the div.product-block-inner further deep down
   // to get name, price, image etc
   // and save it to a dict and yeild
   pass

我是否有可能在最终列表中获得所有div.product-block-inner 的文本、href 而不循环

【问题讨论】：

为什么要在这里面打负分？

标签： python-2.7 xpath scrapy

【解决方案1】：

是的，但是很混乱，例如你可以试试这个：

products = response.xpath(
    '//ul[@class="products-grid"]//li//div[@class="product-block"]//div[@class="product-block-inner"]'
).css(
    '.product-name a::attr(href), .product-name a::text, .price::text'
).extract()

但我建议始终循环（顺便说一句，当您将其分配给 products 时，为什么要调用 extract()？）

products = response.xpath(
    '//ul[@class="products-grid"]//li//div[@class="product-block"]//div[@class="product-block-inner"]'
)
for product in products:
    yield {'name': product.css('.product-name a::text').extract_first()
           'url': product.css('.product-name a::attr(href)').extract_first()
           'price': product.css('.price::text').extract_first()}

（在这种情况下我使用了 css 选择器，因为等效的 xpath 更长，但使用 xpath 也可以实现）

【讨论】：

extract() 是我错误粘贴的。对不起