Scrappy 选择器上的 Scrappy 迭代答案

【问题标题】：Scrappy iteration over Scrappy selectorScrappy 选择器上的 Scrappy 迭代
【发布时间】：2021-03-27 00:50:46
【问题描述】：

我正在尝试抓取一个使用调用 HTML 的 API 的网站，因此我需要抓取 API，然后从 API 抓取 HTML 结果

我已经使用这篇文章来设法获取 API 响应并从中获取 HTML。

resp = json.loads(response.text)
selector= scrapy.Selector(text=resp['results'], type="html")

而且效果很好，当我尝试从页面获取属性时，我可以使用 CSS 或 Xpath 选择器并获取项目

我现在要做的是遍历选择器

        for item in selector:          
            title = job.css('a h2').extract()

            items ['title'] = title

            yield items

但是当我应用这个循环时，我得到了 TypeError

TypeError: 'Selector' object is not iterable

所以我想要实现的是迭代这个

<class 'scrapy.selector.unified.Selector'>

或任何其他方式从 JSON API 响应中抓取嵌入的 HTML

更新：我现在可以迭代项目，但现在我无法进行分页

【问题讨论】：

当你说for item in resp:时，你的意思是for item in selector:吗？因为resp 似乎不是选择器。
是的，用于选择器中的项目。现在编辑它
resp['results']的内容是什么样的？也许您需要先迭代该内容，然后将其中的每个项目加载到Selector。

标签： python json api scrapy css-selectors

【解决方案1】：

使用xpath extract获取选择器列表的内容

import requests
from scrapy import Selector

url='https://en.wikipedia.org/wiki/Web_scraping'
html=requests.get(url).content
sel=Selector(text=html)

for p in sel.xpath("//p"):
    print(p.xpath("text()").extract())

【讨论】：