如何在scrapy上存储产量请求响应答案

【问题标题】：how to store yield request responses on scrapy如何在scrapy上存储产量请求响应
【发布时间】：2017-04-14 08:25:00
【问题描述】：

您好，我是 python 和 scrapy 的新手。所以这将是一个菜鸟问题。我也尝试过搜索，但找不到任何可以直接回答我问题的内容。我正在尝试浏览以下国家/地区的网页并将其人口存储在一个数组中，然后立即打印它们。如您所见，下面的代码在每次发出请求时打印。我怎样才能用结果数组批量打印呢？谢谢

class CrawlerSpider(scrapy.Spider):
    name = 'wikiCrawler'
    #allowed_domains = ['web']
    start_urls = ['https://en.wikipedia.org/wiki/List_of_sovereign_states']
    #counter = 1
    global i
    i = {}
    global list
    list = []

    def __init__(self):
        self.counter = 1
        pass

    def parse(self, response):

        for resultHref in response.xpath('//table[contains(@class, "wikitable")]//a[preceding-sibling::span[@class="flagicon"]]'):
            href = resultHref.xpath('./@href').extract_first()
            nameC = resultHref.xpath('./text()').extract_first()
            yield scrapy.Request(response.urljoin(href), callback=self.parse_item, meta={'Country': nameC})

    def parse_item(self, response):
        self.counter = self.counter + 1
        i['country'] = response.meta['Country']
        i['population'] = response.xpath('//tr[preceding-sibling::tr/th/a/text()="Population"]/td/text()').extract_first()
        yield i #this is where I would like to store the data instead of printing and then later print all together

【问题讨论】：

标签： python scrapy

【解决方案1】：

在parse_item 函数而不是类中创建了 i 变量。对此进行了测试，它可以工作，尽管 xpath 选择器可能需要一些改进。

class CrawlerSpider(scrapy.Spider):
    name = 'wikiCrawler'
    #allowed_domains = ['web']
    start_urls = ['https://en.wikipedia.org/wiki/List_of_sovereign_states']
    #counter = 1
    global list
    list = []

    def __init__(self):
        self.counter = 1
        pass

    def parse(self, response):

        for resultHref in response.xpath('//table[contains(@class, "wikitable")]//a[preceding-sibling::span[@class="flagicon"]]'):
            href = resultHref.xpath('./@href').extract_first()
            nameC = resultHref.xpath('./text()').extract_first()
            yield scrapy.Request(response.urljoin(href), callback=self.parse_item, meta={'Country': nameC})

    def parse_item(self, response):
        i = {}
        self.counter = self.counter + 1
        i['country'] = response.meta['Country']
        i['population'] = response.xpath('//tr[preceding-sibling::tr/th/a/text()="Population"]/td/text()').extract_first()
        yield i #this is where I would like to store the data instead of printing and then later print all together

【讨论】：

嗨。抱歉，这不是我想要的。我创建了一个单独的问题。你能帮帮我吗？谢谢。 stackoverflow.com/questions/43610655/…