【发布时间】:2017-04-14 08:25:00
【问题描述】:
您好,我是 python 和 scrapy 的新手。所以这将是一个菜鸟问题。我也尝试过搜索,但找不到任何可以直接回答我问题的内容。 我正在尝试浏览以下国家/地区的网页并将其人口存储在一个数组中,然后立即打印它们。如您所见,下面的代码在每次发出请求时打印。我怎样才能用结果数组批量打印呢?谢谢
class CrawlerSpider(scrapy.Spider):
name = 'wikiCrawler'
#allowed_domains = ['web']
start_urls = ['https://en.wikipedia.org/wiki/List_of_sovereign_states']
#counter = 1
global i
i = {}
global list
list = []
def __init__(self):
self.counter = 1
pass
def parse(self, response):
for resultHref in response.xpath('//table[contains(@class, "wikitable")]//a[preceding-sibling::span[@class="flagicon"]]'):
href = resultHref.xpath('./@href').extract_first()
nameC = resultHref.xpath('./text()').extract_first()
yield scrapy.Request(response.urljoin(href), callback=self.parse_item, meta={'Country': nameC})
def parse_item(self, response):
self.counter = self.counter + 1
i['country'] = response.meta['Country']
i['population'] = response.xpath('//tr[preceding-sibling::tr/th/a/text()="Population"]/td/text()').extract_first()
yield i #this is where I would like to store the data instead of printing and then later print all together
【问题讨论】: