【发布时间】:2018-07-18 16:14:39
【问题描述】:
如何更改选择器scrapy?在蜘蛛工作期间。
请看我的代码:
# -*- coding: utf-8 -*-
import scrapy
from kino.items import KinoItem
class KinopoiskSpider(scrapy.Spider):
name = 'kinopoisk'
allowed_domains = ['kinopoisk.ru']
start_urls = ['https://www.kinopoisk.ru/afisha/new/city/1/']
def parse(self, response):
links = response.css('div.name>a').xpath('@href').extract()
for link in links:
yield scrapy.Request(response.urljoin(link), callback=self.parse_moov, dont_filter=True)
def parse_moov(self, response):
item = KinoItem()
item['orgname'] = response.css('div#headerFilm>span::text').extract()
item['name'] = response.css('h1.moviename-big::text').extract()
item['rating'] = response.css('div.block_2>div.div1>a>span.rating_ball::text').extract()
item['r_critic'] = response.css('div.ratingNum>span::text').extract()
item['waiting'] = response.xpath('//*[@id="block_rating"]/div[1]/div[3]/a[1]/text()').extract()
if item['waiting'] is None:
item['waiting_two'] = response.xpath('//*[@id="block_rating"]/div[1]/div[2]/a[1]/text()').extract()
item['runtime'] = response.css('td#runtime::text').extract()
item['premiere'] = response.xpath('//*[@id="div_rus_prem_td2"]/div/span[1]/a[1]/text()').extract()
item['info'] = response.css('div.brand_words.film-synopsys::text').extract()
yield item
这里if item['waiting'] is None: item['waiting_two']
不起作用。有人可以在这里提出一些帮助吗?
【问题讨论】:
标签: python-3.x web-scraping scrapy web-crawler scrapy-spider