scrapy框架的日志等级和请求传参

阅读目录：

日志等级
请求传参
- 爬取https://www.4567tv.tv/frim/index6.html电影网
- 爬取网易新闻

- 在使用scrapy crawl spiderFileName运行程序时，在终端里打印输出的就是scrapy的日志信息。

　　- 日志信息的种类：

　　　　　　　　ERROR ：一般错误

　　　　　　　　WARNING : 警告

　　　　　　　　INFO : 一般的信息

　　　　　　　　DEBUG ：调试信息

　　- 设置日志信息指定输出：

　　　　在settings.py配置文件中，加入

LOG_LEVEL = ‘指定日志信息种类’即可。

LOG_FILE = 'log.txt'则表示将日志信息写入到指定文件中进行存储。

请求传参

- 在某些情况下，我们爬取的数据不在同一个页面中，例如，我们爬取一个电影网站，电影的名称，评分在一级页面，而要爬取的其他电影详情在其二级子页面中。这时我们就需要用到请求传参。

- 案例展示：爬取https://www.4567tv.tv/frim/index6.html电影网，将一级页面中的电影名称，类型，评分一级二级页面中的上映时间，导演，片长进行爬取。

scrapy框架的日志等级和请求传参

爬虫文件：movie

# -*- coding: utf-8 -*-
import scrapy

from ..items import MovieItem
class MovieSpider(scrapy.Spider):
    name = 'movie'
    # allowed_domains = ['www.mv.com']
    start_urls = ['https://www.4567tv.tv/frim/index6.html']

    # 接收一个请求传递过来的数据
    def detail_parse(self,response):
        item = response.meta['item']
        desc = response.xpath('/html/body/div[1]/div/div/div/div[2]/p[5]/span[2]/text()').extract_first()
        item['desc'] = desc
        yield item

    def parse(self, response):
        li_list = response.xpath('//div[@class="stui-pannel_bd"]/ul/li')
        for li in li_list:
            name = li.xpath(".//h4[@class='title text-overflow']/a/text()").extract_first()
            detail_url = 'https://www.4567tv.tv' + li.xpath('.//h4[@class="title text-overflow"]/a/@href').extract_first()
            item = MovieItem()
            item["name"] = name
            # meta是一个字典，字典中所有的键值对都可以传递给指定好的回调函数
            yield scrapy.Request(url=detail_url,callback= self.detail_parse,meta={'item':item})

movie.py