由于目标采集资源为gb2312发生乱码,采用中间件的解决方式,中间件为DownloaderMiddleware

1     def process_response(self, request, response, spider):
2         # Called with the response returned from the downloader.
3         # Must either;
4         # - return a Response object
5         # - return a Request object
6         response = HtmlResponse(url=response.url, body=response.body, encoding='utf-8')
7         # - or raise IgnoreRequest
8         return response

即在下载网页阶段是将网页转换为utf-8格式,另外需要将中间激活,在配置文件settings.py文件中插入代码,以激活

1 DOWNLOADER_MIDDLEWARES = {'news.middlewares.NewsDownloaderMiddleware': 1000}

至此,爬虫文件中不需要进行额外的转码,即可正常显示中文了

相关文章:

  • 2022-01-27
  • 2022-03-01
  • 2022-01-10
  • 2021-06-25
  • 2021-11-09
  • 2021-12-25
  • 2021-11-05
  • 2022-12-23
猜你喜欢
  • 2022-12-23
  • 2022-12-23
  • 2022-12-23
  • 2021-09-20
  • 2021-09-22
  • 2022-12-23
  • 2021-11-28
相关资源
相似解决方案