【问题标题】:Scrapy spider not scraping correct divScrapy蜘蛛没有刮取正确的div
【发布时间】:2018-08-17 16:52:08
【问题描述】:
import scrapy
class rottenTomatoesSpider(scrapy.Spider):
    name = "movieList"
    start_urls = [
         'https://www.rottentomatoes.com/'
    ]

def parse(self, response):
    for movieList in response.xpath('//div[@id="homepage-opening-this-week"]'):
        yield {
           'score': response.css('td.left_col').extract_first(),
           'title': response.css('td.middle_col').extract_first(),
           'openingDate': response.css('td.right_col right').extract_first()
        }

所以蜘蛛正在抓取<div id='homepage-tv-top'>

我假设是 homepage- 混淆了脚本。有人知道解决方法吗?

【问题讨论】:

    标签: python html scrapy rotten-tomatoes


    【解决方案1】:

    您需要遍历每个 tr 并且在 for 循环中使用 movieList 而不是 response

    for movieList in response.xpath('//div[@id="homepage-opening-this-week"]//tr'):
        yield {
           'score': "".join(a for a in movieList.css('td.left_col *::text').extract()),
           'title': "".join(a for a in movieList.css('td.middle_col *::text').extract()),
           'openingDate': "".join(a for a in movieList.css('td.right_col *::text').extract())
        }
    

    【讨论】:

      猜你喜欢
      • 2015-04-03
      • 1970-01-01
      • 2017-07-25
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2018-11-29
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多