Scrapy蜘蛛没有刮取正确的div答案

【问题标题】：Scrapy spider not scraping correct divScrapy蜘蛛没有刮取正确的div
【发布时间】：2018-08-17 16:52:08
【问题描述】：

import scrapy
class rottenTomatoesSpider(scrapy.Spider):
    name = "movieList"
    start_urls = [
         'https://www.rottentomatoes.com/'
    ]

def parse(self, response):
    for movieList in response.xpath('//div[@id="homepage-opening-this-week"]'):
        yield {
           'score': response.css('td.left_col').extract_first(),
           'title': response.css('td.middle_col').extract_first(),
           'openingDate': response.css('td.right_col right').extract_first()
        }

所以蜘蛛正在抓取<div id='homepage-tv-top'>

我假设是 homepage- 混淆了脚本。有人知道解决方法吗？

【问题讨论】：

标签： python html scrapy rotten-tomatoes

【解决方案1】：

您需要遍历每个 tr 并且在 for 循环中使用 movieList 而不是 response

for movieList in response.xpath('//div[@id="homepage-opening-this-week"]//tr'):
    yield {
       'score': "".join(a for a in movieList.css('td.left_col *::text').extract()),
       'title': "".join(a for a in movieList.css('td.middle_col *::text').extract()),
       'openingDate': "".join(a for a in movieList.css('td.right_col *::text').extract())
    }

【讨论】：