【发布时间】:2019-01-24 23:14:31
【问题描述】:
这是我要抓取的页面。当我使用 SplashRequest 打开它时,我会得到一个具有相同来源的不同页面。 这些是我对 slas 的设置:
ROBOTSTXT_OBEY = False
SPLASH_URL = 'http://192.168.99.100:8050'
DOWNLOADER_MIDDLEWARES = {
'scrapy_splash.SplashCookiesMiddleware': 723,
'scrapy_splash.SplashMiddleware': 725,
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware':
810,
}
SPIDER_MIDDLEWARES = {
'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}
DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'
我的蜘蛛代码: 导入scrapy 从 scrapy_splash 导入 SplashRequest
class RealForeclosure(scrapy.Spider):
name = 'realForeclosure'
start_urls = [
'https://www.miamidade.realforeclose.com/index.cfm?
zaction=user&zmethod=calendar'
]
def parse(self,response):
link = 'https://www.miamidade.realforeclose.com/index.cfm?
zaction=AUCTION&Zmethod=PREVIEW&AUCTIONDATE='
date = response.xpath('//div[@tabindex="0"]/@dayid').extract()[10]
yield SplashRequest(link+date, callback=self.auction)
def auction(self, response):
for i in response.css('.AUCTION_ITEM').extract():
yield {'item':i}
【问题讨论】:
-
请发布您的蜘蛛代码
-
我添加了蜘蛛代码
标签: python web-scraping scrapy scrapy-splash