【发布时间】:2020-01-23 11:18:42
【问题描述】:
如果我尝试在我的 parse() 方法中使用 response.request.url 来生成 url,它会返回:
http://192.168.99.100:8050/execute
在 Lua 脚本中返回 URL 有效,但我不知道如何在 parse() 方法中生成它。
import scrapy
from scrapy_splash import SplashRequest
class ComputersSpider(scrapy.Spider):
name = 'computers'
allowed_domains = ['http://daraz.pk']
start_urls = ['http://daraz.pk']
script = '''
function main(splash, args)
splash.private_mode_enabled = false
assert(splash:go(args.url))
assert(splash:wait(1))
input = assert(splash:select("#q"))
input:focus()
input:send_text("computers")
button = assert(splash:select(".search-box__button--1oH7"))
button:mouse_click()
assert(splash:wait(6))
splash:set_viewport_full()
return {
html = splash:html(),
link = splash:url(), -- "I WANT TO YIELD THIS THING IN THE PARSE() METHOD"
}
end '''
def start_requests(self):
for url in self.start_urls:
yield SplashRequest(url = url, callback = self.parse, endpoint= 'execute', args = {"wait" : 3, 'lua_source' : self.script})
def parse(self, response):
link = response.request.url
yield {
'URL' : link,
}
尝试使用response.url,它返回起始url
【问题讨论】:
标签: web-scraping scrapy scrapy-splash