【发布时间】:2020-05-26 20:25:42
【问题描述】:
谁能告诉我为什么不调用 ParseLinks 和 ParseContent ?其余的运行并打印/附加/执行操作,但我从 teo 解析函数中得到了风滚草。也欢迎任何进一步的错误信息/建议。
import scrapy
import scrapy.shell
from scrapy.crawler import CrawlerProcess
Websites = ("https://www.flylevel.com/", "https://www.latam.com/en_us/")
links = []
D = {}
#D = {main website: links: content}
def dictlayout():
for W in Websites:
D[W] = []
dictlayout()
class spider(scrapy.Spider):
name = "spider"
start_urls = Websites
print("request level 1")
def start_requests(self):
print("request level 2")
for U in self.start_urls:
print("request level 3")
yield scrapy.Request(U, callback = self.ParseLinks)
print("links: ")
print(links)
def ParseLinks(self, response):
Link = response.xpath("/html//@href")
Links = link.extract()
print("parser print")
print(link)
for L in Links:
link.append(L)
D[W]=L
yield response.follow(url=L, callback=self.ParseContent)
def ParseContent(self, response):
content = ParseLinks.extract_first().strip()
D[W][L] = content
print("content")
print(content)
print(D)
print(links)
process = CrawlerProcess()
process.crawl(spider)
process.start()
【问题讨论】:
标签: function web-scraping callback scrapy