同时从链接中抓取数据以及来自scrapy主页的数据答案

【问题标题】：Scraping the data from the links simultaneously along with the data from the main page in scrapy同时从链接中抓取数据以及来自scrapy主页的数据
【发布时间】：2020-11-11 06:59:25
【问题描述】：

我在这里面临的问题是我试图抓取这个网站的报价

网站：https://quotes.toscrape.com/ 我打算做的是刮掉作者的名字，引用和标签，同时我希望它跟随每个部分中的（关于）标签（这是一个超链接）并刮掉作者的描述以及他的出生日期和将它们全部保存到 CSV 文件中。

我看到了一些关于如何做类似事情的类似问题。但是没看清楚。

如果有人解释如何解决这个问题，解释如何使用 meta/cb_kwargs 等，会很高兴。

这是我的代码。

class QuoteSpider(scrapy.Spider):
    name = "quotes"
    start_urls = [
        "https://quotes.toscrape.com/"
    ]

    def parse(self, response):
        for quote in response.css(".quote"):
            author_link = response.css(".quote span a::attr(href)")
            yield response.follow_all(author_link, callback=self.author_parse)
            yield {
                "author": quote.css(".author::text").get(),
                "text": quote.css(".text::text").get(),
                "tags": quote.css(".tags .tag::text").getall(),
            }

    def author_parse(self, response):
        yield {
            "dob": response.css(".author-born-date::text").get(),
            "bio": response.css(".author-description::text").get(),
        }

【问题讨论】：

标签： python web-scraping scrapy

【解决方案1】：

使用cb_kwargs 是目前的首选方法：

def parse(self, response):
    for quote in response.css(".quote"):
        author_link = response.css(".quote span a::attr(href)")
        author = {
            "author": quote.css(".author::text").get(),
            "text": quote.css(".text::text").get(),
            "tags": quote.css(".tags .tag::text").getall(),
        }
        yield response.follow_all(author_link, callback=self.author_parse, cb_kwargs={'author': author})


def author_parse(self, response, author):
    author["dob"] = response.css(".author-born-date::text").get()
    author["bio"] = response.css(".author-description::text").get()
    yield author

【讨论】：