Python scrapy屈服于.json文件不起作用答案

【问题标题】：Python scrapy yield to .json file not workingPython scrapy屈服于.json文件不起作用
【发布时间】：2022-09-30 21:09:58
【问题描述】：

我想使用 Scrapy 在 url 中提取不同书籍的标题，并将它们作为字典数组输出/存储在 json 文件中。

这是我的代码：

import scrapy

class BooksSpider(scrapy.Spider):
    name = \"books\"
    star_urls = [ 
        \"http://books.toscrape.com\"
    ]

def parse(self, response):
    titles = response.css(\"article.product_pod h3 a::attr(title)\").getall()
    for title in titles:
        yield {\"title\": title}

这是我在终端中输入的内容：

scrapy crawl books -o books.json

books.json 文件已创建但为空。

我检查了我是否在正确的目录和 venv 中，但它仍然无法正常工作。

然而：

早些时候，我部署了这个蜘蛛来抓取整个 html 数据并将其写入 books.html 文件，一切正常。

这是我的代码：

import scrapy

class BooksSpider(scrapy.Spider):
    name = \"books\"
    star_urls = [ 
        \"http://books.toscrape.com\"
    ]
    def parse(self, response):
        with open(\"books.html\", \"wb\") as file:
            file.write(response.body)

这是我在终端中输入的内容：

scrapy crawl books

关于我做错了什么的任何想法？谢谢

编辑：

输入response.css(\'article.product_pod h3 a::attr(title)\').getall()

进入scrapy shell输出：

[\'A Light in the Attic\', \'Tipping the Velvet\', \'Soumission\', \'Sharp Objects\', \'Sapiens: A Brief History of Humankind\', \'The Requiem Red\', \'The Dirty Little Secrets of Getting Your Dream Job\', \'The Coming Woman: A Novel Based on the Life of the Infamous Feminist, Victoria Woodhull\', \'The Boys in the Boat: Nine Americans and Their Epic Quest for Gold at the 1936 Berlin Olympics\', \'The Black Maria\', \'Starving Hearts (Triangular Trade Trilogy, #1)\', \"Shakespeare\'s Sonnets\", \'Set Me Free\', \"Scott Pilgrim\'s Precious Little Life (Scott Pilgrim #1)\", \'Rip it Up and Start Again\', \'Our Band Could Be Your Life: Scenes from the American Indie Underground, 1981-1991\', \'Olio\', \'Mesaerion: The Best Science Fiction Stories 1800-1849\', \'Libertarianism for Beginners\', \"It\'s Only the Himalayas\"]

您是否验证过您的.getall() 确实使用调试器或调用print() 返回了一些东西？
我首先在scrapy shell中使用它并得到了一个标题列表，所以它确实返回了一些东西

标签： python json python-3.x macos scrapy

【解决方案1】：

现在运行代码。它应该可以工作

import scrapy

class QuotesSpider(scrapy.Spider):
    name = "quotes"
    start_urls = ['http://books.toscrape.com/']

    def parse(self, response):

        titles = response.css('.product_pod')
        for title in titles:
            yield {
                "title": title.css('h3 a::attr(title)').get()
                #"title": title.css('h3 a::text').get()
            }

【讨论】：

感谢您的建议，但 json 文件仍然为空。你知道它可能是什么吗？
要运行的终端命令：scrapy crawl quotes -o data.json