【发布时间】:2022-09-30 21:09:58
【问题描述】:
我想使用 Scrapy 在 url 中提取不同书籍的标题,并将它们作为字典数组输出/存储在 json 文件中。
这是我的代码:
import scrapy
class BooksSpider(scrapy.Spider):
name = \"books\"
star_urls = [
\"http://books.toscrape.com\"
]
def parse(self, response):
titles = response.css(\"article.product_pod h3 a::attr(title)\").getall()
for title in titles:
yield {\"title\": title}
这是我在终端中输入的内容:
scrapy crawl books -o books.json
books.json 文件已创建但为空。
我检查了我是否在正确的目录和 venv 中,但它仍然无法正常工作。
然而:
早些时候,我部署了这个蜘蛛来抓取整个 html 数据并将其写入 books.html 文件,一切正常。
这是我的代码:
import scrapy
class BooksSpider(scrapy.Spider):
name = \"books\"
star_urls = [
\"http://books.toscrape.com\"
]
def parse(self, response):
with open(\"books.html\", \"wb\") as file:
file.write(response.body)
这是我在终端中输入的内容:
scrapy crawl books
关于我做错了什么的任何想法?谢谢
编辑:
输入response.css(\'article.product_pod h3 a::attr(title)\').getall()
进入scrapy shell输出:
[\'A Light in the Attic\', \'Tipping the Velvet\', \'Soumission\', \'Sharp Objects\', \'Sapiens: A Brief History of Humankind\', \'The Requiem Red\', \'The Dirty Little Secrets of Getting Your Dream Job\', \'The Coming Woman: A Novel Based on the Life of the Infamous Feminist, Victoria Woodhull\', \'The Boys in the Boat: Nine Americans and Their Epic Quest for Gold at the 1936 Berlin Olympics\', \'The Black Maria\', \'Starving Hearts (Triangular Trade Trilogy, #1)\', \"Shakespeare\'s Sonnets\", \'Set Me Free\', \"Scott Pilgrim\'s Precious Little Life (Scott Pilgrim #1)\", \'Rip it Up and Start Again\', \'Our Band Could Be Your Life: Scenes from the American Indie Underground, 1981-1991\', \'Olio\', \'Mesaerion: The Best Science Fiction Stories 1800-1849\', \'Libertarianism for Beginners\', \"It\'s Only the Himalayas\"]
-
您是否验证过您的
.getall()确实使用调试器或调用print()返回了一些东西? -
我首先在scrapy shell中使用它并得到了一个标题列表,所以它确实返回了一些东西
标签: python json python-3.x macos scrapy