【发布时间】:2021-08-20 21:49:48
【问题描述】:
import scrapy
class BestBooksSpider(scrapy.Spider):
name = 'best_books'
page_num = 2
allowed_domains = [
'www.goodreads.com/list/show/1.Best_Books_Ever?page=1']
start_urls = [
'https://www.goodreads.com/list/show/1.Best_Books_Ever?page=1']
def parse(self, response):
page_num = 2
for books in response.xpath('//tr'):
yield {
'Title': books.css('a.bookTitle span::text').get(),
'Author': books.css('a.authorName *::text').get(),
'Rating': books.css('span.minirating::text').get(),
}
# this part is not working, won't read past page 1
next_page = 'https://www.goodreads.com/list/show/1.Best_Books_Ever?page=' + \
str(BestBooksSpider.page_num)
if BestBooksSpider.page_num < 3:
BestBooksSpider.page_num += 1
yield response.follow(next_page, callback=self.parse)
首页效果很好,但它不会阅读后续页面。我从其他教程中尝试了许多不同的代码变体,但均未成功。我在scrapy中没有收到任何错误代码。 Scrapy 只是表示它已完成。
【问题讨论】:
-
日志说什么?你的
allowed_domains开始是错误的……
标签: python pagination scrapy