Scrapy 根据条件跟踪 url答案

【问题标题】：Scrapy follow urls based on conditionScrapy 根据条件跟踪 url
【发布时间】：2021-04-12 11:05:02
【问题描述】：

我正在使用 Scrapy，我想提取至少有 4 个帖子的每个主题。我有两个单独的选择器：

real_url_list 以获取每个主题的 href

nbpostsintopic_resp 获取帖子数量

real_url_list = response.css("td.col-xs-8 a::attr(href)").getall()
for topic in real_url_list:
    nbpostsintopic_resp = response.css("td.center ::text").get()
    nbpostsintopic = nbpostsintopic_resp[0]
    if int(nbpostsintopic) > 4: 
    yield response.follow(topic, callback=self.topic)

ULR：https://www.allodocteurs.fr/forums-et-chats/forums/allergies/allergies-aux-pollens/

不幸的是，上面没有按预期工作，帖子的数量似乎没有考虑在内。有没有办法达到这样的条件？

提前谢谢你。

【问题讨论】：

标签： python web-scraping scrapy

【解决方案1】：

你的问题在于这一行

nbpostsintopic_resp = response.css("td.center ::text").get()

请注意，这总是会给你同样的东西，没有引用你的 topic 变量。

相反，循环遍历tr 选择器，然后从中获取信息

def parse(self, response):
    for row in response.css("tbody > tr"):
        nbpostsintopic_resp = row.css("td.center::text").get()
        if int(nbpostsintopic_resp) > 4:
            response.follow(row.css("td > a")[0], callback=self.topic)

【讨论】：