Start_urls 没有被解析答案

【问题标题】：Start_urls not getting parsedStart_urls 没有被解析
【发布时间】：2021-09-24 00:56:58
【问题描述】：

下面的代码与您在使用Scrapy's FormRequest 时在大多数教程中看到的代码非常接近，但出于某种原因，无论我尝试什么变体，我似乎都无法让它工作。我的理解（也许我错了）start_url 应该基本上交给parse 函数，该函数开始抓取网站的过程。每当我运行这个脚本时，它只是将start_url 设置为 URL，然后将parse 视为一个未调用的函数（跳过它）。我不确定我做错了什么，但这让我发疯了！

import requests
import scrapy
from scrapy import Spider
from scrapy.http import FormRequest

def authentication_failed(response):
# TODO: Check the contents of the response and return True if it failed
# or False if it succeeded.
pass 
class LoginSpider(scrapy.Spider):
   name = 'example.com'
   start_urls = ["https://app.hubspot.com/login"]

   def parse(self, response):
      f=open("/PATH/auth.txt","r")
      lines=f.readlines()
      username=lines[0]
      password=lines[1]
      f.close() 
      yield scrapy.FormRequest.from_response(
        response,
        formdata={'email': username, 'password': password},
        callback=self.after_login(self,response)
    )

   def after_login(self, response):
      if authentication_failed(response):
          self.logger.error("Login failed")
          return

【问题讨论】：

发布代码输出
没有可发布的内容。该脚本不返回任何内容，否则我会发布它。脚本运行，然后完成，没有错误，没有输出。

标签： python python-3.x web-scraping scrapy

【解决方案1】：

它被传递给解析函数，这是页面：

它正在使用 Javascript 来检查您的浏览器，请尝试使用无头浏览器。

【讨论】：

我很好奇你是怎么知道的？当我运行上面的脚本时，我什么也得不到，当我在调试过程中设置断点时，它永远不会在 parse 方法中中断。
你可以用scrapy shell打开它，然后在浏览器中输入'view(response)'来查看它，或者添加到你的代码中：'from scrapy.shell import inspect_response inspect_response(response, self)' see here