从 Scrapy 中的管道和中间件关闭数据库连接答案

【问题标题】：Closing database connection from pipeline and middleware in Scrapy从 Scrapy 中的管道和中间件关闭数据库连接
【发布时间】：2013-05-23 10:10:25
【问题描述】：

我有一个 Scrapy 项目，它使用自定义中间件和自定义管道来检查和存储 Postgres 数据库中的条目。中间件看起来有点像这样：

类 ExistingLinkCheckMiddleware(object):

    def __init__(self):

        ...打开与数据库的连接

    def process_request（自我，请求，蜘蛛）：

        ...在每个请求检查数据库之前
        该页面之前没有被抓取过

管道看起来很相似：

类机器学习管道（对象）：

    def __init__(self):

        ...打开与数据库的连接

    def process_item(self, item, spider):

        ...将项目保存到数据库中

它工作正常，但是当蜘蛛完成时我找不到干净地关闭这些数据库连接的方法，这让我很恼火。

有人知道怎么做吗？

【问题讨论】：

标签： python web-scraping scrapy scrapy-pipeline

【解决方案1】：

我认为最好的方法是使用scrapy的信号spider_closed，例如：

from scrapy import signals
from scrapy.xlib.pydispatch import dispatcher

class ExistingLinkCheckMiddleware(object):

    def __init__(self):
        # open connection to database

        dispatcher.connect(self.spider_closed, signals.spider_closed)

    def spider_closed(self, spider, reason):
        # close db connection

    def process_request(self, request, spider):
        # before each request check in the DB
        # that the page hasn't been scraped before

另见：

希望对您有所帮助。

【讨论】：

我不知道 spider_close 信号。太完美了——谢谢！