【问题标题】:How to pass system command line arguments to the Scrapy CrawlerProcess?如何将系统命令行参数传递给 Scrapy CrawlerProcess?
【发布时间】:2017-09-12 12:52:20
【问题描述】:

我有一个 Scrapy 蜘蛛,我将系统参数传递给使用 scrapy crawl 命令。我正在尝试使用 CrawlerProcess 而不是命令行来运行这个蜘蛛。如何将所有相同的命令行参数传递给这个爬虫进程? scrapy crawl example -o data.jl -t jsonlines -s JOBDIR=/crawlstate

from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
process = CrawlerProcess(get_project_settings())
process.crawl(#How do I Pass arguments like -o data.jl -t jsonlines -s 
JOBDIR=/crawlstate here?)
process.start()

【问题讨论】:

    标签: python-2.7 scrapy


    【解决方案1】:

    您可以在将项目设置传递给CrawlerProcess 构造函数之前修改它们:

    ...
    settings = get_project_settings()
    settings.set('FEED_URI', 'data.jl', priority='cmdline')
    settings.set('FEED_FORMAT', 'jsonlines', priority='cmdline')
    settings.set('JOBDIR', '/crawlstate', priority='cmdline')
    process = CrawlerProcess(settings)
    ...
    

    【讨论】:

      猜你喜欢
      • 2013-10-11
      • 1970-01-01
      • 2012-09-01
      • 2020-08-02
      • 2014-01-13
      • 2020-09-02
      • 2014-05-03
      • 2011-10-14
      相关资源
      最近更新 更多