【发布时间】:2018-05-13 03:29:54
【问题描述】:
我想创建一个调度程序脚本以按顺序多次运行同一个蜘蛛。
到目前为止,我得到了以下信息:
#!/usr/bin/python3
"""Scheduler for spiders."""
import time
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
from my_project.spiders.deals import DealsSpider
def crawl_job():
"""Job to start spiders."""
settings = get_project_settings()
process = CrawlerProcess(settings)
process.crawl(DealsSpider)
process.start() # the script will block here until the end of the crawl
if __name__ == '__main__':
while True:
crawl_job()
time.sleep(30) # wait 30 seconds then crawl again
现在蜘蛛第一次正确执行,然后在时间延迟之后,蜘蛛再次启动,但在它开始抓取之前我收到以下错误消息:
Traceback (most recent call last):
File "scheduler.py", line 27, in <module>
crawl_job()
File "scheduler.py", line 17, in crawl_job
process.start() # the script will block here until the end of the crawl
File "/usr/local/lib/python3.5/dist-packages/scrapy/crawler.py", line 285, in start
reactor.run(installSignalHandlers=False) # blocking call
File "/usr/local/lib/python3.5/dist-packages/twisted/internet/base.py", line 1193, in run
self.startRunning(installSignalHandlers=installSignalHandlers)
File "/usr/local/lib/python3.5/dist-packages/twisted/internet/base.py", line 1173, in startRunning
ReactorBase.startRunning(self)
File "/usr/local/lib/python3.5/dist-packages/twisted/internet/base.py", line 684, in startRunning
raise error.ReactorNotRestartable()
twisted.internet.error.ReactorNotRestartable
不幸的是,我不熟悉Twisted 框架及其Reactors,因此我们将不胜感激!
【问题讨论】:
标签: python-3.x web-scraping scrapy twisted