【问题标题】:Integrationtest of scrapy pipeline returning deferredscrapy管道返回延迟的集成测试
【发布时间】:2016-06-05 00:44:02
【问题描述】:

是否可以创建一个scrapy-pipeline 的集成测试?我不知道该怎么做。特别是我正在尝试为 FilesPipeline 编写一个测试,并且我还希望它能够保留我对 Amazon S3 的模拟响应。

这是我的测试:

def _mocked_download_func(request, info):
    return Response(url=response.url, status=200, body="test", request=request)

class FilesPipelineTests(unittest.TestCase):

    def setUp(self):
        self.settings = get_project_settings()
        crawler = Crawler(self.settings)
        crawler.configure()
        self.pipeline = FilesPipeline.from_crawler(crawler)
        self.pipeline.open_spider(None)
        self.pipeline.download_func = _mocked_download_func

    @defer.inlineCallbacks
    def test_file_should_be_directly_available_from_s3_when_processed(self):
        item = CrawlResult()
        item['id'] = "test"
        item['file_urls'] = ['http://localhost/test']
        result = yield self.pipeline.process_item(item, None)
        self.assertEquals(result['files'][0]['path'], "full/002338a87aab86c6b37ffa22100504ad1262f21b")

我总是遇到以下错误:

DirtyReactorAggregateError: Reactor was unclean.

如何使用twisted 和scrapy 创建正确的测试?

【问题讨论】:

    标签: python-2.7 scrapy twisted nose


    【解决方案1】:

    现在我在没有调用from_crawler 的情况下进行了管道测试,所以它们并不理想,因为它们不测试from_crawler 的功能,但它们可以工作。

    我使用一个空的Spider 实例来完成它们:

    from scrapy.spiders import Spider
    # some other imports for my own stuff and standard libs
    
    @pytest.fixture
    def mqtt_client():
        client = mock.Mock()
    
        return client
    
    def test_mqtt_pipeline_does_return_item_after_process(mqtt_client):
        spider = Spider(name='spider')
        pipeline = MqttOutputPipeline(mqtt_client, 'dummy-namespace')
    
        item = BasicItem()
        item['url'] = 'http://example.com/'
        item['source'] = 'dummy source'
    
        ret = pipeline.process_item(item, spider)
    
        assert ret is not None
    

    (其实我忘了打open_spider()

    你也可以看看scrapy本身是如何测试管道的,e.g. for MediaPipeline

    class BaseMediaPipelineTestCase(unittest.TestCase):
    
        pipeline_class = MediaPipeline
        settings = None
    
        def setUp(self):
            self.spider = Spider('media.com')
            self.pipe = self.pipeline_class(download_func=_mocked_download_func,
                                        settings=Settings(self.settings))
            self.pipe.open_spider(self.spider)
            self.info = self.pipe.spiderinfo
    
        def test_default_media_to_download(self):
            request = Request('http://url')
            assert self.pipe.media_to_download(request, self.info) is None
    

    您还可以查看他们的其他单元测试。对我来说,这些对于如何对 scrapy 组件进行单元测试总是很好的启发。

    如果你也想测试from_crawler 函数,你可以看看他们的Middleware 测试。在这些测试中,他们经常使用from_crawler来创建中间件,e.g. for OffsiteMiddleware

    from scrapy.spiders import Spider
    from scrapy.utils.test import get_crawler
    
    class TestOffsiteMiddleware(TestCase):
    
        def setUp(self):
            crawler = get_crawler(Spider)
            self.spider = crawler._create_spider(**self._get_spiderargs())
            self.mw = OffsiteMiddleware.from_crawler(crawler)
            self.mw.spider_opened(self.spider)
    

    我假设这里的关键组件是从scrapy.utils.test 调用get_crawler。似乎他们考虑到了一些你需要做的调用才能拥有一个测试环境。

    【讨论】:

      猜你喜欢
      • 2019-03-11
      • 1970-01-01
      • 2019-03-26
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2012-04-08
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多