【问题标题】:Connection Refused Error 61: Scrapy+splash Docker连接被拒绝错误 61: Scrapy+splash Docker
【发布时间】:2017-10-31 11:20:08
【问题描述】:

我在抓取 javascript 网站时遇到了一些问题。我正在使用scrapy-splash和docker将js渲染为html来抓取。

import scrapy
from scrapy_splash import SplashRequest
class MySpider (scrapy.Spider):
       name = 'spd'
       start_urls = ['http://example.com']

       def start_requests (self):
            for url in self.start_urls:
                yield SplashRequest(url, self.parse, endpoint='render.html', args={'wait':0.5},)
       def parse (self, response):
            for href in response.xpath('xpath'):
                   yield {'info': href.xpath('xpath')} 

这是我的终端输出的内容:

  2017-05-30 13:20:51 [scrapy.core.engine] INFO: Spider opened
  2017-05-30 13:20:51 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
  2017-05-30 13:20:51 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
  2017-05-30 13:20:51 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://example.com via http://192.168.99.100:8050/render.html> (failed 1 times): Connection was refused by other side: 61: Connection refused.
  2017-05-30 13:20:51 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://example.com via http://192.168.99.100:8050/render.html> (failed 2 times): Connection was refused by other side: 61: Connection refused.
  2017-05-30 13:20:51 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET http://example.com via http://192.168.99.100:8050/render.html> (failed 3 times): Connection was refused by other side: 61: Connection refused.
  2017-05-30 13:20:51 [scrapy.core.scraper] ERROR: Error downloading <GET http://example.com via http://192.168.99.100:8050/render.html>: Connection was refused by other side: 61: Connection refused.
  2017-05-30 13:20:51 [scrapy.core.engine] INFO: Closing spider (finished)

【问题讨论】:

  • 什么问题?您需要提供更多信息。你在期待什么。有没有具体的错误?您正在使用/开始使用什么代码。
  • 你是要废弃 example.com 还是只是一个例子?
  • example.com 只是实际网站的占位符

标签: javascript docker web-scraping scrapy-splash


【解决方案1】:

以下日志消息表明 Splash docker 容器未运行或未侦听预期的端口。

DEBUG: Retrying <GET http://example.com via http://192.168.99.100:8050/render.html> (failed 1 times): Connection was refused by other side: 61: Connection refused.
DEBUG: Retrying <GET http://example.com via http://192.168.99.100:8050/render.html> (failed 2 times): Connection was refused by other side: 61: Connection refused.
DEBUG: Gave up retrying <GET http://example.com via http://192.168.99.100:8050/render.html> (failed 3 times): Connection was refused by other side: 61: Connection refused.

要查看 Docker 容器的状态,包括已退出的容器,请尝试运行:

sudo docker ps -a | grep scrapinghub/splash

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2017-12-15
    • 1970-01-01
    • 1970-01-01
    • 2020-10-18
    • 1970-01-01
    • 2011-08-02
    • 2014-09-28
    相关资源
    最近更新 更多