【问题标题】:Unable to use proxies in Scrapy project无法在 Scrapy 项目中使用代理
【发布时间】:2018-04-19 17:59:28
【问题描述】:

我一直在尝试抓取一个网站,该网站似乎已识别并阻止了我的 IP,并抛出 429 Too many requests 响应。

我从这个链接安装了 scrapy-proxy:https://github.com/aivarsk/scrapy-proxies 并按照给定的说明进行操作。 我从这里得到了一个代理列表:http://www.gatherproxy.com/,现在这就是我的 settings.py 和 proxylist.txt 的样子:

Settings.py

BOT_NAME = 'project'
SPIDER_MODULES = ['project.spiders']
NEWSPIDER_MODULE = 'project.spiders'
# Retry many times since proxies often fail
RETRY_TIMES = 10
# Retry on most error codes since proxies fail for different reasons
RETRY_HTTP_CODES = [429, 500, 503, 504, 400, 403, 404, 408]

DOWNLOADER_MIDDLEWARES = {
    'scrapy.downloadermiddlewares.retry.RetryMiddleware': 90,
    'scrapy_proxies.RandomProxy': 100,
    'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 110,
}

PROXY_LIST = "filepath\proxylist.txt"
USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36'
CONCURRENT_REQUESTS = 1
DOWNLOAD_DELAY = 2

PROXY_MODE = 0
DOWNLOAD_HANDLERS = {'s3': None}

EXTENSIONS = {
   'scrapy.telnet.TelnetConsole': None
}

proxylist.txt

http://195.208.172.20:8080
http://154.119.56.179:9999
http://124.12.50.43:8088
http://61.7.168.232:52136
http://122.193.188.236:8118

然而,当我运行我的爬虫时,我收到以下错误:

[scrapy.proxies] DEBUG: Proxy user pass not found

我尝试在谷歌上搜索具体错误,但找不到任何解决方案。

我们将不胜感激。提前非常感谢。

【问题讨论】:

标签: python web-scraping proxy scrapy web-crawler


【解决方案1】:

我建议您创建自己的中间件以像这样指定 IP:PORT 并将此 proxies.py 中间件文件放在项目的 middleware 文件夹中:

class ProxiesMiddleware(object):
    def __init__(self, settings):
        pass

    @classmethod
    def from_crawler(cls, crawler):
        return cls(crawler.settings)

    def process_request(self, request, spider):
        request.meta['proxy'] = "http://IP:PORT"

ProxiesMiddleware 中间件行添加到您的settings.py

DOWNLOADER_MIDDLEWARES = {
   'yourproject.middleware.proxies.ProxiesMiddleware':400,
}

【讨论】:

  • 它给出了这个错误:ImportError: No module named proxies
  • @Kunwar 这可能取决于您的文件夹层次结构。您需要找到您的 ProxiesMiddleware 文件/函数的确切位置。您可能将其直接放在您的middleware 文件夹/文件中,在这种情况下,您应该从您的DOWNLOADER_MIDDLEWARES 列表中的该项目中删除.proxies
猜你喜欢
  • 2018-10-02
  • 1970-01-01
  • 1970-01-01
  • 2015-03-21
  • 2013-07-28
  • 2014-02-14
  • 2018-06-13
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多