【问题标题】:Failure to use Netnut.io proxy with Apify Cheerio scraper无法将 Netnut.io 代理与 Apify Cheerio 刮板一起使用
【发布时间】:2020-11-16 13:50:35
【问题描述】:

我正在开发网络爬虫,我想将来自 Netnut 的 Proxy 集成到其中。

Netnut 集成给出:

代理网址:gw.ntnt.io 代理端口:5959 代理用户:igorsavinkin-cc-any 代理密码:xxxxx

示例循环 IP 格式(IP:PORT:USERNAME-CC-COUNTRY:PASSWORD): gw.ntnt.io:5959:igorsavinkin-cc-any:xxxxx

要更改国家/地区,请将“任何”更改为您想要的 国家。 (美国、英国、IT、DE 等)可用国家/地区: https://l.netnut.io/countries

如果您希望将它们设为静态,我们的 IP 会自动轮换 住宅,请在用户名参数中添加会话 ID,例如 下面的例子:

用户名-cc-any-sid-any_number

代码:

    Apify.main(async () => { 
    const proxyConfiguration = await Apify.createProxyConfiguration({
    proxyUrls: [ 
            'gw.ntnt.io:5959:igorsavinkin-DE:xxxxx'
        ]
    });
    // Add URLs to a RequestList
    const requestQueue = await Apify.openRequestQueue(queue_name);
    await requestQueue.addRequest({ url: 'https://ip.nf/me.txt' });
    
    // Create an instance of the CheerioCrawler class - a crawler
    // that automatically loads the URLs and parses their HTML using the cheerio library.
    const crawler = new Apify.CheerioCrawler({ 
        // Let the crawler fetch URLs from our list.
        requestQueue,
        // To use the proxy IP session rotation logic, you must turn the proxy usage on.
        proxyConfiguration,
        // Activates the Session pool.         
        minConcurrency: 10,
        maxConcurrency: 50,
        // On error, retry each page at most once.
        maxRequestRetries: 2,

        // Increase the timeout for processing of each page.
        handlePageTimeoutSecs: 50,

        // Limit to 10 requests per one crawl
        maxRequestsPerCrawl: 1000,

        handlePageFunction: async ({ request, $/*, session*/ }) => {
            const text = $('body').text();
            log.info(text);
            ...
       });
       await crawler.run();
    });

错误:RequestError: getaddrinfo ENOTFOUND 5959 5959:80

似乎爬虫与 url 端口 5959 和 80 混合...

ERROR CheerioCrawler: handleRequestFunction failed, reclaiming failed request
 back to the list or queue {"url":"https://ip.nf/me.txt","retryCount":3,"id":
"F32s4Txz0fBUmwd"}
  RequestError: getaddrinfo ENOTFOUND 5959 5959:80
      at ClientRequest.request.once (C:\Users\User\Documents\RnD\Node.js\merc
ateo-scraper\node_modules\got\dist\source\core\index.js:953:111)
      at Object.onceWrapper (events.js:285:13)
      at ClientRequest.emit (events.js:202:15)
      at ClientRequest.origin.emit.args (C:\Users\User\Documents\RnD\Node.js\
mercateo-scraper\node_modules\@szmarczak\http-timer\dist\source\index.js:39:2
0)
      at onerror (C:\Users\User\Documents\RnD\Node.js\mercateo-scraper\node_m
odules\agent-base\dist\src\index.js:115:21)
      at callbackError (C:\Users\User\Documents\RnD\Node.js\mercateo-scraper\
node_modules\agent-base\dist\src\index.js:134:17)
      at processTicksAndRejections (internal/process/next_tick.js:81:5)

有什么办法吗?

【问题讨论】:

    标签: proxy apify


    【解决方案1】:

    尝试以这种格式使用它:

    http://username:password@host:port

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2014-08-10
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多