【发布时间】:2020-11-16 13:50:35
【问题描述】:
我正在开发网络爬虫,我想将来自 Netnut 的 Proxy 集成到其中。
Netnut 集成给出:
代理网址:gw.ntnt.io 代理端口:5959 代理用户:igorsavinkin-cc-any 代理密码:xxxxx
示例循环 IP 格式(IP:PORT:USERNAME-CC-COUNTRY:PASSWORD): gw.ntnt.io:5959:igorsavinkin-cc-any:xxxxx
要更改国家/地区,请将“任何”更改为您想要的 国家。 (美国、英国、IT、DE 等)可用国家/地区: https://l.netnut.io/countries
如果您希望将它们设为静态,我们的 IP 会自动轮换 住宅,请在用户名参数中添加会话 ID,例如 下面的例子:
用户名-cc-any-sid-any_number
代码:
Apify.main(async () => {
const proxyConfiguration = await Apify.createProxyConfiguration({
proxyUrls: [
'gw.ntnt.io:5959:igorsavinkin-DE:xxxxx'
]
});
// Add URLs to a RequestList
const requestQueue = await Apify.openRequestQueue(queue_name);
await requestQueue.addRequest({ url: 'https://ip.nf/me.txt' });
// Create an instance of the CheerioCrawler class - a crawler
// that automatically loads the URLs and parses their HTML using the cheerio library.
const crawler = new Apify.CheerioCrawler({
// Let the crawler fetch URLs from our list.
requestQueue,
// To use the proxy IP session rotation logic, you must turn the proxy usage on.
proxyConfiguration,
// Activates the Session pool.
minConcurrency: 10,
maxConcurrency: 50,
// On error, retry each page at most once.
maxRequestRetries: 2,
// Increase the timeout for processing of each page.
handlePageTimeoutSecs: 50,
// Limit to 10 requests per one crawl
maxRequestsPerCrawl: 1000,
handlePageFunction: async ({ request, $/*, session*/ }) => {
const text = $('body').text();
log.info(text);
...
});
await crawler.run();
});
错误:RequestError: getaddrinfo ENOTFOUND 5959 5959:80
似乎爬虫与 url 端口 5959 和 80 混合...
ERROR CheerioCrawler: handleRequestFunction failed, reclaiming failed request
back to the list or queue {"url":"https://ip.nf/me.txt","retryCount":3,"id":
"F32s4Txz0fBUmwd"}
RequestError: getaddrinfo ENOTFOUND 5959 5959:80
at ClientRequest.request.once (C:\Users\User\Documents\RnD\Node.js\merc
ateo-scraper\node_modules\got\dist\source\core\index.js:953:111)
at Object.onceWrapper (events.js:285:13)
at ClientRequest.emit (events.js:202:15)
at ClientRequest.origin.emit.args (C:\Users\User\Documents\RnD\Node.js\
mercateo-scraper\node_modules\@szmarczak\http-timer\dist\source\index.js:39:2
0)
at onerror (C:\Users\User\Documents\RnD\Node.js\mercateo-scraper\node_m
odules\agent-base\dist\src\index.js:115:21)
at callbackError (C:\Users\User\Documents\RnD\Node.js\mercateo-scraper\
node_modules\agent-base\dist\src\index.js:134:17)
at processTicksAndRejections (internal/process/next_tick.js:81:5)
有什么办法吗?
【问题讨论】: