首先从国外一个网站爬取了免费的代理ip信息存到mongodb中;接着代码设置:

在爬虫客户端抽象类中添加属性:

设置代理的代码其实就以下几句:

firefoxProfile.setPreference("network.proxy.type", 1);
firefoxProfile.setPreference("network.proxy.no_proxies_on", "localhost, 127.0.0.1"); 

firefoxProfile.setPreference("network.proxy.http", proxyHttp.getIp());
firefoxProfile.setPreference("network.proxy.http_port", proxyHttp.getPort());

firefoxProfile.setPreference("network.proxy.ssl", proxyHttps.getIp());
firefoxProfile.setPreference("network.proxy.ssl_port", proxyHttps.getPort());

以下是具体实现代码:

/**
* 爬虫客户端抽象类
* 其生命周期如下
* setSpiderDao→setRootUrl→setParamsMap→init→runSpider→returnData→destory
*/
public abstract class SpiderClient {

private static final Logger logger = LoggerFactory.getLogger(SpiderClient.class);
protected SpiderDao spiderDao;
protected SpiderData spiderData;
protected WebDriver driver;
protected String rootUrl;
protected Map<String, Object> params;
private String collection;
protected boolean enableProxy;

 

//.. get set

/**
* 初始化工作
*/
public void init(){

FirefoxProfile firefoxProfile = new FirefoxProfile(); 

}
}
this.driver = new FirefoxDriver(firefoxProfile);
this.driver.manage().timeouts().implicitlyWait(30, TimeUnit.SECONDS);
this.spiderData = new SpiderData();
this.spiderData.setIds(new ArrayList<String>());

}

//先从China的ip获取(信号相对好,网速快)

private ProxyIP getProxyIPForHttp(){
MongoSpiderDao mongoSpiderDao = (MongoSpiderDao) spiderDao;
List<ProxyIP> list = mongoSpiderDao.getProxyIP("HTTP", "China", 20); //从mongodb中查询20条ip数据
if(list==null || list.isEmpty()){
return null;
}
return list.get(RandomUtils.nextInt(0, list.size()));
}
private ProxyIP getProxyIPForHttps(){
MongoSpiderDao mongoSpiderDao = (MongoSpiderDao) spiderDao;
List<ProxyIP> list = mongoSpiderDao.getProxyIP("HTTPS", "China", 20);
if(list==null || list.isEmpty()){
return null;
}
return list.get(RandomUtils.nextInt(0, list.size()));
}

...

}

 

有个很好的自动化获取有效免费代理ip的项目:https://github.com/yzf233/IPProxyTool,只需要跑一下命令即可;

相关文章: