【发布时间】:2011-08-17 22:29:56
【问题描述】:
我正在用 Java 编写网络爬虫,但我在代理服务器后面,这让事情变得非常困难。
这是连接代码:
public void scrape(String url, String filename) throws Exception {
this.url = url;
this.filename = filename;
System.out.println("Scraping " + url);
System.out.println("Saving to \"" + this.filename + "\"");
try {
makeConnection();
createStream();
writeToFile();
System.out.println("Scrape was successful");
} catch (Exception e) {
System.err.println("Error: " + e.getMessage());
}
}
private void makeConnection() throws Exception {
// Set proxy info
System.setProperty("java.net.useSystemProxies", "true");
URL address = new URL(url);
connection = address.openConnection();
}
这是输出:
Scraping http://feeds.bbci.co.uk/news/northern_ireland/rss.xml
Saving to "../rss/northern_ireland.xml"
Error: Connection timed out
有没有更好的代理设置方法?
【问题讨论】: