【发布时间】:2012-10-06 12:10:45
【问题描述】:
我喜欢这样的页面:
www.foo1.bar
www.foo2.bar
www.foo3.bar
.
.
www.foo100.bar
我正在使用库 jsoup 并使用 Thread 同时连接到每个页面:
Thread matchThread = new Thread(task);
matchThread.start();
每个任务,像这样连接到页面,并解析 HTML:
Jsoup.connect("www.fooX.bar").timeout(0).get();
获取大量这些异常:
java.net.ConnectException: Connection timed out: connect
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:529)
at sun.net.NetworkClient.doConnect(NetworkClient.java:158)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:388)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:523)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:227)
at sun.net.www.http.HttpClient.New(HttpClient.java:300)
at sun.net.www.http.HttpClient.New(HttpClient.java:317)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:970)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:911)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:836)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:404)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:391)
at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:157)
at org.jsoup.helper.HttpConnection.get(HttpConnection.java:146)
jsoup 是否同时只允许 1 个线程?或者我做错了什么? 关于如何更快地连接到我的页面的任何建议,因为逐一进行需要很长时间。
编辑:
所有 700 个线程都使用这种方法,也许这是问题所在。这个方法能处理这么多线程还是单例?
private static Document connect(String url) {
Document doc = null;
try {
doc = Jsoup.connect(url).timeout(0).get();
} catch (IOException e) {
System.out.println(url);
}
return doc;
}
编辑:整个线程代码
public class MatchWorker implements Callable<Match>{
private Element element;
public MatchWorker(Element element) {
this.element = element;
}
@Override
public Match call() throws Exception {
Match match = null;
Util.connectAndDoStuff();
return match;
}
}
我的所有 700 元素:
Collection<Match> matches = new ArrayList<Match>();
Collection<Future<Match>> results = new ArrayList<Future<Match>>();
for (Element element : elements) {
MatchWorker matchWorker = new MatchWorker(element);
FutureTask<Match> task = new FutureTask<Match>(matchWorker);
results.add(task);
Thread matchThread = new Thread(task);
matchThread.start();
}
for(Future<Match> match : results) {
try {
matches.add(match.get());
} catch (Exception e) {
e.printStackTrace();
}
}
【问题讨论】:
-
您是否仔细检查过 www.fooX.bar 是否在线 - 例如浏览器可以访问它还是你可以在端口 80 上远程登录它?
-
是的,我可以访问它,它基本上只是一个和其他网站一样的网站。
-
@Jaanus:我也经常遇到这个问题。您能找到解决方案吗?
标签: java multithreading jsoup runnable callable