【问题标题】:How to call getPage from HtmlUnit WebClient and have setTimeout not wait forever?如何从 HtmlUnit WebClient 调用 getPage 并且 setTimeout 不会永远等待?
【发布时间】:2012-02-03 09:55:25
【问题描述】:

我遇到了与问题Call getPage from htmlunit WebClient with JavaScript disabled and setTimeout set to 10000 waits forever 中描述的相同的问题。

那里只有一个相关(复杂)的可能答案(by theytoo)。所以我想知道是否:

  1. 有人有更简单的答案吗?
  2. 有人可以验证解决方案是否有效吗?

【问题讨论】:

  • 也许最好提供最简单的代码来导致此异常不被抛出,以及您正在使用的 HtmlUnit 版本。
  • 是的,我也有一个(a)。我们在 HtmlUnit 2.9 怎么样:
  • 是的,我也有一个(小胡子)。我们在 HtmlUnit 2.9 怎么样:webClient = new WebClient(); webClient().setTimeout(180000); page=webClient.getPage("myurl");在一次大尝试中......

标签: settimeout htmlunit


【解决方案1】:

我使用的代码:

package main;

import java.io.IOException;
import java.net.MalformedURLException;

import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException;
import com.gargoylesoftware.htmlunit.WebClient;

public class Test {

    public static void main(final String[] args) {
        final WebClient webClient = new WebClient();
        webClient.setTimeout(1000);
        try {
            System.out.println("Querying");
            webClient.getPage("http://www.google.com");
            System.out.println("Success");
        } catch (final FailingHttpStatusCodeException e) {
            System.out.println("One");
            e.printStackTrace();
        } catch (final MalformedURLException e) {
            System.out.println("Two");
            e.printStackTrace();
        } catch (final IOException e) {
            System.out.println("Three");
            e.printStackTrace();
        } catch (final Exception e) {
            System.out.println("Four");
            e.printStackTrace();
        }
        System.out.println("Finished");
    }

}

输出(删除所有 CSS 和 JS 警告):

Querying
Success
Finished

将超时时间从 1000 更改为 1 后(我不会在 1 毫秒内点击谷歌):

Querying
Three
org.apache.http.conn.ConnectTimeoutException: Connect to www.google.com:80 timed out
    at com.gargoylesoftware.htmlunit.SocksSocketFactory.connectSocket(SocksSocketFactory.java:92)
    at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:148)
    at org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:149)
    at org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:121)
    at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:573)
    at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:425)
    at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
    at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:776)
    at com.gargoylesoftware.htmlunit.HttpWebConnection.getResponse(HttpWebConnection.java:152)
    at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseFromWebConnection(WebClient.java:1439)
    at com.gargoylesoftware.htmlunit.WebClient.loadWebResponse(WebClient.java:1358)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:307)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:373)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:358)
    at main.Test.main(Test.java:17)
Finished

结论:我无法重现它,它按预期工作

【讨论】:

  • 是的,好吧,我想我很努力,但如果我的问题这么简单,我就不会把它放在 StackOverflow 上。基于并且也基于hc.apache.org/httpcomponents-client-ga/httpclient/apidocs/org/… 的文档
  • ... 基于上述其他 StackOverflow 问题以及hc.apache.org/httpcomponents-client-ga/httpclient/apidocs/org/… 的文档,我将尝试两件事(它们是配置参数,所以这很容易):1)大幅减少超时3分钟到“一些”(?)秒和2)将IP地址平放在URL而不是站点域中。感谢您的想法; A.R. 会及时通知您
  • 你用的是什么版本?我使用的是 htmlunit 2.20,WebClient 中没有 setTimeout 方法。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2015-05-29
  • 1970-01-01
  • 1970-01-01
  • 2019-12-21
  • 2019-05-21
  • 1970-01-01
相关资源
最近更新 更多