【问题标题】:Why HTMLUnit always shows the HostPage no matter what url I type in (Crawlable GWT APP)?为什么无论我输入什么 url(可抓取 GWT APP),HTMLUnit 总是显示 HostPage?
【发布时间】:2014-05-16 14:15:27
【问题描述】:

这是完整的代码

public class CrawlServlet implements Filter{
 public static String getFullURL(HttpServletRequest request) {
    StringBuffer requestURL = request.getRequestURL();
    String queryString = request.getQueryString();


    if (queryString == null) {
        return requestURL.toString();
    } else {
        return requestURL.append('?').append(queryString).toString();
    }
 }

 @Override
 public void destroy() {
 // TODO Auto-generated method stub

 }

 @Override
 public void doFilter(ServletRequest request, ServletResponse response,
 FilterChain chain) throws IOException, ServletException {

 HttpServletRequest httpRequest = (HttpServletRequest) request;
 String fullURLQueryString = getFullURL(httpRequest);
 System.out.println(fullURLQueryString+" what wrong");

 if ((fullURLQueryString != null) && (fullURLQueryString.contains("_escaped_fragment_"))) {
     // remember to unescape any %XX characters
     fullURLQueryString=URLDecoder.decode(fullURLQueryString,"UTF-8");
     // rewrite the URL back to the original #! version
         String url_with_hash_fragment=fullURLQueryString.replace("?_escaped_fragment_=", "#!");


         final WebClient webClient = new WebClient();

         WebClientOptions options = webClient.getOptions();
         options.setCssEnabled(false);
         options.setThrowExceptionOnScriptError(false);
         options.setThrowExceptionOnFailingStatusCode(false);
         options.setJavaScriptEnabled(false);
         HtmlPage page = webClient.getPage(url_with_hash_fragment);

         // important!  Give the headless browser enough time to execute JavaScript
         // The exact time to wait may depend on your application.

         webClient.waitForBackgroundJavaScript(20000);

         // return the snapshot
         //String originalHtml=page.getWebResponse().getContentAsString();
         //System.out.println(originalHtml+" +++++++++");
         System.out.println(page.asXml()+" +++++++++");

         PrintWriter out = response.getWriter();
         out.println(page.asXml());
         //out.println(originalHtml);
     } else {
      try {
        // not an _escaped_fragment_ URL, so move up the chain of servlet (filters)
        chain.doFilter(request, response);
      } catch (ServletException e) {
        System.err.println("Servlet exception caught: " + e);
        e.printStackTrace();
      }
    }

 }


 @Override
 public void init(FilterConfig arg0) throws ServletException {
 // TODO Auto-generated method stub

 }


}

打开网址“http://127.0.0.1:8888/Myproject.html?gwt.codesvr=127.0.0.1:9997?_escaped_fragment_=article”后,显示主机页面html代码如下:

<html>

<head>
<meta name="fragment" content="!">
<meta http-equiv="content-type" content="text/html; charset=UTF-8"/>
<!-- -->
<!--
 Consider inlining CSS to reduce the number of requested files 
-->
<!-- -->
<link type="text/css" rel="stylesheet" href="MyProject.css"/>
<!-- -->
<!-- Any title is fine -->
<!-- -->
<title>MyProject</title>
<!-- -->
<!-- This script loads your compiled module. -->
<!-- If you add any GWT meta tags, they must -->
<!-- be added before this line. -->
<!-- -->
<script type="text/javascript" language="javascript" ></script>
<!-- -->
<!-- The body can have arbitrary html, or -->
<!-- you can leave the body empty if you want -->
<!-- to create a completely dynamic UI. -->
<!-- -->
</head>
<body>

<div id="loading">
Loading
<br/>
<img src="../images/loading.gif"/>
</div>
<!-- OPTIONAL: include this if you want history support -->
<iframe src="javascript:''" id="__gwt_historyFrame" tabindex="-1" style="position: absolute; width: 0;height: 0; border:0;"></iframe>
<!--
 RECOMMENDED if your web app will not function without JavaScript enabled 
-->
<noscript>

<div style="width: 22em; position: absolute; left: 50%; margin-left: -11em; color: red; background-color: white; border: 1pxsolid red; padding: 4px; font-family: sans-serif;">
Your web browser must have JavaScript enabled in order for this application to display correctly.
</div>
</noscript>
</body>
</html>

另一方面,“http://127.0.0.1:8888/Myproject.html?gwt.codesvr=127.0.0.1:9997#!article”工作正常,显示文章没有任何问题。

我还编译了整个项目并在 Tomcat7 下运行它,但我遇到了同样的问题。它总是显示主机页面的html。

注意:文章页面是嵌入在标题展示器中的嵌套展示器。但我认为这不是主要原因,因为它甚至没有显示标题页。

【问题讨论】:

    标签: java gwt htmlunit gwtp


    【解决方案1】:

    首先,而不是?_escaped_fragment_=article,也许试试&amp;_escaped_fragment_=article,因为你已经有? 用于gwt.codesvr,所以2 ? 可能会弄乱url 参数解析。

    其次,您需要确保您的过滤器能够处理具有参数gwt.codesvr 的情况。看起来您的过滤器假定它是第一个参数——即,以? 开头。我相信here 的例子确实适用。

    【讨论】:

    • 这不是真的,为什么?因为我什至将 url 放在 WebClient 之前进行测试,它仍然显示主机页面。 url_with_hash_fragment="127.0.0.1:8888/… final WebClient webClient = new WebClient();
    • 除了我什至用“mydomain.com?_escaped_fragment_=article”编译我的项目和测试之外,结果是一样的
    • 更改为字符串 url_with_hash_fragment=fullURLQueryString.replace("&_escaped_fragment_=", "#!");也不能正常工作
    • 无论如何,我成功地尝试了您的代码并且得到了相同的结果,那就是主机页面。我什至将 url 放在 doFilter 中: String url_with_hash_fragment="127.0.0.1:8888/Ekajati.html?gwt.codesvr=127.0.0.1:9997#!article";最终 URL urlWithHashFragment = 新 URL(url_with_hash_fragment);最终 WebRequest webRequest = new WebRequest(urlWithHashFragment);它显示相同的结果
    • 我无法评论。 GWTP。您是否尝试过针对 GWT 编译的 javascript 而不是开发模式运行(即,使用转义片段但编译后没有 gwt.codesvr)?在本地运行 GWT 编译的 javascript 对我有用。但是,我遇到了开发模式不再在 Linux 上运行的问题,所以很遗憾我无法测试开发模式——这很糟糕,但这是另一个问题!
    猜你喜欢
    • 2019-09-28
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-02-22
    • 1970-01-01
    • 2018-11-22
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多