【问题标题】:HtmlUnit JavaScript Issue on page load - Cannot find function页面加载时的 HtmlUnit JavaScript 问题 - 找不到函数
【发布时间】:2018-09-14 13:13:29
【问题描述】:

我正在抓取受 Cloudflare 保护的网站,有时由于使用 ReCapcha 重定向到页面而出现错误,由于某些 javascript 错误,该页面甚至无法加载。代码在#getPage 方法上失败,我不知道为什么。

这是正常页面的代码,但在确认页面上失败:

final WebClient webClient = new WebClient(BrowserVersion.CHROME);
    webClient.getOptions().setJavaScriptEnabled(true);

    final HtmlPage page = webClient.getPage("https://mydummy.site");

    webClient.waitForBackgroundJavaScript(10000);

    int waitForBackgroundJavaScript = webClient.waitForBackgroundJavaScript(200);
    int loopCount = 0;
    while (waitForBackgroundJavaScript > 0 && loopCount < 2) {
        ++loopCount;
        waitForBackgroundJavaScript = webClient.waitForBackgroundJavaScript(200);
        if (waitForBackgroundJavaScript == 0) {
            break;
        }
    }

日志:

java.lang.RuntimeException: com.gargoylesoftware.htmlunit.ScriptException: Wrapped com.gargoylesoftware.htmlunit.ScriptException: Wrapped com.gargoylesoftware.htmlunit.ScriptException: TypeError: Cannot find function start in object [object MessagePort]. (https://www.gstatic.com/recaptcha/api2/v1536705955372/recaptcha__en.js#249) (https://www.gstatic.com/recaptcha/api2/v1536705955372/recaptcha__en.js#253)
at com.gargoylesoftware.htmlunit.html.HtmlPage.initialize(HtmlPage.java:305)
at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:539)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:399)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:316)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:467)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:449)
at Main.htmlUnit(Main.java:156)
at Main.main(Main.java:43)
Caused by: com.gargoylesoftware.htmlunit.ScriptException: Wrapped com.gargoylesoftware.htmlunit.ScriptException: Wrapped com.gargoylesoftware.htmlunit.ScriptException: TypeError: Cannot find function start in object [object MessagePort]. (https://www.gstatic.com/recaptcha/api2/v1536705955372/recaptcha__en.js#249) (https://www.gstatic.com/recaptcha/api2/v1536705955372/recaptcha__en.js#253)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:892)
    at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:616)
    at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:532)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:772)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:748)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:104)
    at com.gargoylesoftware.htmlunit.html.HtmlPage.loadExternalJavaScriptFile(HtmlPage.java:992)
    at com.gargoylesoftware.htmlunit.html.HtmlScript.executeScriptIfNeeded(HtmlScript.java:371)
    at com.gargoylesoftware.htmlunit.html.HtmlScript$2.execute(HtmlScript.java:246)
    at com.gargoylesoftware.htmlunit.html.HtmlPage.initialize(HtmlPage.java:298)

【问题讨论】:

    标签: javascript java htmlunit cloudflare


    【解决方案1】:

    我们也一直在努力解决这个问题。我们的测试套件运行完美,直到 2018 年底这个问题破坏了我们所有的登录。我相信谷歌故意将其用于破坏自动破解验证码的尝试,因为解决其中的一部分似乎只会导致另一个问题。加载页面和提交页面都会导致问题,即使您将 HtmlUnitDriver 告诉 ignore all JavaScript errors

    此时我已经尝试了几个选项。如果您使用Google specified test site key,那么错误就会消失。因此,如果您对如何生成该站点密钥具有完全的服务器端控制权,那么您就可以了。请记住确保测试站点密钥在验证错误和所有类似用例中再次显示,否则您将收到该错误。

    (对我们来说不幸的是,我们的登录页面是纯 JSP ,因此实现它是一个令人头疼的问题,除非我们想在任何地方更改 URL。仍在争论该怎么做,因为现在我们确实有一个可行的解决方案,如果丑陋的解决方案涉及一些页面上的条件逻辑并在测试代码的其他点捕获 JavaScript 异常。)

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2022-01-04
      相关资源
      最近更新 更多