【发布时间】:2018-09-14 13:13:29
【问题描述】:
我正在抓取受 Cloudflare 保护的网站,有时由于使用 ReCapcha 重定向到页面而出现错误,由于某些 javascript 错误,该页面甚至无法加载。代码在#getPage 方法上失败,我不知道为什么。
这是正常页面的代码,但在确认页面上失败:
final WebClient webClient = new WebClient(BrowserVersion.CHROME);
webClient.getOptions().setJavaScriptEnabled(true);
final HtmlPage page = webClient.getPage("https://mydummy.site");
webClient.waitForBackgroundJavaScript(10000);
int waitForBackgroundJavaScript = webClient.waitForBackgroundJavaScript(200);
int loopCount = 0;
while (waitForBackgroundJavaScript > 0 && loopCount < 2) {
++loopCount;
waitForBackgroundJavaScript = webClient.waitForBackgroundJavaScript(200);
if (waitForBackgroundJavaScript == 0) {
break;
}
}
日志:
java.lang.RuntimeException: com.gargoylesoftware.htmlunit.ScriptException: Wrapped com.gargoylesoftware.htmlunit.ScriptException: Wrapped com.gargoylesoftware.htmlunit.ScriptException: TypeError: Cannot find function start in object [object MessagePort]. (https://www.gstatic.com/recaptcha/api2/v1536705955372/recaptcha__en.js#249) (https://www.gstatic.com/recaptcha/api2/v1536705955372/recaptcha__en.js#253)
at com.gargoylesoftware.htmlunit.html.HtmlPage.initialize(HtmlPage.java:305)
at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:539)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:399)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:316)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:467)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:449)
at Main.htmlUnit(Main.java:156)
at Main.main(Main.java:43)
Caused by: com.gargoylesoftware.htmlunit.ScriptException: Wrapped com.gargoylesoftware.htmlunit.ScriptException: Wrapped com.gargoylesoftware.htmlunit.ScriptException: TypeError: Cannot find function start in object [object MessagePort]. (https://www.gstatic.com/recaptcha/api2/v1536705955372/recaptcha__en.js#249) (https://www.gstatic.com/recaptcha/api2/v1536705955372/recaptcha__en.js#253)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:892)
at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:616)
at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:532)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:772)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:748)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:104)
at com.gargoylesoftware.htmlunit.html.HtmlPage.loadExternalJavaScriptFile(HtmlPage.java:992)
at com.gargoylesoftware.htmlunit.html.HtmlScript.executeScriptIfNeeded(HtmlScript.java:371)
at com.gargoylesoftware.htmlunit.html.HtmlScript$2.execute(HtmlScript.java:246)
at com.gargoylesoftware.htmlunit.html.HtmlPage.initialize(HtmlPage.java:298)
【问题讨论】:
标签: javascript java htmlunit cloudflare