【发布时间】:2017-10-07 20:20:50
【问题描述】:
我正在尝试使用 HtmlUnit 加载 Playstation 商店页面,但看起来它加载的所有内容都是带有“正在加载...”文本(以及一些 javascript)的空白页面。 我使用以下配置使 HtmlUnit 工作,但无望(它的 kotlin):
@Test
@Throws(Exception::class)
fun homePage() {
val webClient = WebClient(BrowserVersion.INTERNET_EXPLORER).apply {
ajaxController = NicelyResynchronizingAjaxController()
options.isUseInsecureSSL = true
options.isThrowExceptionOnScriptError = false
options.isJavaScriptEnabled = true
options.isCssEnabled = true
options.isRedirectEnabled = true
options.isThrowExceptionOnFailingStatusCode = false
options.isUseInsecureSSL = true
options.isDownloadImages = true
cookieManager.isCookiesEnabled = true
waitForBackgroundJavaScript(10000)
waitForBackgroundJavaScriptStartingBefore(10000)
}
val page = webClient.getPage<HtmlPage>("https://store.playstation.com/")
Thread.sleep(10000)
assertFalse(page.asXml().contains("Loading"))
}
我在加载页面时没有看到任何具体错误:
мая 09, 2017 4:08:22 PM com.gargoylesoftware.htmlunit.html.HtmlScript isExecutionNeeded
WARNING: Script is not JavaScript (type: application/json, language: ). Skipping execution.
мая 09, 2017 4:08:22 PM com.gargoylesoftware.htmlunit.NicelyResynchronizingAjaxController processSynchron
INFO: Re-synchronized call to https://sonynetworkentertainment.112.2o7.net/b/ss/snestorewebloadglobal/1/chidv1/s75296982536092?AQB=1&ndh=1&t=9%2F5%2F2017%2016%3A8%3A22%202%20-180&ts=1494335302&vid=c61f4752-adfd-84d1-728c-187350f9aa37&pageName=web%3Aloading_start&v1=D%3DpageName&g=https%3A%2F%2Fstore.playstation.com%2F&r=&v2=xx-xx&ch=web%3Aloading_start&c68=D%3Dg&c72=web&v72=web&cc=USD&ce=UTF-8&server=web&events=event1&AQE=1
мая 09, 2017 4:08:22 PM com.gargoylesoftware.htmlunit.NicelyResynchronizingAjaxController processSynchron
INFO: Re-synchronized call to https://store.playstation.com/kamaji/api/chihiro/00_09_000/geo
问题是:是什么阻止了 HtmlUnit 加载页面?我试图自己弄清楚,但我得到的唯一想法是它可能是对 HtmlUnit 不支持的无头浏览器或非常重的 JS 的某种防御。但是例如
可以轻松打开。
【问题讨论】:
标签: javascript java web-scraping kotlin htmlunit