【发布时间】:2017-06-20 16:05:34
【问题描述】:
我正在尝试使用 RSelenium 抓取此网站 link。我已经成功地抓取了页面上的大部分内容,但试图打通“设施访问”和“设施投诉”。由于这两个按钮在我使用开发人员工具检查它们时都有一个 javascript href,因此我一直在使用 phantomjs 和 RSelenium。
我可以通过 phantom 成功导航到该页面,但是每当我尝试使用 $getElementText 从字段中提取文本时,都会抛出以下错误:
Selenium message:{"errorMessage":"Element does not exist in cache","request":{"headers":{"Accept":"application/json, text/xml, application/xml, */*","Accept-Encoding":"gzip, deflate","Host":"localhost:4444","User-Agent":"libcurl/7.53.1 r-curl/2.6 httr/1.2.1"},"httpVersion":"1.1","method":"GET","url":"/attribute/id","urlParsed":{"anchor":"","query":"","file":"id","directory":"/attribute/","path":"/attribute/id","relative":"/attribute/id","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/attribute/id","queryKey":{},"chunks":["attribute","id"]},"urlOriginal":"/session/c0f30500-55d0-11e7-96dd-3b147ee40d88/element/:wdc:1497974074536/attribute/id"}}
Show Traceback
Error: Summary: StaleElementReference Detail: An element command failed because the referenced element is no longer attached to the DOM. class: org.openqa.selenium.StaleElementReferenceException Further Details: run errorDetails method
当我使用 $currentURL 和 $screenship(display = T) 时,它会显示正确的网站和正确的链接。
我知道这与元素如何附加到 DOM 有关,但我不确定如何解决 R 中的问题
代码如下:
url <- "https://dhs.arkansas.gov/dccece/cclas/FacilityInformation.aspx?FacilityNumber=23516"
rd<-remoteDriver(browserName = 'phantomjs')
rd$open()
rd$navigate(url)
webElem<- rd$findElement(using="xpath", value = '//*[@id="ctl00_ContentPlaceHolder1_lbtnVisits"]')
webElem$clickElement()
webElem$findElements('css',"#aspnetForm > div.page > div.main")
webElem$getElementAttribute("id")
【问题讨论】:
标签: r selenium web-scraping