【发布时间】:2015-11-06 14:19:54
【问题描述】:
我使用 Selenium(使用 selenium-server-standalone-2.47.1.jar)和 phantomJs(phantomjs -v 在 Ubuntu 14.04 上返回 1.9.0)创建了一个网络爬虫。该代码在 Windows 10 上与 FirefoxDriver 和 PhantomJSDriver 都可以正常工作,但仅适用于 Ubuntu 14.04 下的 FirefoxDriver。
示例代码如下:
public static void main(String[] args) {
DesiredCapabilities DesireCaps = new DesiredCapabilities();
DesireCaps.setCapability("phantomjs.binary.path", "/usr/lib/phantomjs/phantomjs");
WebDriver driver=new PhantomJSDriver(DesireCaps);
String Url = "https://xxx";
driver.get(Url);
WebElement rootWebElement = driver.findElement(By.id("main"));
List<WebElement> parentElements = rootWebElement.findElements(By.tagName("li"));
//243 , 240 (previous)
for (int i = 106; i < parentElements.size(); i++) {
WebElement href =parentElements.get(i).findElement(By.tagName("z"));
if(href!=null){
Scanner scanner = new Scanner(href.getAttribute("href"));
try {
scanner.parseXML(href.getAttribute("href"));
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
}
如果你打开提供的 URL 的源......你可以很容易地看到带有 id="main" 的标签存在......
堆栈跟踪:
PhantomJS 正在启动 GhostDriver...
[INFO - 2015-08-13T14:15:57.720Z] GhostDriver - 主要 - 在端口 8677 上运行
[INFO - 2015-08-13T14:15:58.361Z] 会话 [d17a3cc0-41c5-11e5-bedb-6fa39763a2c0] - 构造器 - 所需功能:{"phantomjs.binary.path":"/usr/lib/phantomjs/phantomjs "}
[INFO - 2015-08-13T14:15:58.370Z] 会话 [d17a3cc0-41c5-11e5-bedb-6fa39763a2c0] - 构造器 - 协商功能:{“browserName”:“phantomjs”,“version”:“1.9.0” ,"driverName":"ghostdriver","driverVersion":"1.0.3","platform":"linux-unknown-32bit","javascriptEnabled":true,"takesScreenshot":true,"handlesAlerts":false," databaseEnabled":false,"locationContextEnabled":false,"applicationCacheEnabled":false,"browserConnectionEnabled":false,"cssSelectorsEnabled":true,"webStorageEnabled":false,"rotatable":false,"acceptSslCerts":false,"nativeEvents" :true,"proxy":{"proxyType":"direct"}}
[INFO - 2015-08-13T14:15:58.371Z] SessionManagerReqHand - _postNewSessionCommand - 创建新会话:d17a3cc0-41c5-11e5-bedb-6fa39763a2c0
线程“main”org.openqa.selenium.NoSuchElementException 中的异常:错误消息 => '无法找到 id 为'main'的元素'
由 Request => {"headers":{"Accept-Encoding":"gzip,deflate","Connection":"Keep-Alive","Content-Length":"29","Content-Type":" 引起应用程序/json; charset=utf-8","Host":"localhost:8677","User-Agent":"Apache-HttpClient/4.4.1 (Java/1.7.0_79)"},"httpVersion":" 1.1","方法":"POST","post":"{\"using\":\"id\",\"value\":\"main\"}","url":"/element ","urlParsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/element","relative": "/element","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol" :"","source":"/element","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/d17a3cc0-41c5-11e5-bedb-6fa39763a2c0/element "}
命令持续时间或超时:281 毫秒
有关此错误的文档,请访问:http://seleniumhq.org/exceptions/no_such_element.html
构建信息:版本:'2.47.1',修订:'411b314',时间:'2015-07-30 03:03:16'
系统信息:主机:'Vmbox',ip:'127.0.1.1',os.name:'Linux',os.arch:'i386',os.version:'3.19.0-25-generic',java.version :'1.7.0_79'
*** 元素信息:{Using=id, value=main}
会话 ID:d17a3cc0-41c5-11e5-bedb-6fa39763a2c0
驱动信息:org.openqa.selenium.phantomjs.PhantomJSDriver
功能 [{platform=LINUX,acceptSslCerts=false,javascriptEnabled=true,browserName=phantomjs,rotatable=false,driverVersion=1.0.3,locationContextEnabled=false,version=1.9.0,cssSelectorsEnabled=true,databaseEnabled=false,handlesAlerts=false , browserConnectionEnabled=false, proxy={proxyType=direct}, nativeEvents=true, webStorageEnabled=false, driverName=ghostdriver, applicationCacheEnabled=false, takeScreenshot=true}]
在 sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
在 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
在 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
在 java.lang.reflect.Constructor.newInstance(Constructor.java:526)
在 org.openqa.selenium.remote.ErrorHandler.createThrowable(ErrorHandler.java:206)
在 org.openqa.selenium.remote.ErrorHandler.throwIfResponseFailed(ErrorHandler.java:158)
在 org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:595)
在 org.openqa.selenium.remote.RemoteWebDriver.findElement(RemoteWebDriver.java:348)
在 org.openqa.selenium.remote.RemoteWebDriver.findElementById(RemoteWebDriver.java:389)
在 org.openqa.selenium.By$ById.findElement(By.java:215)
在 org.openqa.selenium.remote.RemoteWebDriver.findElement(RemoteWebDriver.java:340)
在 LinkScanner.main(LinkScanner.java:27)
引起:org.openqa.selenium.remote.ScreenshotException:屏幕截图已被拍摄
构建信息:版本:'2.47.1',修订:'411b314',时间:'2015-07-30 03:03:16'
系统信息:主机:'Vmbox',ip:'127.0.1.1',os.name:'Linux',os.arch:'i386',os.version:'3.19.0-25-generic',java.version :'1.7.0_79'
驱动程序信息:driver.version:RemoteWebDriver
在 org.openqa.selenium.remote.ErrorHandler.throwIfResponseFailed(ErrorHandler.java:138)
... 6 更多
原因:org.openqa.selenium.NoSuchElementException:错误消息 => '无法找到 id 为'main'的元素'
由 Request => {"headers":{"Accept-Encoding":"gzip,deflate","Connection":"Keep-Alive","Content-Length":"29","Content-Type":" 引起应用程序/json; charset=utf-8","Host":"localhost:8677","User-Agent":"Apache-HttpClient/4.4.1 (Java/1.7.0_79)"},"httpVersion":" 1.1","方法":"POST","post":"{\"using\":\"id\",\"value\":\"main\"}","url":"/element ","urlParsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/element","relative": "/element","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol" :"","source":"/element","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/d17a3cc0-41c5-11e5-bedb-6fa39763a2c0/element "}
有关此错误的文档,请访问:http://seleniumhq.org/exceptions/no_such_element.html
构建信息:版本:'2.47.1',修订:'411b314',时间:'2015-07-30 03:03:16'
系统信息:主机:'Vmbox',ip:'127.0.1.1',os.name:'Linux',os.arch:'i386',os.version:'3.19.0-25-generic',java.version :'1.7.0_79'
驱动程序信息:驱动程序版本:未知
【问题讨论】:
标签: java ubuntu selenium phantomjs nosuchelementexception