【问题标题】:Selenium throws SocketException RandomlySelenium 随机抛出 SocketException
【发布时间】:2016-12-07 14:25:02
【问题描述】:

我试图抓取大约 1400 页,但 selenium 在大约第 1000 页随机抛出套接字异常。我尝试过 Chrome、Firefox 和 PhantomJS,但都没有奏效。尽管我将 javascriptenabled 属性设置为 true,但 PhantomJS 甚至无法正确处理网站,但这是另一种情况。

这是日志:

Ara 07, 2016 4:09:09 PM org.apache.http.impl.execchain.RetryExec execute
INFO: I/O exception (java.net.SocketException) caught when processing request to {}->http://localhost:1384: Permission denied: connect
Ara 07, 2016 4:09:09 PM org.apache.http.impl.execchain.RetryExec execute
INFO: Retrying request to {}->http://localhost:1384
Exception in thread "main" org.openqa.selenium.WebDriverException: java.net.SocketException: Permission denied: connect
Build info: version: 'unknown', revision: '1969d75', time: '2016-10-18 09:43:45 -0700'
System info: host: 'DESKTOP-OA9G2Q7', ip: '192.168.1.7', os.name: 'Windows 10', os.arch: 'amd64', os.version: '10.0', java.version: '1.8.0_101'
Driver info: driver.version: RemoteWebDriver
    at org.openqa.selenium.remote.service.DriverCommandExecutor.execute(DriverCommandExecutor.java:91)
    at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:601)
    at org.openqa.selenium.remote.RemoteWebElement.execute(RemoteWebElement.java:274)
    at org.openqa.selenium.remote.RemoteWebElement.getAttribute(RemoteWebElement.java:126)
    at com.aliren.sp.scraping.Scraper.getRatioComparisonData(Scraper.java:227)
    at com.aliren.sp.scraping.Scraper.start(Scraper.java:136)
    at com.aliren.sp.scraping.Scraper.main(Scraper.java:104)
Caused by: java.net.SocketException: Permission denied: connect
    at java.net.DualStackPlainSocketImpl.connect0(Native Method)
    at java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketImpl.java:83)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
    at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:172)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:589)
    at org.apache.http.conn.socket.PlainConnectionSocketFactory.connectSocket(PlainConnectionSocketFactory.java:74)
    at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:141)
    at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:353)
    at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:380)
    at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)
    at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
    at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88)
    at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
    at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:71)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
    at org.openqa.selenium.remote.internal.ApacheHttpClient.fallBackExecute(ApacheHttpClient.java:142)
    at org.openqa.selenium.remote.internal.ApacheHttpClient.execute(ApacheHttpClient.java:88)
    at org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:160)
    at org.openqa.selenium.remote.service.DriverCommandExecutor.execute(DriverCommandExecutor.java:82)
    ... 6 more

【问题讨论】:

  • 这有点零星,我曾经遇到过类似的问题。我可能是一个像故障保险这样阻止重复请求的东西。在两个请求之间添加等待时间会有所帮助。另一方面,如果这有帮助,请尝试stackoverflow.com/a/7478027/5212566
  • 嗯,在 url 更改之间已经有一个Thread.sleep()。我不确定您发布的链接如何帮助我。你能解释一下吗?
  • 抱歉,我分享了错误的线程。我试图分享stackoverflow.com/a/27949629/5212566 这个用户有类似的问题,但来自不同的上下文。该线程谈论通过简单地捕获异常并重试来“绕过”它。在您的情况下,您可以尝试捕获套接字异常并重试,因为它是随机的。这不是一个修复,但猜测可能是一种解决方法。

标签: java selenium selenium-webdriver


【解决方案1】:

使用-Djava.net.preferIPv4Stack=true JVM 系统属性来帮助启用对 IPv4 的支持。它解决了我的问题,因为我的网络支持 IPV6。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2013-09-28
    • 1970-01-01
    • 2013-12-24
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多