【发布时间】:2016-12-07 14:25:02
【问题描述】:
我试图抓取大约 1400 页,但 selenium 在大约第 1000 页随机抛出套接字异常。我尝试过 Chrome、Firefox 和 PhantomJS,但都没有奏效。尽管我将 javascriptenabled 属性设置为 true,但 PhantomJS 甚至无法正确处理网站,但这是另一种情况。
这是日志:
Ara 07, 2016 4:09:09 PM org.apache.http.impl.execchain.RetryExec execute
INFO: I/O exception (java.net.SocketException) caught when processing request to {}->http://localhost:1384: Permission denied: connect
Ara 07, 2016 4:09:09 PM org.apache.http.impl.execchain.RetryExec execute
INFO: Retrying request to {}->http://localhost:1384
Exception in thread "main" org.openqa.selenium.WebDriverException: java.net.SocketException: Permission denied: connect
Build info: version: 'unknown', revision: '1969d75', time: '2016-10-18 09:43:45 -0700'
System info: host: 'DESKTOP-OA9G2Q7', ip: '192.168.1.7', os.name: 'Windows 10', os.arch: 'amd64', os.version: '10.0', java.version: '1.8.0_101'
Driver info: driver.version: RemoteWebDriver
at org.openqa.selenium.remote.service.DriverCommandExecutor.execute(DriverCommandExecutor.java:91)
at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:601)
at org.openqa.selenium.remote.RemoteWebElement.execute(RemoteWebElement.java:274)
at org.openqa.selenium.remote.RemoteWebElement.getAttribute(RemoteWebElement.java:126)
at com.aliren.sp.scraping.Scraper.getRatioComparisonData(Scraper.java:227)
at com.aliren.sp.scraping.Scraper.start(Scraper.java:136)
at com.aliren.sp.scraping.Scraper.main(Scraper.java:104)
Caused by: java.net.SocketException: Permission denied: connect
at java.net.DualStackPlainSocketImpl.connect0(Native Method)
at java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketImpl.java:83)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:172)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.http.conn.socket.PlainConnectionSocketFactory.connectSocket(PlainConnectionSocketFactory.java:74)
at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:141)
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:353)
at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:380)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88)
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:71)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
at org.openqa.selenium.remote.internal.ApacheHttpClient.fallBackExecute(ApacheHttpClient.java:142)
at org.openqa.selenium.remote.internal.ApacheHttpClient.execute(ApacheHttpClient.java:88)
at org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:160)
at org.openqa.selenium.remote.service.DriverCommandExecutor.execute(DriverCommandExecutor.java:82)
... 6 more
【问题讨论】:
-
这有点零星,我曾经遇到过类似的问题。我可能是一个像故障保险这样阻止重复请求的东西。在两个请求之间添加等待时间会有所帮助。另一方面,如果这有帮助,请尝试stackoverflow.com/a/7478027/5212566
-
嗯,在 url 更改之间已经有一个
Thread.sleep()。我不确定您发布的链接如何帮助我。你能解释一下吗? -
抱歉,我分享了错误的线程。我试图分享stackoverflow.com/a/27949629/5212566 这个用户有类似的问题,但来自不同的上下文。该线程谈论通过简单地捕获异常并重试来“绕过”它。在您的情况下,您可以尝试捕获套接字异常并重试,因为它是随机的。这不是一个修复,但猜测可能是一种解决方法。
标签: java selenium selenium-webdriver