Selenium 在获取当前 URL 时抛出错误答案

【问题标题】：Selenium throws an error getting the current URLSelenium 在获取当前 URL 时抛出错误
【发布时间】：2017-03-24 19:33:04
【问题描述】：

为了从网页中抓取数据，我检查了当前的 URL 以确保我在预期的页面上。但是，它最终会引发错误，并且似乎是在检查 URL 时。我无法弄清楚为什么，以及何时发生不一致。有时在脚本中有几页，有时只有几页。

Traceback (most recent call last):
  File "scrape.py", line 5, in <module>
    scraper.start_search("ebook")
  File "/home/ubuntu/workspace/scraper/school/scraper.py", line 56, in start_search
    self.scrape_item(product_el)
  File "/home/ubuntu/workspace/scraper/school/scraper.py", line 97, in scrape_item
    if self.driver.current_url.split("/")[3] != "search":
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 493, in current_url
    return self.execute(Command.GET_CURRENT_URL)['value']
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 236, in execute
    response = self.command_executor.execute(driver_command, params)
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/remote_connection.py", line 415, in execute
    return self._request(command_info[0], url, body=data)
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/remote_connection.py", line 489, in _request
    resp = opener.open(request, timeout=self._timeout)
  File "/usr/lib/python2.7/urllib2.py", line 404, in open
    response = self._open(req, data)
  File "/usr/lib/python2.7/urllib2.py", line 422, in _open
    '_open', req)
  File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 1214, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/lib/python2.7/urllib2.py", line 1184, in do_open
    raise URLError(err)
urllib2.URLError: <urlopen error [Errno 111] Connection refused>

看似相关的代码只是：

if self.driver.current_url.split("/")[3] != "search":
            time.sleep(random.randint(1, 3))
            self.driver.back()

我正在使用 Python 2.7、Selenium 和 PhantomJS。

【问题讨论】：

您是否在Stack Overflow 上使用关键字111 Connection refused selenium python 尝试过其他解决方案？共有 8 个结果，其中一半已接受答案。
@TemporalWolf 是的，这些解决方案都没有帮助我
您是否验证过该网址实际上是您认为的那样？如果您将该确切的网址复制并粘贴到浏览器中，它可以工作吗？ URL 是否需要身份验证？

标签： python selenium phantomjs

【解决方案1】：

我不知道为什么会这样，虽然我也看到current_url 是片状的。您是否尝试过通过一些异常处理来缓解这种情况？

from retry import retry
from urllib2 import URLError


@retry(URLError, tries=3)
def get_url(driver):
    return driver.current_url


def main():
    # Whatever setup you have goes here
    # <...>

    if get_url(driver).split("/")[3] != "search":
        time.sleep(random.randint(1, 3))
        driver.back()


if __name__ == "__main__":
    main()

The retry package is available from PyPI

【讨论】：