【问题标题】:Selenium throws an error getting the current URLSelenium 在获取当前 URL 时抛出错误
【发布时间】:2017-03-24 19:33:04
【问题描述】:

为了从网页中抓取数据,我检查了当前的 URL 以确保我在预期的页面上。但是,它最终会引发错误,并且似乎是在检查 URL 时。我无法弄清楚为什么,以及何时发生不一致。有时在脚本中有几页,有时只有几页。

Traceback (most recent call last):
  File "scrape.py", line 5, in <module>
    scraper.start_search("ebook")
  File "/home/ubuntu/workspace/scraper/school/scraper.py", line 56, in start_search
    self.scrape_item(product_el)
  File "/home/ubuntu/workspace/scraper/school/scraper.py", line 97, in scrape_item
    if self.driver.current_url.split("/")[3] != "search":
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 493, in current_url
    return self.execute(Command.GET_CURRENT_URL)['value']
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 236, in execute
    response = self.command_executor.execute(driver_command, params)
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/remote_connection.py", line 415, in execute
    return self._request(command_info[0], url, body=data)
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/remote_connection.py", line 489, in _request
    resp = opener.open(request, timeout=self._timeout)
  File "/usr/lib/python2.7/urllib2.py", line 404, in open
    response = self._open(req, data)
  File "/usr/lib/python2.7/urllib2.py", line 422, in _open
    '_open', req)
  File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 1214, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/lib/python2.7/urllib2.py", line 1184, in do_open
    raise URLError(err)
urllib2.URLError: <urlopen error [Errno 111] Connection refused>

看似相关的代码只是:

if self.driver.current_url.split("/")[3] != "search":
            time.sleep(random.randint(1, 3))
            self.driver.back()

我正在使用 Python 2.7、Selenium 和 PhantomJS。

【问题讨论】:

  • 您是否在Stack Overflow 上使用关键字111 Connection refused selenium python 尝试过其他解决方案?共有 8 个结果,其中一半已接受答案。
  • @TemporalWolf 是的,这些解决方案都没有帮助我
  • 您是否验证过该网址实际上是您认为的那样?如果您将该确切的网址复制并粘贴到浏览器中,它可以工作吗? URL 是否需要身份验证?

标签: python selenium phantomjs


【解决方案1】:

我不知道为什么会这样,虽然我也看到current_url 是片状的。您是否尝试过通过一些异常处理来缓解这种情况?

from retry import retry
from urllib2 import URLError


@retry(URLError, tries=3)
def get_url(driver):
    return driver.current_url


def main():
    # Whatever setup you have goes here
    # <...>

    if get_url(driver).split("/")[3] != "search":
        time.sleep(random.randint(1, 3))
        driver.back()


if __name__ == "__main__":
    main()

The retry package is available from PyPI

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2013-04-05
    • 2014-11-09
    • 1970-01-01
    • 2012-07-17
    • 2015-04-10
    • 1970-01-01
    • 2012-11-18
    相关资源
    最近更新 更多