【发布时间】:2017-03-24 19:33:04
【问题描述】:
为了从网页中抓取数据,我检查了当前的 URL 以确保我在预期的页面上。但是,它最终会引发错误,并且似乎是在检查 URL 时。我无法弄清楚为什么,以及何时发生不一致。有时在脚本中有几页,有时只有几页。
Traceback (most recent call last):
File "scrape.py", line 5, in <module>
scraper.start_search("ebook")
File "/home/ubuntu/workspace/scraper/school/scraper.py", line 56, in start_search
self.scrape_item(product_el)
File "/home/ubuntu/workspace/scraper/school/scraper.py", line 97, in scrape_item
if self.driver.current_url.split("/")[3] != "search":
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 493, in current_url
return self.execute(Command.GET_CURRENT_URL)['value']
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 236, in execute
response = self.command_executor.execute(driver_command, params)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/remote_connection.py", line 415, in execute
return self._request(command_info[0], url, body=data)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/remote_connection.py", line 489, in _request
resp = opener.open(request, timeout=self._timeout)
File "/usr/lib/python2.7/urllib2.py", line 404, in open
response = self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 422, in _open
'_open', req)
File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 1214, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib/python2.7/urllib2.py", line 1184, in do_open
raise URLError(err)
urllib2.URLError: <urlopen error [Errno 111] Connection refused>
看似相关的代码只是:
if self.driver.current_url.split("/")[3] != "search":
time.sleep(random.randint(1, 3))
self.driver.back()
我正在使用 Python 2.7、Selenium 和 PhantomJS。
【问题讨论】:
-
您是否在Stack Overflow 上使用关键字
111 Connection refused selenium python尝试过其他解决方案?共有 8 个结果,其中一半已接受答案。 -
@TemporalWolf 是的,这些解决方案都没有帮助我
-
您是否验证过该网址实际上是您认为的那样?如果您将该确切的网址复制并粘贴到浏览器中,它可以工作吗? URL 是否需要身份验证?