【问题标题】:Errorno 61 When Web Scraping Using Selenium Python使用 Selenium Python 进行 Web 抓取时出现错误 61
【发布时间】:2018-03-01 01:43:00
【问题描述】:

因此,当我从正在创建的类中运行以下代码以抓取 Craigslist.org 时,我不断收到 socket.error 61。我尝试了各种版本的 Chromedriver 和 PhantomJS,但似乎无法让它消失。起初我以为是我的 IP 被标记了,所以我轮流通过代理,但这并没有帮助。我确信这很简单,但我似乎无法弄清楚它是什么。任何帮助将非常感激!

 def __init__(self):

    self.user_agent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.50 Safari/537.36'
    self.options = webdriver.ChromeOptions()
    self.options.add_argument('headless')
    self.options.add_argument('--proxy-server=http://12.221.240.25:8080')
    self.options.add_argument('user-agent={self.user_agent}')
    self.current_region = ''
    self.driver = webdriver.Chrome()
    self.driver.get('https://craigslist.org')
    self.proxy_list = ['208.95.62.81:3128', '208.95.62.80:3128', '159.203.181.50:3128', '35.196.26.166:3128']

 def scrape_test(self):
    self.scraper_wait(self.driver, '//*[@id="rightbar"]')
    rightbar = self.driver.find_element_by_xpath('//*[@id="rightbar"]')
    nearby_cl = rightbar.find_element_by_xpath('//*[@id="rightbar"]/ul/li[1]')
    while True:
        child_items = nearby_cl.find_elements_by_class_name('s')
        random = randint(1, len(child_items))
        try:
            time.sleep(10)
            print("Clicking {}".format(child_items[random].text))
            child_items[random].click()
            housing = self.driver.find_element_by_xpath('//*[@id="hhh"]/h4/a')
            housing.click()
            self.driver.back()
            time.sleep(5)
        except WebDriverException:
            continue
        except Exception as e:
            print(e.message)
            return
        finally:
            self.driver.quit()

堆栈跟踪也如下:

    File "scraper.py", line 131, in <module>
    cl.scrape_test()
    File "scraper.py", line 81, in scrape_test
    child_items = nearby_cl.find_elements_by_class_name('s')
    File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/selenium/webdriver/remote/webelement.py", line 299, in find_elements_by_class_name
return self.find_elements(by=By.CLASS_NAME, value=name)
    File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/selenium/webdriver/remote/webelement.py", line 527, in find_elements
{"using": by, "value": value})['value']
    File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/selenium/webdriver/remote/webelement.py", line 493, in _execute
return self._parent.execute(command, params)
    File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 250, in execute
response = self.command_executor.execute(driver_command, params)
    File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 464, in execute
return self._request(command_info[0], url, body=data)
    File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 487, in _request
self._conn.request(method, parsed_url.path, body, headers)
    File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1057, in request
self._send_request(method, url, body, headers)
    File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1097, in _send_request
self.endheaders(body)
    File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1053, in endheaders
self._send_output(message_body)
    File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 897, in _send_output
self.send(msg)
    File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 859, in send
self.connect()
    File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 836, in connect
self.timeout, self.source_address)
    File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 575, in create_connection
raise err
  socket.error: [Errno 61] Connection refused

【问题讨论】:

    标签: python selenium selenium-chromedriver


    【解决方案1】:

    您在第一次结束时通过while 循环,在您完成使用之前拆除驱动程序。

    相反,将对driver.quit() 的调用移至您确定已使用驱动程序的某个位置,例如:

    def scrape_test(self):
        try:
            # ...
            while True:
                # ...
        finally:
            self.driver.quit()
    

    【讨论】:

    • 哦……我的……字。太感谢了!这太明显了,快把我逼疯了!这就是为什么拥有另一双眼睛是值得的。太感谢了!回家后我会试试的!
    猜你喜欢
    • 1970-01-01
    • 2022-10-18
    • 1970-01-01
    • 2022-12-18
    • 1970-01-01
    • 2018-03-01
    • 2021-10-18
    • 2019-01-09
    • 1970-01-01
    相关资源
    最近更新 更多