【问题标题】:How to write a selenium loop in Python?如何在 Python 中编写硒循环?
【发布时间】:2018-01-01 13:35:48
【问题描述】:

我想从包含 javascript 代码的许多不同网站上抓取数据(这就是为什么我使用 selenium 方法来获取信息的原因)。 一切都很好,但是当我尝试加载下一个 URL 时,我收到一条很长的错误消息:

> Traceback (most recent call last):
  File "C:/Python27/air17.py", line 46, in <module>
    scrape(urls)
  File "C:/Python27/air17.py", line 28, in scrape
    browser.get(url)
  File "C:\Python27\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 268, in get
    self.execute(Command.GET, {'url': url})
  File "C:\Python27\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 254, in execute
    response = self.command_executor.execute(driver_command, params)
  File "C:\Python27\lib\site-packages\selenium\webdriver\remote\remote_connection.py", line 464, in execute
    return self._request(command_info[0], url, body=data)
  File "C:\Python27\lib\site-packages\selenium\webdriver\remote\remote_connection.py", line 487, in _request
    self._conn.request(method, parsed_url.path, body, headers)
  File "C:\Python27\lib\httplib.py", line 1042, in request
    self._send_request(method, url, body, headers)
  File "C:\Python27\lib\httplib.py", line 1082, in _send_request
    self.endheaders(body)
  File "C:\Python27\lib\httplib.py", line 1038, in endheaders
    self._send_output(message_body)
  File "C:\Python27\lib\httplib.py", line 882, in _send_output
    self.send(msg)
  File "C:\Python27\lib\httplib.py", line 844, in send
    self.connect()
  File "C:\Python27\lib\httplib.py", line 821, in connect
    self.timeout, self.source_address)
  File "C:\Python27\lib\socket.py", line 575, in create_connection
    raise err
error: [Errno 10061] 

第一个网站的数据在 csv 文件中,但是当代码尝试打开下一个网站时,它会冻结,并且我收到此错误消息。 我做错了什么?

from bs4 import BeautifulSoup
from selenium import webdriver
import time
import urllib2
import unicodecsv as csv
import os
import sys
import io
import time
import datetime
import pandas as pd
from bs4 import BeautifulSoup
import MySQLdb
import re
import contextlib
import selenium.webdriver.support.ui as ui

filename=r'output.csv'

resultcsv=open(filename,"wb")
output=csv.writer(resultcsv, delimiter=';',quotechar = '"', quoting=csv.QUOTE_NONNUMERIC, encoding='latin-1')
output.writerow(['TIME','FLIGHT','FROM','AIRLANE','AIRCRAFT','IHAVETODELETETHIS','STATUS'])


def scrape(urls):
    browser = webdriver.Firefox()
    for url in urls:
        browser.get(url)
        html = browser.page_source
        soup=BeautifulSoup(html,"html.parser")
        table = soup.find('table', { "class" : "table table-condensed table-hover data-table m-n-t-15" })
        datatable=[]
        for record in table.find_all('tr', class_="hidden-xs hidden-sm ng-scope"):
            temp_data = []
            for data in record.find_all("td"):
                temp_data.append(data.text.encode('latin-1'))
            datatable.append(temp_data)

        output.writerows(datatable)

        resultcsv.close()
        time.sleep(10) 
        browser.quit()

urls = ["https://www.flightradar24.com/data/airports/bud/arrivals", "https://www.flightradar24.com/data/airports/fco/arrivals"]
scrape(urls)

【问题讨论】:

  • 这些也超出了循环(少一个标签):resultcsv.close() browser.quit()
  • 这就是解决方案!谢谢,正在运行! :)

标签: python loops csv selenium


【解决方案1】:

不确定方法末尾的browser.quit() 是否是个好主意。根据Selenium doc

退出()

退出驱动程序并关闭所有关联的窗口。

我认为browser.close()(as documented here) 在循环中就足够了。将browser.quit() 保持在循环之外。

【讨论】:

  • 我认为即使 browser.close() 在循环中也不需要
  • 确实,退出正在杀死网络驱动程序
  • @CrazyElf 关闭当前页面更干净,会释放内存。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2020-08-14
  • 2014-03-08
  • 2016-07-19
  • 2015-06-26
相关资源
最近更新 更多