【发布时间】:2018-01-01 13:35:48
【问题描述】:
我想从包含 javascript 代码的许多不同网站上抓取数据(这就是为什么我使用 selenium 方法来获取信息的原因)。 一切都很好,但是当我尝试加载下一个 URL 时,我收到一条很长的错误消息:
> Traceback (most recent call last):
File "C:/Python27/air17.py", line 46, in <module>
scrape(urls)
File "C:/Python27/air17.py", line 28, in scrape
browser.get(url)
File "C:\Python27\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 268, in get
self.execute(Command.GET, {'url': url})
File "C:\Python27\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 254, in execute
response = self.command_executor.execute(driver_command, params)
File "C:\Python27\lib\site-packages\selenium\webdriver\remote\remote_connection.py", line 464, in execute
return self._request(command_info[0], url, body=data)
File "C:\Python27\lib\site-packages\selenium\webdriver\remote\remote_connection.py", line 487, in _request
self._conn.request(method, parsed_url.path, body, headers)
File "C:\Python27\lib\httplib.py", line 1042, in request
self._send_request(method, url, body, headers)
File "C:\Python27\lib\httplib.py", line 1082, in _send_request
self.endheaders(body)
File "C:\Python27\lib\httplib.py", line 1038, in endheaders
self._send_output(message_body)
File "C:\Python27\lib\httplib.py", line 882, in _send_output
self.send(msg)
File "C:\Python27\lib\httplib.py", line 844, in send
self.connect()
File "C:\Python27\lib\httplib.py", line 821, in connect
self.timeout, self.source_address)
File "C:\Python27\lib\socket.py", line 575, in create_connection
raise err
error: [Errno 10061]
第一个网站的数据在 csv 文件中,但是当代码尝试打开下一个网站时,它会冻结,并且我收到此错误消息。 我做错了什么?
from bs4 import BeautifulSoup
from selenium import webdriver
import time
import urllib2
import unicodecsv as csv
import os
import sys
import io
import time
import datetime
import pandas as pd
from bs4 import BeautifulSoup
import MySQLdb
import re
import contextlib
import selenium.webdriver.support.ui as ui
filename=r'output.csv'
resultcsv=open(filename,"wb")
output=csv.writer(resultcsv, delimiter=';',quotechar = '"', quoting=csv.QUOTE_NONNUMERIC, encoding='latin-1')
output.writerow(['TIME','FLIGHT','FROM','AIRLANE','AIRCRAFT','IHAVETODELETETHIS','STATUS'])
def scrape(urls):
browser = webdriver.Firefox()
for url in urls:
browser.get(url)
html = browser.page_source
soup=BeautifulSoup(html,"html.parser")
table = soup.find('table', { "class" : "table table-condensed table-hover data-table m-n-t-15" })
datatable=[]
for record in table.find_all('tr', class_="hidden-xs hidden-sm ng-scope"):
temp_data = []
for data in record.find_all("td"):
temp_data.append(data.text.encode('latin-1'))
datatable.append(temp_data)
output.writerows(datatable)
resultcsv.close()
time.sleep(10)
browser.quit()
urls = ["https://www.flightradar24.com/data/airports/bud/arrivals", "https://www.flightradar24.com/data/airports/fco/arrivals"]
scrape(urls)
【问题讨论】:
-
这些也超出了循环(少一个标签):resultcsv.close() browser.quit()
-
这就是解决方案!谢谢,正在运行! :)