【发布时间】:2022-01-05 20:30:44
【问题描述】:
我想抓取https://www.airport-data.com/manuf/Reims.html 并遍历所有内容并将结果提取到AircraftListing.csv
代码运行无误,但结果填充错误,并非所有记录都从网页提取到 .csv 文件
如何将所有 Reims 航空记录导出到 AircraftListing.csv?
import requests
from bs4 import BeautifulSoup
import csv
root_url = "https://www.airport-data.com/manuf/Reims.html"
html = requests.get(root_url)
soup = BeautifulSoup(html.text, 'html.parser')
paging = soup.find("table",{"class":"table table-bordered table-condensed"}).find_all("td")
start_page = paging[1].text
last_page = paging[len(paging)-2].text
outfile = open('AircraftListing.csv','w', newline='')
writer = csv.writer(outfile)
writer.writerow(["Tail_Number", "Year_Maker_Model", "C_N","Engines", "Seats", "Location"])
pages = list(range(1,int(last_page)+1))
for page in pages:
url = 'https://www.airport-data.com/manuf/Reims:%s.html' %(page)
html = requests.get(url)
soup = BeautifulSoup(html.text, 'html.parser')
print ('https://www.airport-data.com/manuf/Reims:%s' %(page))
product_name_list = soup.find("table",{"class":"table table-bordered table-condensed"}).find_all("td")
# Each row has 6 elements in it.
# Loop through every sixth element. (The first element of each row)
# Get all the other elements in the row by adding to index of the first.
for i in range(int(len(product_name_list)/6)):
Tail_Number = product_name_list[(i*6)].get_text('td')
Year_Maker_Model = product_name_list[(i*6)+1].get_text('td')
C_N = product_name_list[(i*6)+2].get_text('td')
Engines = product_name_list[(i*6)+3].get_text('td')
Seats = product_name_list[(i*6)+4].get_text('td')
Location = product_name_list[(i*6)+5].get_text('td')
writer.writerow([Tail_Number, Year_Maker_Model, C_N, Engines, Seats, Location])
outfile.close()
print ('Done')
【问题讨论】:
标签: python csv web-scraping beautifulsoup pagination