【发布时间】:2022-01-14 02:39:50
【问题描述】:
这段代码不会崩溃,这很好。但是,它会生成并清空 icao_publications.csv f。我想用来自 URL 的所有页面上的所有记录填充 icao_publications.csv 并捕获所有页面。数据集应该是大约 10,000 行或总共大约 10,000 行。 我想在 csv 文件中获取这 10,000 左右的行。
import requests, csv
from bs4 import BeautifulSoup
url = 'https://www.icao.int/publications/DOC8643/Pages/Search.aspx'
with open('Test1_Aircraft_Type_Designators.csv', "w", encoding="utf-8") as f:
writer = csv.writer(f)
writer.writerow(["Manufacturers", "Model", "Type_Designator", "Description", "Engine_Type", "Engine_Count", "WTC"])
while True:
html = requests.get(url)
soup = BeautifulSoup(html.text, 'html.parser')
for row in soup.select('table tbody tr'):
writer.writerow([c.text if c.text else '' for c in row.select('td')])
if soup.select_one('li.paginate_button.active + li a'):
url = soup.select_one('li.paginate_button.active + li a')['href']
else:
break
【问题讨论】:
标签: pandas csv web-scraping beautifulsoup pagination