【发布时间】:2020-02-23 09:06:56
【问题描述】:
我正在尝试抓取跨越多个页面的表格并导出到 csv 文件。似乎只有一行数据被导出并且混乱。
我在网上查看并尝试了许多迭代,现在非常沮丧。从代码中可以看出,我是编码新手!
import bs4 as bs
import urllib.request
import pandas as pd
import csv
max_page_num = 14
max_page_dig = 1 # number of digits in the page number
with open('result.csv',"w") as f:
f.write("Name, Gender, State, Position, Grad, Club/HS, Rating, Commitment \n")
for i in range(0, max_page_num):
page_num = (max_page_dig - len(str(i))) * "0" +str(i) #gives a string in the format of 1, 01 or 001, 005 etc
print(page_num)
source = "https://www.topdrawersoccer.com/search/?query=&divisionId=&genderId=m&graduationYear=2020&positionId=0&playerRating=&stateId=All&pageNo=" + page_num + "&area=commitments"
print(source)
url = urllib.request.urlopen(source).read()
soup = bs.BeautifulSoup(url,'lxml')
table = soup.find('table')
table_rows = table.find_all('tr')
for tr in table_rows:
td = tr.find_all('td')
row = [i.text for i in td]
#final = row.strip("\n")
#final = row.replace("\n","")
with open('result.csv', 'a') as f:
f.write(row)
似乎当我写入 csv 时它会覆盖以前的。它也将其粘贴在一行上,并将玩家姓名与学校名称连接起来。感谢您的所有帮助。
【问题讨论】:
标签: python-3.x web-scraping export-to-csv multipage