【发布时间】:2021-08-18 14:53:41
【问题描述】:
我抓取了 html 表格数据,它们显示错误“无法设置列不匹配的行”
import requests
from bs4 import BeautifulSoup
import pandas as pd
headers ={
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.3'}
r =requests.get('https://jleague.co/clubs/sapporo/player/')
soup=BeautifulSoup(r.content, 'lxml')
table=soup.find('table',class_='commonTable playerData')
headers=[]
for i in table.find_all('th'):
title=i.text.strip()
headers.append(table)
df=pd.DataFrame(columns=headers)
for row in table.find_all('tr')[1:]:
data=row.find_all('td')
row_data=[td.text.strip() for td in data]
length=len(df)
df.loc[length]=row_data
【问题讨论】:
-
我会得到csv格式的输出
标签: python html web-scraping beautifulsoup