【发布时间】:2021-05-14 03:13:37
【问题描述】:
我刚开始学习网络抓取,30 分钟后我在从 wiki 抓取表格时遇到了问题。
import requests
from bs4 import BeautifulSoup
import pandas as pd
start_url = 'https://en.wikipedia.org/wiki/The_Avengers_(2012_film)#Sequels'
downloaded_html = requests.get(start_url)
soup = BeautifulSoup(downloaded_html.text)
with open('downloaded.html', 'w', encoding="utf-8") as file:
file.write(soup.prettify())
full_table = soup.select('table.wikitable tbody')[0]
table_head = full_table.select('tr th')
tabele_column = []
for element in table_head:
colume_label = element.get_text(separator=" ", strip=True)
colume_label = colume_label.replace(" ", "_")
tabele_column.append(colume_label)
table_row = full_table.select('tr')
table_data = []
for index, element in enumerate(table_row):
if index > 0:
row_list = []
values = element.select('td')
for value in values:
row_list.append(value.text.strip())
table_data.append(row_list)
# print(table_data)
df = pd.DataFrame(table_data, columns=colume_label)
print(df)
我收到以下错误
ValueError: 9 列传递,传递的数据有 3 列
【问题讨论】:
标签: python pandas web-scraping