【发布时间】:2021-03-27 03:34:39
【问题描述】:
作为序言,我对 python 的经验非常少。我正在尝试为我最喜欢的 NFL 球队新英格兰爱国者队收集足球数据。我要抓取的链接是https://www.pro-football-reference.com/teams/nwe/2020.htm,我关心日程表和游戏结果表。我可以从我的代码中获取我想要的数据,但是我的格式都是错误的。
任何帮助将不胜感激。
import requests
import lxml.html as lh
import pandas as pd
import argparse
import re
import os
from bs4 import BeautifulSoup
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
twenty_twenty = []
link = "https://www.pro-football-reference.com/teams/nwe/2020.htm"
r = session.get(link)
soup = BeautifulSoup(r.text,'html.parser')
table_all = soup.find_all('div',{"class":"overthrow table_container"})
tbody = table_all[1].table.tbody
trs = tbody.find_all('tr')
week_dict = {}
for tr in trs:
stat = str(tr.find('th')) #['data-stat'])
val = str(tr.find('th').getText())
week_dict.update({stat:val})
tds = tr.find_all('td')
for td in tds:
stat = str((td)['data-stat'])
val = str((td).getText())
if stat == 'team_record':
record = (val.split('-'))
wins = record[0]
losses = record[-1]
week_dict.update({'wins_to_date':wins,'losses_to_date':losses})
if stat == 'game_location':
if val == '@':
week_dict.update({'home':0})
else:
week_dict.update({'home':1})
if stat == 'overtime':
if val == 'OT':
week_dict.update({'OT':1})
else:
week_dict.update({'OT':0})
week_dict.update({stat:val})
twenty_twenty.append(week_dict)
print("Patriots" + " " + "Year 2020" + " " + "stats added.")
df2020 = pd.DataFrame(twenty_twenty)
df2020.head(16)
【问题讨论】:
-
您可能应该尝试将其缩减为minimal reproducible example。你到底在哪个部分遇到了麻烦?