【发布时间】:2021-03-08 22:53:59
【问题描述】:
我有这个代码来抓取oddsportal页面:
https://www.oddsportal.com/soccer/england/premier-league/
browser = webdriver.Chrome()
browser.get("https://www.oddsportal.com/soccer/england/premier-league/")
df= pd.read_html(browser.page_source, header=0)[0]
timeList = []
dateList = []
gameList = []
home_odds = []
draw_odds = []
away_odds = []
for row in df.itertuples():
if not isinstance(row[1], str):
continue
elif ':' not in row[1]:
date = row[1].split('-')[0]
continue
time = timeList.append(row[1])
dateList.append(date)
gameList.append(row[2])
home_odds.append(row[4])
draw_odds.append(row[5])
away_odds.append(row[6])
result = pd.DataFrame({'date':dateList,
'time':time,
'game':gameList,
'Home':home_odds,
'Draw':draw_odds,
'Away':away_odds})
我得到的输出是:
date time game Home Draw Away
-- ------------- ------ ----------------------------- ------ ------ ------
0 Today, 08 Mar Chelsea - Everton 1.62 3.93 6.07
1 Today, 08 Mar West Ham - Leeds 2.25 3.61 3.18
2 10 Mar 2021 Manchester City - Southampton 1.22 6.94 13.75
3 12 Mar 2021 Newcastle - Aston Villa 3.8 3.59 2
4 13 Mar 2021 Leeds - Chelsea 4.45 3.97 1.77
5 13 Mar 2021 Crystal Palace - West Brom 2.1 3.34 3.77
6 13 Mar 2021 Everton - Burnley 1.84 3.61 4.54
7 13 Mar 2021 Fulham - Manchester City 10.05 5.16 1.34
8 14 Mar 2021 Southampton - Brighton 2.8 3.11 2.77
9 14 Mar 2021 Leicester - Sheffield Utd 1.5 4.34 7.06
10 14 Mar 2021 Arsenal - Tottenham 2.48 3.47 2.87
11 14 Mar 2021 Manchester Utd - West Ham 1.86 3.62 4.44
12 15 Mar 2021 Wolves - Liverpool 4.65 3.66 1.8
13 19 Mar 2021 Fulham - Leeds 2.55 3.53 2.72
14 20 Mar 2021 Brighton - Newcastle 1.76 3.39 5.58
15 21 Mar 2021 West Ham - Arsenal 2.86 3.51 2.44
16 21 Mar 2021 Aston Villa - Tottenham 3.24 3.4 2.27
time 没有任何价值
如果我遗漏了什么,谁能帮助我理解?我是否正确定义了time?
【问题讨论】:
标签: python web-scraping selenium-chromedriver