【发布时间】:2020-05-01 08:09:20
【问题描述】:
from urllib.request import urlopen
from selenium import webdriver
from bs4 import BeautifulSoup as BSoup
import requests
import pandas as pd
from requests_html import HTMLSession
import time
import xlsxwriter
import re
import os
urlpage = 'https://racing.hkjc.com/racing/information/English/Racing/LocalResults.aspx?RaceDate=2019/07/14&Racecourse=ST&RaceNo=1'
# Setup selenium
driver = webdriver.Firefox(executable_path = 'geckodriver path')
# get web page
driver.get(urlpage)
time.sleep(10)
bs_obj = BSoup(driver.page_source, 'html.parser')
# Scrape table content
table = bs_obj.find('table', {"f_tac table_bd draggable"})
rows = table.find_all('tr')
table_content = []
for row in rows[1:]:
cell_row = []
for cell in row.find_all('td'):
cell_row.append(cell.text.replace(" ", "").replace("\n\n", " ").replace("\n", ""))
table_content.append(cell_row)
header_content = []
for cell in rows[0].find_all('td'):
header_content.append(cell.text)
driver.close()
race_writer = pd.ExcelWriter('export path', engine='xlsxwriter')
df = pd.DataFrame(table_content, columns=header_content)
df.to_excel(race_writer, sheet_name='game1')
大家好,我正在尝试从马会那里获取比赛结果。当我执行上面的代码时,发生了以下任一错误:
- 没有创建 excel 文件
- Df 未写入 excel 文件
- 如果我成功抓取了游戏 1 的结果,然后我修改脚本以继续抓取游戏 2 的结果,但它仍然给我游戏 1 的结果。
如果有人可以提供帮助,不胜感激。
【问题讨论】:
标签: python-3.x pandas selenium web-scraping beautifulsoup