【问题标题】:Python unable to refresh the execution of scriptPython无法刷新脚本的执行
【发布时间】:2020-05-01 08:09:20
【问题描述】:
from urllib.request import urlopen
from selenium import webdriver
from bs4 import BeautifulSoup as BSoup
import requests
import pandas as pd
from requests_html import HTMLSession
import time
import xlsxwriter
import re
import os

urlpage = 'https://racing.hkjc.com/racing/information/English/Racing/LocalResults.aspx?RaceDate=2019/07/14&Racecourse=ST&RaceNo=1'

# Setup selenium 
driver = webdriver.Firefox(executable_path = 'geckodriver path')
# get web page
driver.get(urlpage)
time.sleep(10)

bs_obj = BSoup(driver.page_source, 'html.parser')

# Scrape table content
table = bs_obj.find('table', {"f_tac table_bd draggable"})
rows = table.find_all('tr')
table_content = []

for row in rows[1:]:
    cell_row = []
    for cell in row.find_all('td'):
        cell_row.append(cell.text.replace(" ", "").replace("\n\n", " ").replace("\n", ""))
    table_content.append(cell_row)

header_content = []
for cell in rows[0].find_all('td'):
    header_content.append(cell.text)

driver.close()

race_writer = pd.ExcelWriter('export path', engine='xlsxwriter')

df = pd.DataFrame(table_content, columns=header_content)
df.to_excel(race_writer, sheet_name='game1')

大家好,我正在尝试从马会那里获取比赛结果。当我执行上面的代码时,发生了以下任一错误:

  1. 没有创建 excel 文件
  2. Df 未写入 excel 文件
  3. 如果我成功抓取了游戏 1 的结果,然后我修改脚本以继续抓取游戏 2 的结果,但它仍然给我游戏 1 的结果。

如果有人可以提供帮助,不胜感激。

【问题讨论】:

    标签: python-3.x pandas selenium web-scraping beautifulsoup


    【解决方案1】:

    我将您的脚本更改为以下脚本。遵循的方法是点击每个相关“沙田”按钮(请参阅range(1, len(shatin)-1))并收集比赛表数据。比赛表被添加到名为“比赛”的列表中。最后,将每个比赛表写入 Excel 中的单独工作表(请注意,您不再需要 BeautifulSoup)。

    将这些添加到您的导入列表中:

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.common.by import By
    

    那么:

    urlpage = 'https://racing.hkjc.com/racing/information/English/Racing/LocalResults.aspx?RaceDate=2019/07/14&Racecourse=ST&RaceNo=1'
    
    # Setup selenium 
    driver = webdriver.Firefox(executable_path = 'geckodriver path')
    # get web page
    driver.get(urlpage)
    
    WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH,"//table[@class='f_fs12 f_fr js_racecard']")))
    shatin=driver.find_elements_by_xpath("//table[@class='f_fs12 f_fr js_racecard']/tbody/tr/td")
    
    races=[]
    for i in range(1, len(shatin)-1):
        shatin = driver.find_elements_by_xpath("//table[@class='f_fs12 f_fr js_racecard']/tbody/tr/td")
        #time.sleep(3)
        #WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//div[@class='performance']")))
        shatin[i].click()
        table = pd.read_html(driver.find_element_by_xpath("//div[@class='performance']").get_attribute('outerHTML'))[0]
        races.append(table)
    
    with pd.ExcelWriter('races.xlsx') as writer:
        for i,race in enumerate(races):
            race.to_excel(writer, sheet_name=f'game{i+1}', index=False)
        writer.save()
    
    driver.quit()
    

    输出:

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2020-07-29
      • 2018-06-27
      • 2012-10-26
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多