【问题标题】:No data table with Python3 SeleniumPython3 Selenium 没有数据表
【发布时间】:2023-01-20 06:33:44
【问题描述】:

我需要改进此脚本以从此站点提取每日数据。但是,除了“Spot”列之外,我没有得到任何数据! 谢谢您的帮助!

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
import pandas as pd


chrome_options = Options()
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--headless")
options.add_argument("start-maximized")
webdriver_service = Service("chromedriver/chromedriver") ## path to where you saved chromedriver binary
browser = webdriver.Chrome(service=webdriver_service, options=chrome_options)

browser.get("https://www.eex.com/en/market-data/natural-gas/spot")
soup = BeautifulSoup(browser.page_source, 'html5lib')
table = soup.select('table')[1]

browser.quit()
final_list = []
for row in table.select('tr'):
   final_list.append([x.text for x in row.find_all(['td', 'th'])])
final_df = pd.DataFrame(final_list[1:], columns = final_list[:1])
final_df[:-2]

final_df.to_excel('final_df.xlsx', index = False)

【问题讨论】:

    标签: python python-3.x selenium selenium-webdriver selenium-chromedriver


    【解决方案1】:

    稍作调整,以便可以提取所有列。主要思想是提取逻辑需要检查 HTML dom 的情况。

    from selenium import webdriver
    from selenium.webdriver.chrome.service import Service
    from selenium.webdriver.chrome.options import Options
    from bs4 import BeautifulSoup as bs
    import pandas as pd
    
    def get_df(page_source):
        soup = bs(page_source, 'html.parser')
        table = soup.select('table')[1]
        table_header=table.find("tr", {"class": "mv-quote-header-row"})
        table_body=table.select('tbody')
        result={}
    
    
        for e_header in table_header.find_all('th'):
            if e_header.text:
                result[e_header.text]=[]
        for e_r in table_body[0].find_all('tr'):
            r1=[e.text for e in e_r.find_all('td',{'class':not ['mv-quote-button']})]
            result['Spot'].append(r1[0])
            result['Last Price'].append(r1[1])
            result['Last Volume'].append(r1[2])
            result['End of Day Index'].append(r1[3])
            result['Volume Exchange'].append(r1[4])
        #result
        df=pd.DataFrame(result)
        return df
    
    
    chrome_options = Options()
    chrome_options.add_argument("--no-sandbox")
    #chrome_options.add_argument("--headless")
    chrome_options.add_argument("start-maximized")
    webdriver_service = Service("chromedriver/chromedriver") ## path to where you saved chromedriver binary
    #webdriver_service = Service()
    browser = webdriver.Chrome(service=webdriver_service, options=chrome_options)
    
    browser.get("https://www.eex.com/en/market-data/natural-gas/spot")
    #soup = BeautifulSoup(browser.page_source, 'html5lib')
    
    page_source=browser.page_source
    
    #table = soup.select('table')[1]
    final_df=get_df(browser.page_source)
    browser.quit()
    final_df.to_excel('final_df.xlsx', index = False)
    

    【讨论】:

    • 谢谢,但我有麻烦MaxRetryError: HTTPConnectionPool(host='localhost', port=51125)
    • 哪一行抛出该错误?
    • 我的错误,当我复制代码时 :(,browser.quit() 在错误的地方。已修复,现在应该可以工作了。
    猜你喜欢
    • 2018-10-19
    • 2018-11-04
    • 2019-02-16
    • 2019-07-29
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2020-07-13
    • 2013-07-07
    相关资源
    最近更新 更多