【问题标题】:get the new html after .click() with selenium使用 selenium 在 .click() 之后获取新的 html
【发布时间】:2019-08-20 23:35:43
【问题描述】:

我正在使用 selenium 单击链接,但无法获取新表。我用什么代码来检索新页面?

    df_list = []
    url = 'https://www.cartolafcbrasil.com.br/scouts/cartola-fc-2018/rodada-1' #+ str(i)
    page = requests.get(url)
    soup = BeautifulSoup(page.text, 'html.parser')
    table = soup.find_all('table')[0]
    df = pd.read_html(str(table), encoding="UTF-8")

    driver = webdriver.PhantomJS(executable_path = 'C:\\Python27\\phantomjs-2.1.1-windows\\bin\\phantomjs')
    driver.get('https://www.cartolafcbrasil.com.br/scouts/cartola-fc-2018/rodada-1') 
    driver.find_element_by_xpath("/html[1]/body[1]/form[1]/div[1]/div[2]/div[3]/div[1]/div[1]/div[2]/div[1]/div[1]/div[2]/div[1]/div[2]/div[1]/table[1]/tbody[1]/tr[52]/td[1]/table[1]/tbody[1]/tr[1]/td[2]/a[1]").click()



    ?????
    table = soup.find_all('table')[0]
    df = pd.read_html(str(table), encoding="UTF-8")

【问题讨论】:

  • 哇,我相信有比driver.find_element_by_xpath("/html[1]/body[1]/form[1]/div[1]/div[2]/div[3]/div[1]/div[1]/div[2]/div[1]/div[1]/div[2]/div[1]/div[2]/div[1]/table[1]/tbody[1]/tr[52]/td[1]/table[1]/tbody[1]/tr[1]/td[2]/a[1]").click()....更好的方法来获取您想要的信息。
  • 我可以变得更好,但我想先拿到新桌子

标签: python selenium beautifulsoup python-requests


【解决方案1】:

如果我理解您的问题,那就是“如何从我的 driver 对象中获取已加载的新页面的 HMTL”。答案是driver.page_source

driver.find_element_by_xpath("Some crazy shenanigans of an xpath").click()
html_from_page = driver.page_source
soup = bs4.BeautifulSoup(html_from_page, 'html.parser')
# more stuff

【讨论】:

  • 谢谢。 BeautifulSoup(html_from_page.text, 'html.parser') 是错误
【解决方案2】:

欢迎来到 SO。这是脚本将遍历所有表(页面)并获取数据的另一种方法。

df_list = []
url = 'https://www.cartolafcbrasil.com.br/scouts/cartola-fc-2018/rodada-1' #+ str(i)
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')
table = soup.find_all('table')[0]
df = pd.read_html(str(table), encoding="UTF-8")

driver = webdriver.PhantomJS(executable_path = 'C:\\Python27\\phantomjs-2.1.1-windows\\bin\\phantomjs')
driver.get('https://www.cartolafcbrasil.com.br/scouts/cartola-fc-2018/rodada-1')
# get the number of pages and iterate each of them
numberOfPage = driver.find_element_by_xpath("(//tr[@class='tbpaging']//a)[last()]").text
for i in range(2,int(numberOfPage)):
    # click on each page link and then get the details
    driver.find_element_by_xpath("(//tr[@class='tbpaging']//a)[" + i +"]").click()
    soup = BeautifulSoup(driver.page_source, 'html.parser')
    table = soup.find_all('table')[0]
    df = pd.read_html(str(table), encoding="UTF-8")

【讨论】:

    猜你喜欢
    • 2023-03-06
    • 1970-01-01
    • 2023-02-07
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-03-31
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多