Python如何在pre标签下查找数据答案

【问题标题】：Python how to find data under a pre tagPython如何在pre标签下查找数据
【发布时间】：2018-07-09 08:05:52
【问题描述】：

我想使用 Python 从 html 页面中获取 pre 标签下的一些数据。

我尝试先使用 Selenium，但无法通过 xpath 找到元素。

browser = webdriver.Ie()
wait = WebDriverWait(browser, 5)
browser.get('file:\\\my_url.html')
body= wait.until(EC.presence_of_element_located((By.XPATH, "/html/body/pre[2]")))
print(body.text)

我尝试使用 bs4。但是，BeautifulSoup 一直告诉我我的浏览器不支持 Frames 扩展。我对 bs4 不熟悉，找不到任何有用的解决方案。谁能告诉我如何修改IE浏览器的设置才能成功读取数据？谢谢！

import urllib.request
from bs4 import BeautifulSoup
from urllib.request import urlopen
import html2text

url = " " #this html page is on a network drive and can be opened by IE\Chrome\...
html = urlopen(url).read()
soup = BeautifulSoup(html, "html.parser")

for script in soup(["script", "style"]):
    script.extract()    # rip it out

text = soup.get_text()
lines = (line.strip() for line in text.splitlines())
chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
text = '\n'.join(chunk for chunk in chunks if chunk)

print(text)

>>>This page is designed to be viewed by a browser which supports Frames extension. 
This text will be shown by browsers which do not support the Frames extension.

【问题讨论】：

您尝试从哪些页面获取数据。至少提供页面的html结构。
一个框架集下存储了很多段数据。
请不要图片。 SO是关于代码的，你的文档在这里没用。

标签： python html selenium beautifulsoup frames

【解决方案1】：

您的pre 元素位于名为“glhstry_main”的<frame> 中，因此您需要先切换到它，然后才能访问您的元素。这里：

browser = webdriver.Ie()
wait = WebDriverWait(browser, 5)
browser.get('file:\\\my_url.html')
browser.switch_to_frame("glhstry_main")  // switching to the frame
body= wait.until(EC.presence_of_element_located((By.XPATH, "/html/body/pre[2]")))
print(body.text)
//do your frame stuff
driver.switch_to.default_content()     // switching back to original HTML from the frame

【讨论】：