使用 BeautifulSoup 从本地保存的 html 文件中提取原始 html答案

【问题标题】：Extracting raw html from locally saved html file using BeautifulSoup使用 BeautifulSoup 从本地保存的 html 文件中提取原始 html
【发布时间】：2017-03-12 11:00:10
【问题描述】：

BeautifulSoup 相对较新。试图从本地保存的 html 文件中获取原始 html。我环顾四周，发现我可能应该为此使用 Beautiful Soup。虽然当我这样做时：

from bs4 import BeautifulSoup
url = r"C:\example.html"
soup = BeautifulSoup(url, "html.parser")
text = soup.get_text()
print (text)

打印出一个空字符串。我想我错过了一些步骤。任何朝着正确方向轻推将不胜感激。

【问题讨论】：

标签： python html parsing beautifulsoup extract

【解决方案1】：

BeautifulSoup 的第一个参数是一个实际的 HTML 字符串，而不是 URL。打开文件，读取其内容，然后传入。

【讨论】：

【解决方案2】：

谈到上一个答案，有两种方法可以打开 HTML 文件：

with open("example.html") as fp:
    soup = BeautifulSoup(fp)

soup = BeautifulSoup(open("example.html"))

【讨论】：