从转义的 html -> 到常规的 html？ - Python答案

【问题标题】：From escaped html -> to regular html? - Python从转义的 html -> 到常规的 html？ - Python
【发布时间】：2011-01-29 07:39:20
【问题描述】：

我使用 BeautifulSoup 处理通过 REST API 收集的 XML 文件。

响应包含 HTML 代码，但 BeautifulSoup 会转义所有 HTML 标记，因此可以很好地显示。

很遗憾，我需要 HTML 代码。

我将如何继续将转义的 HTML 转换为正确的标记？

非常感谢您的帮助！

【问题讨论】：

【解决方案1】：

我想你想要来自 Python 标准库的xml.sax.saxutils.unescape。

例如：

>>> from xml.sax import saxutils as su
>>> s = '&lt;foo&gt;bar&lt;/foo&gt;'
>>> su.unescape(s)
'<foo>bar</foo>'

【讨论】：

【解决方案2】：

你可以试试urllib 模块吗？

它有一个方法unquote() 可能适合您的需要。

编辑：重新考虑，（以及更多阅读您的问题）您可能只想使用string.replace()

像这样：

string.replace('&lt;','<')
string.replace('&gt;','>')

【讨论】：

当 saxutils.unescape 为您完成所有替换步骤时，您为什么还要费心编写不同的替换步骤（针对 lt、gt、amp）？-) 另外，请记住：替换调用不会更改字符串，它会构建一个新字符串。给定的代码 sn-p 是一个缓慢的无操作！-)