如何使用 ElementTree 将未转义的字符串写入 XML 元素？答案

【问题标题】：How to write unescaped string to a XML element with ElementTree?如何使用 ElementTree 将未转义的字符串写入 XML 元素？
【发布时间】：2018-12-09 11:40:46
【问题描述】：

我有一个字符串变量contents，其值如下：

<ph type="0" x="1"></ph>

我尝试将其写入 XML 元素，如下所示：

elemen_ref.text = contents

在我将 XML 树写入文件并使用 Notepad++ 检查后，我看到以下值写入 XML 元素：

&lt;ph type="0" x="1"&gt;&lt;/ph&gt;

如何编写未转义的字符串？请注意，这个值是从另一个 XML 元素复制而来的，该元素在将树写入文件后保持不变，因此问题在于将值分配给 text 属性。

【问题讨论】：

标签： python xml-parsing elementtree

【解决方案1】：

您正在尝试这样做：

import xml.etree.ElementTree as ET

root = ET.Element('root')
content_str = '<ph type="0" x="1"></ph>'
root.text = content_str

print(ET.tostring(root))
#  <root>&lt;ph type="0" x="1"&gt;&lt;/ph&gt;</root>

这实质上是将 XML“注入”到元素的文本属性中。这不是正确的做法。

相反，您应该将 content 字符串转换为可以附加到现有 XML 节点的实际 XML 节点。

import xml.etree.ElementTree as ET

root = ET.Element('root')
content_str = '<ph type="0" x="1"></ph>'
content_element = ET.fromstring(content_str)
root.append(content_element)

print(ET.tostring(root))
#  <root><ph type="0" x="1" /></root>

如果你坚持，可以使用unescape：

import xml.etree.ElementTree as ET
from xml.sax.saxutils import unescape

root = ET.Element('root')
content_str = '<ph type="0" x="1"></ph>'
root.text = content_str

print(unescape(ET.tostring(root).decode()))
#  <root><ph type="0" x="1"></ph></root>

【讨论】：

转换 content 看起来是一个有效的选项，但它不适用于其他格式化文本字符串，例如：My test<pha type="0" /> 和 16<bpt i="1" type="25" />th<ept i="1" /> November 2018
我也不确定是否要取消转义整个 XML 的第二个选项，因为我正在更改 ~300 MB XML 文档中的单个节点，因此我担心取消转义整个内容会产生不必要的修改。