【发布时间】:2015-01-27 07:46:51
【问题描述】:
我正在从数据库中提取数据并尝试从这些数据中创建一个 XML 文件。数据采用 UTF-8 格式,可以包含 á、š 或 č 等字符。这是代码:
import xml.etree.cElementTree as ET
tree = ET.parse(metadata_file)
# ..some commands that alter the XML..
tree.write(metadata_file, encoding="UTF-8")
写入数据时,脚本失败:
Traceback (most recent call last):
File "get-data.py", line 306, in <module>
main()
File "get-data.py", line 303, in main
tree.write(metadata_file, encoding="UTF-8")
File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 820, in write
serialize(write, self._root, encoding, qnames, namespaces)
File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 939, in _serialize_xml
_serialize_xml(write, e, encoding, qnames, None)
File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 939, in _serialize_xml
_serialize_xml(write, e, encoding, qnames, None)
File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 937, in _serialize_xml
write(_escape_cdata(text, encoding))
File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1073, in _escape_cdata
return text.encode(encoding, "xmlcharrefreplace")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 32: ordinal not in range(128)
防止这种情况的唯一方法是解码写入 XML 文件的数据:
text = text.decode('utf-8')
但是结果文件将包含例如&#269; 而不是 č。知道如何将数据写入文件并将其保存为 UTF-8 吗?
编辑:
这是脚本所做的示例:
]$ echo "<data></data>" > test.xml
]$ cat test.xml
<data></data>
]$ python
Python 2.7.5 (default, Nov 3 2014, 14:33:39)
[GCC 4.8.3 20140911 (Red Hat 4.8.3-7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import xml.etree.cElementTree as ET
>>> tree = ET.parse('./test.xml')
>>> root = tree.getroot()
>>> new = ET.Element("elem")
>>> new.text = "á, š, or č"
>>> root.append(new)
>>> tree.write('./text.xml', encoding="UTF-8")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 820, in write
serialize(write, self._root, encoding, qnames, namespaces)
File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 939, in _serialize_xml
_serialize_xml(write, e, encoding, qnames, None)
File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 937, in _serialize_xml
write(_escape_cdata(text, encoding))
File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1073, in _escape_cdata
return text.encode(encoding, "xmlcharrefreplace")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
【问题讨论】:
标签: python xml utf-8 elementtree