【问题标题】:UnicodeDecodeError when parsing XML on mac but works on PC在 Mac 上解析 XML 但在 PC 上工作时出现 UnicodeDecodeError
【发布时间】:2017-09-25 14:16:20
【问题描述】:

在解析XML 文件时:

from lxml import etree

with open('cortex_full.xml', 'r') as infile:
    root = etree.parse(infile)

我收到下面的UnicodeDecodeError。不过,这只发生在我的 Mac 上 - 如果我在工作 PC 上使用相同的脚本解析相同的文件,一切正常。

File "/Users/Desktop/CPET/xml_test2.py", line 5, in <module>
    root = etree.parse(infile)
  File "src/lxml/lxml.etree.pyx", line 3442, in lxml.etree.parse (src/lxml/lxml.etree.c:81701)
  File "src/lxml/parser.pxi", line 1832, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:118888)
  File "src/lxml/parser.pxi", line 1852, in lxml.etree._parseFilelikeDocument (src/lxml/lxml.etree.c:119171)
  File "src/lxml/parser.pxi", line 1747, in lxml.etree._parseDocFromFilelike (src/lxml/lxml.etree.c:117959)
  File "src/lxml/parser.pxi", line 1162, in lxml.etree._BaseParser._parseDocFromFilelike (src/lxml/lxml.etree.c:112686)
  File "src/lxml/parser.pxi", line 595, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:105881)
  File "src/lxml/parser.pxi", line 702, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:107548)
  File "src/lxml/lxml.etree.pyx", line 324, in lxml.etree._ExceptionContext._raise_if_stored (src/lxml/lxml.etree.c:12152)
  File "src/lxml/parser.pxi", line 373, in lxml.etree._FileReaderContext.copyToBuffer (src/lxml/lxml.etree.c:103210)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 783: ordinal not in range(128)

考虑到这里的线程数量,这似乎是很常见的情况,但是建议的修复似乎都不适用于这种情况。让它工作的任何想法?完整的XML 文件here

【问题讨论】:

  • 您是否在两个系统上使用相同版本的 Python?
  • 是的,Python3 和 lxml 都是最新版本。
  • with open('cortex_full.xml', 'r', encoding='utf-8') as infile:
  • 有趣的是,我昨天尝试过,但没有成功,但当我刚刚尝试时,它可以工作。我一定是第一次输入了不同的内容。

标签: python xml ascii lxml parsexml


【解决方案1】:

发布对我有用的答案以供将来参考。 感谢@Burhan Khalid 提供答案。

打开xml文件时需要设置编码为utf-8

with open('cortex_full.xml', 'r', encoding='utf-8') as infile:

【讨论】:

    猜你喜欢
    • 2017-03-15
    • 1970-01-01
    • 1970-01-01
    • 2013-04-02
    • 1970-01-01
    • 1970-01-01
    • 2016-02-25
    • 1970-01-01
    • 2012-07-26
    相关资源
    最近更新 更多