【发布时间】:2015-10-27 16:02:12
【问题描述】:
我有一个应用程序可以修改 xml 文件的内容(通过美丽的汤),然后将其写入磁盘。很简单,在我的开发机器(Linux)上,我有这个工作代码:
首先,让我们将文件加载到汤中:
# load document
document = open(contentxml, encoding="utf-8")
# load into soup
soup = BeautifulSoup(document, "lxml")
# do soupy stuff here
with open(document.name, "w") as f:
# soup is the beautiful soup data
f.write(soup.decode("utf-8"))
现在一切正常,花花公子,现在当我在 FreeBSD 生产系统上运行完全相同的代码时,我得到了错误:
UnicodeEncodeError: 'ascii' codec can't encode character '\xa3' in position 8253: ordinal not in range(128)
所以在这种情况下,我想我会尝试对文件进行编码,然后将其写入磁盘:
with open(document.name, "w") as f:
# soup is the beautiful soup data
# srting the output as you cannot write bytes
soup_enc = str(soup.encode('utf8'))
f.write(soup_enc)
现在这可以正常工作,但是这会将错误的 xml 写入输出文件,因为它会输出到
b'<myxmlcontent>'
这反过来又使最终文件无用,解决此问题的最佳方法是什么?
注意:
一些网上阅读建议不要打开带有指定编码的原始文档,例如做:
# load document
document = open(contentxml)
# load into soup
soup = BeautifulSoup(document, "lxml")
# do soupy stuff here
with open(document.name, "w") as f:
# soup is the beautiful soup data
f.write(str(soup))
这在 Linux 上运行良好,但在 FreeBSD 上执行以下的初始 open(..) 时会引发错误:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 7551: ordinal not in range(128)
【问题讨论】: