【问题标题】:Python 3 encoding/decoding problems between FreeBSD/Linux BeautifulSoupFreeBSD/Linux BeautifulSoup 之间的 Python 3 编码/解码问题
【发布时间】:2015-10-27 16:02:12
【问题描述】:

我有一个应用程序可以修改 xml 文件的内容(通过美丽的汤),然后将其写入磁盘。很简单,在我的开发机器(Linux)上,我有这个工作代码:

首先,让我们将文件加载到汤中:

# load document
document = open(contentxml, encoding="utf-8")
# load into soup
soup = BeautifulSoup(document, "lxml")
# do soupy stuff here
with open(document.name, "w") as f:
    # soup is the beautiful soup data
    f.write(soup.decode("utf-8"))

现在一切正常,花花公子,现在当我在 FreeBSD 生产系统上运行完全相同的代码时,我得到了错误:

UnicodeEncodeError: 'ascii' codec can't encode character '\xa3' in position 8253: ordinal not in range(128)

所以在这种情况下,我想我会尝试对文件进行编码,然后将其写入磁盘:

with open(document.name, "w") as f:
    # soup is the beautiful soup data
    # srting the output as you cannot write bytes
    soup_enc = str(soup.encode('utf8'))
    f.write(soup_enc)

现在这可以正常工作,但是这会将错误的 xml 写入输出文件,因为它会输出到

b'<myxmlcontent>'

这反过来又使最终文件无用,解决此问题的最佳方法是什么?

注意:

一些网上阅读建议不要打开带有指定编码的原始文档,例如做:

# load document
document = open(contentxml)
# load into soup
soup = BeautifulSoup(document, "lxml")
# do soupy stuff here
with open(document.name, "w") as f:
    # soup is the beautiful soup data
    f.write(str(soup))

这在 Linux 上运行良好,但在 FreeBSD 上执行以下的初始 open(..) 时会引发错误:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 7551: ordinal not in range(128)

【问题讨论】:

    标签: python-3.x beautifulsoup


    【解决方案1】:

    为了直接写入二进制文件,我需要用正确的方法打开它,然后写入编码的字节串:

    with open(document.name, 'wb') as f:
        f.write(soup.encode('utf8'))
    

    【讨论】:

      猜你喜欢
      • 2011-07-02
      • 1970-01-01
      • 1970-01-01
      • 2015-03-15
      • 2015-01-29
      • 2015-04-28
      • 2011-11-05
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多