【发布时间】:2015-06-15 13:47:03
【问题描述】:
我在 Windows 上的 Python3/BeautifulSoup 中处理包含转义 unicode 字符(中文范围内)的 HTML 时遇到问题。 BeautifulSoup 似乎运行正常,直到我尝试打印提取的标签或写入文件。我将默认编码设置为 utf-8,但似乎选择了 cp1252 编解码器...
复制:
soup = BeautifulSoup("隱")
f = open("out.html", "w")
f.write(soup.text)
f.close()
附加堆栈跟踪。
Traceback (most recent call last):
File "scrape.py", line 143, in <module>
test_uni()
File "scrape.py", line 126, in test_uni
f.write(soup.text)
File "c:\venv\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u96b1' in position 0: character maps to <undefined>
【问题讨论】:
标签: python windows python-3.x unicode beautifulsoup