【发布时间】:2021-11-12 11:34:25
【问题描述】:
我正在尝试使用 utf-8 打开带有 BeautifulSoup 的 utf-8 元标记的文件,但出现解析错误:
from bs4 import BeautifulSoup
soup = BeautifulSoup(open(filename), "html.parser", from_encoding="utf-8")
文件头:
<!DOCTYPE html>
<html lang="en">
<head>
<title>
Logs
</title>
<meta charset="utf-8"/>
<meta content="width=device-width, initial-scale=1" name="viewport"/>
错误:
$ python3.6 dom.py Traceback(最近一次调用最后一次):文件“dom.py”, 第 56 行,在 汤 = BeautifulSoup(open(filename), "html.parser", from_encoding="utf-8") 文件 “/usr/local/lib/python3.6/site-packages/bs4/init.py”,第 309 行,在 初始化 markup = markup.read() File "/usr/local/lib/python3.6/encodings/ascii.py", line 26, in decode return codecs.ascii_decode(input, self.errors)[0] UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 152902:序数不在范围内(128)
我应该如何进行调试? 谢谢
【问题讨论】:
-
试试:
soup = BeautifulSoup(open(filename, "r", encoding="utf-8").read(), "html.parser")
标签: python beautifulsoup encoding