尽管字符集正确，BeautifulSoup 仍无法读取文件答案

【问题标题】：BeautifulSoup cannot read file despite correct charset尽管字符集正确，BeautifulSoup 仍无法读取文件
【发布时间】：2021-11-12 11:34:25
【问题描述】：

我正在尝试使用 utf-8 打开带有 BeautifulSoup 的 utf-8 元标记的文件，但出现解析错误：

from bs4 import BeautifulSoup
soup = BeautifulSoup(open(filename), "html.parser", from_encoding="utf-8")

文件头：

<!DOCTYPE html>
<html lang="en">
 <head>
  <title>
   Logs
  </title>
  <meta charset="utf-8"/>
  <meta content="width=device-width, initial-scale=1" name="viewport"/>

错误：

$ python3.6 dom.py Traceback（最近一次调用最后一次）：文件“dom.py”，第 56 行，在汤 = BeautifulSoup(open(filename), "html.parser", from_encoding="utf-8") 文件 “/usr/local/lib/python3.6/site-packages/bs4/init.py”，第 309 行，在 初始化 markup = markup.read() File "/usr/local/lib/python3.6/encodings/ascii.py", line 26, in decode return codecs.ascii_decode(input, self.errors)[0] UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 152902：序数不在范围内（128）

我应该如何进行调试？谢谢

【问题讨论】：

试试：soup = BeautifulSoup(open(filename, "r", encoding="utf-8").read(), "html.parser")

标签： python beautifulsoup encoding

【解决方案1】：

您没有正确打开文件。

from bs4 import BeautifulSoup

with open(filename, "r", encoding="utf-8") as f:
    soup = BeautifulSoup(f, "html.parser", from_encoding="utf-8")

或

from bs4 import BeautifulSoup

f = open(filename, "r", encoding="utf-8").read()
soup = BeautifulSoup(f, "html.parser", from_encoding="utf-8")
f.close()

【讨论】：