BeautifulSoup 的 site.content 和 site.read() 有什么区别？答案

【问题标题】：What is the difference between BeautifulSoup's site.content and site.read()?BeautifulSoup 的 site.content 和 site.read() 有什么区别？
【发布时间】：2020-03-05 11:21:17
【问题描述】：

当我使用存储在笔记本电脑上的本地 html 文件时，

from bs4 import BeautifulSoup
site = open('smpl.htm', 'r')
page = BeautifulSoup(site.content, 'html.parser')
print(page)

返回（在 cmd 中）：

Traceback (most recent call last):
File "c:/~~~~~~/python/h.py", line 3, in <module>
page = BeautifulSoup(site.content, 'html.parser')
AttributeError: '_io.TextIOWrapper' object has no attribute 'content'

但通过将site.content 替换为site.read()，代码会显示正确的HTML 并对其执行操作而不会出现任何问题。

但是，如果我通过 requests 从 Web 获取我的 HTML 文件，那么我将不得不编写 site.content 而不是 site.read() 来解析它。

content和read()有什么区别，哪个适合什么？

【问题讨论】：

标签： python beautifulsoup html-parsing

【解决方案1】：

在笔记本电脑上打开一个 html 文件会返回一个 TextIOWrapper，它有一个 read() 方法来获取文件的内容。

打开网页使用具有不同方法的不同类 - 您引用的类看起来返回某种形式的带有内容字符串参数的 HttpResponse 对象。

【讨论】：