最近对爬虫有点着迷,

在用bs4模块时,遇到报错:UnicodeDecodeError: 'gbk' codec can't decode byte 0xae in position 9: illegal multibyte sequence

bs4获取本地文件内容  

from bs4 import BeautifulSoup
soup = BeautifulSoup(open('a.html'), 'html.parser')
print(soup.prettify()) # 打印本地文件的内容
其中,a.html的内容为:
<div>大家好</div>
<p>你好啊</p>

运行报错

UnicodeDecodeError: 'gbk' codec can't decode byte 0xae in position 9: illegal multibyte sequence

上面是字符流的问题

from bs4 import BeautifulSoup
soup = BeautifulSoup(open('a.html', 'rb'), 'html.parser')
print(soup.prettify()) # 打印本地文件的内容

运行结果:

UnicodeDecodeError: 'gbk' codec can't decode byte 0xae in position 9: illegal multibyte sequence

 

相关文章:

  • 2021-10-26
  • 2021-12-27
  • 2021-11-19
  • 2021-06-07
  • 2021-12-15
  • 2021-07-22
  • 2022-01-16
  • 2021-06-14
猜你喜欢
  • 2021-04-19
  • 2022-12-23
  • 2022-12-23
  • 2022-12-23
  • 2022-01-14
  • 2022-12-23
  • 2021-08-15
相关资源
相似解决方案