【发布时间】:2019-09-03 03:25:32
【问题描述】:
假设我们有一个像这样的html 文件:
test.html
<div>
<i>Some text here.</i>
Some text here also.<br>
2 + 4 = 6<br>
2 < 4 = True
</div>
如果我将这个html 传递给BeautifulSoup,它将转义plus 实体附近的& 符号并输出html 将是这样的:
<div>
<i>Some text here.</i>
Some text here also.<br>
2 &plus 4 = 6<br>
2 < 4 = True
</div>
例如python3代码:
from bs4 import BeautifulSoup
with open('test.html', 'rb') as file:
soup = BeautifulSoup(file, 'html.parser')
print(soup)
如何避免这种行为?
【问题讨论】:
标签: python html python-3.x beautifulsoup html-parsing