在 python 3.5 中解析 html 返回奇怪的类型答案

【问题标题】：Parsing html in python 3.5 returns strange type在 python 3.5 中解析 html 返回奇怪的类型
【发布时间】：2017-07-15 11:11:54
【问题描述】：

我正在运行 python 3.5 并试图从该网页中提取 BINGO 数据，但遇到了一些问题。当我拆分 html 响应时，我一直在我的字符串列表之前收到字母 b，这使得无法检查。我检查了我不熟悉的 html 输出及其类字节。为什么这个 b 在我所有的字符串之前，第二个我怎样才能更干净地解析一个 html 页面。

 import urllib.request
with urllib.request.urlopen('http://www.executiveadministrator.com/cgi-local/inoutPROhosted4/inoutPRO.pl?refresh=1&ID=AFTCO') as response:
   html = response.read()

htmllist = html.split()

print(htmllist)
for i in htmllist:
    #if i == 'BINGO':
    print(i)

示例输出：b'class="colorlinkbody">Renew' b'Board' b'Contract
' b'Copyright' b'1996-2013' b''

【问题讨论】：

因为 response.read 返回 bytes 不再是 str。使用encode()

标签： python html python-3.x parsing urllib

【解决方案1】：

由于response.read()返回bytes不再是cmets中提到的str，如果你需要从一个字节对象中获取字符串值，你必须在字节对象上调用decode(encoding)方法。制作你的打印功能：

for i in htmllist:
    print(i.decode('utf-8'))

【讨论】：

谢谢，这似乎是一种从 html 中获取字符串列表的笨拙方式。有没有更好的办法？意思是 urllib.request 以外的东西？如果这很重要，我在 Windows 平台上。
取决于你想用它们做什么，但你可能应该更多地查看 html 解析库，如 lxml 或 BeautifulSoup aka bs4