AttributeError：“HTTPResponse”对象没有属性“split”答案

【问题标题】：AttributeError: 'HTTPResponse' object has no attribute 'split'AttributeError：“HTTPResponse”对象没有属性“split”
【发布时间】：2016-09-19 02:08:23
【问题描述】：

我正在尝试从谷歌金融获取一些信息，但我收到了这个错误

AttributeError: 'HTTPResponse' 对象没有属性 'split'

这是我的python代码：

import urllib.request
import urllib
from bs4 import BeautifulSoup

symbolsfile = open("Stocklist.txt")

symbolslist = symbolsfile.read()

thesymbolslist = symbolslist.split("\n")

i=0


while i<len (thesymbolslist):
    theurl = "http://www.google.com/finance/getprices?q=" + thesymbolslist[i] + "&i=10&p=25m&f=c"
    thepage = urllib.request.urlopen (theurl)
    print(thesymbolslist[i] + " price is " + thepage.split()[len(thepage.split())-1])
    i= i+1

【问题讨论】：

你想在这里做什么？ thepage.split()[len(thepage.split())-1])
我正在尝试将页面放入列表中，然后从该列表中获取最后一个属性并打印它。
您需要从thepage 到read() 才能得到一个实际的字符串。

标签： python python-3.x

【解决方案1】：

问题的原因

这是因为urllib.request.urlopen (theurl) 返回一个表示连接的对象，而不是字符串。

解决方案

要从此连接中读取数据并实际获取字符串，您需要这样做

thepage = urllib.request.urlopen(theurl).read()

然后你的其余代码应该自然而然地遵循。

解决方案附录

有时，字符串本身包含无法识别的字符编码字形，在这种情况下，Python 会将其转换为bytestring。

处理这个问题的正确方法是找到正确的字符编码并使用它将字节串解码为常规字符串，如this question：

thepage = urllib.request.urlopen(theurl)
# read the correct character encoding from `Content-Type` request header
charset_encoding = thepage.info().get_content_charset()
# apply encoding
thepage = thepage.read().decode(charset_encoding)

有时可以安全地假设字符编码为utf-8，在这种情况下

thepage = urllib.request.urlopen(theurl).read().decode('utf-8')

确实经常工作。如果不出意外，这在统计上是一个很好的猜测。

【讨论】：

一旦我这样做了，它给了我这个错误：TypeError: Can't convert 'bytes' object to str implicitly
这是因为你收到的字符串的编码不是 Python 能理解的。给我一点时间来解决问题。
您的解决方案更强大，因为它不依赖于源编码，所以 OP：最好将此标记为正确答案 :)

【解决方案2】：

检查documentation 可能会在将来为您节省时间。它说 urlopen() 方法返回一个 HTTPResponse 具有 read() 方法的对象。在 Python 3 中，您需要解码源编码的输出，在本例中为 UTF-8。所以就写吧

thepage = urllib.request.urlopen(theurl).read().decode('utf-8')

【讨论】：

一旦我这样做了，它给了我这个错误：TypeError: Can't convert 'bytes' object to str implicitly
Python 3？那就看stackoverflow.com/questions/16699362/…试试thepage = urllib.request.urlopen(theurl).read().decode('utf-8')
@le_m 假设默认编码是utf-8 - 这通常是正确的，但不一定是发送过来的编码。正确的方法是检查标头中的编码并应用它。
@AkshatMahajan 当然，你是对的，但由于 OP 只是在查询 google.com，我们可以放心地假设 UTF-8。
@le_m You would be surprised what character encodings Google uses in lieu of UTF-8...