【发布时间】:2013-06-13 10:55:51
【问题描述】:
我正在尝试使用以下代码解析带有 requests 和 BeautifulSoup 库的任意网页:
try:
response = requests.get(url)
except Exception as error:
return False
if response.encoding == None:
soup = bs4.BeautifulSoup(response.text) # This is line 809
else:
soup = bs4.BeautifulSoup(response.text, from_encoding=response.encoding)
在大多数网页上都可以正常工作。但是,在某些任意页面 (
Traceback (most recent call last):
File "/home/dotancohen/code/parser.py", line 155, in has_css
soup = bs4.BeautifulSoup(response.text)
File "/usr/lib/python3/dist-packages/requests/models.py", line 809, in text
content = str(self.content, encoding, errors='replace')
TypeError: str() argument 2 must be str, not None
供参考,这是requests库的相关方法:
@property
def text(self):
"""Content of the response, in unicode.
if Response.encoding is None and chardet module is available, encoding
will be guessed.
"""
# Try charset from content-type
content = None
encoding = self.encoding
# Fallback to auto-detected encoding.
if self.encoding is None:
if chardet is not None:
encoding = chardet.detect(self.content)['encoding']
# Decode unicode from given encoding.
try:
content = str(self.content, encoding, errors='replace') # This is line 809
except LookupError:
# A LookupError is raised if the encoding was not found which could
# indicate a misspelling or similar mistake.
#
# So we try blindly encoding.
content = str(self.content, errors='replace')
return content
可以看出,当抛出此错误时,我没有传入编码。 我如何不正确地使用该库,我可以做些什么来防止这个错误?这是在 Python 3.2.3 上,但我也可以在 Python 2 上得到相同的结果。
【问题讨论】:
标签: python exception beautifulsoup python-requests