【发布时间】:2016-09-18 05:55:36
【问题描述】:
我无法弄清楚为什么会出现这个 ValueError...为了提供一些上下文,我正在使用请求、BeautifulSoup 和带有 python 的 json 来抓取网站的 json 数据。
我不确定为什么它不适用于此 URL。我已经用其他几个 URL 完成了,没有问题。甚至 'page 2' (http://hypem.com/playlist/loved/Bigdirtyian/json/2/data.js) 也被成功抓取并存储在字典中。
我已经包含了 iPython 输入/输出(有问题的 URL 和成功的一页 - 第 3 页和第 2 页,分别):
In [1]: url = 'http://hypem.com/playlist/loved/Bigdirtyian/json/3/data.js'
In [2]: import json
In [3]: import requests
In [4]: from bs4 import BeautifulSoup
In [5]: r = requests.get(url)
In [6]: content = r.content
In [7]: soup = BeautifulSoup(content, 'html.parser')
In [8]: page_json_dict = json.loads(str(soup))
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-8-18cc0e11884e> in <module>()
----> 1 page_json_dict = json.loads(str(soup))
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.pyc in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
336 parse_int is None and parse_float is None and
337 parse_constant is None and object_pairs_hook is None and not kw):
--> 338 return _default_decoder.decode(s)
339 if cls is None:
340 cls = JSONDecoder
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.pyc in decode(self, s, _w)
367 end = _w(s, end).end()
368 if end != len(s):
--> 369 raise ValueError(errmsg("Extra data", s, end, len(s)))
370 return obj
371
ValueError: Extra data: line 1 column 18924 - line 1 column 18932 (char 18923 - 18931)
In [9]: url2 = 'http://hypem.com/playlist/loved/Bigdirtyian/json/2/data.js'
In [10]: r2 = requests.get(url2)
In [11]: content2 = r2.content
In [12]: soup2 = BeautifulSoup(content2, 'html.parser')
In [13]: page_json_dict2 = json.loads(str(soup2))
In [14]: //
提前致谢!!!
【问题讨论】:
标签: python json web-scraping beautifulsoup python-requests