@vijay、print json.loads(soup.find("pre").string[2:-2])["Author"]; 将完成这项工作。请看下面在 Python 交互终端上执行的代码。
>>> import json
>>> import requests
>>> from bs4 import BeautifulSoup
>>>
>>> html_text = """<html>
... <head></head>
... <body>
... <pre style="word-wrap: break-word; white-space: pre-wrap;">
... "{"Title":"One Indian Girl","Year":"2016","Author":"Chetan Bhagat"}"
... </pre>
... </body>
... </html>"""
>>>
>>> soup = BeautifulSoup(html_text, "html.parser")
>>> print(soup.prettify())
<html>
<head>
</head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">
"{"Title":"One Indian Girl","Year":"2016","Author":"Chetan Bhagat"}"
</pre>
</body>
</html>
>>>
>>> print(soup.find("pre"))
<pre style="word-wrap: break-word; white-space: pre-wrap;">
"{"Title":"One Indian Girl","Year":"2016","Author":"Chetan Bhagat"}"
</pre>
>>>
>>> print(soup.find("pre").string)
"{"Title":"One Indian Girl","Year":"2016","Author":"Chetan Bhagat"}"
>>> print(soup.find("pre").string[2:-2])
{"Title":"One Indian Girl","Year":"2016","Author":"Chetan Bhagat"}
>>>
>>> d = json.loads(soup.find("pre").string[2:-2])
>>> type(d)
<type 'dict'>
>>>
>>> d
{u'Author': u'Chetan Bhagat', u'Year': u'2016', u'Title': u'One Indian Girl'}
>>>
>>> d["Author"]
u'Chetan Bhagat'
>>>
>>> d["Year"]
u'2016'
>>>
>>> d["Title"]
u'One Indian Girl'
>>>
>>> # Place all in the list
...
>>> l = [d["Title"], d["Year"], d["Author"]]
>>> l
[u'One Indian Girl', u'2016', u'Chetan Bhagat']
>>>
» 在列表中获取数据,而不像上面那样引用字典的键。
>>> final_data = [str(a.strip().split(":")[1]) for a in soup.find("pre").string[2:-3].replace('\"', '').split(",")]
>>>
>>> final_data
['One Indian Girl', '2016', 'Chetan Bhagat']
>>>
让我们了解一下上面一步一步获取列表中数据的直接过程(更新)。
>>> data = soup.find("pre").string[2:-3]
>>> data
u'{"Title":"One Indian Girl","Year":"2016","Author":"Chetan Bhagat"'
>>>
>>> data = data.replace('\"', '')
>>> data
u'{Title:One Indian Girl,Year:2016,Author:Chetan Bhagat'
>>>
>>> arr = data.split(",")
>>> arr
[u'{Title:One Indian Girl', u'Year:2016', u'Author:Chetan Bhagat']
>>>
>>> final_data = [str(a.strip().split(":")[1]) for a in arr]
>>> final_data
['One Indian Girl', '2016', 'Chetan Bhagat']
>>>