从您的另一篇文章中,我猜 URL 是 https://www.sciencedirect.com/journal/construction-and-building-materials/issues
当您单击链接时,网页会从另一个 URL 加载 JSON。您可以自己请求 JSON,而无需单击链接。您只需要知道永远不变的 ISBN (09500618) 以及您可以从某个范围传入的年份。这甚至会从已展开的选项卡中返回数据。
import requests
import json
# The website rejects requests except from user agents it has not blacklisted so set a header
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0'
}
for i in range (1999, 2019):
url = "https://www.sciencedirect.com/journal/09500618/year/" + str(i) + "/issues"
r = requests.get(url, headers=headers)
j = r.json()
for d in j['data']:
# Print the json object
print (json.dumps(d, indent=4, sort_keys=True))
# Or print specific values
print (d['coverDateText'], d['volumeFirst'], d['uriLookup'], d['srctitle'])
输出:
{
"cid": "271475",
"contentFamily": "serial",
"contentType": "JL",
"coverDateStart": "19991201",
"coverDateText": "1 December 1999",
"hubStage": "H300",
"issn": "09500618",
"issueFirst": "8",
"pages": [
{
"firstPage": "417",
"lastPage": "470"
}
],
"pii": "S0950061800X00323",
"sortField": "1999001300008zzzzzzz",
"srctitle": "Construction and Building Materials",
"uriLookup": "/vol/13/issue/8",
"volIssueSupplementText": "Volume 13, Issue 8",
"volumeFirst": "13"
}
1 December 1999 13 /vol/13/issue/8 Construction and Building Materials
...