【发布时间】:2021-08-15 19:45:47
【问题描述】:
我正在尝试使用 requests 模块从 webpage 中抓取表格内容。该页面的内容是高度动态的但是,可以根据开发工具通过 api 访问它。我正在尝试使用适当的参数来模拟相同的发布请求,但我总是得到状态403。
import requests
from pprint import pprint
start_url = 'https://opensea.io/rankings'
link = 'https://api.opensea.io/graphql/'
payload = {"id":"rankingsQuery","query":"query rankingsQuery(\n $chain: [ChainScalar!]\n $count: Int!\n $cursor: String\n $sortBy: CollectionSort\n $parents: [CollectionSlug!]\n $createdAfter: DateTime\n) {\n ...rankings_collections\n}\n\nfragment rankings_collections on Query {\n collections(after: $cursor, chains: $chain, first: $count, sortBy: $sortBy, parents: $parents, createdAfter: $createdAfter, sortAscending: false, includeHidden: true, excludeZeroVolume: true) {\n edges {\n node {\n createdDate\n name\n slug\n logo\n stats {\n floorPrice\n marketCap\n numOwners\n totalSupply\n sevenDayChange\n sevenDayVolume\n oneDayChange\n oneDayVolume\n thirtyDayChange\n thirtyDayVolume\n totalVolume\n id\n }\n id\n __typename\n }\n cursor\n }\n pageInfo {\n endCursor\n hasNextPage\n }\n }\n}\n","variables":{"chain":None,"count":100,"cursor":"YXJyYXljb25uZWN0aW9uOjk5","sortBy":"SEVEN_DAY_VOLUME","parents":None,"createdAfter":None}}
with requests.Session() as s:
s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36'
s.headers['x-api-key'] = '2f6f419a083c46de9d83ce3dbe7db601'
s.headers['x-build-id'] = 'cplNDIqD8Uy8MvANX90r9'
s.headers['referer'] = 'https://opensea.io/'
res = s.post(link,json=payload)
pprint(res.status_code)
print(res.json())
如何使用请求模块从该网页中抓取表格内容?
【问题讨论】:
标签: python python-3.x web-scraping python-requests