【发布时间】:2022-01-18 22:20:57
【问题描述】:
如果我有多个帐户/令牌,GitHub 允许您每小时发送不超过 2500 个请求,当达到特定级别的请求(例如 2500)或令牌时,如何在 Scrapy 中设置自动令牌更改响应 403 时更改。?
class GithubSpider(scrapy.Spider):
name = 'github.com'
start_urls = ['https://github.com']
tokens = ['token1', 'token2', 'token3', 'token4']
headers = {
'Accept': 'application/vnd.github.v3+json',
'Authorization': 'token ' + tokens[1],
}
def start_requests(self, **cb_kwargs):
for lang in languages:
cb_kwargs['lang'] = lang
url = f'https://api.github.com/search/users?q=language:{lang}%20location:{country}&per_page=100'
yield Request(url=url, headers=self.headers, callback=self.parse, cb_kwargs=cb_kwargs)
【问题讨论】:
标签: web-scraping scrapy github-api