【问题标题】:How to find when a request had started and when it got ended in scrapy如何查找请求何时开始以及何时以scrapy结束
【发布时间】:2017-08-04 18:11:10
【问题描述】:
我正在尝试在 scrapy 中测量系统的吞吐量,并试图找出 HTTP 请求何时被触发以及何时在 scrapy 中完成。
非常感谢任何找到解决方案的方向。
【问题讨论】:
标签:
python
scrapy
throughput
【解决方案1】:
您可以使用自定义中间件:
class MeasureMiddleware:
requests = []
def process_request(self, request, spider):
# store the time and url of every outgoing request
self.requests.append((request.url, datetime.now()))
def process_response(self, request, response, spider):
# for everyone response check if one of tracked requests cameback
# if so, print start time and current time
filtered_requests = []
# go through tracked requests and check whether any of them match current url
for request in self.requests:
url, start_date = request
if url == request.url:
logging.info(f'request {url} {start_date} - {datetime.now()}')
else:
filtered_requests.append(request)
self.requests = filtered_requests
然后激活下载器中间件
DOWNLOADER_MIDDLEWARES = {
'myproject.middlewares.MeasureMiddleware': 543,
}
值得注意的是,由于 scrapy 的异步性质,它不会精确到毫秒,但它应该足够准确,可以提供一个通用的概述。