PYTHON_异步爬虫(三)之协程(二)
一、
首先创建三个服务器,代码如下:
flask服务器.py
1 from flask import Flask 2 import time 3 4 app = Flask(__name__) 5 6 @app.route(\'/bobo\') 7 def index_bobo(): 8 time.sleep(2) 9 return \'Hello bobo\' 10 11 @app.route(\'/jay\') 12 def index_jay(): 13 time.sleep(2) 14 return \'Hello jay\' 15 16 @app.route(\'/tom\') 17 def index_tom(): 18 time.sleep(2) 19 return \'Hello tom\' 20 21 if __name__=="__main__": 22 app.run(threaded=True)
运行:
然后开始编写多任务协程
1 import requests 2 import asyncio 3 import time 4 5 6 start=time.time() 7 urls=[ 8 \'http://127.0.0.1:5000/bobo\',\'http://127.0.0.1:5000/jay\',\'http://127.0.0.1:5000/tom\' 9 ] 10 11 async def get_page(url): 12 print(\'正在下载\',url) 13 response=requests.get(url=url) 14 print(\'下载完毕\',response.text) 15 16 tasks=[] 17 18 for url in urls: 19 c=get_page(url) 20 task=asyncio.ensure_future(c) 21 tasks.append(task) 22 23 loop = asyncio.get_event_loop() 24 loop.run_until_complete(asyncio.wait(tasks)) 25 26 end=time.time() 27 print(\'总耗时:\',end-start)
运行结果:
总耗时与串行操作差不多,为什么?
因为requests.get是基于同步的,必须使用基于异步的网络请求模块进行指定url的请求发送
而aiohttp:基于异步网络请求的模块
1 import requests 2 import asyncio 3 import time 4 import aiohttp 5 6 start=time.time() 7 urls=[ 8 \'http://127.0.0.1:5000/bobo\',\'http://127.0.0.1:5000/jay\',\'http://127.0.0.1:5000/tom\' 9 ] 10 11 async def get_page(url): 12 async with aiohttp.ClientSession() as session: 13 #get()、post(): 14 #headers、params/data,proxy=\'http://ip:port\' 15 async with await session.get(url) as response: 16 #text()返回字符串形式的响应数据 17 #read()返回的二进制形式的响应数据 18 #json()返回的就是json对象 19 #注意:获取响应数据操作之前一定要使用await进行手动挂起 20 page_text=await response.text() 21 print(page_text) 22 # print(\'正在下载\',url) 23 # response=requests.get(url=url) 24 # print(\'下载完毕\',response.text) 25 26 tasks=[] 27 28 for url in urls: 29 c=get_page(url) 30 task=asyncio.ensure_future(c) 31 tasks.append(task) 32 33 loop = asyncio.get_event_loop() 34 loop.run_until_complete(asyncio.wait(tasks)) 35 36 end=time.time() 37 print(\'总耗时:\',end-start)
运行结果: