industrial-fd-2019

PYTHON_异步爬虫(三)之协程(二)

一、

首先创建三个服务器,代码如下:

flask服务器.py

 1 from flask import Flask
 2 import time
 3 
 4 app = Flask(__name__)
 5 
 6 @app.route(\'/bobo\')
 7 def index_bobo():
 8     time.sleep(2)
 9     return \'Hello bobo\'
10 
11 @app.route(\'/jay\')
12 def index_jay():
13     time.sleep(2)
14     return \'Hello jay\'
15 
16 @app.route(\'/tom\')
17 def index_tom():
18     time.sleep(2)
19     return \'Hello tom\'
20 
21 if __name__=="__main__":
22     app.run(threaded=True)

运行:

 

 然后开始编写多任务协程

 1 import requests
 2 import asyncio
 3 import time
 4 
 5 
 6 start=time.time()
 7 urls=[
 8     \'http://127.0.0.1:5000/bobo\',\'http://127.0.0.1:5000/jay\',\'http://127.0.0.1:5000/tom\'
 9 ]
10 
11 async def get_page(url):
12     print(\'正在下载\',url)
13     response=requests.get(url=url)
14     print(\'下载完毕\',response.text)
15 
16 tasks=[]
17 
18 for url in urls:
19     c=get_page(url)
20     task=asyncio.ensure_future(c)
21     tasks.append(task)
22 
23 loop = asyncio.get_event_loop()
24 loop.run_until_complete(asyncio.wait(tasks))
25 
26 end=time.time()
27 print(\'总耗时:\',end-start)

运行结果:

 

总耗时与串行操作差不多,为什么?

因为requests.get是基于同步的,必须使用基于异步的网络请求模块进行指定url的请求发送

而aiohttp:基于异步网络请求的模块

 1 import requests
 2 import asyncio
 3 import time
 4 import aiohttp
 5 
 6 start=time.time()
 7 urls=[
 8     \'http://127.0.0.1:5000/bobo\',\'http://127.0.0.1:5000/jay\',\'http://127.0.0.1:5000/tom\'
 9 ]
10 
11 async def get_page(url):
12     async with aiohttp.ClientSession() as session:
13         #get()、post():
14         #headers、params/data,proxy=\'http://ip:port\'
15         async with await session.get(url) as response:
16             #text()返回字符串形式的响应数据
17             #read()返回的二进制形式的响应数据
18             #json()返回的就是json对象
19             #注意:获取响应数据操作之前一定要使用await进行手动挂起
20             page_text=await response.text()
21             print(page_text)
22     # print(\'正在下载\',url)
23     # response=requests.get(url=url)
24     # print(\'下载完毕\',response.text)
25 
26 tasks=[]
27 
28 for url in urls:
29     c=get_page(url)
30     task=asyncio.ensure_future(c)
31     tasks.append(task)
32 
33 loop = asyncio.get_event_loop()
34 loop.run_until_complete(asyncio.wait(tasks))
35 
36 end=time.time()
37 print(\'总耗时:\',end-start)

运行结果:

 

发表于 2021-02-20 14:23  努力爬行的小虫子  阅读(70)  评论(0编辑  收藏  举报
 

分类:

技术点:

相关文章: