【发布时间】:2020-02-02 20:20:27
【问题描述】:
我需要解析几个页面中的所有链接。我编写了简单的脚本,它使用异步方法。
此时它返回空列表links。但我希望将页面中的所有链接都列出links 并将其显示到控制台。
我的脚本没有任何错误消息。
import asyncio
import aiohttp
from bs4 import BeautifulSoup
links = []
host = 'https://avito.ru/saransk'
search_words = [
'asus',
'lenovo',
'xiaomi',
'apple',
'ipad',
]
def get_data(html_text):
paths = []
soup = BeautifulSoup(html_text, 'lxml')
link_obj = soup.find_all('a')
for path in link_obj:
paths.append(path['href'])
links.extend(paths)
return links
async def get_html(search_word):
async with aiohttp.ClientSession() as session:
resp = await session.get(host + '?q=' + search_word)
assert resp.status == 200
# print(await resp.text())
resp2 = await get_data(resp.text())
print('----------', resp2)
def main():
ioloop = asyncio.get_event_loop()
tasks = [ioloop.create_task(get_html(word)) for word in search_words]
ioloop.run_until_complete(asyncio.wait(tasks))
ioloop.close()
print(links)
main()
我使用 python 3.8 并遵循要求:
aiohttp==3.6.2
- async-timeout [required: >=3.0,<4.0, installed: 3.0.1]
- attrs [required: >=17.3.0, installed: 19.3.0]
- chardet [required: >=2.0,<4.0, installed: 3.0.4]
- multidict [required: >=4.5,<5.0, installed: 4.7.4]
- yarl [required: >=1.0,<2.0, installed: 1.4.2]
- idna [required: >=2.0, installed: 2.8]
- multidict [required: >=4.0, installed: 4.7.4]
bs4==0.0.1
- beautifulsoup4 [required: Any, installed: 4.8.2]
- soupsieve [required: >=1.2, installed: 1.9.5]
fake-useragent==0.1.11
lxml==4.5.0
requests==2.22.0
- certifi [required: >=2017.4.17, installed: 2019.11.28]
- chardet [required: >=3.0.2,<3.1.0, installed: 3.0.4]
- idna [required: >=2.5,<2.9, installed: 2.8]
- urllib3 [required: >=1.21.1,<1.26,!=1.25.1,!=1.25.0, installed: 1.25.8]
【问题讨论】:
标签: python python-3.x beautifulsoup async-await