【发布时间】:2020-08-31 07:03:50
【问题描述】:
import asyncio
import aiohttp
import lxml
from bs4 import BeautifulSoup
async def get_content(session,url):
async with session.get(url) as responce:
data = await responce.read()
return BeautifulSoup(data.decode('utf-8'), 'lxml-xml')
async def parse(urls):
async with aiohttp.ClientSession() as session:
tasks = [asyncio.create_task(get_content(session,i)) for i in urls]
soups = await asyncio.gather(*tasks,return_exceptions=True)
return soups
url = "https://kolesa.kz/cars/almaty/?page={}"
urls = [url.format(i) for i in range(2,201)]
loop = asyncio.get_event_loop()
soups = loop.run_until_complete(parse(urls))
loop.close()
print(soups[0])
使用 BeautifulSoup lxml-xml 无法解析站点的 200 页内容。
汤[0] 显示此<?xml version="1.0" encoding="utf-8"?>。
我可以使用 lxml-xml 获取 html 页面吗?
【问题讨论】:
标签: python parsing beautifulsoup utf-8 lxml