【发布时间】:2018-10-28 07:41:47
【问题描述】:
我需要从站点地图中获取链接列表。
我正在使用下面的代码,但我没有得到任何回报。没有错误。最终,我会喜欢带有列表的 Excel 表格。
import bs4
from lxml import etree #added as suggested
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'https://www.example.com/sitemap.xml'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(site, "lxml.xml") #added as suggested
evensite = page_soup.findAll("table", {"class":"td"})
print(evensite)
修改后,这是得到的错误
Traceback (most recent call last):
File "/Users/user/Downloads/lxml.py", line 14, in <module>
page_soup = soup(site, "lxml.xml")
File "/anaconda3/lib/python3.6/site-packages/bs4/__init__.py", line 165, in __init__
% ",".join(features))
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml.xml. Do you need to install a parser library?
[Finished in 1.3s]
【问题讨论】:
-
解释预期结果和实际结果之间的差异。
-
这是您使用的实际网址吗?
-
@KarlRichter 我目前在运行脚本时一无所获。甚至没有错误。当我调用它时,我得到了 html 代码。但是我不能
find代码中的任何内容。我想取回网站上的所有超链接。
标签: python xml beautifulsoup