使用 BeautifulSoup 获取 xml 站点答案

【问题标题】：Obtaining a xml site using BeautifulSoup使用 BeautifulSoup 获取 xml 站点
【发布时间】：2018-10-28 07:41:47
【问题描述】：

我需要从站点地图中获取链接列表。
我正在使用下面的代码，但我没有得到任何回报。没有错误。最终，我会喜欢带有列表的 Excel 表格。

import bs4
from lxml import etree #added as suggested
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'https://www.example.com/sitemap.xml'

uClient = uReq(my_url)

page_html = uClient.read()
uClient.close()

page_soup = soup(site, "lxml.xml") #added as suggested

evensite = page_soup.findAll("table", {"class":"td"})

print(evensite)

修改后，这是得到的错误

Traceback (most recent call last):
File "/Users/user/Downloads/lxml.py", line 14, in <module>
page_soup = soup(site, "lxml.xml")
File "/anaconda3/lib/python3.6/site-packages/bs4/__init__.py", line 165, in __init__
% ",".join(features))
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml.xml. Do you need to install a parser library?
[Finished in 1.3s]

【问题讨论】：

解释预期结果和实际结果之间的差异。
这是您使用的实际网址吗？
@KarlRichter 我目前在运行脚本时一无所获。甚至没有错误。当我调用它时，我得到了 html 代码。但是我不能find 代码中的任何内容。我想取回网站上的所有超链接。

标签： python xml beautifulsoup

【解决方案1】：

我没有尝试过，但我认为您无法使用html.parser 解析 .xml 文件。您是否尝试过使用

page_soup = soup(page_html, "lxml-xml")
evensite = page_soup.findAll("link")

【讨论】：

奇怪的是我目前正在使用anaconda，我通过anaconda安装了lxml。我仍然收到此错误bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml.xml. Do you need to install a parser library?
它是 lxml-xml 而不是 lxml.xml。一开始我打错了。