lxml 仅获取前缀，然后返回元素名称答案

【问题标题】：lxml to grab only prefix and then return the element nameslxml 仅获取前缀，然后返回元素名称
【发布时间】：2017-03-20 23:27:23
【问题描述】：

我需要 lxml 来做两件事： 1) 列出 xml 文件中使用的所有各种前缀； 2) 指定前缀后，让 lxml 返回给我所有元素名称它们的多个属性。

对于这个 lxml：

<pref:MiscDetails contentRef='01-01_2016' misc='wha'>1000</pref:MiscDetails>
<pref:TestingThis contentRef='03-02_2017' misc='t' qual='5'>50</pref:TestingThis>
<pref:AnotherExample contentRef='01-01_2015' misc='x'>100000</pref:AnotherExample>
<test:AFinalExample contentRef='' te='t'>test</test:AFinalExample>

代码应该首先告诉我这个文件中的前缀是“pref”和“test”，然后我希望代码列出与“pref”关联的元素名称及其属性，然后是“test”。

输出 1：

"Listing prefixes:"
"pref"
"test"

输出 2：

"Listing the prefix 'pref' element names and their attributes:"
"Element MiscDetails with attributes contentRef='01-01_2016' misc='wha'"
"Element TestingThis with attributes contentRef='03-02_2017' misc='t' qual='5'"
"Element AnotherExample with attributes contentRef='01-01_2015' misc='x'"

"Listing the prefix 'test' element names and their attributes:"
"Element AFinalExample with attributes contentRef='' te='t'"

谢谢！

【问题讨论】：

到目前为止你尝试了什么？

标签： python xml web-scraping lxml

【解决方案1】：

文档或元素上的nsmap 属性将列出任何命名空间前缀：

>>> from lxml import etree
>>> doc = etree.fromstring("""<doc xmlns:pref='http://example.com'>
    <pref:MiscDetails>...</pref:MiscDetails></doc>""")
>>> doc.nsmap
{'pref': 'http://example.com'}

使用iter() 和{namespace-uri}* 来返回该命名空间中的所有元素（这里必须使用URI，这是命名空间中有意义的部分，而不是前缀，这只是为了方便人类）：

>>> doc = etree.fromstring("<doc xmlns:pref='http://example.com'>
<pref:foo/><pref:bar/></doc>")
>>> [ el.tag for el in doc.iter('{http://example.com}*') ]
['{http://example.com}foo', '{http://example.com}bar']

lxml 文档中的更多信息：http://lxml.de/tutorial.html#namespaces

【讨论】：

谢谢，我也必须使用 getroot()，但现在可以了！
下一个问题：如何找到与每个前缀关联的所有元素？
添加了一个额外的例子。
你是个摇滚明星！谢谢！