为什么 xml 文件中的 .findall 无法正确读取？答案

【问题标题】：Why .findall in xml file doesn't read correctly?为什么 xml 文件中的 .findall 无法正确读取？
【发布时间】：2021-11-02 10:26:46
【问题描述】：

我们来看看下面的xml文件：

<?xml version="1.0" encoding="utf-8"?>
<root
    xmlns="urn:schemas-upnp-org:device-1-0">
    <specVersion>
        <major>1</major>
        <minor>0</minor>
    </specVersion>
    <URLBase>http://192.168.1.1:80</URLBase>
    <device>
        <serviceList>
            <service>
                <serviceType>1</serviceType>
            </service>
        </serviceList>
        <deviceList>
            <device>
                <serviceList>
                    <service>
                        <serviceType>2</serviceType>
                    </service>
                </serviceList>
                <deviceList>
                    <device>
                        <serviceList>
                            <service>
                                <serviceType>3</serviceType>
                            </service>
                        </serviceList>
                    </device>
                </deviceList>
            </device>
        </deviceList>
        <presentationURL>/</presentationURL>
    </device>
</root>

我想提取device 下的所有服务，所以在示例中它应该只有 1。

所以我写了：

import os
import sys
import xml.etree.ElementTree as ET

root = ET.fromstring(inner_xml) #inner_xml=above
device = root.find('{urn:schemas-upnp-org:device-1-0}device')
for serviceType in device.findall(
        './/{urn:schemas-upnp-org:device-1-0}serviceList//{urn:schemas-upnp-org:device-1-0}serviceType'):
    print(serviceType.text)

但为什么我也会得到 2 和 3？他们不在serviceList 直接在device 下

【问题讨论】：

欢迎来到 Stack Overflow。当您遇到问题时，请始终发布完整的可复制粘贴代码（请参阅minimal reproducible example），包括import 语句。不要只发布代码片段。
@mzjn 完成了，请看一下
.//x 等表达式选择上下文节点的所有 x 后代。在您的情况下，上下文节点是 device 元素。

标签： python python-3.x xml python-3.6 elementtree

【解决方案1】：

您的代码“要求”使用 // 进行递归搜索

for serviceType in device.findall(
        './/{urn:schemas-upnp-org:device-1-0}serviceList//{urn:schemas-upnp-org:device-1-0}serviceType'):

你需要使用

for serviceType in device.findall(
        '{urn:schemas-upnp-org:device-1-0}serviceList//{urn:schemas-upnp-org:device-1-0}serviceType'):

下面的工作代码

import xml.etree.ElementTree as ET


xml = '''<?xml version="1.0" encoding="utf-8"?>
<root
    xmlns="urn:schemas-upnp-org:device-1-0">
    <specVersion>
        <major>1</major>
        <minor>0</minor>
    </specVersion>
    <URLBase>http://192.168.1.1:80</URLBase>
    <device>
        <serviceList>
            <service>
                <serviceType>1</serviceType>
            </service>
        </serviceList>
        <deviceList>
            <device>
                <serviceList>
                    <service>
                        <serviceType>2</serviceType>
                    </service>
                </serviceList>
                <deviceList>
                    <device>
                        <serviceList>
                            <service>
                                <serviceType>3</serviceType>
                            </service>
                        </serviceList>
                    </device>
                </deviceList>
            </device>
        </deviceList>
        <presentationURL>/</presentationURL>
    </device>
</root>'''

root = ET.fromstring(xml)
device = root.find('{urn:schemas-upnp-org:device-1-0}device')
for serviceType in device.findall(
        '{urn:schemas-upnp-org:device-1-0}serviceList//{urn:schemas-upnp-org:device-1-0}serviceType'):
    print(serviceType.text)

【讨论】：

只需声明 namespaces = {"": "urn:schemas-upnp-org:device-1-0"} 并将其作为第二个参数传递给所有查找函数，它会缩短您的 XPath。
递归是什么意思？里面是什么？
见docs.python.org/3.8/library/…

【解决方案2】：

您可以使用简单的 XPath ./device/serviceList/service/serviceType 来查找节点。您还可以将命名空间作为第二个参数传递给任何查找函数，以不在 XPath 表达式中为每个节点指定它们。您可以在此处阅读更多相关信息：Parsing XML with Namespaces。

代码：

import xml.etree.ElementTree as ET

source = ...
root = ET.fromstring(source)

namespaces = {"": "urn:schemas-upnp-org:device-1-0"}
for node in root.iterfind("./device/serviceList/service/serviceType", namespaces):
    print(node.text)

【讨论】：

.//和./有什么区别？
@Algo，在 XPath 标准的Abbreviated Syntax 文章中有描述。基本上，./node 将搜索<node>，它是根的子节点，而.//node 将搜索<node>，它是根内任何其他节点的子节点（递归）。例如。我的答案中的 XPath 表达式可以缩短为./device/serviceList//serviceType，但不是./device//serviceType，因为它会发现所有<serviceType> 节点都包含<device>。现在更清楚还是需要一些扩展的解释？