使用 lxml 查找多个标签的值答案

【问题标题】：Find for multiple tags' values with lxml使用 lxml 查找多个标签的值
【发布时间】：2020-11-24 11:41:41
【问题描述】：

我正在使用 lxml 来解析像这个示例这样的 XML：

<compounddef xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" id="d2/db7/class_foo" kind="class">
    <compoundname>FooClass</compoundname>
    <sectiondef kind="public-type">
        <memberdef kind="typedef" id="d2/db7/class_bar">
            <type><ref refid="d3/d73/struct_foo" kindref="compound">StructFoo</ref></type>
            <definition>StructFooDefinition</definition>
        </memberdef>
    </sectiondef>
</compounddef>

我正在尝试使用 <refid> "d3/d73/struct_foo" 和包含文本 "Foo" 的 <definition> 获取元素。

可能有许多具有该值的 refid 和许多包含 Foo 的定义，但只有一个具有这种组合。

我能够首先找到具有该 refid 的所有元素，然后通过检查其中哪些包含“Foo”来过滤此列表，但由于我正在使用一个非常大的 XML 文件 (~1GB)应用程序对时间敏感，我想避免这种情况。

我尝试使用关键字 'and' 或 '//precede:...' 组合各种 etree 路径，但没有成功。我最后一次尝试是：

self.dox_tree_root_.xpath(".//compounddef[@kind = 'class']//memberdef[@kind='typedef'][/type/ref[@refid='%s'] and contains(definition, 'name')]" % (independent_type_refid, name)))

但它给了我一个错误。

有没有办法在一个命令中组合两个过滤器？

【问题讨论】：

请向我们展示您的代码。如果可以使用 lxml 代替内置的 ElementTree 模块，则可以使用更强大的 XPath 表达式。
@mzjn 我更新了示例，并更正了我正在使用 lxml。谢谢。
您没有向我们展示任何可运行的 Python 代码（请参阅 minimal reproducible example）。
不平衡括号给出SyntaxError: unmatched ')'。修复提供TypeError: not all arguments converted during string formatting。正如@mzjn 所说，给一个 MRE。

标签： python lxml elementtree

【解决方案1】：

你可以使用 XPATH

//a[.//ref[@refid="12345"] and contains(c, "Good")]

【讨论】：

谢谢，不幸的是，我无法让它在我的真实案例中运行。这是我正在使用的实际行： self.dox_tree_root_.xpath(".//compounddef[@kind = 'class']//memberdef[@kind='typedef'][/type/ref[@refid= '%s'] 和 contains(definition, 'name')]" % (independent_type_refid, name))) 但我在路径表达式中遇到错误。
@Mdp11 ，使用看起来像真实 HTML/XML 的示例更新您的源示例
@Mdp11 试试这个".//compounddef[@kind = 'class']//memberdef[@kind='typedef'][type/ref[@refid='%s'] and contains(definition, '%s')]" % (independent_type_refid, name)

【解决方案2】：

如果我理解正确，这应该让你足够接近：

.//compounddef[@kind = 'class']//memberdef[@kind='typedef'][./type/ref[@refid='d3/d73/struct_foo']][contains(.//definition, 'Foo')]//definition

输出：

StructFooDefinition

【讨论】：