仅当节点是包含属性的特定节点的子节点时，才搜索所有出现的字符串答案

【问题标题】：Search all occurrences a string only when the nodes are children of a specific node containing an attribute仅当节点是包含属性的特定节点的子节点时，才搜索所有出现的字符串
【发布时间】：2019-03-20 17:26:15
【问题描述】：

考虑这个例子：

<foo attr1="dummy">
   <bar1>
     some text #{abc} some text
   </bar1>
   <bar2>
     <bar2bar2>
        some text #{def} some text
     </bar2bar2>
   </bar2>
</foo>

我需要一个 XPath 1.0 查询（不支持正则表达式），当节点是节点的（直接或间接）子节点时，搜索所有出现的 # {*} foo 属性为 attr1。换句话说，查询应该返回：

some text #{abc} some other text
some text #{def} some other text

【问题讨论】：

标签： xml xpath

【解决方案1】：

（回答原始问题）：尝试以下 XPath-1.0 表达式：

//text()[starts-with(normalize-space(.),'#{') and substring(normalize-space(.),string-length(normalize-space(.)),1)='}' and  ancestor::foo[@attr1]]

它返回所需的text() 节点，但带有前导和尾随空格。这在 XPath-1.0 中是无法避免的，因为 normalize-space() 函数只接受一个参数。在 XPath-2.0 中，您可以简单地将 /normalize-space() 添加到表达式的末尾来处理它。

【讨论】：

谢谢。这适用于我原来的问题（我现在修改它），当 ${} 单独出现时。但是我提出了一个问题有点错误，我实际上是想匹配一个正则表达式。所以 # {} 可能在一些文本之间。有什么办法可以修复它以捕获这些情况。
在 XPath-1.0 中，不可能应用 RegEx。最接近的是使用两个contains(...)。所以以下只是一个很好的近似值：//text()[contains(.,'#{') and contains(substring-after(.,'#{'),'}') and ancestor::foo[@attr1]].
只是为了完整性，如果您能找到使用 XPath-2.0 的方法：//text()[matches(.,'.*#\{.*\}.*') and ancestor::foo[@attr1]] 最能满足所有条件。

【解决方案2】：

我需要一个用于搜索的 XPath 1.0 查询（不支持正则表达式）当节点为（直接或间接）时，所有出现的 #{*} 具有 attr1 属性的节点 foo 的子节点。换句话说，查询应该返回

//foo//text()[contains(.,'#{')][contains(substring-after(.,'#{'),'}')]

请注意此表达式将选择文本节点。如果您有混合内容（具有文本和标记的元素，如 HTML p 具有 em 或 span），则字符串将被拆分为多个文本节点。为此，您将需要这样的答案：How can I find a node in HTML which has marked-up text by searching for the plaintext?

【讨论】：