如何在 XPath 中使用 contains()？答案

【问题标题】：How to use contains() in XPath?如何在 XPath 中使用 contains()？
【发布时间】：2018-12-05 13:38:12
【问题描述】：

我正在尝试从网页收集信息，但无法获得正确的 XPath 来找到它。以下是网站上的一段：

<div class="posted">
  <div>
    June 20, 2018
  </div>
</div>

我想在每个页面中搜索显示“已发布”的分割类，然后将其下的所有内容作为字符串返回。（一个凌乱的字符串是可以的；我将在“possibleDate”中使用“if”2018“来搜索年份）这是我正在尝试的：

possibleDate = str(tree.xpath("//div[contains(@class, ’posted’)]//@text"))

它说这是一个无效的表达式。
我做错了什么？

【问题讨论】：

请注意[contains(@class, 'posted')] 没有错，但我怀疑您的意图是[@class = 'posted']。 “包含”版本将匹配@class="signposted"； "=" 版本不会。

【解决方案1】：

首先，将’ 字符替换为' 围绕posted 的字符。

接下来，将 @text 替换为 text() 以消除 XPath 语法错误。

此外，您可能希望使用所选div 的空间标准化字符串值，而不是选择文本节点：

possibleDate = str(tree.xpath("normalize-space(//div[@class='posted'])")

这将跨嵌套在目标div 中的标记变体进行抽象。

另请参阅： xpath: find a node whose class attribute matches a value and whose text contains a certain string

【讨论】：

将@text 替换为text() 仍会返回无效的表达式错误。使用possibleDate = str(tree.xpath("normalize-space(//div[@class='posted'])")) 没有报错，但是没有找到任何东西。
啊，您还必须在 XPath 中将 ’ 字符替换为围绕 posted 的 ' 字符。答案已更新。
谢谢。我测试了tree.xpath("//div[contains(@class, 'posted')]//text()")、tree.xpath("normalize-space(//div[@class='posted'])") 和tree.xpath("//div[contains(@class, 'posted')]")，但都只返回空字符串。我确信他们检查的页面包含适当的类，但他们仍然找不到它们。
您需要用 true minimal reproducible example 更新您的问题，以便我们为您提供帮助进一步。