如何选择所有子文本，但不包括 Selenium 的 XPath 选择器中的标签？答案

【问题标题】：How to select all children text but excluding a tag in Selenium's XPath selector?如何选择所有子文本，但不包括 Selenium 的 XPath 选择器中的标签？
【发布时间】：2015-02-19 21:28:18
【问题描述】：

我有这个 html：

<div id="content">
    <h1>Title 1</h1><br><br>

    <h2>Sub-Title 1</h2>
    <br><br>
    Description 1.<br><br>Description 2.
    <br><br>

    <h2>Sub-Title 2</h2>
    <br><br>
    Description 1<br>Description 2<br>
    <br><br>

    <div class="infobox">
        <font style="color:#000000"><b>Information Title</b></font>
        <br><br>Long Information Text
    </div>
</div>

我想在 Selenium 的 find_element_by_xpath 函数中获取 <div id="content"> 中的所有文本但不包括 <div class="infobox"> 的内容，所以预期的结果是这样的：

Title 1


Sub-Title 1


Descripton 1.

Descripton 2.


Sub-Title 2


Descripton 1.
Descripton 2.

我可以通过在在线 XPath 测试器中使用此代码来获得它：

//div[@id="content"]/descendant::text()[not(ancestor::div/@class="infobox")]

但是如果我将代码传递给 selenium 的 find_element_by_xpath，我会得到selenium.common.exceptions.InvalidSelectorException。

result = driver.find_element_by_xpath('//div[@id="content"]/descendant::text()[not(ancestor::div/@class="infobox")]')

【问题讨论】：

标签： python html selenium xpath selenium-webdriver

【解决方案1】：

find_element_by_xpath() 中使用的 xpath 必须指向一个元素，而不是文本节点，也不是属性。

这里最简单的方法是找到父标签，找到要排除的文本的子标签，然后从父文本中删除子文本：

parent = driver.find_element_by_id('content')
child = parent.find_element_by_class_name('infobox')
print parent.text.replace(child.text, '')

【讨论】：

很好的答案！我知道你提到这是最简单的。但是可以为此编写 xpath 吗？
@Saifur 谢谢你，它可以，虽然find_element_by_xpath() 不能用于此。我认为的另一个选择是使用 js 和 execute_script() 评估 xpath。