lxml xpath 无法处理 <p> 标签

【问题标题】：lxml xpath can not handle <p> taglxml xpath 无法处理 <p> 标签
【发布时间】：2015-03-19 14:13:15
【问题描述】：

在这种情况下如何获取 p 标签文本“Blahblah”：

当p标签文本字段在强标签后面时，lxml无法识别。

<p class="user_p"><strong>cc</strong>Blahblah</p>

====代码====

from lxml import html
content="""
    <div>
    <p class="user_p">Blahblah<strong>cc</strong></p>
    <p class="user_p"><strong>cc</strong>Blahblah</p> 
    </div>
"""
tree = html.fromstring(content.decode('utf-8'))

p = tree.xpath('//div/p')

print p[0].text

print p[1].text

====输出====

Blahblah
None

【问题讨论】：

标签： html lxml

【解决方案1】：

在这个 HTML 片段中，

<p class="user_p"><strong>cc</strong>Blahblah</p>

文本“Blahblah”是<strong> 元素的tail 属性的值。

演示代码：

from lxml import html

content = """
    <div>
     <p class="user_p"><strong>cc</strong>Blahblah</p> 
    </div>"""

tree = html.fromstring(content)
s = tree.xpath('//div/p/strong')
print s[0].tail

输出：

Blahblah

【讨论】：

你是对的。我还找到了另一种方式：“//div/p/strong/following-sibling::text()”。它也可以把它拿出来。添加参考