【问题标题】:Select siblings between two nodes选择两个节点之间的兄弟姐妹
【发布时间】:2012-03-28 17:54:43
【问题描述】:

我必须收集所有类别名称以及它们下的所有 div,类以“config-entry”开头。

<h2>category 1</h2>
<div class='clear10'></div>
<div class='config-entry selected-block'>...</div>
<div class='config-entry '>...</div>
<div class='config-entry '>...</div>
<div class='config-entry '>...</div>
<h2>category 2</h2>
<div class='clear10'></div>
<div class='config-entry selected-block'>...</div>
<div class='config-entry '>...</div>
<div class='config-entry '>...</div>
<div class='config-entry '>...</div>
<div class='config-entry '>...</div>
<h2>category 3</h2>
<div class='clear10'></div>
<div class='config-entry selected-block'>...</div>
<div class='config-entry '>...</div>
<h2>category 4</h2>
<div class='clear10'></div>
<div class='config-entry selected-block'>...</div>
<div class='config-entry '>...</div>
<div class='config-entry '>...</div>
<div class='config-entry '>...</div>

我正在使用 xpath //h2[1]/following-sibling::h2[1]/preceding-sibling::div[starts-with(@class,'config-entry')] 喜欢:

categories = root.xpath("//h2")
for i in xrange(len(categories)):
   print "----%s----" % categories[i].text
   contents = root.xpath("//h2[1]/following-sibling::h2[1]/preceding-sibling::div[starts-with(@class,'config-entry')]")
   print len(contents)

此代码仅适用于类别 1。选择类别 1 和 2 之间的所有 div,但稍后会搞砸。我玩过h2[1],将其更改为 0,2,3 但没有具体内容。有什么线索吗?

【问题讨论】:

    标签: python html xpath


    【解决方案1】:

    我建议使用 h2 标签和 div 标签的联合,这将按文档顺序返回它们,然后在您处理它们时,每个 div“属于”您看到的最后一个 h2 .

    例如

    '//h2|//div[contains(@class,"config-entry")]'
    

    工作示例:

    from lxml import etree
    
    doc = etree.HTML("""
    <html>
    <h2>category 1</h2>
    <div class='clear10'></div>
    <div class='config-entry selected-block'>...</div>
    <div class='config-entry '>...</div>
    <div class='config-entry '>...</div>
    <div class='config-entry '>...</div>
    <h2>category 2</h2>
    <div class='clear10'></div>
    <div class='config-entry selected-block'>...</div>
    <div class='config-entry '>...</div>
    <div class='config-entry '>...</div>
    <div class='config-entry '>...</div>
    <div class='config-entry '>...</div>
    <h2>category 3</h2>
    <div class='clear10'></div>
    <div class='config-entry selected-block'>...</div>
    <div class='config-entry '>...</div>
    <h2>category 4</h2>
    <div class='clear10'></div>
    <div class='config-entry selected-block'>...</div>
    <div class='config-entry '>...</div>
    <div class='config-entry '>...</div>
    <div class='config-entry '>...</div>
    </html>""")
    
    category = None
    for ele in doc.xpath('//h2|//div[contains(@class,"config-entry")]'):
      if ele.tag == 'h2':
        category = str(ele.text)
      else:
        if category:
          print "%s: %s, %r" % (category,ele.tag,ele.attrib)
    

    产量:

    category 1: div, {'class': 'config-entry selected-block'}
    category 1: div, {'class': 'config-entry '}
    category 1: div, {'class': 'config-entry '}
    category 1: div, {'class': 'config-entry '}
    category 2: div, {'class': 'config-entry selected-block'}
    category 2: div, {'class': 'config-entry '}
    category 2: div, {'class': 'config-entry '}
    category 2: div, {'class': 'config-entry '}
    category 2: div, {'class': 'config-entry '}
    category 3: div, {'class': 'config-entry selected-block'}
    category 3: div, {'class': 'config-entry '}
    category 4: div, {'class': 'config-entry selected-block'}
    category 4: div, {'class': 'config-entry '}
    category 4: div, {'class': 'config-entry '}
    category 4: div, {'class': 'config-entry '}
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2011-01-09
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多