【问题标题】:python, lxml retrieving all elements in a listpython,lxml检索列表中的所有元素
【发布时间】:2017-06-10 01:18:21
【问题描述】:

我正在尝试从网站获取列表中的所有元素

来自以下html sn-p:

<ul>
    <li class="name"> James </li>
    <li> Male </li>
    <li> 5'8" </li>
</ul>

我当前的代码使用 xpath 并将名称存储在列表中。有没有办法将所有三个字段作为一个列表?

我的代码:

name = tree.xpath('//li[@class="name"]/text()')

【问题讨论】:

    标签: python html web-scraping lxml


    【解决方案1】:
    import lxml.html as LH
    tree = LH.parse('data')
    print(tree.xpath('//li[../li[@class="name" and position()=1]]/text()'))
    

    打印

    [' James ', ' Male ', ' 5\'8" ']
    

    XPath '//li[../li[@class="name" and position()=1]]/text()' 表示

    //li             # all li elements
    [                # whose
    ..               # parent
    /                # has a child 
    li               # li element
      [              # whose
       @class="name" # class attribute equals "name"
       and           # and 
       position()=1] # which is the first child element
      ]               
      /text()        # return the text of those elements 
    

    【讨论】:

      【解决方案2】:
      from lxml import html
      
      text = '''<ul>
          <li class="name"> James </li>
          <li> Male </li>
          <li> 5'8" </li>
      </ul>
      <ul>
          <li class="name"> James </li>
          <li> Male </li>
          <li> 5'8" </li>
      </ul>
      <ul>
          <li class="name"> James </li>
          <li> Male </li>
          <li> 5'8" </li>
      </ul>'''
      
      tree = html.fromstring(text)
      for ul in tree.xpath('//ul[li[@class="name"]]'):  # loop through the ul tag, whose child tag contains class attribute and the value is 'name'
          print(ul.xpath("li/text()")) # get all the text in the li tag
      

      出来:

      [' James ', ' Male ', ' 5\'8" ']
      [' James ', ' Male ', ' 5\'8" ']
      [' James ', ' Male ', ' 5\'8" ']
      

      【讨论】:

        猜你喜欢
        • 2021-02-22
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2018-01-21
        • 1970-01-01
        相关资源
        最近更新 更多