python，lxml检索列表中的所有元素答案

【问题标题】：python, lxml retrieving all elements in a listpython，lxml检索列表中的所有元素
【发布时间】：2017-06-10 01:18:21
【问题描述】：

我正在尝试从网站获取列表中的所有元素

来自以下html sn-p：

<ul>
    <li class="name"> James </li>
    <li> Male </li>
    <li> 5'8" </li>
</ul>

我当前的代码使用 xpath 并将名称存储在列表中。有没有办法将所有三个字段作为一个列表？

我的代码：

name = tree.xpath('//li[@class="name"]/text()')

【问题讨论】：

标签： python html web-scraping lxml

【解决方案1】：

import lxml.html as LH
tree = LH.parse('data')
print(tree.xpath('//li[../li[@class="name" and position()=1]]/text()'))

打印

[' James ', ' Male ', ' 5\'8" ']

XPath '//li[../li[@class="name" and position()=1]]/text()' 表示

//li             # all li elements
[                # whose
..               # parent
/                # has a child 
li               # li element
  [              # whose
   @class="name" # class attribute equals "name"
   and           # and 
   position()=1] # which is the first child element
  ]               
  /text()        # return the text of those elements

【讨论】：

【解决方案2】：

from lxml import html

text = '''<ul>
    <li class="name"> James </li>
    <li> Male </li>
    <li> 5'8" </li>
</ul>
<ul>
    <li class="name"> James </li>
    <li> Male </li>
    <li> 5'8" </li>
</ul>
<ul>
    <li class="name"> James </li>
    <li> Male </li>
    <li> 5'8" </li>
</ul>'''

tree = html.fromstring(text)
for ul in tree.xpath('//ul[li[@class="name"]]'):  # loop through the ul tag, whose child tag contains class attribute and the value is 'name'
    print(ul.xpath("li/text()")) # get all the text in the li tag

出来：

[' James ', ' Male ', ' 5\'8" ']
[' James ', ' Male ', ' 5\'8" ']
[' James ', ' Male ', ' 5\'8" ']

【讨论】：