【问题标题】:python, lxml retrieving all elements in a listpython,lxml检索列表中的所有元素
【发布时间】:2017-06-10 01:18:21
【问题描述】:
我正在尝试从网站获取列表中的所有元素
来自以下html sn-p:
<ul>
<li class="name"> James </li>
<li> Male </li>
<li> 5'8" </li>
</ul>
我当前的代码使用 xpath 并将名称存储在列表中。有没有办法将所有三个字段作为一个列表?
我的代码:
name = tree.xpath('//li[@class="name"]/text()')
【问题讨论】:
标签:
python
html
web-scraping
lxml
【解决方案1】:
import lxml.html as LH
tree = LH.parse('data')
print(tree.xpath('//li[../li[@class="name" and position()=1]]/text()'))
打印
[' James ', ' Male ', ' 5\'8" ']
XPath '//li[../li[@class="name" and position()=1]]/text()' 表示
//li # all li elements
[ # whose
.. # parent
/ # has a child
li # li element
[ # whose
@class="name" # class attribute equals "name"
and # and
position()=1] # which is the first child element
]
/text() # return the text of those elements
【解决方案2】:
from lxml import html
text = '''<ul>
<li class="name"> James </li>
<li> Male </li>
<li> 5'8" </li>
</ul>
<ul>
<li class="name"> James </li>
<li> Male </li>
<li> 5'8" </li>
</ul>
<ul>
<li class="name"> James </li>
<li> Male </li>
<li> 5'8" </li>
</ul>'''
tree = html.fromstring(text)
for ul in tree.xpath('//ul[li[@class="name"]]'): # loop through the ul tag, whose child tag contains class attribute and the value is 'name'
print(ul.xpath("li/text()")) # get all the text in the li tag
出来:
[' James ', ' Male ', ' 5\'8" ']
[' James ', ' Male ', ' 5\'8" ']
[' James ', ' Male ', ' 5\'8" ']