您可以使用列表推导轻松收集数据:
In [2]: html = """<div><span class='name'>Andrew</span><span class='value'>42</span></div>
...: <div><span class='name'>Bob</span><span class='value'>128</span></div>"""
In [3]: soup = BeautifulSoup(html)
In [4]: patterns = ['div > span.name', 'div > span.value']
In [5]: data = [[product.text for product in soup.select(pattern)] for pattern in patterns]
In [6]: data
Out[6]: [['Andrew', 'Bob'], ['42', '128']]
但是,此代码仍然为每个选择模式调用单独的 for 循环。如果你想使用一个循环,你应该提供一个文档结构的例子。
对于给定的文档结构,我可以建议另一种解决方案:
In [7]: html = '''<html><body><div id="pagecontent"><div id="container"><div id="content"><div id="tab-description"><div id="attributes">
...: <div class="attr">
...: <span class="name">Ugug</span>
...: <span class="value">dfgd454</span>
...: </div>'''
In [8]: soup = BeautifulSoup(html)
In [9]: attrs = soup.select('div.attr')
In [10]: attrs
Out[10]:
[<div class="attr">
<span class="name">Ugug</span>
<span class="value">dfgd454</span>
</div>]
In [11]: def parse_attr(attr):
....: return {
....: 'name': attr.find(class_='name').text,
....: 'value': attr.find(class_='value').text
....: }
....:
In [12]: list(map(parse_attr, attrs))
Out[12]: [{'name': 'Ugug', 'value': 'dfgd454'}]
您还可以扩展属性的数量。在这种情况下,您可以通过以下方式重写函数parse_attr:
In [25]: def parse_attr(attr):
return {span['class'][0]: span.text for span in attr('span')}
....:
In [26]: list(map(parse_attr, attrs))
Out[26]: [{'name': 'Ugug', 'value': 'dfgd454'}]