【发布时间】:2014-11-27 09:34:21
【问题描述】:
我有一个页面包含多个重复:<div...><h4>...<p>... 例如:
html = '''
<div class="proletariat">
<h4>sickle</h4>
<p>Ignore this text</p>
</div>
<div class="proletariat">
<h4>hammer</h4>
<p>This is the text we want</p>
</div>
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html)
如果我写print soup.select('div[class^="proletariat"] > h4 ~ p'),我会得到:
[<p>Ignore this text</p>, <p>This is the text we want</p>]
我如何指定我只想要前面有<h4>hammer</h4> 的p 文本?
谢谢
【问题讨论】:
标签: python html css-selectors beautifulsoup