【发布时间】:2015-04-10 14:03:45
【问题描述】:
对于某个类名的跨度,我需要解析未命名的 br 元素之间的某些文本。在本例中,我需要 0.36,在本例中,它位于命名属性“DS”之后。
这是我尝试过的。
from bs4 import BeautifulSoup
html="""
<pre5 style="">
<br><br>
<span class="field-name">DS :</span>
0.36 [null]<br><br> <br> <span> <b>FC</b> </span><span> : 0.0 </span><br> <br> <span> <b>FDC</b> </span><span> : 0.36 </span><br> <br> <span> <b>LDD</b> </span><span> : 4838400000 </span><br> <br> <span> <b>IFS</b> </span><span> : 0.5333333 </span><br>
</pre5>
"""
soup = BeautifulSoup(html,'lxml')
divTag = soup.find_all("pre5", {"style":""})
for tag in divTag:
tdTags = tag.find_all("span", {"class":"field-name"})
for tag in tdTags:
print tag.text
# print DS :, but I want 0.36
#Alternatively,
soup = BeautifulSoup(html,'lxml')
print str(soup.span.next_sibling.strip()).replace('[null]','')
#prints 0.36 , but I would like to print by making sure that this element actually comes along with DS: and not just by the "immediate next sibilng" - is there a way to respect the named attribute DS and fetch the value for it ?
也是通过字符串解析/拆分/替换,会比较慢,可以直接用树形结构吗?
编辑,在这种情况下,DS 的值应为 0.007。不能保证 DS 将是 span 类中的第一个元素。
html="""
<pre5 style="">
<br><br>
<span class="field-name">FC :</span>
0.36 [null]<br><br> <br> <span> <b>DS:</b> </span><span> : 0.007 </span><br> <br> <span> <b>FDC</b> </span><span> : 0.36 </span><br> <br> <span> <b>LDD</b> </span><span> : 4838400000 </span><br> <br> <span> <b>IFS</b> </span><span> : 0.5333333 </span><br>
</pre5>
"""
【问题讨论】:
标签: python-2.7 parsing beautifulsoup lxml