如何在 Beautifulsoup 中跳过特定标签并抓取其他标签的文本答案

【问题标题】：How to skip a particular tag and crawl other tag's text in Beautifulsoup如何在 Beautifulsoup 中跳过特定标签并抓取其他标签的文本
【发布时间】：2014-06-10 05:42:55
【问题描述】：

我正在抓取一个网页，我正在使用 Beautifulsoup。有一种情况，我想跳过一个特定标签的内容并获取其他标签内容。在下面的代码中，我不想要 div 标签内容。但我无法解决这个问题。请帮我。

HTML 代码，

<blockquote class="messagetext">
    <div style="margin: 5px; float: right;">
        unwanted text .....
    </div>
    Text..............
    <a class="externalLink" rel="nofollow" target="_blank" href="#">text </a>
    <a class="externalLink" rel="nofollow" target="_blank" href="#">text</a>
    <a class="externalLink" rel="nofollow" target="_blank" href="#">text</a>
    ,text
</blockquote>

我试过这样，

content = soup.find('blockquote',attrs={'class':'messagetext'}).text

但它也在 div 标签内获取不需要的文本。

【问题讨论】：

标签： python-2.7 beautifulsoup

【解决方案1】：

像这样使用clear 函数：

soup = BeautifulSoup(html_doc)
content = soup.find('blockquote',attrs={'class':'messagetext'})

for tag in content.findChildren():
    if tag.name == 'div':
        tag.clear()

print content.text

这会产生：

Text..............
text 
text
text
   ,text

【讨论】：