【问题标题】:To split html code using beautifulsoup for the required format使用 beautifulsoup 将 html 代码拆分为所需格式
【发布时间】:2023-03-13 02:56:01
【问题描述】:

我有一个 HTML sn-p,如下所示:

<div class="myTestCode">
<strong>Abc: </strong> test1</br>
<strong>Def: </strong> test2</br>
</div>

如何在 Beautiful Soup 中解析得到:

Abc: test1, Def: test2

这是我迄今为止尝试过的:

data = """<div class="myTestCode">
<strong>Abc: </strong> test1</br>
<strong>Def: </strong> test2</br>
</div>"""
temp = BeautifulSoup(data)
link = temp.select('.myTestCode')

#both didn't print the expected output as mentioned above
print str(link).split('<strong>')
print ''.join(link.stripped_strings) 

【问题讨论】:

  • 我也试过 str(link).split('') ''.join(link.stripped_strings) where link = temp.select('.myTestCode') 。 temp = BeautifulSoup(数据)
  • edit 你的帖子和你的代码。

标签: python-2.7 beautifulsoup


【解决方案1】:

一种可能的方法:

from bs4 import BeautifulSoup

data = """<div class="myTestCode">
<strong>Abc: </strong> test1</br>
<strong>Def: </strong> test2</br>
</div>"""
temp = BeautifulSoup(data)

#get individual <strong> elements
strongs = temp.select('.myTestCode > strong')

#map each <strong> element to it's text content concatenated with the text node that follow
result = map(lambda x: x.text + x.nextSibling.strip(), strongs)

#join all separated by comma and print
print ', '.join(result)

#print output:
#Abc: test1, Def: test2

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2012-04-29
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2013-08-09
    • 1970-01-01
    • 2015-10-01
    相关资源
    最近更新 更多