【发布时间】:2014-07-06 06:31:47
【问题描述】:
我一直在尝试解决这个问题,但我设法做到这一点的唯一方法是使用复杂的 while 循环。
我想输入以下内容:
"<td colspan='2' class='ToEx'>This is a test (<i> to see </i> this works) and I really hope it does</td>"
并输出:
"This is a test (to see if this works) and I really hope it does"
本质上,我想 删除 带有“”的所有内容以及介于两者之间的所有内容。我可以用几个命令做的最好的事情是:
"This is a test (<i> to see </i> this works) and I really hope it does"
但我只剩下这些讨厌的家伙了:<i></i>
这是我的代码:
from bs4 import BeautifulSoup
text = "<td colspan='2' class='ToEx'>This is a test (<i> to see </i> this works) and I really hope it does</td>"
soup = BeautifulSoup(text)
content = soup.find_all("td","ToEx")
content[0].renderContents()
【问题讨论】:
标签: python python-3.x web-scraping beautifulsoup