HTML 标签的正则表达式模式答案

【问题标题】：Regex Pattern For HTML TagsHTML 标签的正则表达式模式
【发布时间】：2018-07-30 05:28:05
【问题描述】：

我一直在尝试找到一种模式，可以从下面的 >

<li><a href="/web/20151030182314/https://www.wiki.edu/trees/">Forest Trees Green</a></li>

<span class="field-content">Tress, Design &amp; Plants</span></div> 

<h3><a href="http://web.archive.org/web/20151030182501/http://www.latimes.com">Trees</>
<div class="tf-text">
        Trees provide oxygen <a
<h4>Trees</h4>
<span class="field-content">Trees everywhere</span>  </div></li>
  </ul></div>    </div>
<h3 class="secondary-feature-headline">Through European Security Initiative, Stanford focuses on changing trees</h3>

有人有什么建议吗？ P.S 我不能使用 BeautifulSoup

【问题讨论】：

标签： python regex python-3.x

【解决方案1】：

您可以使用 BeautifulSoup 提取结果，或使用普通的正则表达式模块进行文本提取，

重新导入数据 = re.findall(r'>.*?', '').replace('

上述文本的输出如下：

森林树木绿色树木、设计和植物树木树木到处都是树通过欧洲安全倡议，斯坦福专注于改变树木

【讨论】：