【发布时间】:2016-03-21 09:10:02
【问题描述】:
请参阅我有以下 HTML 文件。我想删除除
之外的所有标签<A href="MarineMammal.html">marine mammals.</A>
我能够删除所有标签,但不知道如何保留特定标签。我希望能够得到上面标签周围的单词。这些词不应包含标签。谢谢!
<TITLE> Whale </TITLE>
<H2> Whale </H2>
(from Wikipedia)
<p>
Whale is the common name for a widely distributed and diverse group of
fully aquatic placental
<A href="MarineMammal.html">marine mammals.</A>. They are an informal grouping
within the infraorder <A href="Cetacean.html">Cetacea,</A> usually excluding
<A href="Dolphin.html">dolphins</A> and
<A href="Porpoise.html">porpoises.</A>
Whales, dolphins and porpoises belong to the order Cetartiodactyla with
even-toed
<A href="Ungulate.html">ungulates</A> and their
closest living relatives are the
<A href="Hippopotamus.html">hippopotamuses,</A> having
diverged about 40 million years ago.
【问题讨论】:
-
您尝试过什么吗,请显示一些代码,到目前为止您尝试过什么?在 java 中解析 html 有很多很好的答案。这听起来像你想要的:stackoverflow.com/questions/240546/…
-
对不起,我可以在几分钟前发表评论,因为没有为 SO 启用 javascript。
-
如果你解释你想要实现什么,它会更容易帮助你