以有效的方式从 Java Hashmap 值中删除 Html 标记答案

【问题标题】：Removing Html tag from Java Hashmap value with efficient way以有效的方式从 Java Hashmap 值中删除 Html 标记
【发布时间】：2021-11-10 19:18:38
【问题描述】：

以高效的方式从 Java Hashmap 值中删除 Html 标记

main(String str[]){
 HashMap<String, String> hm = new HashMap<>();
hm.put("A", "Apple");
hm.put("B", "<b>Ball</b>");
hm.put("C", "Cat");
hm.put("D", "Dog");
hm.put("E", "<h1>Elephant</h1>");
}



 // we have to remove only html tags which have like B = <b>Ball</b> so the B = Ball
   // and E = <h1>Elephant</h1> should be E =Elephant

【问题讨论】：

标签： java hashmap iterator coding-efficiency

【解决方案1】：

import java.util.HashMap;
import java.util.stream.Collectors;
import java.util.Map;
public class MyClass {
    public static void main(String args[]) {
        HashMap<String, String> hm = new HashMap<>();
        hm.put("A", "Apple");
        hm.put("B", "<b>Ball</b>");
        hm.put("C", "Cat");
        hm.put("D", "Dog");
        hm.put("E", "<h1>Elephant</h1>");
        
        Map<String, String> newHm = hm.entrySet().
        stream()
        .collect(Collectors.toMap(Map.Entry::getKey, e -> e.getValue().replaceAll("\\<[^>]*>","")));
        
        
        System.out.println(newHm);
    }
}

【讨论】：

【解决方案2】：

有一个方法Map::replaceAll 接受一个替换值的函数。

在这种情况下，可以使用正则表达式和方法 String::replaceAll 从值中删除 HTML 标记：

hm.replaceAll((k, v) -> v.replaceAll("(\\<\\w+\\>)(.*)(\\</\\w+\\>)", "$2"));

System.out.println(hm);

输出显示 Apple 和 Elephant 值已从 HTML 标记中清除：

{A=Apple, B=Ball, C=Cat, D=Dog, E=Elephant}

正则表达式："(\\<\\w+\\>)(.*)(\\</\\w+\\>)" 查找包含打开 (\\<\\w+\\>) 和关闭 (\\</\\w+\\>) 标记以及它们之间的任何文本 (.*) 的序列。

【讨论】：

【解决方案3】：

    @Test
    public void test1() {
        final Map<String, String> hm = new HashMap<>();
        hm.put("A", "Apple");
        hm.put("B", "<b>Ball</b>");
        hm.put("C", "Cat");
        hm.put("D", "Dog");
        hm.put("E", "<h1>Elephant</h1>");

        hm.entrySet().stream()
                .forEach(entry -> entry.setValue(entry.getValue().replaceAll("</.*>", "").replaceAll("<.*>", "")));
        assertEquals("Ball", hm.get("B"));
        assertEquals("Elephant", hm.get("E"));
    }

一定要先替换结束标签。

这也适用于多个标签（即<hi><b>Elephant</b></h1>

【讨论】：

【解决方案4】：

您可以通过多种方式做到这一点。最简单的两个是使用：

正则表达式 - 匹配 html 标记并将其从代码中删除

private static String removeHtmlTags(String input) {
    return input.replaceAll("<.*?>", "");
}

使用 Jsoup 等外部库将字符串解析为 HTML 标签并打印内容。缺点是您必须将其添加到您的 pom.xml。

<dependency>
    <groupId>org.jsoup</groupId>
    <artifactId>jsoup</artifactId>
    <version>1.14.2</version>
</dependency>

private static String removeHtmlTagsUsingParser(String input) {
    Document document = Jsoup.parse(input);
    return document.text();
}

【讨论】：