Jsoup在解析时将unicode实体转换为html实体答案

【问题标题】：Jsoup converting unicode entites to html entities while parsingJsoup在解析时将unicode实体转换为html实体
【发布时间】：2016-07-20 07:22:32
【问题描述】：

我有一个像下面这样的字符串

String input="<div class="prov2Txt">(2)&#x00a0;&#x00a0;Notwithstanding anything in any other written law and notwithstanding the making of an oath or declaration of secrecy, a person shall not be guilty of an offence by reason of anything done by him for the purposes of section&#x00a0;6.</div>";

我正在使用 JSOUP 解析它，然后 Jsoup 将所有 Unicode 实体（&#x00a0）删除到 html 实体。

Document d = Jsoup.parse(input);
d.outputSettings(new Document.OutputSettings().prettyPrint(false));

此代码将 &#x00a0 转换为其等效的 HTML 实体。

现在我想在解析输入字符串后保留所有 unicode 实体。

【问题讨论】：

标签： java jsoup

【解决方案1】：

xhtml 转义模式可能适合你的需要：

d.outputSettings(new Document.OutputSettings().escapeMode(EscapeMode.xhtml).prettyPrint(false));

它将&#x00a0 变成&#xa0;。

【讨论】：

我已经尝试了所有这些。不要为我工作document.outputSettings(new Document.OutputSettings().prettyPrint(false)); document.outputSettings(new Document.OutputSettings()); // Adjust escape mode document.outputSettings().escapeMode();
@vinaykaushik 我没有在答案中看到示例代码。你试过了吗？
是的，我已经尝试过了。
@vinaykaushik 您的 JVM 版本、Jsoup 版本和 Java 语言环境是什么？
java version "1.8.0_91" Java(TM) SE Runtime Environment (build 1.8.0_91-b15) Jsoup 1.9.1 Locale : en-us;English (United States)