【问题标题】:How to unescape tags without unescaping content如何在不转义内容的情况下转义标签
【发布时间】:2013-12-30 08:45:58
【问题描述】:

我怎样才能只转义标签而不转义内容?让我举例说明...

这是原始的原始响应:

<GetWhoISResponse xmlns="http://www.webservicex.net">
         <GetWhoISResult>Whois Server Version 2.0

To single out one record, look it up with "xxx", where xxx is one of the
of the records displayed above. If the records are the same, look them up
with "=xxx" to receive a full display for each record.

>>> Last update of whois database: Mon, 30 Dec 2013 08:20:00 UTC <<<

NOTICE: The expiration date displayed in this record is the date the 
registrar's sponsorship of the domain name registration in the registry is 
currently set to expire. This date does not necessarily reflect the expiration 
date of the domain name registrant's agreement with the sponsoring 
registrar.  Users may consult the sponsoring registrar's Whois database to 
view the registrar's reported date of expiration for this registration.

</GetWhoISResult>
      </GetWhoISResponse>

如果我使用 StringEscapeUtils 和 unescape 文本 (unescapeXml):

<GetWhoISResponse xmlns="http://www.webservicex.net">
    <GetWhoISResult>Whois Server Version 2.0

To single out one record, look it up with "xxx", where xxx is one of the
of the records displayed above. If the records are the same, look them up
with "=xxx" to receive a full display for each record.

>>> Last update of whois database: Mon, 30 Dec 2013 08:20:00 UTC <<<

NOTICE: The expiration date displayed in this record is the date the 
registrar's sponsorship of the domain name registration in the registry is 
currently set to expire. This date does not necessarily reflect the expiration 
date of the domain name registrant's agreement with the sponsoring 
registrar.  Users may consult the sponsoring registrar's Whois database to 
view the registrar's reported date of expiration for this registration.

    </GetWhoISResult>
</GetWhoISResponse>

问题出在中间,在&lt;&gt; 被转义的那一行。我需要这个,因为我想把它转换成 JSON,但是现在我得到了解析错误。

【问题讨论】:

  • 在检索到的 XMLish 数据的所有版本中 >> 是否保持不变,或者您是否正在寻找内容中特殊字符的通用解决方案?
  • 我正在寻找一个通用的解决方案,因为我不知道会有什么反应。这只是我遇到的第一个问题:)
  • 这是一个有趣的问题:)

标签: java xml json escaping


【解决方案1】:

这是一个有趣的问题,我尝试使用宽容的 xml 解析器,但它们似乎无法解析损坏的 xml。下一个最好的选择是正则表达式,我设法通过它解析给定的 xml,但需要注意的是,较小和较大的符号不应形成标签的模式,例如:

< some random text here and >

经过一些研究,我最终确定了给定 xml 的 2 个正则表达式模式(也可以用于通用格式):

public static final String LESSER_STRING = "<(.[^>]*)(<)+";
public static final String GREATER_STRING = ">[^<](.[^<]*)(>)+";

这些字符串用于建立匹配器扫描序列的正则表达式模式。

这是带有输出的工作代码:

public static final String LESSER_STRING = "<(.[^>]*)(<)+";
    public static final String GREATER_STRING = ">[^<](.[^<]*)(>)+";
    public static final String ESCAPED_XML = "&lt;GetWhoISResponse xmlns=&quot;http://www.webservicex.net&quot;&gt;&lt;GetWhoISResult&gt;Whois Server Version 2.0 To single out one record, look it up with &quot;xxx&quot;, where xxx is one of the of the records displayed above. If the records are the same, look them up with &quot;=xxx&quot; to receive a full display for each record. &gt;&gt;&gt; Last update of whois database: Mon, 30 Dec 2013 08:20:00 UTC &lt;&lt;&lt; NOTICE: The expiration date displayed in this record is the date the registrar&apos;s sponsorship of the domain name registration in the registry is currently set to expire. This date does not necessarily reflect the expiration date of the domain name registrant&apos;s agreement with the sponsoring registrar.  Users may consult the sponsoring registrar&apos;s Whois database to view the registrar&apos;s reported date of expiration for this registration.&lt;/GetWhoISResult&gt;&lt;/GetWhoISResponse&gt;";
    private static Matcher matcher;
    private static Pattern pattern;
    private static String alter;
    private static StringBuffer str = new StringBuffer();
    private static StringBuffer jsonString = new StringBuffer();

    public static void main(String[] args) {
        String xml = StringEscapeUtils.unescapeXml(ESCAPED_XML);

        pattern = Pattern.compile(GREATER_STRING);
        matcher = pattern.matcher(xml);

        while (matcher.find()) {
            System.out.println(matcher.group(0));
            System.out.println(matcher.group(0).substring(1));

            // Find the first encountered greater than sing assuming greater
            // than and less than do not form a 'tag' pattern

            // Picks the first value after the 'last opened tag' including the
            // greater sign - take substring 1
            alter = ">" + matcher.group(0).substring(1).replaceAll(">", "&gt;");
            matcher.appendReplacement(str, alter);
        }

        matcher.appendTail(str);

        pattern = Pattern.compile(LESSER_STRING);
        matcher = pattern.matcher(str);

        while (matcher.find()) {
            System.out.println(matcher.group(0));
            System.out.println(matcher.group(0).substring(0,
                    matcher.group(0).length() - 1));

            // Find the encountered lesser than sign assuming greater
            // than and less than do not form a 'tag' pattern

            // Picks the content between the lesser tags and the last opened
            // tag; including the lesser sign of the tag
            // Reduce it by 1 to prevent the last tag getting replaced
            alter = matcher.group(0)
                    .substring(0, matcher.group(0).length() - 1);

            // Add the last tag as is without replacing
            alter = alter.replaceAll("<", "&lt;") + "<";
            matcher.appendReplacement(jsonString, alter);

        }

        matcher.appendTail(jsonString);

        System.out.println(jsonString);
    }

输出:

<GetWhoISResponse xmlns="http://www.webservicex.net"><GetWhoISResult>Whois Server Version 2.0 To single out one record, look it up with "xxx", where xxx is one of the of the records displayed above. If the records are the same, look them up with "=xxx" to receive a full display for each record. &gt;&gt;&gt; Last update of whois database: Mon, 30 Dec 2013 08:20:00 UTC &lt;&lt;&lt; NOTICE: The expiration date displayed in this record is the date the registrar's sponsorship of the domain name registration in the registry is currently set to expire. This date does not necessarily reflect the expiration date of the domain name registrant's agreement with the sponsoring registrar.  Users may consult the sponsoring registrar's Whois database to view the registrar's reported date of expiration for this registration.</GetWhoISResult></GetWhoISResponse>

【讨论】:

  • 顶级解决方案! :)
【解决方案2】:

您可以阅读内容并再次替换“”

【讨论】:

  • 您如何再次自动替换它们?你的答案不完整。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2020-11-20
  • 2013-01-24
  • 1970-01-01
  • 2021-02-10
  • 1970-01-01
  • 2012-03-06
相关资源
最近更新 更多