【问题标题】:Why is my XSLT here stripping HTML tags为什么我的 XSLT 在这里剥离 HTML 标签
【发布时间】:2014-01-24 21:12:27
【问题描述】:

我正在使用 XSLT 1.0 将一些 XML 转换为 JSON 输出。不幸的是,我正在使用的一些 XML 中包含 HTML 标记。下面是一些 XML 输入的示例:

 <text>
 Kevin Love and Steph Curry can talk about their first-
 time starting gigs in the All-Star game Friday night when the Minnesota
 Timberwolves visit Oracle Arena to face the Golden State Warriors.
</text>
  <continue>
    <P>
 Love and Curry were two of four first-time All-Star starters when the league
 made the announcement on Thursday.
</P>
    <P>
 Love got a late push to overtake Houston Rockets center Dwight Howard in the
 final week of voting.
</P>
    <P>
 "I think it's a little sweeter this way because I really didn't expect it,"
 Love said on a conference call. "I was already humbled by the response the
 fans gave me to being very close to the top (frontcourt players). The outreach
 by the Minnesota fans and beyond was truly amazing."
</P>
</continue>

标记并不理想,我需要在我的 JSON 输出中保留 &lt;P&gt; 标记。为了处理报价,我逃避它们。这是我处理此问题的模板:

<xsl:variable name="escaped-continue">
      <xsl:call-template name="replace-string">
        <xsl:with-param name="text" select="continue"/>
        <xsl:with-param name="replace" select="'&quot;'" />
        <xsl:with-param name="with" select="'\&quot;'"/>
      </xsl:call-template>
    </xsl:variable>
     <xsl:variable name="escaped-text">
      <xsl:call-template name="replace-string">
        <xsl:with-param name="text" select="text"/>
        <xsl:with-param name="replace" select="'&quot;'" />
        <xsl:with-param name="with" select="'\&quot;'"/>
      </xsl:call-template>
    </xsl:variable>
 <xsl:template name="replace-string">
        <xsl:param name="text"/>
        <xsl:param name="replace"/>
        <xsl:param name="with"/>
        <xsl:choose>
            <xsl:when test="contains($text,$replace)">
                <xsl:value-of select="substring-before($text,$replace)"/>
                <xsl:value-of select="$with"/>
                <xsl:call-template name="replace-string">
                    <xsl:with-param name="text"
                        select="substring-after($text,$replace)"/>
                    <xsl:with-param name="replace" select="$replace"/>
                    <xsl:with-param name="with" select="$with"/>
                </xsl:call-template>
            </xsl:when>
            <xsl:otherwise>
                <xsl:value-of select="$text"/>
            </xsl:otherwise>
        </xsl:choose>
   </xsl:template>

然后我只需使用类似以下的内容来输出 JSON:

{
    "text": "<xsl:value-of select="normalize-space($escaped-text)"/>", 
    "continue": "<xsl:value-of select="normalize-space($escaped-continue)"/>"
}

我在这里遇到的问题是输出看起来像这样:

{
 "text": "Kevin Love and Steph Curry can talk about their first- time starting gigs in the All-Star game Friday night when the Minnesota Timberwolves visit Oracle Arena to face the Golden State Warriors.", 
  "continue": "Love and Curry were two of four first-time All-Star starters when the league made the announcement on Thursday. Love got a late push to overtake Houston Rockets center Dwight Howard in the final week of voting. \"I think it's a little sweeter this way because I really didn't expect it,\" Love said on a conference call. \"I was already humbled by the response the fans gave me to being very close to the top (frontcourt players). The outreach by the Minnesota fans and beyond was truly amazing.\"
}

如您所见,双引号已正确转义,但 &lt;P&gt; 标记已被 XSLT 解析器直接剥离和/或解析,然后被 normalize-space() 抑制。在此处将&lt;P&gt; 标签重新添加到我的输出中的最佳方法是什么?

【问题讨论】:

  • 实际上,我认为这里的问题是您首先将文本提取到您的escaped-text 中。如果您想要其中的元素,您需要的不仅仅是文本节点。

标签: html xml json xslt


【解决方案1】:

试试这个方法:

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
<xsl:output method="xml" encoding="utf-8" omit-xml-declaration="yes" />

<xsl:template match="/root">
    <xsl:text>{&#10;"text": "</xsl:text>
    <xsl:apply-templates select="text/text()"/>
    <xsl:text>"&#10;"continue": "</xsl:text>
    <xsl:apply-templates select="continue/*"/>
    <xsl:text>"&#10;}</xsl:text>
</xsl:template>

<xsl:template match="*">
    <xsl:copy>
        <xsl:apply-templates/>
    </xsl:copy>
</xsl:template>

<xsl:template match="text()">
<xsl:variable name="escaped-text">
    <xsl:call-template name="replace-string">
        <xsl:with-param name="text" select="."/>
        <xsl:with-param name="replace" select="'&quot;'" />
        <xsl:with-param name="with" select="'\&quot;'"/>
    </xsl:call-template>
</xsl:variable>
<xsl:value-of select="normalize-space($escaped-text)"/>
</xsl:template>

<xsl:template name="replace-string">
    <xsl:param name="text"/>
    <xsl:param name="replace"/>
    <xsl:param name="with"/>
    <xsl:choose>
        <xsl:when test="contains($text,$replace)">
            <xsl:value-of select="substring-before($text,$replace)"/>
            <xsl:value-of select="$with"/>
            <xsl:call-template name="replace-string">
                <xsl:with-param name="text"
                    select="substring-after($text,$replace)"/>
                <xsl:with-param name="replace" select="$replace"/>
                <xsl:with-param name="with" select="$with"/>
            </xsl:call-template>
        </xsl:when>
        <xsl:otherwise>
            <xsl:value-of select="$text"/>
        </xsl:otherwise>
    </xsl:choose>
</xsl:template>

</xsl:stylesheet>

应用于您输入的修改版本(添加了根元素和更多标记以进行测试):

<root>
    <text>
    Kevin Love and Steph Curry can talk about their first-
    time starting gigs in the All-Star game Friday night when the Minnesota
    Timberwolves visit Oracle Arena to face the Golden State Warriors.
    </text>
    <continue>
        <P>
        Love and Curry were <i>two of <b>four</b> first-time All-Star</i> starters when the league
        made the announcement on Thursday.
        </P>
        <P>
        Love got a late push to overtake Houston Rockets center Dwight Howard in the
        final week of voting.
        </P>
        <P>
        "I think it's a little sweeter this way because I really didn't expect it,"
        Love said on a conference call. "I was already humbled by the response the
        fans gave me to being very close to the top (frontcourt players). The outreach
        by the Minnesota fans and beyond was truly amazing."
        </P>
    </continue>
</root>

产生以下结果:

{
"text": "Kevin Love and Steph Curry can talk about their first- time starting gigs in the All-Star game Friday night when the Minnesota Timberwolves visit Oracle Arena to face the Golden State Warriors."
"continue": "<P>Love and Curry were<i>two of<b>four</b>first-time All-Star</i>starters when the league made the announcement on Thursday.</P><P>Love got a late push to overtake Houston Rockets center Dwight Howard in the final week of voting.</P><P>\"I think it's a little sweeter this way because I really didn't expect it,\" Love said on a conference call. \"I was already humbled by the response the fans gave me to being very close to the top (frontcourt players). The outreach by the Minnesota fans and beyond was truly amazing.\"</P>"
}

【讨论】:

    【解决方案2】:

    这就是定义 xsl:value-of 的目的。如果要保留标签,请使用 xsl:copy-of。

    【讨论】:

    • 如果要处理内容,我不明白如何使用 xsl:copy-of。
    • 然后使用 xsl:apply-templates 将内容处理为元素树。
    【解决方案3】:

    当您将continue 作为参数传递到escaped-continue 的文本中时,您将在该步骤删除&lt;p&gt; 标记。您可以将 exslt node-sets 与 XSLT 1.0 一起使用并处理 replace-string 模板中的节点,或者重写您的 escaped-continue 以解析节点和文本,并且只为文本节点调用 replace-string

    【讨论】:

    • 和/或将整个事物重组为&lt;xsl:apply-templates/&gt; 和匹配模板,而不是显式调用命名模板。如果您以程序方式编写 XSLT,那么您几乎总是让生活变得比应有的更艰难;学习使用它作为一种规则语言。
    • "当您将 continue 作为参数传递到 escaped-continue 的文本中时,您将在该步骤删除 &lt;p&gt; 标签。" 恐怕您误会了.当传递的参数被处理为字符串时,剥离发生在后期。
    猜你喜欢
    • 2017-05-06
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2013-07-11
    • 1970-01-01
    • 2010-10-24
    • 2016-08-03
    相关资源
    最近更新 更多