【问题标题】:Flatten child elements nested within text nodes展平嵌套在文本节点中的子元素
【发布时间】:2015-01-03 17:02:55
【问题描述】:

这里有许多扁平化的问题,但没有一个涉及这种复杂程度。

我有一个看起来像这样的 xml 文档:

<document>
<div class='target-one'>
    maybe some text node, maybe not...1
    <randomElement>
        maybe some text node, maybe not...2
    </randomElement>

    <div class='target-one'>
        <randomElement>
            maybe some text node, maybe not...3
        </randomElement>
    </div>
    maybe some text node, maybe not...4
    <randomElement>
        maybe some text node, maybe not...5
    </randomElement>

    <div class='target-two'>
        maybe some text node, maybe not...6
        <randomElement>
            maybe some text node, maybe not...7
        </randomElement>
    </div>
    maybe some text node, maybe not...8
    <randomElement>
        maybe some text node, maybe not...9
    </randomElement>
</div>
<div class='target-two'>
    maybe some text node, maybe not...10
    <randomElement>
        maybe some text node, maybe not...11
    </randomElement>

    <div class='target-one'>
        <randomElement>
            maybe some text node, maybe not...12
        </randomElement>
    </div>
    maybe some text node, maybe not...13
    <randomElement>
        maybe some text node, maybe not...14
    </randomElement>

    <div class='target-two'>
        maybe some text node, maybe not...15
        <randomElement>
            maybe some text node, maybe not...16
        </randomElement>
    </div>
    maybe some text node, maybe not...17
    <randomElement>
        maybe some text node, maybe not...18
    </randomElement>
</div>

</document>

所以有一个可以按任意顺序嵌套的目标元素列表。我想在嵌套它们时通过添加更多父元素以分别包装 randomElement 和节点来将它们展平,同时使目标子元素成为目标兄弟姐妹。我的意思是输出应该是这样的:

<document>
<div class='target-one'>
    maybe some text node, maybe not...1
    <randomElement>
        maybe some text node, maybe not...2
    </randomElement>
</div>
<div class='target-one'>
    <randomElement>
        maybe some text node, maybe not...3
    </randomElement>
</div>
<div class='target-one'>
    maybe some text node, maybe not...4
    <randomElement>
        maybe some text node, maybe not...5
    </randomElement>
</div>
<div class='target-two'>
    maybe some text node, maybe not...6
    <randomElement>
        maybe some text node, maybe not...7
    </randomElement>
</div>
<div class='target-one'>
    maybe some text node, maybe not...8
    <randomElement>
        maybe some text node, maybe not...9
    </randomElement>
</div>
<div class='target-two'>
    maybe some text node, maybe not...10
    <randomElement>
        maybe some text node, maybe not...11
    </randomElement>
</div>
<div class='target-one'>
    <randomElement>
        maybe some text node, maybe not...12
    </randomElement>
</div>
<div class='target-two'>
    maybe some text node, maybe not...13
    <randomElement>
        maybe some text node, maybe not...14
    </randomElement>
</div>
<div class='target-two'>
    maybe some text node, maybe not...15
    <randomElement>
        maybe some text node, maybe not...16
    </randomElement>
</div>
<div class='target-two'>
    maybe some text node, maybe not...17
    <randomElement>
        maybe some text node, maybe not...18
    </randomElement>
</div>

</document>

所以我最终得到了更多的父 div,但所有文本和其他节点都在正确的位置。请注意,randomElement 可能是一个不是目标类的 div...

这是为了重新格式化电子书以在在线图书馆中分页,因此在我们真正遇到问题 div 之前可能存在大量元素。因此,我们需要一些方法来选择问题子 div 之间的所有元素和文本节点作为一个组,因为如果它们都被包裹在自己的 div 中,那就没有用了——我们最终会得到每个 p、em 或 span 为自己的页面。

同时,大多数父 div 没有问题子级。只要解决方案通过它们,我就可以通过另一次运行清理任何空 div,但我确实需要它至少在基本级别上处理没有子元素的文本。

这是我在 StackOverflow 上的第一个问题,因为我没有得到必要的递归。

谢谢!

根据用户 52889 的回答进行编辑。这从来没有成功,但我把它留在这里是为了便于阅读:

我可以在撒克逊语中触发的 XSL:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"version="2.0">
<xsl:output method="html"
        indent="yes"
        encoding="utf-8"/>
<xsl:strip-space elements="*"/>
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>
<xsl:template match="/"> 
    <xsl:apply-templates />  
</xsl:template>
<xsl:template match="div[matches(@class,'target-one|target-two','i')]">
    <xsl:for-each select="node()">
        <xsl:choose>
            <xsl:when test="self::*[matches(@class,'target-one|target-two','i')]">
                <xsl:apply-templates select="."/>
            </xsl:when>
            <xsl:when test="preceding-sibling::node()[0][not(self::*[matches(@class,'target-one|target-two','i')])]">
                <!-- do nothing, it will be handled by the next case -->
            </xsl:when>
            <xsl:otherwise>
                <!--
      create a copy of the element matched by the template, with its attrs
      add to it the current node and all nodes which follow it, up to the next SIGNIFICANT node
      or, put another way, all following siblings which either
      a) do not have a preceding signficant node, or
      b) whose nearest preceding singificant node is the same as the nearest preceding significant node of the current node, i.e. its following sibling node is the current node.
    -->
                <xsl:element name="{../name()}">
                    <xsl:apply-templates select="../@*"/>
                    <xsl:apply-templates select="following-sibling::node()[
          not(preceding-sibling::*[matches(@class,'target-one|target-two','i')])
          or 
          count(preceding-sibling::*[matches(@class,'target-one|target-two','i')][0]/following-sibling::node()[0] | current()) = 1
        ]" />
                </xsl:element>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:for-each>
</xsl:template>
</xsl:stylesheet>

此文件的当前输出(包含子项和重复项):

<document>
<div class="target-one">
    <randomElement>
        maybe some text node, maybe not...2

    </randomElement>
    <div class="target-one"></div>
    maybe some text node, maybe not...4

    <randomElement>
        maybe some text node, maybe not...5

    </randomElement>
    <div class="target-two">
        <randomElement>
            maybe some text node, maybe not...7

        </randomElement>
    </div>
    <div class="target-two"></div>
    maybe some text node, maybe not...8

    <randomElement>
        maybe some text node, maybe not...9

    </randomElement>
</div>
<div class="target-one">
    <div class="target-one"></div>
    maybe some text node, maybe not...4

    <randomElement>
        maybe some text node, maybe not...5

    </randomElement>
    <div class="target-two">
        <randomElement>
            maybe some text node, maybe not...7

        </randomElement>
    </div>
    <div class="target-two"></div>
    maybe some text node, maybe not...8

    <randomElement>
        maybe some text node, maybe not...9

    </randomElement>
</div>
<div class="target-one"></div>
<div class="target-one">
    <randomElement>
        maybe some text node, maybe not...5

    </randomElement>
    <div class="target-two">
        <randomElement>
            maybe some text node, maybe not...7

        </randomElement>
    </div>
    <div class="target-two"></div>
    maybe some text node, maybe not...8

    <randomElement>
        maybe some text node, maybe not...9

    </randomElement>
</div>
<div class="target-one">
    <div class="target-two">
        <randomElement>
            maybe some text node, maybe not...7

        </randomElement>
    </div>
    <div class="target-two"></div>
    maybe some text node, maybe not...8

    <randomElement>
        maybe some text node, maybe not...9

    </randomElement>
</div>
<div class="target-two">
    <randomElement>
        maybe some text node, maybe not...7

    </randomElement>
</div>
<div class="target-two"></div>
<div class="target-one">
    <randomElement>
        maybe some text node, maybe not...9

    </randomElement>
</div>
<div class="target-one"></div>
<div class="target-two">
    <randomElement>
        maybe some text node, maybe not...11

    </randomElement>
    <div class="target-one"></div>
    maybe some text node, maybe not...13

    <randomElement>
        maybe some text node, maybe not...14

    </randomElement>
    <div class="target-two">
        <randomElement>
            maybe some text node, maybe not...16

        </randomElement>
    </div>
    <div class="target-two"></div>
    maybe some text node, maybe not...17

    <randomElement>
        maybe some text node, maybe not...18

    </randomElement>
</div>
<div class="target-two">
    <div class="target-one"></div>
    maybe some text node, maybe not...13

    <randomElement>
        maybe some text node, maybe not...14

    </randomElement>
    <div class="target-two">
        <randomElement>
            maybe some text node, maybe not...16

        </randomElement>
    </div>
    <div class="target-two"></div>
    maybe some text node, maybe not...17

    <randomElement>
        maybe some text node, maybe not...18

    </randomElement>
</div>
<div class="target-one"></div>
<div class="target-two">
    <randomElement>
        maybe some text node, maybe not...14

    </randomElement>
    <div class="target-two">
        <randomElement>
            maybe some text node, maybe not...16

        </randomElement>
    </div>
    <div class="target-two"></div>
    maybe some text node, maybe not...17

    <randomElement>
        maybe some text node, maybe not...18

    </randomElement>
</div>
<div class="target-two">
    <div class="target-two">
        <randomElement>
            maybe some text node, maybe not...16

        </randomElement>
    </div>
    <div class="target-two"></div>
    maybe some text node, maybe not...17

    <randomElement>
        maybe some text node, maybe not...18

    </randomElement>
</div>
<div class="target-two">
    <randomElement>
        maybe some text node, maybe not...16

    </randomElement>
</div>
<div class="target-two"></div>
<div class="target-two">
    <randomElement>
        maybe some text node, maybe not...18

    </randomElement>
</div>
<div class="target-two"></div>
</document>

【问题讨论】:

    标签: xml xslt xslt-2.0


    【解决方案1】:

    试图将其视为我提出的分组问题

    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
    
    <xsl:param name="prefix" select="'target-'"/>
    
    <xsl:output indent="yes"/>
    
    <xsl:template match="document">
      <xsl:copy>
        <xsl:for-each-group select="descendant::text()[normalize-space()]"
          group-adjacent="generate-id(ancestor::div[starts-with(@class, $prefix)][1])">
          <xsl:apply-templates select="ancestor::div[starts-with(@class, $prefix)][1]" mode="g">
            <xsl:with-param name="group" select="current-group()"/>
          </xsl:apply-templates>
        </xsl:for-each-group>
      </xsl:copy>
    </xsl:template>
    
    <xsl:template match="*" mode="g">
      <xsl:param name="group"/>
      <xsl:if test=". intersect $group/ancestor::*">
        <xsl:copy>
          <xsl:copy-of select="@*"/>
          <xsl:apply-templates select="node()" mode="g">
            <xsl:with-param name="group" select="$group"/>
          </xsl:apply-templates>
        </xsl:copy>
      </xsl:if>
    </xsl:template>
    
    <xsl:template match="text()" mode="g">
      <xsl:param name="group"/>
      <xsl:if test=". intersect $group">
        <xsl:copy/>
      </xsl:if>
    </xsl:template>
    
    </xsl:stylesheet>
    

    这基本上将任何非空白文本节点的后代按最近的祖先 div 与您正在寻找的 class 分组,然后使用所有分组的文本节点重新创建祖先中包含的子树。

    【讨论】:

    • 即使 div 之间有随机数量的标签和节点,也能 100% 工作。甚至适用于多层嵌套。接受的答案。
    【解决方案2】:

    很难理解您的示例中什么是规则,什么只是示例。以下样式表将产生所需的结果 - 也许这就是您要寻找的。如果不是,请编辑您的问题并解释所请求转换背后的逻辑

    XSLT 2.0(或 1.0)

    <xsl:stylesheet version="2.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
    <xsl:strip-space elements="*"/>
    
    <xsl:template match="/document">
        <document>
            <xsl:for-each select="//randomElement">
                <div class='{../@class}'>
                    <xsl:copy-of select=". | preceding-sibling::text()[1]"/>
                </div>
            </xsl:for-each>
        </document>
    </xsl:template>
    
    </xsl:stylesheet>
    

    【讨论】:

    • 有趣的解决方案。我永远不会想到会输出兄弟姐妹!我在“谢谢!”之前更新了更多上下文。在我的问题中,因为不可能以这种方式定位每个 randomElement...非常感谢!
    • 恐怕你还不清楚。如果“不可能以这种方式定位每个随机元素”,那么如何定位它们?我们只有你的榜样。
    • 你不能因为内容是随机的。这就是为什么它是一个困难的案例。这样做的唯一方法是:遍历 dom 直到你击中一个目标 div,选择它之前的所有内容,将所有内容包装在一个新的 div 中,输出找到的 div,然后检查是否有更多节点,冲洗并重复,直到一切都处理完毕。对不起这个例子,但我可以在这里包含的文本数量是有限的。目标 div 是章节,而文档中有一百多个其他标签。我们必须针对特定的 div 而不是数百个随机元素。
    • 没有内容是随机的(如果是,您无法编写算法来处理它)。
    【解决方案3】:

    听起来您想要类似以下内容,其中 SIGNIFICANT 是一些表达式,描述了所有这些元素,并且仅描述了您希望成为新列表项的那些元素(例如 div[substring(@class,1,6)='target'] 之类的东西)...

    <xsl:template match="SIGNIFICANT">
      <xsl:for-each select="node()">
        <xsl:choose>
          <xsl:when test="self::SIGNIFICANT">
            <xsl:apply-templates select="."/>
          </xsl:when>
          <xsl:when test="preceding-sibling::node()[0][not(self::SIGNIFICANT)]">
            <!-- do nothing, it will be handled by the next case -->
          </xsl:when>
          <xsl:otherwise>
            <!--
              create a copy of the element matched by the template, with its attrs
              add to it the current node and all nodes which follow it, up to the next SIGNIFICANT node
              or, put another way, all following siblings which either
              a) do not have a preceding signficant node, or
              b) whose nearest preceding singificant node is the same as the nearest preceding significant node of the current node, i.e. its following sibling node is the current node.
            -->
            <xsl:element name="../name()">
              <xsl:apply-templates select="../@*"/>
              <xsl:apply-templates select="following-sibling::node()[
                  not(preceding-sibling::SIGNIFICANT)
                  or 
                  count(preceding-sibling::SIGNIFICANT[0]/following-sibling::node()[0] | current()) = 1
                ]">
            </xsl:element>
          </xsl:otherwise>
      </xsl:for-each>
    </xsl:template>
    

    注意:这意味着没有子节点的顶级div 将被完全删除。如果你不想要这种行为,你可以简单地包含一个选择/何时。

    另请注意:对于极长的列表,可能有一种性能更高的递归方式。

    【讨论】:

    • 我发现了一些拼写错误(未封闭的选择标签,似乎需要在 self 之后添加 * 等等)。我确实设法在上面的 xml 上运行它,但它并不完全有效。我仍然有儿童 div,我不太清楚为什么,但有重复。东西没有被包裹在新标签中,它们只是被插入空的。我不知道如何在 cmets 中包含所有这些代码,所以我将在上面编辑附加代码...
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2014-05-26
    • 1970-01-01
    • 2019-03-05
    • 2020-10-04
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多