使用 sed 命令取消注释 xml 块答案

【问题标题】：Un-commenting xml block using sed command使用 sed 命令取消注释 xml 块
【发布时间】：2014-06-08 07:31:14
【问题描述】：

我有一个 xml 文件，其中包含许多注释的元素。从所有这些元素中，我想使用 sed 命令取消注释一个元素。

我的 xml 文件为：

<!-- This is the sample xml
    which holds the data of the students -->
<Students>
    <!-- <student>
        <name>john</>
        <id>123</id>
    </student> -->
    <student>
        <name>mike</name>
        <id>234</id>
    </student>
    <!-- <student>
        <name>NewName</name>
        <id>NewID</id>
    </student> -->
</Students>

在上面的 xml 文件中，我想取消注释最后一个 xml 块，所以我的文件看起来像

<!-- This is the sample xml
    which holds the data of the students -->
<Students>
    <!-- <student>
        <name>john</>
        <id>123</id>
    </student> -->
    <student>
        <name>mike</name>
        <id>234</id>
    </student>
    <student>
        <name>NewName</name>
        <id>NewID</id>
    </student> 
</Students>

我执行了 sed 命令，但不知道如何从最后一个块中删除 。是否可以将带有 <name> 的 xml 块取消注释为 NewName ？除了删除整行之外，我没有发现任何东西。

编辑：除了<name> 和<id> 之外，我可以有许多xml 元素，例如<address>, <city>, <class>,<marks>。

【问题讨论】：

在您的输入中，只有第一个 student 被评论包围。第三个不是。实际上，输入的 XML 无效。
我已经更新了这个问题。我错了，我在这里打错了。但问题依然存在
我怀疑你想做的事情是可能的。解析 XML 很困难。
那么还有其他方法可以使用 awk 什么的吗？但我只想使用 shell 脚本

标签： xml bash sed

【解决方案1】：

不要使用sed。使用xsltproc。

<!-- uncomment.xsl -->
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <!-- this copies every input node unchanged -->
  <xsl:template match="node() | @*">
    <xsl:copy>
      <xsl:apply-templates select="node() | @*" />
    </xsl:copy>
  </xsl:template>

  <!-- this uncomments every comment that starts with a `<` -->
  <xsl:template match="comment()[substring(normalize-space(), 1, 1) = '&lt;']">
    <xsl:value-of select="." disable-output-escaping="yes" />
  </xsl:template>
</xsl:stylesheet>

在命令行上：

xsltproc -o output.xml uncomment.xsl input.xml

如果它工作正常，你会为你的输入 XML 得到这个：

<!-- This is the sample xml
    which holds the data of the students -->
<Students>
    <student>
        <name>john</name>
        <id>123</id>
    </student>
    <student>
        <name>mike</name>
        <id>234</id>
    </student>
    <student>
        <name>NewName</name>
        <id>NewID</id>
    </student>
</Students>

【讨论】：

【解决方案2】：

这可能对你有用（GNU sed）：

sed -r '/<Students>/,/<\/Students>/{/<Students>/{h;d};H;/<\/Students>/!d;g;s/(.*)<!-- (.*) -->(.*)/\1\2\3/}' file

这会将Students 数据存储在保持空间中，然后使用贪婪查找 的最后一次出现，并在打印数据之前将其删除。

【讨论】：

这里我只想删除最后一个将作为 NewName 的学生元素。学生是父标签，学生是这里的子元素。所以我对你在模式匹配中的命令感到困惑
@Optimus 存储多行，然后删除最后的开始/结束注释标签。
谢谢@potong，你能告诉我为什么需要 \1\2\3 吗？
@Optimus 这些是back references。