使用 Shell 脚本，查找并更新多次出现的 .xml 文件的标记值答案

【问题标题】：Using Shell Script, find and update .xml file's tag values that was present for multiple times使用 Shell 脚本，查找并更新多次出现的 .xml 文件的标记值
【发布时间】：2021-09-02 20:15:36
【问题描述】：

我有一个包含多次用户名和密码的 xml 文件，以及需要动态更改的连接 URL。

<datasources>
  <datasource jndi-name="java:jboss/datasources/TestFlow" pool-name="TestFlow" enabled="true" use-java-context="true" statistics-enabled="${wildfly.datasources.statistics-enabled:$ {wildfly.statistics-enabled:false}}">
    <connection-url>jdbc:h2:mem:test;DB_CLOSE_DELAY=-1;DB_CLOSE_ON_EXIT=FALSE</connection-url>
    <driver>h2</driver>
    <security>
      <user-name>test</user-name>
      <password>test</password>
    </security>
  </datasource>
  <datasource jta="false" jndi-name="java:/AdminDSource" poolname="AdminDSource" enabled="true" use-java-context="true">
    <connection-url>jdbc:oracle:thin:@xxxxxx.xxxxxxx.xxxxxxxx-1.rds.amazonaws.com:xxxx:ORCL</connection-url>
    <driver>oracle</driver>
    <security>
      <user-name>aldo</user-name>
      <password>aldo</password>
    </security>
  </datasource>
</datasources>

在上面我想将第一次出现的连接 URL、用户名和密码更改为一些所需的值

<connection-url>jdbc:h2:mem:test;DB_CLOSE_DELAY=-1;DB_CLOSE_ON_EXIT=FALSE</connection-url>
<user-name>test</user-name>
<password>test</password>

改成

<connection-url>jdbc:h2:mem:test;DB_CLOSE_DELAY=-2;DB_CLOSE_ON_EXIT=FALSE</connection-url>
<user-name>Atom</user-name>
<password>Atom</password>

同样的，第二次出现相同的地方要改变

<connection-url>jdbc:oracle:thin:@{Content after the @ to be changed}</connection-url>
<user-name>{aldo to username}</user-name>
<password>{aldo to password}</password>

我已尝试以下方法来更新用户名和密码，

for filename in *.xml; do
    if grep -q '<driver>h2</driver>' "$filename"; then
            sed -i.bak 's/<user-name>test<\/user-name>/<user-name>Atom<\/user-name>/g'  "$filename"
            
    fi
    if grep -q '<driver>h2</driver>' "$filename"; then
            
            sed -i.bak 's/<password>test<\/password>/<password>Atom<\/password>/g' "$filename"
    fi
    if grep -q '<driver>oracle</driver>' "$filename"; then
            sed -i.bak 's/<user-name>aldo<\/user-name>/<user-name>username<\/user-name>/g' "$filename"
            
    fi
    if grep -q '<driver>oracle</driver>' "$filename"; then
            
            sed -i.bak 's/<password>aldo<\/password>/<password>password<\/password>/g' "$filename"
    fi
done

但我希望有一个脚本来进行所有理想的更改。

【问题讨论】：

您发布的是单个脚本，虽然效率不高。顺便说一句，是否保证在您的 XML 中，开始和结束标记将始终位于同一物理行中？
最快的方法——除非你有 100 多个 xml 文件——是打开你最喜欢的文本编辑器并进行搜索/替换...比编写和调试复杂的脚本要快得多。
查看 xmlstarlet 或 xsltproc

标签： xml bash shell sed xmlstarlet

【解决方案1】：

如果你可以制作另一个文件（sample.sed），答案如下。

$ cat sample.sed 
/<driver>h2<\/driver>/,/<\/security>/{
    s/<user-name>test<\/user-name>/<user-name>Atom<\/user-name>/g
    s/<password>test<\/password>/<password>Atom<\/password>/g
}
/<driver>oracle<\/driver>/,/<\/security>/{
    s/<user-name>aldo<\/user-name>/<user-name>username<\/user-name>/g
    s/<password>aldo<\/password>/<password>password<\/password>/g
}

for filename in *.xml; do
    sed -i.bak -f sample.sed $filename
done

【讨论】：

【解决方案2】：

要问的第一个问题是：我是否需要脚本来执行此操作？我认为即使你有 10 个文件都需要替换相同的信息，你可能会比尝试编写一个没有错误的脚本更快地手动完成它们（即在文本编辑器中） .当然，如果您有 50 或 100 个文件，情况就会发生变化。

但是这真的有点取决于替换任务实际上需要什么。如果您正在考虑以下简单的事情：

V0：将每次出现的<user-name>test</user-name> 替换为 <user-name>atom</user-name>等

那么sed 可能是该工作的工具权。它逐行处理文本文件，但它不太擅长考虑来自前一行或后一行的上下文。所以，如果你的任务实际上更像

V1：将<user-name>test</user-name> 替换为 <user-name>atom</user-name> 但前提是之前的连接 URL 曾是 <connection-url>jdbc:h2:mem:test;DB_CLOSE_DELAY=-1;DB_CLOSE_ON_EXIT=FALSE</connection-url> 在这种情况下，也将其更改为 <connection-url>jdbc:h2:mem:test;DB_CLOSE_DELAY=-2;DB_CLOSE_ON_EXIT=FALSE</connection-url>, 等等

那么sed 的日子会更加艰难。

另一个基于行的命令行工具是awk，它更强大，因为它允许您编写匹配规则并可以在变量中表示上下文信息。但是，如果我们颠倒 V1 中的条件顺序，它仍然不是直截了当的：

V2：替换 <connection-url>jdbc:h2:mem:test;DB_CLOSE_DELAY=-1;DB_CLOSE_ON_EXIT=FALSE</connection-url> 和 <connection-url>jdbc:h2:mem:test;DB_CLOSE_DELAY=-2;DB_CLOSE_ON_EXIT=FALSE</connection-url> 但前提是以下用户名是<user-name>test</user-name> 在这种情况下，也将其更改为 <user-name>atom</user-name> 等。

现在您无法在处理每一行时立即编写替换，您可能不得不保留某些行一段时间，因为您稍后在文件中遇到的信息决定了您应该如何处理这些行。然后，再次，它开始变得复杂。但它变得更糟。如果由于某种原因，您的 xml 文件的格式略有不同，该怎么办：

<datasource jndi-name="java:jboss/datasources/TestFlow" 
            pool-name="TestFlow" 
            enabled="true" 
            use-java-context="true" 
            statistics-enabled="${wildfly.datasources.statistics-enabled:$ {wildfly.statistics-enabled:false}}">
  <connection-url>
    jdbc:h2:mem:test;DB_CLOSE_DELAY=-1;DB_CLOSE_ON_EXIT=FALSE
  </connection-url>
...

当所有内容都没有整齐地呈现在一行中时，awk 的处理突然变得更加困难。在最坏的情况下，您基本上最终会在 awk 中实现 XML 解析器，当然，没有人愿意这样做。

那么，为什么不首先使用适当的现有 XML 解析器呢？有一些选项可以做到这一点on the command line，但也许最好转向更强大的脚本语言。这是一个小型 Python 脚本的示例，它以上下文相关的方式执行您想要的替换：仅当所有三个替换（连接 URL、用户名、密码）都匹配时才会触及元素。

from bs4 import BeautifulSoup
import re
import sys

# (connection-url, user-name, password) -> (connection-url, user-name, password)
REPLACEMENTS = {
    ('jdbc:h2:mem:test;DB_CLOSE_DELAY=-1;DB_CLOSE_ON_EXIT=FALSE', 'test', 'test'):
    ('jdbc:h2:mem:test;DB_CLOSE_DELAY=-2;DB_CLOSE_ON_EXIT=FALSE', 'atom', 'atom'),

    ('jdbc:oracle:thin:@xxxxxx.xxxxxxx.xxxxxxxx-1.rds.amazonaws.com:xxxx:ORCL', 'aldo', 'aldo'):
    ('jdbc:oracle:thin:@{Content after the @ to be changed}', '{aldo to username}', '{aldo to password}')
}

# check correct invocation
if len(sys.argv) != 3:
    print(f"USAGE: python {sys.argv[0]} <infile> <outfile>")
    sys.exit(1)

# read infile
with open(sys.argv[1], 'r') as f:
    soup = BeautifulSoup(f, 'xml')

# apply transformations
for datasource in soup.datasources.findAll("datasource", recursive=False):
    elements = (datasource.find('connection-url', recursive=False),
                datasource.security.find('user-name', recursive=False),
                datasource.security.password)
    if all(elements):
        old = tuple(e.text for e in elements)
        if old in REPLACEMENTS:
            new = REPLACEMENTS[old]
            for e, text in zip(elements, new):
                e.string = text

# write outfile
with open(sys.argv[2], 'w') as f:
    for line in soup.prettify().split('\n'):
        f.write(re.sub(r'^(\s+)', '\\1\\1', line))
        f.write('\n')

正如我在上面所写的，最简单的东西（sed 脚本）可能已经很适合该任务，但这取决于（可能的）情况。

【讨论】：

【解决方案3】：

这位著名的Bash FAQ 声明如下：

不要尝试使用sed、awk、grep 等来[更新 XML 文件]（它会导致 undesired results）

以下是几个使用 XML 特定命令行工具的不同解决方案。

使用 XMLStarlet 命令

考虑使用以下 XMLStarlet 命令：

xml ed -L -u "(//datasources/datasource)[1]/connection-url" -v "jdbc:h2:mem:test;DB_CLOSE_DELAY=-2;DB_CLOSE_ON_EXIT=FALSE" \
          -u "(//datasources/datasource)[1]/security/user-name" -v "Atom" \
          -u "(//datasources/datasource)[1]/security/password" -v "Atom" \
          -u "(//datasources/datasource)[2]/connection-url" -v "jdbc:oracle:thin:@{Content after the @ to be changed}" \
          -u "(//datasources/datasource)[2]/security/user-name" -v "{aldo to username}" \
          -u "(//datasources/datasource)[2]/security/password" -v "{aldo to username}" \
          ./some/path/to/file.xml

_{注意：您需要根据需要重新定义尾随 ./some/path/to/file.xml 路径}

说明：

上述命令部分分解如下：

xml - 调用 XML Starlet 命令。
ed - 编辑/更新 XML 文档。
-L - 就地编辑文件（注意：您可能希望最初在测试时省略此内容）
-u - 更新 <xpath>，然后是 -v 替换 <value>。

让我们看看用于匹配节点的 XPath 模式：

(//datasources/datasource)[1]/connection-url - 这匹配作为第一个 datasources/datasource 元素节点的子节点的 connection-url 元素节点。
(//datasources/datasource)[1]/security/user-name - 这匹配父元素节点为security 的user-name 元素节点，并且security 必须是第一个datasources/datasource xml 元素节点的子节点。
(//datasources/datasource)[1]/security/password - 与前面的模式类似，这匹配父元素节点为security 的password 元素节点，并且security 必须是第一个datasources/datasource 元素节点的子节点.
我们基本上使用类似的模式来匹配第二个实例，即为了匹配第二个 datasources/datasource 元素节点中所需的元素节点，我们将索引从 [1] 更改为 [2]。

在 bash 脚本中使用 xsltproc 和 XSLT

如果xsltproc 在您的主机系统上可用，那么您可能需要考虑使用以下 bash 脚本：

script.sh

#!/usr/bin/env bash

xslt() {
cat <<'EOX'
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()" />
    </xsl:copy>
  </xsl:template>

  <xsl:template match="datasource[1]/connection-url/text()">
    <xsl:text>jdbc:h2:mem:test;DB_CLOSE_DELAY=-2;DB_CLOSE_ON_EXIT=FALSE</xsl:text>
  </xsl:template>

  <xsl:template match="datasource[1]/security/user-name/text()">
    <xsl:text>Atom</xsl:text>
  </xsl:template>

  <xsl:template match="datasource[1]/security/password/text()">
    <xsl:text>Atom</xsl:text>
  </xsl:template>


  <xsl:template match="datasource[2]/connection-url/text()">
    <xsl:text>jdbc:oracle:thin:@{Content after the @ to be changed}</xsl:text>
  </xsl:template>

  <xsl:template match="datasource[2]/security/user-name/text()">
    <xsl:text>{aldo to username}</xsl:text>
  </xsl:template>

  <xsl:template match="datasource[2]/security/password/text()">
    <xsl:text>{aldo to username}</xsl:text>
  </xsl:template>

</xsl:stylesheet>
EOX
}

xml_file=./some/path/to/file.xml

xsltproc --novalid <(xslt) - <"$xml_file" > "${TMPDIR}result.xml"

mv -- "${TMPDIR}result.xml" "$xml_file" 2>/dev/null || {
  echo -e "Cannot move .xml from TMPDIR to ${xml_file}" >&2
  exit 1
}

_{注意：您需要根据需要重新定义分配给xml_file 变量的./some/path/to/file.xml 路径。}

说明：

使用了一个 XSLT 样式表，其中包括几个模板来匹配必要的元素节点并根据需要替换它们的文本节点。
xsltproc 工具/命令使用给定的 XSLT 转换源 .xml 文件。
生成的.xml 文件被写入系统临时目录（即TMPDIR），然后使用mv 命令移动到与原始源xml_file 相同的位置 - 有效地覆盖它。李>

【讨论】：

【解决方案4】：

这已经说了无数次了；不要使用 RegEx 解析 HTML/XML 或 JSON！请改用具有本机支持的工具。

使用xidel，您可以多次使用它的x-replace-nodes() 函数，将输出提供给下一个实例：

$ xidel -s input.xml -e '
  x:replace-nodes(
    (//security)[1]/node()/text(),
    "Atom"
  )/x:replace-nodes(
    (//security)[2]/user-name/text(),
    "{aldo to username}"
  )/x:replace-nodes(
    (//security)[2]/password/text(),
    "{aldo to password}"
  )
' --output-node-format=xml --output-node-indent

或者，您可以组合函数的 2^nd 和 3^rd 调用：

$ xidel -s input.xml -e '
  x:replace-nodes(
    (//security)[1]/node()/text(),
    "Atom"
  )/x:replace-nodes(
    (//security)[2],
    element security {
      element user-name {"{aldo to username}"},
      element password {"{aldo to password}"}
    }
  )
' --output-node-format=xml --output-node-indent

两种情况下都输出到标准输出：

<datasources>
  <datasource jndi-name="java:jboss/datasources/TestFlow" pool-name="TestFlow" enabled="true" use-java-context="true" statistics-enabled="${wildfly.datasources.statistics-enabled:$ {wildfly.statistics-enabled:false}}">
    <connection-url>jdbc:h2:mem:test;DB_CLOSE_DELAY=-1;DB_CLOSE_ON_EXIT=FALSE</connection-url>
    <driver>h2</driver>
    <security>
      <user-name>Atom</user-name>
      <password>Atom</password>
    </security>
  </datasource>
  <datasource jta="false" jndi-name="java:/AdminDSource" poolname="AdminDSource" enabled="true" use-java-context="true">
    <connection-url>jdbc:oracle:thin:@xxxxxx.xxxxxxx.xxxxxxxx-1.rds.amazonaws.com:xxxx:ORCL</connection-url>
    <driver>oracle</driver>
    <security>
      <user-name>{aldo to username}</user-name>
      <password>{aldo to password}</password>
    </security>
  </datasource>
</datasources>

要更新输入文件，请使用命令行选项--in-place。

要处理多个 xml 文件，您可以让 Bash 处理它...

$ for file in *.xml; do
  xidel -s --in-place "$file" -e '
    [...]
  '
done

...但是如果您有很多 xml 文件，则为每个文件调用 xidel 并不是很有效。 xidel 可以通过其集成的EXPath File Module 更有效地做到这一点：

$ xidel -se '
  for $file in file:list(.,false(),"*.xml") return   (: iterate over all the current dir's xml-files :)
  file:write(
    $file,                                           (: essentially overwrite the input file :)
    x:replace-nodes(
      (doc($file)//security)[1]/node()/text(),       (: doc($file) to open the input file inside the query :)
      "Atom"
    )/x:replace-nodes(
      (//security)[2],
      element security {
        element user-name {"{aldo to username}"},
        element password {"{aldo to password}"}
      }
    ),
    {"indent":true()}                                (: "prettify" the output :)
  )
'

【讨论】：