POSIX sh：用函数查找和替换答案

【问题标题】：POSIX sh: find and replace with functionPOSIX sh：用函数查找和替换
【发布时间】：2018-10-19 11:08:58
【问题描述】：

在 JavaScript 中，你可以这样做：

someComplexProcessing = (wholeMatch, group1, group2, index, mystr)=> replacement...
mystr.replace(/some.* regex(with) multiple (capture groups)/g, someComplexProcessing)

例如。

const renderTemplate = (str, env)=> str.replace(/{{(.*?)}}/g, (_, name)=> env[name])
renderTemplate('{{salut}} {{name}}!', {salut: 'Hi', name: 'Leo'}) // "Hi Leo!"

最好的 POSIX 兼容、通用、变体是什么？

- reusability # eg. a function taking regex, processingFunction, and input, etc - that I could but in my .shellrc/source lib.sh or similar and reuse
- multiline # eg. if "uppercase everything between {{ and }}", `a {{b\nc}}` -> `a B\nC`
- no escape gotchas # eg. it shouldn't break if input, replacement, or regex contains special characters
- POSIX compatible # eg. running it under `docker run --rm -it alpine sh`, etc
- using regex # eg. perl regex seems like the most prominent one, please note differences from it if other is used

meriting:
- no/less dependencies # eg. as portable as possible
- multiple capture groups
- performance
- security # related to no escape gotchas, eg. ok with untrusted input

我找到了一些 bash 的解决方案，以及一些兼容的边缘案例解决方案，尽管没有一个能够完全接近 js 的 .replace 提供的简单性。最终，我想在不考虑实现细节/陷阱的情况下进行编程，并且不引入 100 的 MB（主要是 alpine 容器，但也使用 ubuntu/OSX），从而尝试建立一个可移植的、与 posix 兼容的库sn-ps、函数和模式。

【问题讨论】：

您可以查看expr 命令，但您不应该期望shell 语言提供通用编程语言提供的数据操作类型。 Shell 被定义为运行用（例如）JavaScript 编写的程序，而不是替换那些程序。
澄清一下：我可以访问的起点/环境是 POSIX 兼容的。我的最终目标是运行来自 sh 的查找和替换的东西 - 即。它可以是一个编译的 c 程序，只要它满足上述一些点——尽管不是 JavaScript，因为 Node 大于 50MB。此外，每场比赛都将通过另一个程序传递，这使得 sh 看起来是一个不错的选择（无论如何它是开始和中间的一部分）。

标签： regex shell sh posix

【解决方案1】：

一个无效的输入有些转义（假设没有\r）（但不是正则表达式输入转义），解决方案，只有一个捕获组（中间）。虽然可移植（仅使用 tr 和 sed（以及 printf、-z 空字符串检查）。（可能将 sed 部分更改为通常与 perl 正则表达式兼容的内容）

lib.sh:

#!/usr/bin/env sh
multiline_substitute_with_fn () {
  sub_start="$1"; shift; fn_name="$1"; shift; sub_end="$1"; shift; left="$(cat)";
  # uppercase () { cat | tr 'a-z' 'A-Z'; }; echo 'Hello [there]!' | multiline_substitute_with_fn '\[' uppercase '\]'

  # make single-line, sanitize input against _SUB(START|END)_, a\ra {{echo "b\rb"}} c {{echo d}} e
  left="$(echo "$left" | tr '\n' '\r' | sed 's/_SUB/_ASUB/g')"

  while [ ! -z "$left" ]; do
    left="$(echo "$left" | sed "s/$sub_start/_SUBSTART_/")" # a\ra _SUBSTART_echo "b\rb"}} c {{echo d}} e
    printf '%s' "$(echo "$left" | sed 's/_SUBSTART_.*//' | sed 's/_ASUB/_SUB/g' | tr '\r' '\n')" # a\na

    lefttmp="$(echo "$left" | sed 's/.*_SUBSTART_//' | sed "s/$sub_end/_SUBEND_/")" # echo "b\rb"_SUBEND_ c {{echo d}} e
    if [ "$lefttmp" = "$left" ]; then left=''; break; fi
    left="$lefttmp"

    middle="$(echo "$left" | sed 's/_SUBEND_.*//' | tr '\r' '\n')" # echo "b\nb"
    [ ! -z "$middle" ] && printf '%s' "$(echo "$middle" | $fn_name | sed 's/_ASUB/_SUB/g')" # b\nb
    left="$(echo "$left" | sed 's/.*_SUBEND_//')" # c {{echo d}} e
  done
}

用法：

cat file | multiline_substitute_with_fn 'start regex' processingFunction 'end regex'

例如。用法：

#!/usr/bin/env sh
. ./lib.sh # load lib

uppercase () { cat | tr 'a-z' 'A-Z'; };
echo 'Hello [there]!' | multiline_substitute_with_fn '\[' uppercase '\]'
# -> Hello THERE!

eval_template () { # not "safe" in terms of eval
  # echo 'a\na {{echo "b\nb"}} c {{echo d}} e' | eval_template # -> 'a\na b\nb c d e'
  # hello=hi; echo '{{=$hello}} there' | eval_template # -> {{echo "$hello"}} there -> 'hi there'
  fn () {
    middle="$(cat)"
    case "$middle" in =*) middle="echo \"${middle#=}\"" ;; *);; esac # '=$a' -> 'echo "$a"'
    eval "$middle"
  }
  cat | multiline_substitute_with_fn '{{' fn '}}'
}

eval_template <<-EOF
a
a {{echo "b
b"}} c {{echo d}} e
EOF
# -> a
# a b
# b c d e'

echo '{{=$salut}} {{=$name}}!' > my.template
salut=Hi; name="Leo Name";
cat my.template | eval_template
# Hi Leo Name!

【讨论】：

作为简化，printf '%s' "$(...)" 基本上等同于 ...，除非您希望丢弃 ... 输出中的任何尾随换行符。