从一个长字符串中获取特定字符串答案

【问题标题】：Get specific strings from one long string从一个长字符串中获取特定字符串
【发布时间】：2018-12-14 14:52:50
【问题描述】：

我的 shell 有问题，我尝试在一个很长的字符串中获取一些特定的字符串。

字符串的格式为：

Something(first:test, second:test2, third:test4, fourth:test4, fifth(Field(test:1, test2:test2,...)), Any1:test1, Any2:test3.

我想得到first、third和Any1之后的字符串。我可以轻松地用, 拆分并将它们作为数组值获取，但我无法预测Any1 将在哪个位置，所以我必须检测“Any1”值。

我该怎么做？

【问题讨论】：

您提供的示例输入的期望输出是什么？听起来你想要test、test4 和test1，但我不想假设。
是的，你说得对，所需的输出是 test、test4 和 test1
可能的错字：输入有三个 (s，但只有两个 )s。

标签： arrays string shell split

【解决方案1】：

Multichar RS 可能不适用于所有 awks (*)，但是：

$ awk -v RS="[(,] *"  '            # record split at all the right places
BEGIN {
    a["first"];a["tird"];a["Any1"] # define the keywords we are interested in
}
split($0,b,":") && (b[1] in a) {   # split, match and score
    print b[2]
}' file
test
 test4
test1

*) 可以使用 GNU awk、mawk 和 Busybox awk，但不能使用 bwk awk。

【讨论】：

【解决方案2】：

上述问题不会对解决方案施加条件或约束。另一方面，它确实提到了 shell（“我的 shell 有问题”）。这是一个仅使用几个标准 Linux 实用程序的 shell (bash) 解决方案：grep 和 cut。（但请注意，我们假设 -P 开关在 grep 中可用，这并不是对所有平台都有效的假设，尽管现在它在 Linux 上似乎相当普遍。）

$ cat -n solution.sh
     1  #!/bin/bash
     2
     3  grep -Po '\b(first|third|Any1):\w+' | cut -d: -f2
     4

$ cat infile.txt
Something(first:test, second:test2, third:test4, fourth:test4, fifth(Field(test:1, test2:test2,...)), Any1:test1, Any2:test3.

$ solution.sh < infile.txt
test
test4
test1

【讨论】：