可能有更好的方法,但我建议采用以下方法:
输入:
$ cat to_transform.txt
abc "27,422,734" def"27,422,734" def
ltu "123,734" abc "345,678,123,734" vtu
xtz "345,678,123,734" vtu "345,678,123,734"
u "1" a
"123"
iu"abc"a "123,734"
CMD:
$ paste -d' ' <(grep -oP '(?<=")(:?\d+,\d+)+(?=")' to_transform.txt) <(grep -oP '(?<=")(:?\d+,\d+)+(?=")' to_transform.txt | sed -e 's/,//g;:loop s/\([0-9]\{4\}\)\($\|,\)/\2,\1/g; s/,,/,/g; /\([0-9]\{5\}\)/b loop') | awk '{cmd="sed -i 0,/"$1"/s/" $1 "/" $2 "/ to_transform.txt"; system(cmd)}'
输出:
$ cat to_transform.txt
abc "2742,2734" def"2742,2734" def
ltu "12,3734" abc "3456,7812,3734" vtu
xtz "3456,7812,3734" vtu "3456,7812,3734"
u "1" a
"123"
iu"abc"a "12,3734"
代码详情和解释:
-
<(grep -oP '(?<=")(:?\d+,\d+)+(?=")' to_transform.txt) 将从输入文件中提取每个要处理的数字,这里使用的正则表达式使用lookbehind/lookahead 来强制使用引号包围的条件,(:?\d+,\d+)+ 用于提取像27,422,734 这样的数字。
-
sed 命令将从 grep 命令获取输出然后执行以下操作:
SED 详细信息:
s/,//g #remove all , in the number
:loop #create a label to loop
s/\([0-9]\{4\}\)\($\|,\)/\2,\1/g #add a coma after every chain of 4 characters starting by the end of the string/or from the latest coma added
s/,,/,/g #remove duplicate comas added by the previous step if any
/\([0-9]\{5\}\)/b loop #if there are at least 5 digits present successively in the string loop and continue the processing.
paste操作后的临时输出:
27,422,734 2742,2734
27,422,734 2742,2734
123,734 12,3734
345,678,123,734 3456,7812,3734
345,678,123,734 3456,7812,3734
345,678,123,734 3456,7812,3734
123,734 12,3734
最后但同样重要的是,awk 命令将读取此文件并运行一些 sed 命令以将第一列的每个元素替换为第二个命令中的相应值:awk '{cmd="sed -i 0,/"$1"/s/" $1 "/" $2 "/ to_transform.txt"; system(cmd)}'。