用相应列中提供的文件的数字相应第 n 行替换两个不同列上第 n 次出现的 'foo' 和 'bar'答案

【问题标题】：Replace each nth occurrence of 'foo' and 'bar' on two distincts columns by numerically respective nth line of a supplied file in respective columns用相应列中提供的文件的数字相应第 n 行替换两个不同列上第 n 次出现的 'foo' 和 'bar'
【发布时间】：2021-10-12 13:33:12
【问题描述】：

我有一个如下所示的source.txt 文件，其中包含两列数据。 source.txt的列格式包括[]（方括号）如我的source.txt所示：

[hot] [water]
[16] [boots and, juice]

我还有另一个 target.txt 文件，其中包含空行以及每行末尾的句号：

the weather is today (foo) but we still have (bar). 

= (

the next bus leaves at (foo) pm, we can't forget to take the (bar).

我想用source.txt的第一列的“相应内容”替换target.txt的每n行的foo，并替换每nth的bar target.txt 的行与source. txt 的第二列 的“相应内容”。

我试图搜索其他来源并了解我将如何做到这一点，起初我已经有一个用于替换 "replace each nth occurrence of 'foo' by numerically respective nth line of a supplied file" 的命令，但我无法适应它：

awk 'NR==FNR {a[NR]=$0; next} /foo/{gsub("foo", a[++i])} 1' source.txt target.txt > output.txt;

我记得看到过一种使用 gsub 包含两列数据的方法，但我不记得具体有什么区别。

编辑帖子：有时会在 target.txt 文本中读取 = 和 ( 和 ) 之间的一些符号。我添加了这个符号，因为如果这些符号在 target.txt 文件中，某些答案将不起作用

注意：target.txt 的行数以及 bar 和 foo 在此文件中的出现次数可能会有所不同，我只是展示了一个示例。但是foo和bar在每一行中出现的次数分别为1。

【问题讨论】：

您的意思是要将第 n 行源代码替换为第 n 行目标代码吗？您的示例在源代码中有 2 行，但在目标中有 3 行。
所以你的意思是第一行的预期结果是“今天天气热但我们还有水” i> 第二个“下一班车在16 pm出发，我们不能忘记带上靴子和果汁”？
为此，您只需要两个数组。如果行号应该始终匹配，您可以使用FNR 而不是i++。
可能edit您的问题是为了澄清要求。
@7beggars_nnnnm，单行中是否可以有多个 foo、bar 的实例？或者它们总是与源文件中的每一行相同？

标签： string awk replace gsub text-processing

【解决方案1】：

使用您显示的示例，请尝试以下答案。用 GNU awk 编写和测试。

awk -F'\\[|\\] \\[|\\]' '
FNR==NR{
  foo[FNR]=$2
  bar[FNR]=$3
  next
}
NF{
  gsub(/\<foo\>/,foo[++count])
  gsub(/\<bar\>/,bar[count])
}
1
' source.txt FS=" " target.txt

说明：为上述添加详细说明。

awk -F'\\[|\\] \\[|\\]' '       ##Setting field separator as [ OR ] [ OR ] here.
FNR==NR{                        ##Checking condition FNR==NR which will be TRUE when source.txt will be read.
  foo[FNR]=$2                   ##Creating foo array with index of FNR and value of 2nd field here.   
  bar[FNR]=$3                   ##Creating bar array with index of FNR and value of 3rd field here.
  next                          ##next will skip all further statements from here.
}
NF{                             ##If line is NOT empty then do following.
  gsub(/\<foo\>/,foo[++count])  ##Globally substituting foo with array foo value, whose index is count.
  gsub(/\<bar\>/,bar[count])    ##Globally substituting bar with array of bar with index of count.
}
1                               ##printing line here.
' source.txt FS=" " target.txt  ##Mentioning Input_files names here.

编辑：还添加以下解决方案，该解决方案将处理源中出现 n 次 [...] 并在目标文件中匹配它们。因为这是 OP（在 cmets 中确认）的工作解决方案，所以在此处添加。同样公平的警告，当 source.txt 包含 & 时，这将失败。

awk '
FNR==NR{
  while(match($0,/\[[^]]*\]/)){
    arr[++count]=substr($0,RSTART+1,RLENGTH-2)
    $0=substr($0,RSTART+RLENGTH)
  }
  next
}
{
  line=$0
  while(match(line,/\(?[[:space:]]*(\<foo\>|\<bar\>)[[:space:]]*\)?/)){
    val=substr(line,RSTART,RLENGTH)
    sub(val,arr[++count1])
    line=substr(line,RSTART+RLENGTH)
  }
}
1
' source.txt target.txt

【讨论】：

@7beggars_nnnnm，我们可以在这里继续我们在以前的 cmets 中留下的聊天。
@7beggars_nnnnm，好的尝试将awk -F'\\[|\\] \\[|\\]' 更改为awk -F'\\[|\\] \\[|\\]|=|\$|\$' 一次，公平警告这未经测试，这仅根据您发布的编辑，让我知道它是怎么回事？
@7beggars_nnnnm，我很抱歉现在搞糊涂了。我不确定您使用的是什么输入线？是[hot] [water]吗？或(hot]) (water)?
@7beggars_nnnnm，好的，根据您之前的评论，为您编辑我的工作解决方案。请尝试awk ' FNR==NR{ while(match($0,/\[[^]]*\]/)){ arr[++count]=substr($0,RSTART+1,RLENGTH-2) $0=substr($0,RSTART+RLENGTH) } next } { for(i=1;i<=NF;i++){ if($i~/^$?[[:space:]]*(foo|bar)[[:space:]]*$?$/){ $i=arr[++count1] } } } 1 ' source.txt target.txt。告诉我进展如何。
@7beggars_nnnnm，是的，你的意思是当它像：( foo）它不起作用。如果是这种情况，那么请尝试我之前评论的答案一次？应该可以工作恕我直言，虽然它没有经过测试，我已经在移动设备上写过，但应该可以工作恕我直言，让我知道。

【解决方案2】：

在每个 Unix 机器上的任何 shell 中使用任何 awk：

$ cat tst.awk
BEGIN {
    FS="[][]"
    tags["foo"]
    tags["bar"]
}
NR==FNR {
    map["foo",NR] = $2
    map["bar",NR] = $4
    next
}
{
    found = 0
    head = ""
    while ( match($0,/\([^)]+)/) ) {
        tag = substr($0,RSTART+1,RLENGTH-2)
        if ( tag in tags ) {
            if ( !found++ ) {
                lineNr++
            }
            val = map[tag,lineNr]
        }
        else {
            val = substr($0,RSTART,RLENGTH)
        }
        head = head substr($0,1,RSTART-1) val
        $0 = substr($0,RSTART+RLENGTH)
    }
    print head $0
}

$ awk -f tst.awk source.txt target.txt
the weather is today hot but we still have water.

= (

the next bus leaves at 16 pm, we can't forget to take the boots and, juice.

【讨论】：

我一直将tst.awk 的输出保存在tst.awk 文件中并执行awk -f tst.awk source.txt target.txt，但它没有按预期产生任何输出，我做错了什么？
很抱歉，我做了最后的调整，但没有注意到这已经破坏了脚本并且它没有产生预期的输出，并且无法控制您的评论。我现在修好了。

【解决方案3】：

awk '
    NR==FNR { # build lookup

        # delete gumph
        gsub(/(^[[:space:]]*\[)|(\][[:space:]]*$)/, "")

        # split
        split($0, a, /\][[:space:]]+\[/)

        # store
        foo[FNR] = a[1]
        bar[FNR] = a[2]

        next
    }

    !/[^[:space:]]/ { next } # ignore blank lines

    { # do replacements
        VFNR++ # FNR - (ignored lines)

        # can use sub if foo/bar only appear once
        gsub(/\<foo\>/, foo[VFNR])
        gsub(/\<bar\>/, bar[VFNR])

        print
    }
' source.txt target.txt

注意：\< 和 \> 不在 POSIX 中，但被某些版本的 awk（例如 gawk）所接受。我不确定 POSIX awk 正则表达式是否有“单词边界”。

【讨论】：

在输出打印的第一行是.he weather is today hot but we still have water]
问题是我总是在输出打印行的末尾得到water]，即使在你的答案的最后一次编辑中也是如此。
我的 awk 版本在 OS Arch Linux 上是“GNU awk 5.1.0, API: 3.0 (GNU MPFR 4.1.0, GNU MP 6.2.1)”。
您的 source.txt 在water] 之后是否有空格？
抱歉这个错误，source.txt 在water] 之后没有空格，就像我的帖子一样，只是换行到第二行。