sed 头痛：在文件中的单个匹配项上插入行（不是每行）答案

【问题标题】：sed headaches: inserting lines upon singular matches in file (NOT per line)sed 头痛：在文件中的单个匹配项上插入行（不是每行）
【发布时间】：2015-05-16 14:40:27
【问题描述】：

经过 8 个多小时的搜索，我认输并为此创建了一个新问题。操作很简单，但我最难让它正常工作，似乎已经经历了 SO 上的所有其他解决方案。我需要两件事：

1.) 在整个文件中出现PBS FIRST MATCH 的行之前插入一行。它应该只在整个文件中发生一次。出于某种原因，我尝试过的每个解决方案最终都会为文件中的每次出现重复插入；我怀疑，因为 sed 是以每行为基础的。

所以这需要发生。原始文件：

stuff here  
stuff here  
PBS -N  
PBS -V  
stuff here

变成：

stuff here  
stuff here  
**inserted line**  
PBS -N  
PBS -V  
stuff here

2.) 在整个文件中出现“PBS”的 LAST MATCH 的行之后追加一行。和以前一样：它应该在整个文件中只发生一次。

所以这需要发生：

stuff here  
stuff here  
PBS -N  
PBS -V  
stuff here

变成：

stuff here  
stuff here  
PBS -N  
PBS -V  
**inserted line**  
stuff here

我在网上看到的所有解决方案（此时我打开了大约 20 个选项卡）都表明这应该相对容易。我毫不羞耻地宣布 sed 在这一点上损害了我的自尊心......感谢任何可以提供帮助的人

【问题讨论】：

我感受到你的痛苦。一个字符的命令……不明显。我真的只使用 sed 进行简单的搜索和替换，或者打印/删除特定的行。任何复杂的东西，我都会使用另一种更易读的语言，或者在这里 wh*re for rep ;)

标签： regex bash unix sed

【解决方案1】：

这里有三种方法，两种使用 sed，一种使用 awk。

单独使用 sed

在第一次出现之前插入一次

$ sed ':a;$!{N;ba}; s/PBS/inserted line\nPBS/' file
stuff here
stuff here
inserted line
PBS -N
PBS -V
stuff here

在最后一次出现后插入一次：

$ tac file | sed ':a;$!{N;ba}; s/PBS/inserted line\nPBS/' | tac
stuff here
stuff here
PBS -N
PBS -V
inserted line
stuff here

工作原理

:a;$!{N;ba};

这会一次读取整个文件。（如果整个文件非常大，您将需要查看其他方法之一。）
s/PBS/inserted line\nPBS/

这会执行替换。
tac

通常，在我们读入整个文件之前，无法知道文件中最后出现的 PBS 是哪一个。但是，tac 颠倒了行的顺序。因此，最后的成为第一个。

使用 awk

awk 的主要优点是它允许轻松使用变量。在这里，我们创建一个标志f，在我们第一次出现 PBS 后将其设置为 true：

$ awk '/PBS/ && !f {print "inserted line"; f=1} 1'  file
stuff here
stuff here
inserted line
PBS -N
PBS -V
stuff here

要在最后一次出现之后插入，我们可以使用上面的tac 解决方案。为了多样化，这种方法两次读取文件。在第一次运行时，它会跟踪 PBS 的最后一行编号。第二个，它打印需要打印的内容：

$ awk 'NR==FNR{if (/PBS/)n=FNR;next} 1{print} n==FNR {print "inserted line"}'  file file
stuff here
stuff here
PBS -N
PBS -V
inserted line
stuff here

这些 awk 解决方案一次处理一行文件。如果文件非常大，这有助于限制内存使用。

使用 grep 和 sed

另一种方法是使用grep 告诉我们需要处理的行号。这会在第一次出现之前插入：

$ sed "$(grep -n PBS file | cut -d: -f1 | head -n1)"' s/PBS/inserted line\nPBS/' file
stuff here
stuff here
inserted line
PBS -N
PBS -V
stuff here

在最后一个之后插入：

$ sed  "$(grep -n PBS file | cut -d: -f1 | tail -n1)"' s/.*PBS.*/&\ninserted line/' file
stuff here
stuff here
PBS -N
PBS -V
inserted line
stuff here

这种方法不需要一次将整个文件读入内存。

【讨论】：

感谢所有回答这个问题的人，但我最终在这里使用了 grep + sed 解决方案。非常优雅的解决方案，谢谢约翰。

【解决方案2】：

@John1924 的回答很好。在这种情况下，您也可以以无效的方式完成任务，例如：

仅打印第一个 PBS 之前的行
回显该行
仅打印（包括）第一个 PBS 之后的行

例如。当./pbsfile中有以下内容时

line 1
line 2
PBS -N first
PBS -N second
line 3
PBS -V last-1
PBS -V last
line 4
line 5

上面可以做的例子：

pbsfile="./pbsfile"

(
#delete the lines after the 1st PBS
#so remains only the lines before the 1st PBS
sed  '/PBS/,$d' "$pbsfile"

#echo the needed line
echo "THIS SOULD BE INSERTED BEFORE 1st PBS"

#print only the lines after the 1st PBS
sed -n '/PBS/,$p' "$pbsfile"

)

产生：

line 1
line 2
THIS SOULD BE INSERTED BEFORE 1st PBS
PBS -N first
PBS -N second
line 3
PBS -V last-1
PBS -V last
line 4
line 5

同上，你可以对最后一个 PBS 做，只是在 sed 之前和之后反转文件，例如以下

pbsfile="./pbsfile"

(
tail -r "$pbsfile" | sed -n '/PBS/,$p' | tail -r
echo "THIS SOULD BE INSERTED AFTER THE LAST PBS"
tail -r "$pbsfile" | sed  '/PBS/,$d' | tail -r
)

什么产生

line 1
line 2
PBS -N first
PBS -N second
line 3
PBS -V last-1
PBS -V last
THIS SOULD BE INSERTED AFTER THE LAST PBS
line 4
line 5

同样，这仅作为“替代解决方案”（无效）。

【讨论】：

【解决方案3】：

另一种 sed 方法：

sed '/PBS/ {
  # insert the new line
  i\
inserted line
  # then loop over the rest of the file, implicitly printing each line
  :a; n; ba
}' file

对于 last 匹配，此版本不需要tac

sed '
  # read the whole file into pattern space
  :a; $!{N;ba}
  # then, use greedy matching to get to the *last* PBS
  # and non-greedy matching to get to the end of that line.
  s/.*PBS[^\n]*/&\ninserted line/   
' file

【讨论】：

【解决方案4】：

sed 对这种工作来说是错误的工具，它用于在单个行上进行简单的替换。只需使用 awk：

$ cat tst.awk
NR  == FNR { if (/PBS/) hits[++numHits] = NR; next }
FNR == hits[1] { print "inserted line before" }
{ print }
FNR == hits[numHits] { print "inserted line after" }

$ awk -f tst.awk file file
stuff here
stuff here
inserted line before
PBS -N
PBS -V
inserted line after
stuff here

【讨论】：

【解决方案5】：

这是一个awk，它只读取了一次文件：

cat file
line 1
line 2
PBS -N first
PBS -N second
line 3
PBS -V last-1
PBS -V last
line 4
line 5

awk '/PBS/ {last=NR;if (!f) {first=NR;f=1}} {a[NR]=$0} END {for (i=1;i<=NR;i++) {if (i==first) a[i]="new line before\n"a[i];if (i==last) a[i]=a[i]"\nnew line after";print a[i]}}' file
line 1
line 2
new line before
PBS -N first
PBS -N second
line 3
PBS -V last-1
PBS -V last
new line after
line 4
line 5

它是如何工作的：

awk '                                       # Start
/PBS/ {                                     # Does line contains "PBS"
    last=NR                                 # Set last to current line number
    if (!f) {                               # Is flag "f" false
        first=NR                            # Yes, set first line to current line
        f=1}}                               # and set flag "f"
    {
    a[NR]=$0}                               # Store alle line in array "a"
END {
    for (i=1;i<=NR;i++) {                   # Loop trough all lines
        if (i==first)                       # Is line number equal to first hits
            a[i]="new line before\n"a[i]    # Add data before line
        if (i==last)                        # Is line number equal to last hits
            a[i]=a[i]"\nnew line after"     # Add data after line
        print a[i]}}                        # Print the line
' file

【讨论】：

【解决方案6】：

要让 sed 正确执行它，您必须绕过它的每行操作，然后使用原始正则表达式重新构建它。不难，就是有点繁琐。

sed -E 'H;$!d;g
        s/\n[^\n]*PBS/\ninsert before first PBS-containing line&/
        s/.*PBS[^\n]*/&\ninsert after last PBS-containing line/;
        s/.//
'

H;$!d;g 将整个文件拖到保持缓冲区，前面有一个额外的换行符（H 是将当前行附加到前面有一个\n 的保持缓冲区，$!d 被删除如果这样不是最后一行；g（以及后面的内容）仅在最后一行运行并检索保持缓冲区。

所以s/\n[^\n]*PBS 会找到第一个 PBS 之前的换行符，因为每行之前总是有一个换行符，s/.*PBS[^\n]*/ 会找到最后一个 PBS 以及后面的任何换行符，s/.// 去除人工我们插入换行符以使首次出现的搜索工作。

请注意，您可以通过将第一次出现的插入添加到搜索中来使任意 n 的第一次出现插入工作，s/\n[^\n]*PBS/\netc&/4 表示第四次。

【讨论】：