使用 awk 脚本在两个模式之间拉取文本答案

【问题标题】：pulling text between two patterns with awk script使用 awk 脚本在两个模式之间拉取文本
【发布时间】：2016-10-30 16:58:05
【问题描述】：

输入文本文件：

This is a simple test file.
#BEGIN
These lines should be extracted by our script.

Everything here will be copied.
#END
That should be all.
#BEGIN
Nothing from here.
#END

期望的输出：

These lines should be extracted by our script.

Everything here will be copied.

我的 awk 脚本是：

#!/usr/bin/awk -f
$1 ~ /#BEGIN/{a=1;next};a;$1 ~ /#END/ {exit}

我目前的输出是：

These lines should be extracted by our script.

Everything here will be copied.
#END

我遇到的唯一问题是我仍在打印“#END”。我一直在尝试以某种方式消除它。不知道具体怎么做。

【问题讨论】：

试试这个：$1 ~ /#BEGIN/{a=1;next}$1 ~ /#END/ {exit}a
@user000001 我认为这行得通。你能解释一下这条线吗？我只是想知道它是如何工作的。
好的，我会添加答案

标签： regex linux bash awk

【解决方案1】：

这很明显 IMO 是我们在脚本中注释每个命令。脚本可以这样写：

#!/usr/bin/awk -f
$1 ~ /#BEGIN/ { # If we match the BEGIN line
  a=1           # Set a flag to one
  next          # skip to the next line
}
a != 0 {        # if the flag is not zero
  print $0      # print the current line
}
$1 ~ /#END/ {   # if we match the END line
  exit          # exit the process 
}

请注意，我将a 扩展为等效形式a!=0{print $0}，以使这一点更清楚。

所以脚本在设置标志时开始打印每一行，当它到达 END 行时，它在退出之前已经打印了该行。由于您不希望打印 END 行，因此您应该在打印该行之前退出。所以脚本应该变成：

#!/usr/bin/awk -f
$1 ~ /#BEGIN/ { # If we match the BEGIN line
  a=1           # Set a flag to one
  next          # skip to the next line
}
$1 ~ /#END/ {   # if we match the END line
  exit          # exit the process 
}
a != 0 {        # if the flag is not zero
  print $0      # print the current line
}

在这种情况下，我们在打印行之前退出。简而言之，可以写成：

awk '$1~/#BEGIN/{a=1;next}$1~/#END/{exit}a' file

或者更短一点

awk '$1~/#END/{exit}a;$1~/#BEGIN/{a=1}' file

关于 cmets 中提出的附加约束，为了避免跳过要打印的块中的任何 BEGIN 块，我们应该删除 next 语句，并像上面的示例一样重新排列行。展开后的形式是这样的：

#!/usr/bin/awk -f
$1 ~ /#END/ {   # if we match the END line
  exit          # exit the process 
}
a != 0 {        # if the flag is not zero
  print $0      # print the current line
}
$1 ~ /#BEGIN/ { # If we match the BEGIN line
  a=1           # Set a flag to one
}

为了避免在要打印的块之前找到 END 行时退出，我们可以在退出之前检查标志是否设置：

#!/usr/bin/awk -f
$1 ~ /#END/ && a != 0 {   # if we match the END line and the flag is set
  exit          # exit the process 
}
a != 0 {        # if the flag is not zero
  print $0      # print the current line
}
$1 ~ /#BEGIN/ { # If we match the BEGIN line
  a=1           # Set a flag to one
}

或以简明形式：

awk '$1~/#END/&&a{exit}a;$1~/#BEGIN/{a=1}' file

【讨论】：

感谢这帮助我更好地理解了 awk！不过，我还有一些在我的问题中没有提到的限制。对于任何文本文件，我只想提取第一个 #BEGIN 和 #END 块之间的任何内容。此代码，如果有 2 个开始，然后有一个结束，则打印第一个开始和结束之间的所有内容，但不打印第二个 #BEGIN（我希望打印第二个开始，因为它位于第一个开始和结束块之间)。此外，如果文本文件以 #END 开头，然后是 #BEGIN AND #END 块，则它不会打印任何内容。如何忽略第一个#END？
@asddddddaaaad2：我添加了更多示例来处理这些约束。
如何在 sed 中做同样的事情？到目前为止，我有： /#BEGIN/,/#END/!d *********** /#END/q ************ /#BEGIN/,/ #END/{/#BEGIN/d;/#END/d;p;}
@asddddddaaaad2：我对 sed 不够熟悉，无法尝试创建相应的脚本。您可以在他的回答下询问 VIPIN KUMAR，因为他使用sed 进行回答。否则，您可以提出一个新问题。

【解决方案2】：

尝试下面的 sed 命令以获得所需的输出 -

vipin@kali:~$ sed  '/#BEGIN/,/#END/!d;/END/q' kk.txt|sed '1d;$d'
These lines should be extracted by our script.

Everything here will be copied.
vipin@kali:~$

解释——

使用 d 删除两个表达式之间的内容，但 !d 将打印它们，然后 q 用于在命令找到 END 的地方退出。 1d;$d 在我们的例子中替换第一行和最后一行 #BEGIN 和 #END

【讨论】：