AWK：动态改变FS或RS答案

【问题标题】：AWK: dynamically change FS or RSAWK：动态改变FS或RS
【发布时间】：2020-06-03 10:49:22
【问题描述】：

我似乎无法获得动态交换 FS/RS 变量的技巧，因此我从输入中得到以下结果：

输入文件

header 1
header 2
{
something should not be removed
}

50

( 
auto1
{
    type        good;
    remove      not useful;
}

 auto2
{
    type        good;
    keep        useful;
}

 auto3
{
    type        moderate;
    remove      not useful;
}
)

输出文件

header 1
header 2
{
something that should not be removed
}

50

( 
auto1//good
{
    type        good;//good
}

auto2//good
{
    type        good;//good
    keep        useful;
}

auto3//moderate
{
    type        moderate;//moderate
}
)

关键是：

如果代码块 {...} 前面没有 autoX（X 可以是 1、2、3 等），则不会发生任何变化。
当autoX 后跟代码块{...} 时，应该会发生更改。
代码块&autoX内的值被修改为添加\\good或//moderate，需要从{...}本身读取。
如果包含短语remove，则应从{...} 中删除整行。

提示：这可能是可以使用regex 和here 解释的想法，特别是example。

目前，我只能满足最后一个要求，代码如下：

awk ' {$1=="{"; FS=="}";} {$1!="}"; gsub("remove",""); print NR"\t\t"$0}' Input_file

提前感谢您的技能和时间，通过awk 解决此问题。

【问题讨论】：

标签： regex awk regex awk environment-variables text-manipulation

【解决方案1】：

您可以使用两个换行符作为记录分隔符并处理每个可能包含一个的记录

autoX
{
  ...
  ...
}

阻止。

awk '
BEGIN{
  RS="\n\n"                          # set record separator RS to two newlines
  a["good"]; a["moderate"]           # create array a with indices "good" and "moderate"
}                                    
{                                    
  sub(/\n[ \t]+remove[^;]+;/, "")    # remove line containing "remove xxx;"
  for (i in a){                      # loop array indices "good" and "moderate"
    if (index($0, i)){               # if value exists in record
      sub(i";", i";//"i)             # add "//good" to "good;" or "//moderate" to "moderate;"
      match($0, /(auto[0-9]+)/)      # get pos. RSTART and length RLENGTH of "autoX"
      if (RSTART){                   # RSTART > 0 ?
                                     # set prefix including "autox", "//value" and suffix
        $0=substr($0, 1, RSTART+RLENGTH-1) "//"i substr($0, RSTART+RLENGTH)
      }
      break                          # stop looping (we already replaced "autoX")
    }
  }
  printf "%s", (FNR==1 ? "" : RS)$0  # print modified line prefixed by RS if not the first line
}
' Input_file

【讨论】：

这更紧凑，但 gensub 命令存在问题，该命令仅支持 GAWK。有没有其他可以更广泛应用的东西，比如NAWK/MAWK？

【解决方案2】：

这是我解决这个问题的尝试：

awk '
FNR==NR{
  if($0~/auto[0-9]+/){
    found1=1
    val=$0
    next
  }
  if(found1 && $0 ~ /{/){
    found2=1
    next
  }
  if(found1 && found2 && $0 ~ /type/){
    sub(/;/,"",$NF)
    a[val]=$NF
    next
  }
  if($0 ~ /}/){
    found1=found2=val=""
  }
  next
}
found3 && /not useful/{
  next
}
/}/{
  found3=val1=""
}
found3 && /type/{
  sub($NF,$NF"//"a[val1])
}
/auto[0-9]+/ && $0 in a{
  print $0"//"a[$0]
  found3=1
  val1=$0
  next
}
1
'  Input_file  Input_file

说明：在此添加上述代码的详细说明。

awk '                                      ##Starting awk program from here.
FNR==NR{                                   ##FNR==NR will be TRUE when first time Input_file is being read.
  if($0~/auto[0-9]+/){                     ##Check condition if a line is having auto string followed by digits then do following.
    found1=1                               ##Setting found1 to 1 which makes sure that the line with auto is FOUND to later logic.
    val=$0                                 ##Storing current line value to variable val here.
    next                                   ##next will skip all further statements from here.
  }
  if(found1 && $0 ~ /{/){                  ##Checking condition if found1 is SET and line has { in it then do following.
    found2=1                               ##Setting found2 value as 1 which tells program further that after auto { is also found now.
    next                                   ##next will skip all further statements from here.
  }
  if(found1 && found2 && $0 ~ /type/){     ##Checking condition if found1 and found2 are ET AND line has type in it then do following.
    sub(/;/,"",$NF)                        ##Substituting semi colon in last field with NULL.
    a[val]=$NF                             ##creating array a with variable var and its value is last column of current line.
    next                                   ##next will skip all further statements from here.
  }
  if($0 ~ /}/){                            ##Checking if line has } in it then do following, which basically means previous block is getting closed here.
    found1=found2=val=""                   ##Nullify all variables value found1, found2 and val here.
  }
  next                                     ##next will skip all further statements from here.
}
/}/{                                       ##Statements from here will be executed when 2nd time Input_file is being read, checking if line has } here.
  found3=val1=""                           ##Nullifying found3 and val1 variables here.
}
found3 && /type/{                          ##Checking if found3 is SET and line has type keyword in it then do following.
  sub($NF,$NF"//"a[val1])                  ##Substituting last field value with last field and array a value with index val1 here.
}
/auto[0-9]+/ && $0 in a{                   ##Searching string auto with digits and checking if current line is present in array a then do following.
  print $0"//"a[$0]                        ##Printing current line // and value of array a with index $0.
  found3=1                                 ##Setting found3 value to 1 here.
  val1=$0                                  ##Setting current line value to val1 here.
  next                                     ##next will skip all further statements from here.
}
1                                          ##1 will print all edited/non0-edited lines here.
'  Input_file  Input_file                  ##Mentioning Input_file names here.

【讨论】：

能否请您添加一些代码扩展？然后我就能明白发生了什么。
@massisenergy，现在添加了详细的解释级别，如果有任何疑问，请在这里查看并让我知道。
@massisenergy，谢谢你改正错别字，正赶着开会，后来忘记改了，谢谢。