如何将两个 awk 文件合二为一？答案

【问题标题】：How to combine the two awk files into one?如何将两个 awk 文件合二为一？
【发布时间】：2017-04-04 13:55:00
【问题描述】：

这是一个原始的 awk 文件，我想格式化它。

输入内容----名为test.txt的原始awk文件

awk 'BEGIN {maxlength = 0}\
     {\
           if (length($0) > maxlength) {\
                maxlength = length($0);\
                longest = $0;\
           }\
     }\
     END   {print longest}' somefile

预期输出----格式良好的 awk 文件

awk 'BEGIN {maxlength = 0}                      \
     {                                          \
           if (length($0) > maxlength) {        \
                maxlength = length($0);         \
                longest = $0;                   \
           }                                    \
     }                                          \
     END   {print longest}' somefile

step1：获取最长的行和字符数

step1.awk

#! /usr/bin/awk 
BEGIN {max =0 }
{
    if (length($0) > max) { max = length($0)}
}
END {print max}

awk -f step1.awk test.txt

现在所有行的最大长度为 50。

step2 将 \ 放在 50+2=52 的位置。

step2.awk

#! /usr/bin/awk
{
if($0 ~ /\\$/){
    gsub(/\\$/,"",$0);
    printf("%-*s\\\n",n,$0);
    }
else{
    printf("%s\n",$0);
    }
}

awk -f step2.awk -v n=52 test.txt > well_formatted.txt

如何将step1和step2合并为一个step，并将step1.awk和step2.awk合并为一个awk文件？

【问题讨论】：

发布输入内容和预期输出以获得快速帮助

标签： bash awk

【解决方案1】：

更好的版本，您可以使用sub() 代替gsub()，并避免两次测试相同的正则表达式sub(/\\$/,""){ ... }

awk 'FNR==NR{ 
             if(length>max)max = length 
             next
     }
     sub(/\\$/,""){
             printf "%-*s\\\n", max+2, $0
             next
     }1' test.txt test.txt

说明

awk 'FNR==NR{                             # Here we read file and will find, 
                                          # max length of line in file
                                          # FNR==NR is true when awk reads first file

             if(length>max)max = length   # find max length 
             next                         # stop processing go to next line
     }
     sub(/\\$/,""){                       # Here we read same file once again, 
                                          # if substitution was made for the regex in record then

             printf "%-*s\\\n", max+2, $0 # printf with format string max+2
             next                         # go to next line
     }1                                   # 1 at the end does default operation print $0, 
                                          # nothing but your else statement printf("%s\n",$0) in step2 
     ' test.txt test.txt

您还没有向我们展示，您的输入和预期输出是什么，有一些假设，

如果您的输入如下所示

akshay@db-3325:/tmp$ cat f
123      \
\
12345     
123456   \
1234567  \
123456789 
12345

你得到如下输出

akshay@db-3325:/tmp$ awk 'FNR==NR{ if(length>max)max = length; next}
sub(/\\$/,"",$0){ printf "%-*s\\\n",max+2,$0; next }1' f f
123         \
            \
12345     
123456      \
1234567     \
123456789 
12345

【讨论】：

您可以使用sub()而不是gsub()来保持您的代码的原创性，我没有修改，只是合并为一个
几乎相同的代码（出于目的的一些细微差别）因此+1（但它是客观的:-D）。
@NeronLeVelu 谢谢，OP 没有显示除代码以外的任何内容 :)
唯一剩下的问题是从问题继承的问题 - 它计算字符，因此制表符是 1 个字符，但打印看起来最多 8 个（通常），因此输出中的反斜杠不一定看起来对齐.尝试将输入文件中的所有 8 个空格序列转换为制表符，然后运行您的工具以了解我的意思。如果你想让这个“工作”健壮（假设“工作”意味着只是让正确的字符正确）那么你需要pr -e -t file | awk '...' 但是你不能在输入上运行两次 awk 和打印字符串中的任何文字选项卡将被转换

【解决方案2】：

awk '
   # first round 
   FNR == NR {
      # take longest (compare and take longest line by line)
      M = M < (l = length( $0) ) ? l : M
      # go to next line
      next
      }

   # for every line of second round (due to previous next) that finish by /
   /[/]$/ {          
      # if a modification is needed
      if ( ( l = length( $0) ) < M ) {
         # add the missing space (using sprintf "%9s" for 9 spaces)
         sub( /[/]$/, sprintf( "%" (M - l) "s/", ""))
         }
      }
  # print all line [modified or not] (7 is private joke but important is <> 0 )
  7
  ' test.txt test.txt

注意：

文件末尾的两次是读取两次文件所必需的
假设最后一个 / 之后没有任何内容（没有空格）。可以很容易地适应，但不是目的
假设没有 / 的那一行没有被修改但仍然被打印出来

【讨论】：

似乎正斜杠和反斜杠之间有些混淆（也许是我的混淆：）。 /[/]$/ 对我来说也会导致语法错误。
赞了你的7 is private joke++，我想它的反斜杠不是吗？
不，它来自另一个堆栈用户，他解释说在他的键盘上输入 7 比输入 1 更快，所以他选择了 7，每次我在代码中选择这个时，我都会保持微笑的习惯（并且另一个当人们试图理解原因时，阅读代码）
啊，是的，没有什么比混淆代码更有趣的了 :-)。说到这个 - 永远不要使用字母 l 作为变量名，因为它看起来太像数字 1。
在我的键盘布局中，} 是使用 AltGr-0 生成的。如果它低于任何其他数字 - 例如 9 - 键入 AltGr-9 9 以结束最后一个代码块并打印：...}9 但它必须低于 0。设计此布局的人肯定不是一个 awk 奇才。

【解决方案3】：

这是 GNU awk 的一个。两次运行，第一次找到最大长度，第二次输出。 FS 设置为 "" 以便每个字符都在其字段中，最后一个字符将在 $NF 中：

$ awk 'BEGIN{FS=OFS=""}NR==FNR{m=(m<NF?NF:m);next}$NF=="\\"{$NF=sprintf("% "m-NF+2"s",$NF)}1' file file

输出：

awk 'BEGIN {maxlength = 0}               \
     {                                   \
           if (length($0) > maxlength) { \
                maxlength = length($0);  \
                longest = $0;            \
           }                             \
     }                                   \
     END   {print longest}' somefile

解释：

BEGIN     { FS=OFS="" }                         # each char on different field
NR==FNR   { m=(m<NF?NF:m); next }               # find m ax length
$NF=="\\" { $NF=sprintf("% " m-NF+2 "s",$NF) }  # NF gets space padded
1                                               # output

如果您希望 \s 远离代码，请将 2 更改为 sprintf 以适合您的喜好。

【讨论】：

【解决方案4】：

也许是这样的？

wc -L test.txt | cut -f1 -d' ' | xargs -I{} sed -i -e :a -e 's/^.\{1,'{}'\}$/& /;ta' test.txt && sed -i -r 's/(\\)([ ]*)$/\2\1/g' test.txt

【讨论】：