【问题标题】:Ordering several tables in the same file using awk使用 awk 在同一个文件中排序多个表
【发布时间】:2015-03-11 12:13:53
【问题描述】:

在我的工作流程中,创建了包含带有两行标题的简单表格(见帖子末尾)的文件。我想按编号订购这些表格:

(head -n 2 && tail -n +3 | sort -n -r) > ordered.txt

这很好,但我不知道如何拆分文件,以便我可以订购每个表格并将其打印在一个文件中。我的做法是:

awk '/^TARGET/ {(head -n 2 && tail -n +3 | sort -n -r) >> ordered.txt}' output.txt

但是,这会导致错误消息。我想避免任何中间输出文件。我的 awk 命令中缺少什么?

输入文件如下所示:

TARGET  1
Sample1 Sample2 Sample3 Pattern
3   3   3   z..........................Z........................................z.........Z...z
147 171 49  Z..........................Z........................................Z.........Z...Z
27  28  13  z..........................Z........................................z.........z...z
75  64  32  Z..........................Z........................................Z.........z...Z

TARGET  2
Sample1 Sample2 Sample3 Pattern
2   0   1   z..........................z........................................z.........Z...Z
21  21  7   z..........................Z........................................Z.........Z...Z
1   0   0   ...........................Z........................................Z.............Z
4   8   6   Z..........................Z........................................z.........Z...z
2   0   1   Z..........................Z........................................Z.........Z....
1   0   0   z..........................Z........................................Z.............Z
1   0   0   z...................................................................Z.........Z...Z

TARGET  3
Sample1 Sample2 Sample3 Pattern
1   0   0   z..........................Z........................................z.............z
1   3   0   z..........................z........................................Z.........Z...Z
1   1   0   Z..........................Z........................................Z.............z
1   0   0   Z..........................Z........................................Z.............Z
0   1   2   ...........................Z........................................Z.........Z...Z
0   0   1   z..........................z........................................z..............

我的输出应该是这样的 - 没有删除任何行:

    TARGET  1
Sample1 Sample2 Sample3 Pattern
147 171 49  Z..........................Z........................................Z.........Z...Z
75  64  32  Z..........................Z........................................Z.........z...Z
27  28  13  z..........................Z........................................z.........z...z
3   3   3   z..........................Z........................................z.........Z...z

TARGET  2
Sample1 Sample2 Sample3 Pattern
21  21  7   z..........................Z........................................Z.........Z...Z
4   8   6   Z..........................Z........................................z.........Z...z
2   0   1   z..........................z........................................z.........Z...Z
2   0   1   z..........................z........................................z.........Z...Z
1   0   0   ...........................Z........................................Z.............Z
1   0   0   ...........................Z........................................Z.............Z
1   0   0   ...........................Z........................................Z.............Z

TARGET  3
Sample1 Sample2 Sample3 Pattern
1   0   0   z..........................Z........................................z.............z
1   0   0   z..........................Z........................................z.............z
1   0   0   z..........................Z........................................z.............z
1   0   0   z..........................Z........................................z.............z
0   1   2   ...........................Z........................................Z.........Z...Z
0   0   1   z..........................z........................................z..............

【问题讨论】:

  • 目前还不清楚输出应该是什么。
  • 输出应该看起来像 glenn jackman 的输出,只是数字降序排列。
  • 所以你想让表 2 的一半消失?
  • 您的 awk 命令缺少的是 awk 语言。 awk 不是外壳,就像 C 不是外壳一样。它是一个完全独立的工具,有自己的语言。我和@Jidder 在一起——我不知道你的输出应该是什么,请附上解释以澄清。
  • 不,我不想让任何一行消失。我刚刚注意到格伦的代码就是这种情况。谢谢你的提示!

标签: awk


【解决方案1】:

array traversal sorting 需要 GNU awk:

gawk '
    BEGIN {PROCINFO["sorted_in"] = "@val_num_asc"} 
    function output_table() {
        for (key in table) print table[key]
        delete table
        i=0
    }
    /TARGET/ {print; getline; print; next} 
    /^$/ {output_table(); print; next} 
    {table[++i] = $0} 
    END {output_table()}
' file

输出

TARGET  1
Sample1 Sample2 Sample3 Pattern
3   3   3   z..........................Z........................................z.........Z...z
27  28  13  z..........................Z........................................z.........z...z
75  64  32  Z..........................Z........................................Z.........z...Z
147 171 49  Z..........................Z........................................Z.........Z...Z

TARGET  2
Sample1 Sample2 Sample3 Pattern
1   0   0   ...........................Z........................................Z.............Z
1   0   0   z...................................................................Z.........Z...Z
1   0   0   z..........................Z........................................Z.............Z
2   0   1   Z..........................Z........................................Z.........Z....
2   0   1   z..........................z........................................z.........Z...Z
4   8   6   Z..........................Z........................................z.........Z...z
21  21  7   z..........................Z........................................Z.........Z...Z

TARGET  3
Sample1 Sample2 Sample3 Pattern
0   0   1   z..........................z........................................z..............
0   1   2   ...........................Z........................................Z.........Z...Z
1   0   0   Z..........................Z........................................Z.............Z
1   0   0   z..........................Z........................................z.............z
1   1   0   Z..........................Z........................................Z.............z
1   3   0   z..........................z........................................Z.........Z...Z

【讨论】:

  • 完美运行 - 我很难理解这两行:{table[$1,$2,$3] = $0} & END {output_table()}。你为什么使用 GAWK?它更快吗?
  • 首先,我使用 gawk,因为我使用的是 GNU 系统,而 gawk 是默认的 awk。但是gawk实现了数组遍历排序功能,所以这个答案是必须的。
  • 这一行{table[$1,$2,$3] = $0} 将“表格”行存储在一个数组中。数组键是与 SUBSEP awk 变量连接的前 3 列。这允许您进行数字排序。 END {output_table()} 将打印第三张表。它是必需的,因为文件末尾可能没有空行。
  • 一半的记录应该消失吗?
  • 确认。非常正确。答案已更新。只需进行细微的更改
【解决方案2】:

这有点乱,但假设您在排序时不想丢失记录,这应该可行

 awk 'function sortit(){
           x=asort(a)
           for(i=1;i<=x;i++)print b[a[i]" "d[i]++]
           delete(a);delete(b);delete(c);delete(d)
     }                             
     /^[0-9]/{a[$0]=$1;b[$1" "c[$1]++]=$0}
     /TARGET/{print;getline;print}
     !NF{sortit();print}
     END(sortit()}' file

【讨论】:

  • 你说得对:我不想跳过任何行。所以你的代码完成了这项工作。您的代码中只有一个错字:在“END”之后使用括号而不是大括号。感谢您分享该代码!
  • 但是 - 它不适用于所有输入文件。以“0 0 1 x...”开头的行被删除 - 就像我修改过的输入文件一样!
猜你喜欢
  • 1970-01-01
  • 2013-06-03
  • 2015-01-29
  • 1970-01-01
  • 2017-06-14
  • 1970-01-01
  • 1970-01-01
  • 2021-11-21
  • 1970-01-01
相关资源
最近更新 更多