UNIX：从分隔的文本文件创建表格格式的输出答案

【问题标题】：UNIX : Creating a table formatted ouput from a delimited text fileUNIX：从分隔的文本文件创建表格格式的输出
【发布时间】：2020-05-06 06:10:40
【问题描述】：

我需要从文本文件中获取表格格式的输出，我通过下面的 awk 命令来实现它。

分隔文件

ACTIVE#1238917238971238#USA#The U.S. is a country of 50 states covering a vast swath of North America.
ACTIVE#21389721839781237812#INDIA#India, officially the Republic of India, is a country in South Asia.
ACTIVE#3121278372183782137812#AUSTRALIA#Australia, officially the Commonwealth of Australia, is a sovereign country comprising the mainland of the Australian continent

AWK 命令

awk -F"#" 'BEGIN {{printf "%-80s\n","--------------------------------------------------------------------------------------------------------------------------------------"} {printf "|%-12s|%-30s|%-38s|%-50s|\n","STATUS","ID", "Country", "Description"} {printf "%-80s\n","--------------------------------------------------------------------------------------------------------------------------------------"}} {printf "|%-12s|%-30s|%-38s|%-50s|\n",$1,$2,$3,$4} END{printf "%-80s\n","--------------------------------------------------------------------------------------------------------------------------------------"}' /tmp/test.txt

输出：

如果您可以看到描述列的输出，它不会在自己的列中格式化输出，而是由于字符串长度而弄乱了整个表。

有人可以查看并建议我如何更好地显示“描述”列的输出吗？

【问题讨论】：

调整终端窗口大小？

标签： shell unix awk

【解决方案1】：

我会在 perl 中使用 Term::Table 模块（可通过操作系统的包管理器或从 CPAN 安装），它会自动计算列宽并根据需要换行：

#!/usr/bin/env perl
use strict;
use warnings;
use feature qw/say/;
use Term::Table;

my @lines = map { chomp; [ split /#/ ] } <>;
say for Term::Table->new(
    max_width => 80,
    header => ["Status", "ID", "Country", "Description"],
    rows => \@lines
    )->render;

示例用法：

$ ./table.pl < input.txt
+--------+--------------------------+-----------+--------------------------+
| Status | ID                       | Country   | Description              |
+--------+--------------------------+-----------+--------------------------+
| ACTIVE | 1238917238971238         | USA       | The U.S. is a country of |
|        |                          |           |  50 states covering a va |
|        |                          |           | st swath of North Americ |
|        |                          |           | a.                       |
|        |                          |           |                          |
| ACTIVE | 21389721839781237812     | INDIA     | India, officially the Re |
|        |                          |           | public of India, is a co |
|        |                          |           | untry in South Asia.     |
|        |                          |           |                          |
| ACTIVE | 3121278372183782137812   | AUSTRALIA | Australia, officially th |
|        |                          |           | e Commonwealth of Austra |
|        |                          |           | lia, is a sovereign coun |
|        |                          |           | try comprising the mainl |
|        |                          |           | and of the Australian co |
|        |                          |           | ntinent                  |
+--------+--------------------------+-----------+--------------------------+

想想看，它也可以在没有任何非核心模块的情况下完成，这要感谢 perl formats。我实际上更喜欢这种方式，因为它可以更好地换行（虽然改变表格的整体宽度甚至个别列变得更加麻烦）：

#!/usr/bin/env perl
use strict;
use warnings;
use feature qw/say/;

my ($status, $id, $country, $description);
while (<>) {
    chomp;
    ($status, $id, $country, $description) = split /#/;
    write;
}
say "+--------+------------------------+-----------+-------------------------------+";

format STDOUT_TOP =
+--------+------------------------+-----------+-------------------------------+
| Status | Id                     | Country   | Description                   |
+--------+------------------------+-----------+-------------------------------+
.

format STDOUT =
| @<<<<< | @<<<<<<<<<<<<<<<<<<<<< | @<<<<<<<< | ^<<<<<<<<<<<<<<<<<<<<<<<<<<<< |
  $status, $id,                     $country,   $description
|~~      |                        |           | ^<<<<<<<<<<<<<<<<<<<<<<<<<<<< |
                                                $description
|        |                        |           |                               |
.

$ ./table.pl < input.txt
+--------+------------------------+-----------+-------------------------------+
| Status | Id                     | Country   | Description                   |
+--------+------------------------+-----------+-------------------------------+
| ACTIVE | 1238917238971238       | USA       | The U.S. is a country of 50   |
|        |                        |           | states covering a vast swath  |
|        |                        |           | of North America.             |
|        |                        |           |                               |
| ACTIVE | 21389721839781237812   | INDIA     | India, officially the         |
|        |                        |           | Republic of India, is a       |
|        |                        |           | country in South Asia.        |
|        |                        |           |                               |
| ACTIVE | 3121278372183782137812 | AUSTRALIA | Australia, officially the     |
|        |                        |           | Commonwealth of Australia, is |
|        |                        |           | a sovereign country           |
|        |                        |           | comprising the mainland of    |
|        |                        |           | the Australian continent      |
|        |                        |           |                               |
+--------+------------------------+-----------+-------------------------------+

【讨论】：

感谢肖恩的建议和时间。 :)

【解决方案2】：

我会让 UNIX 实用程序 fold 对要换行的字段进行换行，因为它知道尝试在空白处拆分等，以使换行的文本尽可能可读：

$ cat tst.awk
BEGIN {
    FS = "#"
    OFS = "|"
}
NR == 1 {
    split("8 12 10 45",fldWidths," ")
    rowWidth = NF + 1   # for the OFSs between fields and at the start/end of the line
    for (i in fldWidths) {
        rowWidth += fldWidths[i]
    }

    rowSep = sprintf("%*s",rowWidth,"")
    gsub(/ /,"-",rowSep)

    print rowSep
    split("STATUS ID Country Description",hdrs," ")
    for (i=1; i<=NF; i++) {
        printf "%s%-*s", OFS, fldWidths[i], hdrs[i]
    }
    print OFS
    print rowSep
}
{
    numRows = 0
    for (fldNr=1; fldNr<=NF; fldNr++) {
        cmd = "printf \047%s\n\047 \047" $fldNr "\047 | fold -s -w " fldWidths[fldNr]
        rowNr = 0
        while ( (cmd | getline line) > 0 ) {
            rows[++rowNr,fldNr] = line
            numRows = (numRows > rowNr ? numRows : rowNr)
        }
        close(cmd)
    }
    for (rowNr=1; rowNr<=numRows; rowNr++) {
        for (fldNr=1; fldNr<=NF; fldNr++) {
            printf "%s%-*s", OFS, fldWidths[fldNr], rows[rowNr,fldNr]
        }
        print OFS
    }
    print rowSep
}

$ awk -f tst.awk file
--------------------------------------------------------------------------------
|STATUS  |ID          |Country   |Description                                  |
--------------------------------------------------------------------------------
|ACTIVE  |123891723897|USA       |The U.S. is a country of 50 states covering  |
|        |1238        |          |a vast swath of North America.               |
--------------------------------------------------------------------------------
|ACTIVE  |213897218397|INDIA     |India, officially the Republic of India, is  |
|        |81237812    |          |a country in South Asia.                     |
--------------------------------------------------------------------------------
|ACTIVE  |312127837218|AUSTRALIA |Australia, officially the Commonwealth of    |
|        |3782137812  |          |Australia, is a sovereign country comprising |
|        |            |          |the mainland of the Australian continent     |
--------------------------------------------------------------------------------

根据需要调整字段宽度。

【讨论】：

完美！我期待的完美答案。
非常感谢您的脚本！非常感谢:)

【解决方案3】：

编辑： 使用标题尝试关注。

awk -v line="-----------------------------------" '
BEGIN{
  FS="#"
  OFS="|"
  num=split("STATUS,ID,Country,Description",a,",")
  print line
}
FNR==NR{
  for(i=2;i<=NF;i++){
    max[i]=max[i]>=length($i)?max[i]:length($i)
  }
  next
}
FNR==1{
  for(i=1;i<=num;i++){
    header=(header?header OFS:"")sprintf("%-"max[i]"s",a[i])
  }
  print header
}
{
  for(i=1;i<=NF;i++){
    $i=sprintf("%-"max[i]"s",$i)
  }
}
1;
END{
  print line
}
'  Input_file  Input_file

由于 OP 没有提到在字段中添加空格的逻辑，但是从查看输出可以说它可能基于字段中的最大长度值，所以基于这个假设，您能否尝试以下操作（基于测试和编写显示样本）。

awk '
BEGIN{
  FS="#"
  OFS="|"
}
FNR==NR{
  for(i=2;i<=NF;i++){
    max[i]=max[i]>=length($i)?max[i]:length($i)
  }
  next
}
{
  for(i=1;i<=NF;i++){
    $i=sprintf("%-"max[i]"s",$i)
  }
}
1
'  Input_file Input_file

上述解决方案的解释：为上述添加详细解释。

awk '                                                ##Starting awk program from here.
BEGIN{                                               ##Starting BEGIN section of this program from here.
  FS="#"                                             ##Setting OFS as | here for all lines.
  OFS="|"
}
FNR==NR{                                             ##Checking condition FNR==NR which will be TRUE when first Input_file is being read here.
  for(i=2;i<=NF;i++){                                ##Running for loop from 2nd field to last field of lines.
    max[i]=max[i]>=length($i)?max[i]:length($i)      ##Creating array max with index and value of either current field length OR max array value.
  }
  next                                               ##next will skip all further statements from here.
}
{
  for(i=1;i<=NF;i++){                                ##Running for loop from 1st field to last field of lines.
    $i=sprintf("%-"max[i]"s",$i)                     ##Re-creating first field with sprintf and adding spaces after each field value.
  }
}
1                                                    ##Mentioning 1 will print current line here.
' Input_file Input_file                              ##Mentioning Input_file names here.

【讨论】：

能否请您帮忙看看表格的轮廓，表格的结构
@ManojKumar，当然，你能试试我的 EDIT 解决方案吗？

【解决方案4】：

这是另一个 awk。它计算字段的平均长度，然后计算终端用于输出的比例。可能有比平均（或最大值）更好的方法，但我只尝试了这两种方法。它使用tput cols 来获取终端宽度：

$ awk '
BEGIN {
    FS="#"                                             # delims
    OFS=""                                             # to allow length==0
}
NR==FNR {                                              # avg field lenghts *
    for(i=1;i<=NF;i++)
        avg[i]+=length($i)
    next
}
FNR==1 {
    if(("tput cols"|getline cols)<0 || cols<2*NF-1) {  # get terminal width
        print "Yours is too small"                     # exit if too small
        exit                                           # in reality fails when
    }                                                  # field width rounds to 0
    for(i in avg) {         
        avg[i]=avg[i]/(NR-1)                           # * avg divided here
        avgs+=avg[i]
    }
    for(i=1;i<=NF;i++)                                 # below: field terminal 
        size[i]=((v=sprintf("%0.f",((avg[i]/avgs)*cols)-1))>0?v:1) # proportions
}                                                      # rounded with %0.f, min 1
{
    while(length>0)                                    # while unprinted chars
    for(i=1;i<=NF;i++) {                               # keep outputing
        printf "%-" size[i] "s%s",substr($i,1,size[i]),(i==NF?ORS:"|")
        $i=substr($i,size[i]+1)                        # cut printed from fields
    }
}' file file                                           # 2 runs

64 宽终端的输出：

AC|123891723|US|The U.S. is a country of 50 states covering a v
TI|8971238  |A |ast swath of North America.                    
VE|         |  |                                               
AC|213897218|IN|India, officially the Republic of India, is a c
TI|397812378|DI|ountry in South Asia.                          
VE|12       |A |                                               
AC|312127837|AU|Australia, officially the Commonwealth of Austr
TI|218378213|ST|alia, is a sovereign country comprising the mai
VE|7812     |RA|nland of the Australian continent              
  |         |LI|                                               
  |         |A |

【讨论】：

感谢詹姆斯的建议和时间。 :)