在 Linux 上使用命令行工具进行文件操作答案

【问题标题】：file manipulation with command line tools on linux在 Linux 上使用命令行工具进行文件操作
【发布时间】：2015-02-25 17:54:37
【问题描述】：

我想转换这种格式的文件

1;a;34;34;a
1;a;34;23;d
1;a;34;23;v
1;a;4;2;r
1;a;3;2;d
2;f;54;3;f
2;f;34;23;e
2;f;23;5;d
2;f;23;23;g
3;t;26;67;t
3;t;34;45;v
3;t;25;34;h
3;t;34;23;u
3;t;34;34;z

转成这种格式

1;a;34;34;a;34;23;d;34;23;v;4;2;r;3;2;d
2;f;54;3;f;34;23;e;23;5;d;23;23;g;;;
3;t;26;67;t;34;45;v;25;34;h;34;23;u;34;34;z

这些是 cvs 文件，所以它应该可以与 awk 或 sed 一起使用……但我到现在都失败了。如果第一个值相同，我想将最后三个值添加到第一行。这将一直运行到文件中的最后一个条目。

这里有一些 awk 中的代码，但它不起作用：

#!/usr/bin/awk -f

BEGIN{ FS = " *; *"} 
    { ORS = "\;" }    
    {
        x = $1
        print $0
    }
     { if (x == $1)
        print $3, $4, $5
       else
        print "\n"
    }
    END{
        print "\n"
    }

【问题讨论】：

标签： linux command-line awk sed

【解决方案1】：

$ cat tst.awk
BEGIN { FS=OFS=";" }
{ curr = $1 FS $2 }
curr == prev {
    sub(/^[^;]*;[^;]*/,"")
    printf "%s", $0
    next
}
{
    printf "%s%s", (NR>1?ORS:""), $0
    prev = curr
}
END { print "" }

$ awk -f tst.awk file
1;a;34;34;a;34;23;d;34;23;v;4;2;r;3;2;d
2;f;54;3;f;34;23;e;23;5;d;23;23;g
3;t;26;67;t;34;45;v;25;34;h;34;23;u;34;34;z

【讨论】：

这在 awk 版本 20070501 上不起作用 - 我做错了什么？
我不知道awk version 20070501 是什么意思，但以上内容适用于任何现代awk。你在什么操作系统上，你的 awk 命令的路径是什么？
他在mac 上，我对其进行了测试，它按预期工作。 @querendus 您是否正确复制并粘贴了脚本？
我使用 Max OS X 10.10.2 - 所有文件都在同一个目录中：awk -f test.awk file > result
@querendus 您在复制/粘贴脚本时做错了，或者在执行脚本时做错了。请重新阅读我的回答并确保您遵循它。

【解决方案2】：

如果我对您的理解正确，您想从所有行的第 3-5 个字段中构建一行，前两个字段相同（前面是这两个字段），那么

awk -F \; 'key != $1 FS $2 { if(NR != 1) print line; key = $1 FS $2; line = key } { line = line FS $3 FS $4 FS $5 } END { print line }' filename

那是

key != $1 FS $2 {                 # if the key (first two fields) changed
  if(NR != 1) print line;         # print the line (except at the very
                                  # beginning, to not get an empty line there)

  key = $1 FS $2                  # remember the new key
  line = key                      # and start building the next line
}
{
  line = line FS $3 FS $4 FS $5   # take the value fields from each line
}
END {                             # and at the very end,
  print line                      # print the last line (that the block above
}                                 # cannot handle)

【讨论】：

【解决方案3】：

您在awk 中得到了很好的答案。这是perl中的一个：

perl -F';' -lane'
    $key = join ";", @F[0..1];               # Establish your key
    $seen{$key}++ or push @rec, $key;        # Remember the order
    push @{ $h{$key} }, @F[2..$#F]           # Build your data structure
}{ 
    $, = ";";                                # Set the output list separator
    print $_, @{ $h{$_} } for @rec' file     # Print as per order

【讨论】：

【解决方案4】：

这似乎比其他答案要复杂得多，但它增加了一些东西：

它计算所有构建行的最大字段数
将所有缺失的字段作为空白附加到已构建行的末尾
Mac 上的 posix awk 不会保持数组元素的顺序，即使在使用 for(key in array) 语法时对键进行编号也是如此。为了保持输出顺序，您可以像我一样跟踪它，或者在之后进行排序。

在输出中具有匹配数量的字段似乎是每个指定输出的要求。在不知道它应该是什么的情况下，这个 awk 脚本被构建为首先加载所有行，计算输出行中的最大字段数，然后按顺序输出带有任何调整的行。

#!/usr/bin/awk -f

BEGIN {FS=OFS=";"}

{
    key = $1
         # create an order array for the mac's version of awk
    if( key != last_key ) {
        order[++key_cnt] = key
        last_key = key
    }
    val = a[key]
        # build up an output line in array a for the given key
    start = (val=="" ? $1 OFS $2 : val)
    a[key] = start OFS $3 OFS $4 OFS $5
        # count number of fields for each built up output line
    nf_a[key] += 3
}

END {
        # compute the max number of fields per any built up output line
    for(k in nf_a) {
        nf_max = (nf_a[k]>nf_max ? nf_a[k] : nf_max)
    }
    for(i=1; i<=key_cnt; i++) {
        key = order[i]
            # compute the number of blank flds necessary
        nf_pad = nf_max - nf_a[key]
        blank_flds = nf_pad!=0 ? sprintf( "%*s", nf_pad, OFS ) : ""
        gsub( / /, OFS, blank_flds )
            # output lines along with appended blank fields in order
        print a[key] blank_flds
    }
}

如果提前知道输出行中所需的字段数量，则只需在按键开关上附加空白字段而不使用所有这些数组即可工作并制作更简单的脚本。

我得到以下输出：

1;a;34;34;a;34;23;d;34;23;v;4;2;r;3;2;d
2;f;54;3;f;34;23;e;23;5;d;23;23;g;;;
3;t;26;67;t;34;45;v;25;34;h;34;23;u;34;34;z

【讨论】：