如果一列中的重复值而不是将另一列中的值复制到上面的一行答案

【问题标题】：if duplicate values in one column than copy value from other column to a line above如果一列中的重复值而不是将另一列中的值复制到上面的一行
【发布时间】：2013-04-16 00:47:04
【问题描述】：

我正在处理一个看起来像这样的表

C1    C2    C3
1     a     b
2     c     d
4     e     g
4     f     h
5     x     y
...   ...   ...

如果 C1 中的值相同（在本例中是 4 的两倍），我希望将 C2 和 C3 的值粘贴到 C1 中的 4 的第一行，然后我想删除第二个与 C1 中的 4 一致。所以最后应该是这样的

C1    C2    C3
1     a     b
2     c     d
4     e,f   g,h
5     x     y

我正在使用 perl 脚本。我正在使用while循环文件。我在其他脚本中使用过我的 %seen 或 count 之类的东西，但我不知道如何使用它们。看起来真的很简单……

这就是我的 while 循环目前的样子

 while (<$DATA>) {
    @columns = split
    $var1 = $columns[0]
    $var2 = $columns[1]
    $var3 = $columns[2];         
     }

【问题讨论】：

标签： perl

【解决方案1】：

使用哈希来控制重复项。在我的示例中，我使用了散列 (%info) 的散列，键为 C1 和 C2。它们中的每一个都包含一个数组引用以添加重复项。

use strict;
use warnings;

my %info = ();
while (<DATA>) {
    my @columns = split /\s+/;
    if( exists $info{ $columns[0] } ) {
        push @{ $info{ $columns[0] }->{C2} }, $columns[1];
        push @{ $info{ $columns[0] }->{C3} }, $columns[2];
    }
    else {
        $info{ $columns[0] } = { C2 =>[ $columns[1] ], C3 => [ $columns[2]] }
    }        
}  

foreach my $c1(sort {$a<=>$b} keys %info ) {
    print $c1, "\t", 
          join(',',@{$info{$c1}->{C2}}), "\t", 
          join(',',@{$info{$c1}->{C3}}), "\n";
} 


__DATA__
1     a     b
2     c     d
4     e     g
4     f     h
5     x     y

【讨论】：