如何在 Perl 中按递增顺序排列/排序数组的第一列？答案

【问题标题】：How to arrange/sort the first column of array in increasing order in Perl?如何在 Perl 中按递增顺序排列/排序数组的第一列？
【发布时间】：2021-01-16 04:18:57
【问题描述】：

这是我的文件的样子，

Time   Send     Receive  Address
100    35       57       x9871
03     37       59       x9873
45     39       61       x9875
90     41       63       x9877
1234   43       65       x9879
45     76       89       x9768

我想按照时间递增的顺序排列数组，它应该看起来像，

Time   Send     Receive  Address
03     37       59       x9873
45     76       89       x9768
45     39       61       x9875
90     41       63       x9877
100    35       57       x9871
1234   43       65       x9879

如果时间相同，它应该打印两个时间。到目前为止，我只能逐行读取文件。

#!usr/bin/perl

use warnings;
use strict;

my $logout_file = "ll.log";
my $temp1 = "temp1.log";

open(OUT, "+>>$logout_file") or die "Could not open file $logout_file: $!";   
open (tmp,"<tempp1") or die "Couldn't open $fname";
while(my $aa= <tmp> ) {
@fields = split " ",$aa;
say OUT join("|",@fields));
}
    close fh;
}

主要是我不知道我应该如何开始。我在谷歌上搜索了很多东西，但没有找到任何相关的东西。请任何人建议如何在不使用任何模块的情况下以标准方式（while 循环或 foreach 循环）进行操作。谢谢。

更新：如果特定行中的某些列增加，（log2.txt）

Time   Send     Receive  Address
100    35       57       x9871
03     37       59       x9873
45     39       61       x9875    x7890   x8976
90     41       63       x9877    x8765
1234   43       65       x9879
45     76       89       x9768

使用Data::Dumper 更新输出看起来像，

$VAR1 = {
          '9' => [
                   '9  41 63 x9877'
                 ],
          '345678' => [
                        '345678 4554 5445 5656'
                      ],
          '3' => [
                   '3  37 59 x9873'
                 ],
         
        };

输出

Time   Send     Receive  Address
03     37       59       x9873
03     37       59       x9873
45     39       61       x9875
45     76       89       x9768
45     39       61       x9875    x7890   x8976
45     76       89       x9768
90     41       63       x9877
90     41       63       x9877    x8765
100    35       57       x9871
100    35       57       x9871
1234   43       65       x9879
1234   43       65       x9879

【问题讨论】：

您能否详细说明Time 的含义？是一秒钟吗？例如。 100 秒、3 秒、45 秒..？
@vkk05，是的，我们也可以在几秒钟内完成。
相关：stackoverflow.com/q/64121838/725418

标签： arrays regex perl hash

【解决方案1】：

这个问题的算法如下

使用Time 字段作为哈希键
使用Time键将读取的行推入作为哈希一部分的数组中
打印标题
在每个数组元素的排序键上打印哈希

use strict;
use warnings;
use feature 'say';

my $fname1 = 'log1.txt';
my $fname2 = 'log2.txt';

my($header,%data);

read_file($fname1);
read_file($fname2);

say $header;
for my $time ( sort { $a <=> $b } keys %data ) {
    say for @{$data{$time}};
}

exit 0;

sub read_file {
    my $fname = shift;
    
    open my $fh, '<', $fname
        or die "Couldn't open $fname";
    
    while( <$fh> ) {
        chomp;
        next if /^#Log/;
        my @line = split;
        if( /^Time/ ) {
            $header = $_;
        } else {
            push @{$data{$line[0]}},$_;
        }
    }
    
    close $fh;
}

输出

Time   Send     Receive  Address
03     37       59       x9873
03     37       59       x9873
45     39       61       x9875
45     76       89       x9768
45     76       89       x9768
45     39       61       x9875
90     41       63       x9877
90     41       63       x9877
100    35       57       x9871
100    35       57       x9871

【讨论】：

您的答案现在有效，很高兴看到您使用了我的建议。尽管我仍然不会使用 sub，也不会在其中使用全局变量。使用菱形运算符而不是硬编码文件名将使其更可重用。从技术上讲，也不需要使用哈希，可以使用二维数组并将 Time 存储在索引 0 中，将字符串存储在索引 1 中，而不需要相同 Times 中的数组。这是一种Schwartzian transform、for (map { $_->[1] } sort { $a->[0] <=> $b->[0] } @data ) { say }。干得好。
@TLP -- 我不在乎你的建议，我用的是健康的感觉 -- 这次 OP 声明输入数据与 Time 字段有重复。
有毒没有意义。这是一个学习环境。
@PolarBear，在我的代码中，如果我在 say for @{$data{$time}}; 中写入 for，它不会打印任何内容，如果我删除 for，那么相同的键将打印在同一行。
@HG -- 您应该使用使用Data::Dumper 查看您的数据，这将帮助您了解数据的组织方式。

【解决方案2】：

正如我在your previous question 中提到的关于这个主题的，这可以用一条线来解决。

perl -e'print sort { $a <=> $b} grep /^\d/,<>' log1.log log2.log

您可以将该代码放入文件中并像这样运行它：

$ perl foo.pl log1.log log2.log > log_all.log

您需要自己修复标题。

解释：

<> 在列表上下文中读取所有输入行（可以是标准输入和文件数据），grep /^\d/ 删除所有不以数字开头的行，sort { $a <=> $b} 按数字排序剩余行，print打印它们。

您不需要拆分行或进行任何处理。

此代码有一个警告，它将文件的整行转换为一个数字。 IE。它将采用03 37 59 x9873 之类的字符串并将其转换为数字03。当它这样做时，它将使用第一个字符串 03 并忽略其余部分。如果您启用了警告，您将收到很多警告，因此我们将其关闭，因为我们知道我们在做什么：我们只是按行上的第一个数字对行进行排序。而且由于我们没有任何不以数字开头的行，我们可以这样做。

如果我在你的新输入上尝试我的 oneliner，我会得到：

03     37       59       x9873
03     37       59       x9873
45     39       61       x9875
45     76       89       x9768
45     76       89       x9768
45     39       61       x9875
90     41       63       x9877
90     41       63       x9877
100    35       57       x9871
100    35       57       x9871
1234   43       65       x9879
1234   43       65       x9879

这似乎符合您的要求。

如果你真的想使用use warnings，你可以用no warnings 'numeric'在词法上关闭这些警告。

【讨论】：

OP 的输出包括标题，您的输出省略标题。
@PolarBear 是的，我在回答中提到了这一点

【解决方案3】：

看看这对你有没有帮助。

我正在考虑将每个行号作为哈希的键，根据行号，整个行数据将存储在哈希中。

#!/usr/bin/perl

use strict;
use warnings;

use feature 'say';

my (@fields, %hash);
my $count = 0;

while (my $aa = <DATA>){
    
    next if ($aa =~ /Time\s+Send\s+Receive\s+Address/); 
    
    $count++;
    
    @fields = split " ",$aa;
    
    $hash{$count}{TIME}    = $fields[0];
    $hash{$count}{SEND}    = $fields[1];
    $hash{$count}{RECEIVE} = $fields[2];
    $hash{$count}{ADDRESS} = $fields[3];
}

my @headers = ("Time", "Send", "Receive", "Address");
say join("\t", @headers);

foreach my $key (sort { $hash{$a}->{TIME} <=> $hash{$b}->{TIME} } keys %hash){
    say "$hash{$key}{TIME}\t$hash{$key}{SEND}\t$hash{$key}{RECEIVE}\t$hash{$key}{ADDRESS}";
}

__DATA__
Time   Send     Receive  Address
100    35       57       x9871
03     37       59       x9873
45     39       61       x9875
90     41       63       x9877
1234   43       65       x9879
45     76       89       x9768

输出：

Time    Send    Receive Address
03      37      59      x9873
45      76      89      x9768
45      39      61      x9875
90      41      63      x9877
100     35      57      x9871
1234    43      65      x9879

【讨论】：

与其使用数字递增的哈希作为键，然后对这些键进行排序，为什么不直接使用数组呢？
@vkk05，你能检查有问题的更新部分吗？谢谢你。对于这种情况，我添加 `$hash{$count}{ADD} = $fields[5..$length];`length 是行长度，但这不起作用。你有什么建议。