根据关键字列表搜索压缩日志文件答案

【问题标题】：Search a compressed log file against a list of keywords根据关键字列表搜索压缩日志文件
【发布时间】：2016-05-11 06:00:59
【问题描述】：

我正在尝试打开一个日志文件，根据关键字列表搜索它，打印包含该关键字的每一行，然后将结果文件压缩为 .gz。

我想出了下面的代码，它开始运行时没有编译错误。它写入结果文件，但是当我运行脚本时，它永远不会完成，也永远找不到任何结果。有什么帮助吗？

    #!/usr/bin/perl 

use IO::Uncompress::Gunzip qw($GunzipError);
use IO::Compress::Gzip qw(gzip $GzipError) ;
use diagnostics;
use strict;
use warnings;

my %LOGLINES = ();
my %count = ();

open(FILE, "</data/bro/scripts/Keywords.txt"); 
my %keywords = map { chomp $_; $_, 1 } <FILE>; 
close(FILE);

my $logfile = IO::Uncompress::Gunzip->new( "/data/bro/logs/2016-05-05/http.00:00:00-06:00:00.log.gz" )
    or die "IO::Uncompress::Gunzip failed: $GunzipError\n"; 

open(FILE, "+>Results.txt"); 
my @results = <FILE>; 

foreach my $line ($logfile) { 
    while (<>) {
        my @F=split("\t");
            next unless ($F[2] =~ /^(199|168|151|162|166|150)/);

        $count{ $F[2] }++;

        if ($count{ $F[2] } == 10) {
            print @{ $LOGLINES{$F[2]} };   # print all the log lines we've seen so far
            print $_;                      # print the current line
        } elsif ($count{ $F[2] } > 10) {
            print $_;                      # print the current line
        } else {
            push @{ $LOGLINES{$F[2]} }, $_; # store the log line for later use
        }

    my $flag_found = grep {exists $keywords{$_} } split /\s+/, $line;
    print $line if $flag_found;
    }
}
IO::Compress::Gzip("results.gz")
            or die "IO::Compress::Gunzip failed: $GzipError\n";   
close(FILE);

【问题讨论】：

一般while (<>) 行涉及键盘输入。也许这就是你的脚本“永远不会完成”的原因。
@red0ct 是正确的。 while 循环的目的是什么？它要你输入东西。您已经使用 foreach 循环了 $logfile 的行（虽然只有一次，因为您没有在 ::Gunzip 对象上调用任何东西）。
while 循环会一直搜索日志文件的每一行，直到到达末尾。我是不是走错路了？
我认为while (<$logfile>) 会帮助你。因为 AFAIK 对象，由IO::Uncompress::Gunzip->new 返回可以像普通文件句柄一样处理..
另外，为什么要以读写模式打开Results.txt？你从不使用@results，你也不写信给FILE。你有strict 和warnings，你的代码看起来很不错。还可以使用三参数开放来使其更好。 :)

标签： perl grep

【解决方案1】：

你的脚本中可能不需要while (<>)循环，因为这一行涉及键盘输入。

IO::Uncompress::Gunzip->new constructor 返回的对象 $logfile 可以像普通文件句柄一样处理，所以你可以像 while (<$logfile>) 那样做：

use IO::Uncompress::Gunzip qw($GunzipError);
use IO::Compress::Gzip qw(gzip $GzipError) ;
use strict;
use warnings;
use feature 'say';

#...
my @loglines;

open my $fh, '</data/bro/scripts/Keywords.txt' or die "$!";
my %keywords = map { chomp; $_ => 0 } <$fh>;
close $fh;

my $logfile = IO::Uncompress::Gunzip->new( "..." )
    or die "IO::Uncompress::Gunzip failed: $GunzipError\n"; 

while (<$logfile>) {
    my @line = split /\t/;
    next if ! $line[2];
    for my $key (keys %keywords) {
        if ($line[2] =~ /^$key/) { $keywords{$key}++; push @loglines, $_; say; last  }
    }
}
# ... pack using gzip

因此，@loglines 数组包含日志中的所有行，其中包含您在第三个 ($line[2]) 开头的关键字之一，由 '\t' 子字符串分隔。 %keywords 哈希包含关键字作为键和它们的出现频率作为值。

注释（编辑）：您可以将日志行存储在散列中，其中每个键可以是关键字，每个值 - 匹配行（或子字符串或两者）的数组/散列。例如，我只是将匹配的行推入数组。您可以根据需要进行操作，然后以方便的方式使用 gzip 进行打包。
此外，最好不要使用像 FILE 这样的全局名称，因为在这种情况下，其他代码可能会意外使用它。除了验证您是否已成功打开文件句柄，例如以or die 为例。

【讨论】：

@MichaelMeis 已编辑。另请看this

【解决方案2】：

IO::Uncompress::Gunzip->new 返回一个 IO::Uncompress::Gunzip 对象。

foreach my $line ($logfile) { 
    while (<>) {
      ...
    }
}

没有意义，它只是将 $line 设置为 IO::Uncompress::Gunzip 对象，然后等待键盘输入。

改为尝试：

while (my $line = <$logfile>) {
  ...
}

您也没有正确使用 IO::Compress::Gzip。您可以在处理日志文件并将其与打印一起使用之前创建 IO::Compress::Gzip 对象。像下面这样的东西应该可以工作：

...
my $z = IO::Compress::Gzip->new("results.gz")
            or die "IO::Compress::Gunzip failed: $GzipError\n";
while (my $line = <$logfile>) {
    my @F=split("\t", $line);
        next unless ($F[2] =~ /^(199|168|151|162|166|150)/);

    $count{ $F[2] }++;

    if ($count{ $F[2] } == 10) {
        print $z @{ $LOGLINES{$F[2]} };   # print all the log lines we've seen so far
        print $z $line;                      # print the current line
    } elsif ($count{ $F[2] } > 10) {
        print $z $line;                      # print the current line
    } else {
        push @{ $LOGLINES{$F[2]} }, $_; # store the log line for later use
    }

    my $flag_found = grep {exists $keywords{$_} } split /\s+/, $line;
    print $z $line if $flag_found;
}

您应该查看 IO::Uncompress::Gunzip 和 IO::Compress::Gzip 的文档（使用 perldoc 或 cpan.org）。它显示了这些模块的正确使用示例。

【讨论】：

谢谢你，我仍然很困惑，但我认为这让我现在朝着正确的方向前进。我正在阅读有关 gzip 的更多信息并研究我遇到的新错误。