【发布时间】:2011-07-02 06:44:26
【问题描述】:
我是 Perl 新手,对语法有疑问。我收到了用于解析包含特定信息的文件的代码。我想知道子程序get_number 的if (/DID/) 部分在做什么?这是利用正则表达式吗?我不太确定,因为正则表达式匹配看起来像$_ =~ /some expression/。最后,get_number子程序中的while循环是否必要?
#!/usr/bin/env perl
use Scalar::Util qw/ looks_like_number /;
use WWW::Mechanize;
# store the name of all the OCR file names in an array
my @file_list=qw{
blah.txt
};
# set the scalar index to zero
my $file_index=0;
# open the file titled 'outputfile.txt' and write to it
# (or indicate that the file can't be opened)
open(OUT_FILE, '>', 'outputfile.txt')
or die "Can't open output file\n";
while($file_index < 1){
# open the OCR file and store it in the filehandle IN_FILE
open(IN_FILE, '<', "$file_list[$file_index]")
or die "Can't read source file!\n";
print "Processing file $file_list[$file_index]\n";
while(<IN_FILE>){
my $citing_pat=get_number();
get_country($citing_pat);
}
$file_index=$file_index+1;
}
close IN_FILE;
close OUT_FILE;
get_number的定义如下。
sub get_number {
while(<IN_FILE>){
if(/DID/){
my @fields=split / /;
chomp($fields[3]);
if($fields[3] !~ /\D/){
return $fields[3];
}
}
}
}
【问题讨论】:
标签: regex perl parsing file-io screen-scraping