【问题标题】：Perl CSV to hashPerl CSV 散列
【发布时间】：2013-02-24 01:05:24
【问题描述】：

我有一个 CSV 文件，其中包含标题行和数据之前的注释文本，我想将其作为哈希读入以进行进一步操作。主键具有哈希值，将是两个数据值的组合。我该怎么做？

使用模式“索引”搜索标题行
为键使用标题
读入文件的其余部分。

CSV 示例

#
#
#
#
Description information of source of file.

index,label,bit,desc,mnemonic
6,370,11,three,THRE
9,240,23,four,FOR
11,120,n/a,five,FIV

所需的哈希示例

( '37011' => { 'index' => '6', 'label' => '370', 'bit' => '11', 'desc' => 'three', 'mnemonic' => 'THRE'}, '24023' => {'index' => '9', 'label'  => '240', 'bit' => '23', 'desc' => 'four', 'mnemonic' => 'FOR'}, '120n/a' => {'index' => '11', 'label'  => '120', 'bit' => 'n/a', 'desc' => 'five', 'mnemonic' => 'FIV'} )

【问题讨论】：

标签： perl csv hash

【解决方案1】：

为此，您需要 Text::CSV 模块：

#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
use Text::CSV;

my $filename = 'test.csv';

# watch out the encoding!
open(my $fh, '<:utf8', $filename)
    or die "Can't open $filename: $!";

# skip to the header
my $header = '';
while (<$fh>) {
    if (/^index,/x) {
        $header = $_;
        last;
    }
}

my $csv = Text::CSV->new
    or die "Text::CSV error: " . Text::CSV->error_diag;

# define column names    
$csv->parse($header);
$csv->column_names([$csv->fields]);

# parse the rest
while (my $row = $csv->getline_hr($fh)) {
    my $pkey = $row->{label} . $row->{bit};
    print Dumper { $pkey => $row };
}

$csv->eof or $csv->error_diag;
close $fh;

【讨论】：

Text::CSV::Simple 让这一切变得更加简单。

【解决方案2】：

你总是可以这样做：

#!/usr/bin/env perl

use strict;
use warnings;

my %hash;
while( <DATA> ){ last if /index/ } # Consume the header
my $labels = $_;  # Save the last line for hash keys
chop $labels;
while(<DATA>){
    chop;
    my @a = split ',';
    my $idx = 0;
    my %h = map { $_ => $a[$idx++]} split( ",", $labels );
    $hash{ $a[1] . $a[2] } = \%h;
}

while( my ( $K, $H ) = each %hash ){
    print "$K :: ";
    while( my( $k, $v ) = each( %$H ) ) {
        print $k . "=>" . $v . "  ";
    }
    print "\n";
}

__DATA__

#
#
#
#
Description information of source of file.

index,label,bit,desc,mnemonic
6,370,11,three,THRE
9,240,23,four,FOR
11,120,n/a,five,FIV

【讨论】：

我同意...如果您知道输入格式，则无需调用笨拙的模块和数千行代码。

【解决方案3】：

Text::CSV::Simple 自 2005 年以来一直存在...

来自文档：

# Map the fields to a hash
my $parser = Text::CSV::Simple->new;
$parser->field_map(qw/id name null town/);
my @data = $parser->read_file($datafile);

...简单！

【讨论】：

【解决方案4】：

简单、可粘贴的解析器

sub parse_csv {
    my ($f, $s, %op) = @_;  # file, sub, options
    my $d = $op{delim}?$op{delim}:"\t";  # delimiter, can be a regex
    open IN, $f; 
    $_=<IN>; chomp;
    my @h=map {s/"//g; lc} split /$d/; # header assumed, could be an option
    $h[0]="id" if $h[0] eq ""; # r compatible
    while(<IN>) {
        chomp;
        my @d=split /$d/;
        map {s/^"//; s/"$//;} @d; # any file with junk in it should fail anyway
        push @h, "id" if (@h == (@d - 1)); # r compat
        my %d=map {$h[$_]=>$d[$_]} (0..$#d);
        &{$s}(\%d);
    }
}

示例用法：

parse_csv("file.txt", sub {
   die Dumper $_[0];
})

请注意，像 $. 和 $_ 这样的东西仍然可以在 sub 中使用

【讨论】：

杰作！你能简单解释一下吗，比如push @h, "id" if (@h == (@d - 1)); 行或&{$s}(\%d); 行是做什么的？ lc 是什么？ r compatible 是什么意思？