【问题标题】:Perl CSV to hashPerl CSV 散列
【发布时间】:2013-02-24 01:05:24
【问题描述】:

我有一个 CSV 文件,其中包含标题行和数据之前的注释文本,我想将其作为哈希读入以进行进一步操作。主键具有哈希值,将是两个数据值的组合。我该怎么做?

  1. 使用模式“索引”搜索标题行
  2. 为键使用标题
  3. 读入文件的其余部分。

CSV 示例

#
#
#
#
Description information of source of file.

index,label,bit,desc,mnemonic
6,370,11,three,THRE
9,240,23,four,FOR
11,120,n/a,five,FIV

所需的哈希示例

( '37011' => { 'index' => '6', 'label' => '370', 'bit' => '11', 'desc' => 'three', 'mnemonic' => 'THRE'}, '24023' => {'index' => '9', 'label'  => '240', 'bit' => '23', 'desc' => 'four', 'mnemonic' => 'FOR'}, '120n/a' => {'index' => '11', 'label'  => '120', 'bit' => 'n/a', 'desc' => 'five', 'mnemonic' => 'FIV'} )   

【问题讨论】:

    标签: perl csv hash


    【解决方案1】:

    为此,您需要 Text::CSV 模块:

    #!/usr/bin/env perl
    use strict;
    use warnings;
    use Data::Dumper;
    use Text::CSV;
    
    my $filename = 'test.csv';
    
    # watch out the encoding!
    open(my $fh, '<:utf8', $filename)
        or die "Can't open $filename: $!";
    
    # skip to the header
    my $header = '';
    while (<$fh>) {
        if (/^index,/x) {
            $header = $_;
            last;
        }
    }
    
    my $csv = Text::CSV->new
        or die "Text::CSV error: " . Text::CSV->error_diag;
    
    # define column names    
    $csv->parse($header);
    $csv->column_names([$csv->fields]);
    
    # parse the rest
    while (my $row = $csv->getline_hr($fh)) {
        my $pkey = $row->{label} . $row->{bit};
        print Dumper { $pkey => $row };
    }
    
    $csv->eof or $csv->error_diag;
    close $fh;
    

    【讨论】:

    【解决方案2】:

    你总是可以这样做:

    #!/usr/bin/env perl
    
    use strict;
    use warnings;
    
    my %hash;
    while( <DATA> ){ last if /index/ } # Consume the header
    my $labels = $_;  # Save the last line for hash keys
    chop $labels;
    while(<DATA>){
        chop;
        my @a = split ',';
        my $idx = 0;
        my %h = map { $_ => $a[$idx++]} split( ",", $labels );
        $hash{ $a[1] . $a[2] } = \%h;
    }
    
    while( my ( $K, $H ) = each %hash ){
        print "$K :: ";
        while( my( $k, $v ) = each( %$H ) ) {
            print $k . "=>" . $v . "  ";
        }
        print "\n";
    }
    
    __DATA__
    
    #
    #
    #
    #
    Description information of source of file.
    
    index,label,bit,desc,mnemonic
    6,370,11,three,THRE
    9,240,23,four,FOR
    11,120,n/a,five,FIV
    

    【讨论】:

    • 我同意...如果您知道输入格式,则无需调用笨拙的模块和数千行代码。
    【解决方案3】:

    Text::CSV::Simple 自 2005 年以来一直存在...

    来自文档:

    # Map the fields to a hash
    my $parser = Text::CSV::Simple->new;
    $parser->field_map(qw/id name null town/);
    my @data = $parser->read_file($datafile);
    

    ...简单!

    【讨论】:

      【解决方案4】:

      简单、可粘贴的解析器

      sub parse_csv {
          my ($f, $s, %op) = @_;  # file, sub, options
          my $d = $op{delim}?$op{delim}:"\t";  # delimiter, can be a regex
          open IN, $f; 
          $_=<IN>; chomp;
          my @h=map {s/"//g; lc} split /$d/; # header assumed, could be an option
          $h[0]="id" if $h[0] eq ""; # r compatible
          while(<IN>) {
              chomp;
              my @d=split /$d/;
              map {s/^"//; s/"$//;} @d; # any file with junk in it should fail anyway
              push @h, "id" if (@h == (@d - 1)); # r compat
              my %d=map {$h[$_]=>$d[$_]} (0..$#d);
              &{$s}(\%d);
          }
      }
      

      示例用法:

      parse_csv("file.txt", sub {
         die Dumper $_[0];
      })
      

      请注意,像 $. 和 $_ 这样的东西仍然可以在 sub 中使用

      【讨论】:

      • 杰作!你能简单解释一下吗,比如push @h, "id" if (@h == (@d - 1)); 行或&amp;{$s}(\%d); 行是做什么的? lc 是什么? r compatible 是什么意思?
      猜你喜欢
      • 2011-03-13
      • 2013-03-14
      • 2019-01-13
      • 1970-01-01
      • 1970-01-01
      • 2012-08-03
      • 2015-07-23
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多