【问题标题】:Out of memory error while using Text::CSV_XS and reading multiple CSV files使用 Text::CSV_XS 并读取多个 CSV 文件时出现内存不足错误
【发布时间】:2014-12-10 08:01:11
【问题描述】:

下面是我打印历史不为空的唯一项目列表的代码。

use strict;
use warnings;
use Text::CSV_XS qw ( csv );

my $q = 0;
my $r = 0;
my @array1;
my @array2;
my @array3;
my %uniqueproject;
my @files = glob("*.csv");
foreach $s (@files) {
    open( my $fh, "<", "$s" ) or die "cannot open the file $!";
    my @aoh = @{ csv( in => $fh, headers => "auto" ) };
    foreach my $i (@aoh) {
        if ( defined( $aoh[$q]{History} ) ) {
            if ( $aoh[$q]{History} ne "" ) {
                $array1[$r] = $aoh[$q]{PROJECT};
                $array2[$r] = $aoh[$q]{IDENTIFIER};
                $r++;
            }
        }
        $q++;
    }
    close($fh);
}
foreach (@array1) {
    $uniqueproject{$_} = 1;
}
@array3 = keys(%uniqueproject);
foreach (@array3) {
    print $_. "\n";
}

如果文件夹中只有一个 CSV,上述代码可以正常工作。对于多个 CSV 文件,我收到内存不足错误。我无法理解此错误的原因。请让我知道是什么填满了内存。如果 foreach 循环不适合遍历文件,建议使用正确的循环。

我的示例 CSV 是

test1.csv:

"SEVERITY","DESCRIPTION","PROJECT","Attachments","priority","IDENTIFIER","STATUS","History","TITLE"
"3","fdlkfjalskfjlskfla
fkdalsfjkasljfkl
dksajdfklsajkl","hadkf","dklsfj/dksfj.dskak/fsajk","4","123","pending","repeat","test csv"
"3","fdlkfjalskfjlskfla
fkdalsfjkasljfkl
dksajdfklsajkl","hadkf","dklsfj/dksfj.dskak/fsajk","4","124","pending","repeat","test csv"
"3","fdlkfjalskfjlskfla
fkdalsfjkasljfkl
dksajdfklsajkl","hadkf","dklsfj/dksfj.dskak/fsajk","4","125","pending","repeat","test csv"
"3","fdlkfjalskfjlskfla
fkdalsfjkasljfkl
dksajdfklsajkl","hadkf","dklsfj/dksfj.dskak/fsajk","4","126","pending","repeat","test csv"

test2.csv:

"SEVERITY","DESCRIPTION","PROJECT","Attachments","priority","IDENTIFIER","STATUS","History","TITLE"
"3","fdlkfjalskfjlskflafkdalsfjkasljfkldksajdfklsajkl","hadkf3","dklsfj/dksfj.dskak/fsajk","4","123","pending","repeat","test csv"
"3","fdlkfjalskfjlskfla
fkdalsfjkasljfkl
dksajdfklsajkl","hadkf4","dklsfj/dksfj.dskak/fsajk","4","124","pending","repeat","test csv"
"3","fdlkfjalskfjlskfla
fkdalsfjkasljfkl
dksajdfklsajkl","hadkf4","dklsfj/dksfj.dskak/fsajk","4","125","pending","repeat","test csv"
"3","fdlkfjalskfjlskfla
fkdalsfjkasljfkl
dksajdfklsajkl","hadkf4","dklsfj/dksfj.dskak/fsajk","4","126","pending","repeat","test csv"

【问题讨论】:

    标签: perl csv perl-module


    【解决方案1】:

    我并不完全清楚您所说的“独特”项目是什么意思,但我假设您正在尝试提取在 History 中有值的所有 ID 和项目。如果是其他问题,您必须编辑您的问题以澄清情况。不幸的是,您提供的测试数据是垃圾,所以我不确定IDENTIFIERPROJECT 是否都是唯一的——具有不同 ID 的几行具有相同的 PROJECT 名称。我假设IDENTIFIER 是一个唯一标识符。

    use warnings;
    use strict;
    use Data::Dumper;
    use feature ':5.10';
    
    use Text::CSV_XS qw ( csv );
    
    # we will store project info in this hash
    my %unique;
    my @files = glob("*.csv");
    
    for my $s (@files) {
        open (my $fh, "<","$s") or die "cannot open the file $!";
        my @aoh = @{csv (in => $fh, headers => "auto")};
    
        # go through the results...
        for (@aoh) {
            # if 'History' is defined and has some content (\w tests for alphanumeric chars)
            if ($_->{History} && $_->{History} =~ /\w/) {
                # add it to the hash of unique projects
                # store the ID as the key and the project name as the value
                $unique{ $_->{IDENTIFIER} } = $_->{PROJECT};
            }
        }
        close ($fh);
    }
    
    # now you can go through the hash of projects and print out the ID and project name
    for (keys %unique) {
        say "id: $_; project: $unique{$_}";
    }
    

    您的代码无法正常工作的原因与您检查项目的方式有关。在每个文件被解析后,您检查了通过解析文件生成的哈希数组,但是使用了数字索引和变量的混合来引用应该是相同的实体。例如:

    foreach my $i (@aoh) {
        if ( defined( $aoh[$q]{History} ) ) {
            if ( $aoh[$q]{History} ne "" ) {
    

    foreach循环中,你不需要引用$aoh[$q]——它已经被$i引用了,所以你可以写if ( defined $i{History} )。使用数字索引成为一个问题,因为您没有在第一个文件之后将其重置为 0,因此当您开始查看文件 2 的结果时,$q 不是 0——它已经设置为结果数从第一个文件。 if (defined $aoh[$q]{History}) 在文件 2 结果中第一次运行时会查看 $aoh[6]{History} 而不是 $aoh[0]{History}!不幸的是,当你搜索$aoh[6]{History} 时,Perl 会自动假定$aoh[6] 存在,如果不存在就会创建它。

    如果您将代码修改为以下内容,您可以很好地了解正在发生的事情:

    foreach $s (@files) {
        open( my $fh, "<", "$s" ) or die "cannot open the file $!";
        my @aoh = @{ csv( in => $fh, headers => "auto" ) };
        say "Parsed file $s; found " . @aoh . " entries";
    
        # add an accumulator 
        my $acc = 0;
        foreach my $i (@aoh) {
            say "looking at array entry $acc, aoh length: " . @aoh . "; q: $q; r: $r";
            if ( defined( $aoh[$q]{History} ) ) {
                if ( $aoh[$q]{History} ne "" ) {
                    $array1[$r] = $aoh[$q]{PROJECT};
                    $array2[$r] = $aoh[$q]{IDENTIFIER};
                    $r++;
                }
            }
            $acc++;
            $q++;
            # die after 20 iterations or we'll be here all night!
            die if $acc == 20;
        }
        close($fh);
    }
    

    部分输出:

    Parsed file file2.csv; found 10 entries
    looking at array entry 0, aoh length: 10; q: 12; r: 4
    looking at array entry 1, aoh length: 13; q: 13; r: 4
    looking at array entry 2, aoh length: 14; q: 14; r: 4
    looking at array entry 3, aoh length: 15; q: 15; r: 4
    looking at array entry 4, aoh length: 16; q: 16; r: 4
    looking at array entry 5, aoh length: 17; q: 17; r: 4
    looking at array entry 6, aoh length: 18; q: 18; r: 4
    looking at array entry 7, aoh length: 19; q: 19; r: 4
    looking at array entry 8, aoh length: 20; q: 20; r: 4
    looking at array entry 9, aoh length: 21; q: 21; r: 4
    looking at array entry 10, aoh length: 22; q: 22; r: 4
    

    随着您检查的每个条目,数组@aoh 越来越长!

    【讨论】:

    • 我不能使用 if (defined $i{History}),我可以使用 if (defined $i->{History})
    猜你喜欢
    • 2015-07-27
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2017-10-03
    • 1970-01-01
    • 2018-10-29
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多