合并线程上的分段错误（Perl）答案

【问题标题】：Segmentation fault on merging threads (Perl)合并线程上的分段错误（Perl）
【发布时间】：2014-01-01 15:01:10
【问题描述】：

我有一些工作代码，我尝试使用 dreamincode 上的教程进行多线程：http://www.dreamincode.net/forums/topic/255487-multithreading-in-perl/

那里的示例代码似乎可以正常工作，但我一生都无法弄清楚为什么我的代码不行。从放入调试消息开始，它似乎一直到所有线程的子例程结束，然后在那里坐了一会儿，然后遇到分段错误并转储核心。话虽如此，我也无法在任何地方找到核心转储文件（Ubuntu 13.10）。

如果有人有任何建议阅读，或者可以在下面相当混乱的代码中看到错误，我将永远感激不尽。

#!/usr/bin/env perl

use Email::Valid;
use LWP::Simple;
use XML::LibXML;
use Text::Trim;
use threads;
use DB_File;

use Getopt::Long;

my $sourcefile   = "thislevel.csv";
my $startOffset = 0;
my $chunk = 10000;
my $num_threads = 8;

$result = GetOptions ("start=i" => \$startOffset,    # numeric
              "chunk=i" => \$chunk,    # numeric
                  "file=s"   => \$sourcefile,      # string
                  "threads=i" => \$num_threads,     #numeric
                  "verbose"  => \$verbose);  # flag


$tie = tie(@filedata, "DB_File", $sourcefile, O_RDWR, 0666, $DB_RECNO)
    or die "Cannot open file $sourcefile: $!\n";

my $filenumlines = $tie->length;

if ($filenumlines>$startOffset + $chunk){
    $numlines = $startOffset + $chunk;
} else {
    $numlines = $filenumlines;
}


open (emails, '>>emails.csv');
open (errorfile, '>>errors.csv');
open (nxtlvl, '>>nextlevel.csv');
open (donefile, '>>donelines.csv');
my $line = '';
my $found = false;

my $linenum=0;

my @threads = initThreads();



foreach(@threads){

    $_ = threads->create(\&do_search);

}


foreach(@threads){
    $_->join();
}


close nxtlvl;
close emails;
close errorfile;
close donefile;


sub initThreads{
    # An array to place our threads in
    my @initThreads;
    for(my $i = 1;$i<=$num_threads;$i++){
        push(@initThreads,$i);
    }
    return @initThreads;
}




sub do_search{
    my $id = threads->tid();

    my $linenum=$startOffset-1+$id;

    my $parser = XML::LibXML->new();
    $parser->set_options({ recover           => 2,
                           validation        => 0,
                       suppress_errors   => 1,
                       suppress_warnings => 1,
                       pedantic_parser   => 0,
                       load_ext_dtd      => 0, });


    while ($linenum < $numlines) {

        $found = false;
        @full_line = split ',', $filedata[$linenum-1];

        $line = trim(@full_line[1]);
        $this_url = trim(@full_line[2]);
        print "Thread $id Scanning $linenum of $filenumlines\: ";
        printf "%.3f\%\n", 100 * $linenum / $filenumlines;

        my $content = get trim($this_url);

        if (!defined($content)) {

            print errorfile "$this_url, no content\n";

        }elsif (length($content)<100) {

            print errorfile "$this_url, short\n";

        }else {

            my $doc = $parser->load_html(string => $content);

            if(defined($doc)){

                for my $anchor ( $doc->findnodes("//a[\@href]") )
                {
                    $is_email = substr $anchor->getAttribute("href") ,7;
                    if(Email::Valid->address($is_email)) {
                        printf emails "%s, %s\n", $line, $is_email;
                        $found = true;
                    } else{
                        $link = $anchor->getAttribute("href");
                        if (substr lc(trim($link)),0,4 eq "http"){
                            printf nxtlvl "%s, %s\n", $line, $link;
                        } else {
                            printf nxtlvl "%s, %s/%s\n", $line, $line, $link;
                        }
                    }
                } 
            }
            if ($found=false){

                my @lines = split '\n',$content;

                foreach my $cline (@lines){
                    my @words = split ' ',$cline;
                        foreach my $word (@words) { 
                        my @subwords = split '"',$word ;
                        foreach my $subword (@subwords) {

                            if(Email::Valid->address($subword)) {
                                    printf emails "%s, %s\n", $line, $subword;  
                            }
                        }
                    }
                    }
            }
        }
        printf donefile "%s\n",$linenum;
        $linenum = $linenum + $num_threads;     
    }
    threads->exit();
}

【问题讨论】：

该教程包含大量糟糕和/或无用的代码。忘记它，而是阅读threads 文档。我不知道您的代码为什么会出现段错误，这可能是您使用的某些模块的错误（我的钱在DB_File）。也就是说，我怀疑您的代码以非线程形式工作：Perl 没有true 或false，请改用1 或0。通过use strict; use warnings; 获得有关此类事情的警告（Perl::Critic 也可以提供帮助）。
DB_File 是线程安全的吗？您是否尝试过使用最新版本的 Perl？
你说得对，真/假会改变行为，所以我已经解决了这个问题。当最后一个线程到达threads->exit(); 时发生崩溃，我在哪里检查DB_File 是否是线程安全的？它的 CPAN 页面上没有任何内容：[search.cpan.org/~pmqs/DB_File-1.831/DB_File.pm]
线程需要使用线程安全库，DB_File 似乎不是线程安全的 - perlmonks.org/?node_id=733599 （虽然这是一篇旧帖子，但我看不出有任何理由在没有写入的情况下神奇地改变了它产品的 man 文件）
感谢大家 - DB_File 导致了错误。我已经将这个和其他错误编码出来，一切都恢复正常了。

标签： multithreading perl segmentation-fault

【解决方案1】：

除了各种编码错误意味着我的代码永远不应被用作其他访问者的示例之外，DB_File 不是线程安全模块。

令人烦恼的是，并且可能具有误导性，它绝对可以正常工作，直到您关闭已在整个代码中成功访问该文件的线程。

【讨论】：