当内存使用量变高时，Perl 多线程会变慢答案

【问题标题】：Perl multithreading slower when memory usage getting high当内存使用量变高时，Perl 多线程会变慢
【发布时间】：2021-03-06 15:12:11
【问题描述】：

大家好~ 我用 Perl 编写了一个非常简单的代码，使用多线程。代码如下。

#!/bin/perl

use strict;
use threads;
use Benchmark qw(:hireswallclock);

my $starttime;
my $finishtime;
my $timespent;
my $num_of_threads = 1;
my $total_size = 10000000;
my $chunk_size = int($total_size / $num_of_threads);

if($total_size % $num_of_threads){
        $chunk_size++;
}

my @threads = ();

$starttime = Benchmark->new;

for(my $i = 0; $i < $num_of_threads; $i++) {
        my $thread = threads->new(\&search);
        push (@threads, $thread);
}

foreach my $thread (@threads) {
        $thread->join();
}

my $finishtime = Benchmark->new;
$timespent = timediff($finishtime, $starttime);
print "$num_of_threads threads used in ".timestr($timespent)."\nDone!\n";

sub search{
        my $i = 0;
        while($i < $chunk_size){
            $i++;
        }

        return 1;
}

这段代码运行良好，当增加线程数时，它会运行得更快。

但是，当在中间添加额外的行时，会创建一个很大的数组，添加更多线程时代码会运行得更慢。带有附加行的代码如下所示。

#!/bin/perl

use strict;
use threads;
use Benchmark qw(:hireswallclock);

my $starttime;
my $finishtime;
my $timespent;
my $num_of_threads = 1;
my $total_size = 10000000;
my $chunk_size = int($total_size / $num_of_threads);

if($total_size % $num_of_threads){
        $chunk_size++;
}

##########Additional codes##########
print "Preparing data...\n";
$starttime = Benchmark->new;

my @array = ();

for(my $i = 0; $i < $total_size; $i++){
        my $rn = rand();
        push(@array, $rn);
}

$finishtime = Benchmark->new;
$timespent = timediff($finishtime, $starttime);
print "Used ".timestr($timespent)."\n";
######################################

my @threads = ();

$starttime = Benchmark->new;

for(my $i = 0; $i < $num_of_threads; $i++) {
        my $thread = threads->new(\&search);
        push (@threads, $thread);
}

foreach my $thread (@threads) {
        $thread->join();
}

my $finishtime = Benchmark->new;
$timespent = timediff($finishtime, $starttime);
print "$num_of_threads threads used in ".timestr($timespent)."\nDone!\n";

sub search{
        my $i = 0;
        while($i < $chunk_size){
            $i++;
        }

        return 1;
}

我对 Perl 的多线程中的这种行为感到非常困惑。有谁知道这里可能出了什么问题？

谢谢！

【问题讨论】：

我没有时间查看详细信息，但每个线程都会复制@array（以及任何其他未声明为共享的变量），我猜这是缓慢的部分。使用threads::shared 并将@array 声明为:shared；这应该会提高性能。
谢谢@Dada！它确实适用于上面的代码。

标签： multithreading perl memory

【解决方案1】：

您必须记住，当使用 ithreads（解释器线程）时，整个 Perl 解释器，包括代码和内存，都被克隆到新线程中。因此，要克隆的数据越多，所需的时间就越长。有一些方法可以控制克隆的内容；看看threads perldoc。

你应该尽可能少做，甚至在你产生线程之前不要加载很多模块。

如果您确实有大量数据将被所有线程使用，请将其与threads::shared 共享。然后共享数据结构使用shared_clone()。除了简单的变量之外，您不能简单地 share() 任何东西。该共享变量只能包含纯标量或其他共享引用。

如果您要使用或抽取该数据，请使用Thread::Queue 模块将其设为队列。它会自动共享值并负责锁定。生成工作线程池后，使用Thread::Semaphore 控制它们。这样他们就不会在你给他们做任何事情之前终止。您还可以防止竞争条件。

https://metacpan.org/pod/threads::shared

HTH

【讨论】：

1) 你可以share() 一个标量、一个数组或一个散列。 2）如果你使用Thread::Queue，你可能不需要Thread::Semaphore来防止线程终止：在线程中使用while (defined(my $item = $queue->dequeue))，直到你使用$queue->end，它们才会终止。
您可能希望使用信号量来调节其他非队列活动。当然，TIMTOWTDI。
感谢@lordadmira 指出共享问题。它确实在小范围内起作用，例如在上面的代码中。我也在阅读文档，但我可能需要一些时间才能完全理解这些机制。如果我有一个更复杂的脚本，包含一些更大的哈希和数组，并且我想只在一个子例程中实现多线程，我应该将所有这些大变量都设为共享吗？
请参阅this 以获取 Thread::Queue 示例。还有类似的 Thread::Queue::Any 自动序列化排队值，因此它们不需要共享。
@seallin Re "如果我有一个更复杂的脚本怎么办"，您可以传递对共享数组/哈希的引用，就像它们没有被共享一样

【解决方案2】：

谢谢大家指点我相对的方向！我学习并尝试了不同的东西，包括如何使用共享和队列，这应该可以解决问题。于是我修改了脚本如下：

#!/bin/perl

use strict;
use threads;
use threads::shared;
use Thread::Queue;
use Benchmark qw(:hireswallclock);

my $starttime;
my $finishtime;
my $timespent;
my $num_of_threads = shift @ARGV;
my $total_size = 100000;
######Initiation of a 2D queue######
print "Preparing queue...\n";
$starttime = Benchmark->new;
my $queue = Thread::Queue->new();
for(my $i = 0; $i < $total_size; $i++){
        my $rn1 = rand();
        my $rn2 = rand();
        my @interval :shared = sort($rn1, $rn2);
        $queue->enqueue(\@interval);
}
$finishtime = Benchmark->new;
$timespent = timediff($finishtime, $starttime);
print "Used ".timestr($timespent)."\n";
#####################################
$starttime = Benchmark->new;
my $queue_copy = $queue; #Copy the 2D queue so that the original queue can be kept\
for(my $i = 0; $i < $num_of_threads; $i++) {
        my $thread = threads->create(\&search, $queue_copy);
}
foreach my $thread (threads->list()) {
        $thread->join();
}
$finishtime = Benchmark->new;
$timespent = timediff($finishtime, $starttime);
print "$num_of_threads threads used in ".timestr($timespent)."\nDone!\n";
#####################################
sub search{
        my $temp_queue = $_[0];
        while(my $temp_interval = $temp_queue->dequeue_nb()){
                #Do something
        }
        return 1;
}

我首先要做的是创建一个数组队列，每个数组包含两个数字。制作了队列的副本，因为我想在通过它时保留原始队列。然后使用多线程完成复制的队列。但是，我仍然发现添加更多线程时它运行得更慢，我不知道这是为什么。

【讨论】：

你发现了收益递减规律。请记住，线程是一种使用备用系统资源的方式，而这些资源并未被任何执行行使用。更多的线程只会与更高的性能相关，直到所有空闲资源都被利用。无论是 CPU 时间还是网络时间或其他什么。例如您有 4 个 CPU 内核，因此您生成了 4 个工作线程，它们能够最大限度地利用 4 个 CPU 内核。添加更多线程会减慢整个工作流程，因为必须从工作中转移更多 CPU 时间来管理线程调度。
您的代码有一些问题可以解释您观察到的性能（而且“收益递减规律”远非主要问题）。 @interval 不必共享。 my $queue_copy = $queue; 不复制队列。你的队列的使用有点奇怪。您可能应该做的是 1）创建队列 2）创建线程（应该做 while (defined(my $interval = $queue->dequeue)) 而不是 $dequeue_nb） 3）填充队列 4）等待线程结束。
不过，IMO 的主要问题是线程的主循环（在子 search 中）是空的，因此，执行时间主要由创建线程、从队列，管理线程......使用一个真实的例子，你应该会看到一些变化。
（lordadmira 的观点是，如果你的 CPU 上只有 4 个逻辑核心，那么使用超过 4 个线程是没有意义的）
考虑以下单行：time perl -Mthreads -MThread::Queue -e '$q = Thread::Queue->new; threads->create(\&search) for 1 .. shift; $q->enqueue($_) for 1 .. 10000; $q->end; $_->join for threads->list; sub search { while (defined$q->dequeue) { $x = 0; $x ++ while $x < 100000 } }' 1。更改最后一位以选择要使用的线程数。（如果您无法理解代码，请添加一些换行符）在我的计算机上，1 个线程运行 24 秒，2 个线程运行 12 秒，3 个线程运行 8.5 秒，4 个线程运行 7 秒。