在 Perl 中，如何生成列表的所有可能组合？答案

【问题标题】：In Perl, how can I generate all possible combinations of a list?在 Perl 中，如何生成列表的所有可能组合？
【发布时间】：2012-05-05 05:03:29
【问题描述】：

我有一个带有列表的文件，并且需要创建一个文件来比较每一行。例如，我的文件有这个：

AAA
BBB
CCC
DDD
电子电气设备

我希望最终列表如下所示：

AAA BBB
AAA CCC
AAA DDD
AAA 电子电气设备
BBB CCC
BBB DDD
BBB 电子电气设备
CCC DDD
CCC 电子电气设备
DDD EEE

我第一次尝试在 Perl 中执行此操作，但遇到了一些麻烦。我确实知道您需要创建一个数组，然后将其拆分，但之后我遇到了一些麻烦。

【问题讨论】：

请发布您目前的代码。

标签： arrays perl combinations combinatorics

【解决方案1】：

使用Algorithm::Combinatorics。基于迭代器的方法比一次生成所有内容更可取。

#!/usr/bin/env perl

use strict; use warnings;
use Algorithm::Combinatorics qw(combinations);

my $strings = [qw(AAA BBB CCC DDD EEE)];

my $iter = combinations($strings, 2);

while (my $c = $iter->next) {
    print "@$c\n";
}

输出：

AAA BBB
AAA CCC
AAA DDD
AAA 电子电气设备
BBB CCC
BBB DDD
BBB 电子电气设备
CCC DDD
CCC 电子电气设备
DDD EEE

【讨论】：

【解决方案2】：

使用递归很简单。

此代码示例演示。

use strict;
use warnings;

my $strings = [qw(AAA BBB CCC DDD EEE)];

sub combine;

print "@$_\n" for combine $strings, 5;

sub combine {

  my ($list, $n) = @_;
  die "Insufficient list members" if $n > @$list;

  return map [$_], @$list if $n <= 1;

  my @comb;

  for my $i (0 .. $#$list) {
    my @rest = @$list;
    my $val  = splice @rest, $i, 1;
    push @comb, [$val, @$_] for combine \@rest, $n-1;
  }

  return @comb;
}

编辑

抱歉 - 我生成的是排列而不是组合。

这段代码是正确的。

use strict;
use warnings;

my $strings = [qw(AAA BBB CCC DDD EEE)];

sub combine;

print "@$_\n" for combine $strings, 2;

sub combine {

  my ($list, $n) = @_;
  die "Insufficient list members" if $n > @$list;

  return map [$_], @$list if $n <= 1;

  my @comb;

  for (my $i = 0; $i+$n <= @$list; ++$i) {
    my $val  = $list->[$i];
    my @rest = @$list[$i+1..$#$list];
    push @comb, [$val, @$_] for combine \@rest, $n-1;
  }

  return @comb;
}

输出

AAA BBB
AAA CCC
AAA DDD
AAA EEE
BBB CCC
BBB DDD
BBB EEE
CCC DDD
CCC EEE
DDD EEE

【讨论】：

【解决方案3】：

看看Math::Combinatorics - 对列表执行组合和排列

从 CPAN 复制的示例：

use Math::Combinatorics;

  my @n = qw(a b c);
  my $combinat = Math::Combinatorics->new(count => 2,
                                          data => [@n],
                                         );

  print "combinations of 2 from: ".join(" ",@n)."\n";
  print "------------------------".("--" x scalar(@n))."\n";
  while(my @combo = $combinat->next_combination){
    print join(' ', @combo)."\n";
  }

  print "\n";

  print "permutations of 3 from: ".join(" ",@n)."\n";
  print "------------------------".("--" x scalar(@n))."\n";
  while(my @permu = $combinat->next_permutation){
    print join(' ', @permu)."\n";
  }

  output:
combinations of 2 from: a b c
  ------------------------------
  a b
  a c
  b c

  permutations of 3 from: a b c
  ------------------------------
  a b c
  a c b
  b a c
  b c a
  c a b
  c b a

【讨论】：

你为什么不使用问题中的示例数据？

【解决方案4】：

这是一个使用 glob 的 hack：

my @list = qw(AAA BBB CCC DDD EEE);

for my $i (0..$#list-1) {
    print join "\n", glob sprintf "{'$list[$i] '}{%s}",
          join ",", @list[$i+1..$#list];
    print "\n";
}

输出：

AAA BBB
AAA CCC
AAA DDD
AAA EEE
BBB CCC
BBB DDD
BBB EEE
CCC DDD
CCC EEE
DDD EEE

附：您可能希望使用 Text::Glob::Expand 或 String::Glob::Permute 模块而不是普通的 glob() 以避免在当前工作目录中匹配文件的警告。

【讨论】：

glob 技巧应始终伴随着失败时的各种警告。

【解决方案5】：

我对以下 Perl 模块进行了基准测试：

基准测试包括执行 OP 要求的操作，组合 2 个项目，但将单词集增加到 10,000 个，而不是最初请求的 5 个 (AAA BBB CCC DDD EEE)。

Math::Combinatorics 的测试脚本

#!/usr/bin/env perl
use strict; use warnings;
use Math::Combinatorics;
my $strings = [qw(AAA BBB CCC DDD EEE) x 2000];
my $iter = new Math::Combinatorics (count => 2, data => $strings);
while (my @c = $iter->next_combination) {
    print "@c\n";
}

这每秒产生约 53,479 个组合。

Algorithm::Combinatorics 的测试脚本

#!/usr/bin/env perl
use strict; use warnings;
use Algorithm::Combinatorics qw(combinations);
my $strings = [qw(AAA BBB CCC DDD EEE) x 2000];
my $iter = combinations($strings, 2);
while (my $c = $iter->next) {
    print "@$c\n";
}

这每秒产生约 861,982 个组合。

Cmb 的测试脚本

#!/usr/bin/env perl
use strict; use warnings;
use Cmb;
my $strings = [qw(AAA BBB CCC DDD EEE) x 2000];
my $cmb = new Cmb { size_min => 2, size_max => 2 };
$cmb->cmb_callback($#$strings + 1, $strings, sub {
    print "@_\n";
    return 0;
});

这每秒产生约 2,940,882 个组合。

但如果你只需要打印组合，Cmb 实际上可以比上面的更快。

#!/usr/bin/env perl
use strict; use warnings;
use Cmb;
my $strings = [qw(AAA BBB CCC DDD EEE) x 2000];
my $cmb = new Cmb { size_min => 2, size_max => 2 };
$cmb->cmb($#$strings + 1, $strings);

这每秒产生约 3,333,000 个组合。

基准测试是在 CentOS Linux 版本 7.7.1908 (Core) 上使用 dpv 在内核 3.10.0-1062.1.1.el7.x86_64 x86_64 上使用 Perl 5.16.3 在 Intel(R) Xeon(R) CPU 上执行的E5-2699 v4 @ 2.20GHz

【讨论】：

【解决方案6】：

取第一个字符串
从下一个位置到结束遍历数组
1. 将下一个字符串附加到原始字符串
获取下一个字符串并返回第 2 步

【讨论】：

【解决方案7】：

怎么样：

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dump qw(dump);

my @in = qw(AAA BBB CCC DDD EEE);
my @list;
while(my $first = shift @in) {
    last unless @in;
    my $rest = join',',@in;
    push @list, glob("{$first}{$rest}");
}
dump @list;

输出：

(
  "AAABBB",
  "AAACCC",
  "AAADDD",
  "AAAEEE",
  "BBBCCC",
  "BBBDDD",
  "BBBEEE",
  "CCCDDD",
  "CCCEEE",
  "DDDEEE",
)

【讨论】：

glob 技巧应始终伴随着失败时的各种警告。
@daxim：您的意思是在当前工作目录中匹配文件的“副作用”吗？如果是这样，由于他没有使用?、[] 或*，这不是完全安全吗？
所有这些。我现在很生气，警告应该作为答案的一部分明确列出，而不是作为低能见度的评论附加的修辞问题。这不是“副作用”，它确实发生了，模态化这个词是错误的。这不安全：显然，用户在问题中提供了虚构/匿名数据，并且在现实世界条件下会大吃一惊。 SO的答案应该努力不让人们失败，他们应该始终意识到微妙之处和风险；鉴于此，我现在对这个答案投了反对票，以激励 M42 改进它。 -- 继续：
我推荐 Text::Glob::Expand 或 String::Glob::Permute 而不是普通 glob，只要文档更好并且他们对内存数据结构进行操作，不受影响受外部因素影响，例如 shell 或当前目录中的内容。
@daxim：这是一个很好的观点。引用副作用只是一个玩笑：我同意 SO 不是教授这些技巧的最佳场所。