我能想到的唯一方法是解析所有可能的字母组合,并将它们与字典进行比较。将它们与字典进行比较的最快方法是将字典转换为哈希。这样,您可以快速查找该词是否为有效词。
为了安全起见,我通过将字典单词中的所有字母小写然后删除所有非字母字符来键入我的字典。对于该值,我将存储实际的字典单词。例如:
cant => "can't",
google => "Google",
这样,我可以显示正确拼写的单词。
我发现Math::Combinatorics 看起来不错,但并没有按照我希望的方式工作。你给它一个字母列表,它会以你指定的字母数量返回这些字母的所有组合。因此,我认为我所要做的就是将字母转换为单个字母的列表,然后简单地遍历所有可能的组合!
不...这给了我所有无序的组合。然后我要做的是对每个组合,列出这些字母的所有可能排列。呸!太棒了!耶!
所以,臭名昭著的循环循环。实际上,三个循环。
* 外部循环只是将所有组合数从 1 倒数到单词中的字母数。
* 下一个查找每个字母组的所有无序组合。
* 最后,最后一个获取所有无序组合并返回这些组合的排列列表。
现在,我终于可以将这些字母排列与我的词典进行比较了。令人惊讶的是,考虑到它必须将 235,886 个单词字典转换为哈希,然后循环通过三层循环来查找所有可能的字母数量的所有组合的所有排列,该程序的运行速度比我预期的要快得多。整个程序不到两秒就跑完了。
#! /usr/bin/env perl
#
use strict;
use warnings;
use feature qw(say);
use autodie;
use Data::Dumper;
use Math::Combinatorics;
use constant {
LETTERS => "EBLAIDL",
DICTIONARY => "/usr/share/dict/words",
};
#
# Create Dictionary Hash
#
open my $dict_fh, "<", DICTIONARY;
my %dictionary;
foreach my $word (<$dict_fh>) {
chomp $word;
(my $key = $word) =~ s/[^[:alpha:]]//;
$dictionary{lc $key} = $word;
}
#
# Now take the letters and create a Perl list of them.
#
my @letter_list = split // => LETTERS;
my %valid_word_hash;
#
# Outer Loop: This is a range from one letter combinations to the
# maximum letters combination
#
foreach my $num_of_letters (1..scalar @letter_list) {
#
# Now we generate a reference to a list of lists of all letter
# combinations of $num_of_letters long. From there, we need to
# take the Permutations of all those letters.
#
foreach my $letter_list_ref (combine($num_of_letters, @letter_list)) {
my @letter_list = @{$letter_list_ref};
# For each combination of letters $num_of_letters long,
# we now generate a permeation of all of those letter
# combinations.
#
foreach my $word_letters_ref (permute(@letter_list)) {
my $word = join "" => @{$word_letters_ref};
#
# This $word is just a possible candidate for a word.
# We now have to compare it to the words in the dictionary
# to verify it's a word
#
$word = lc $word;
if (exists $dictionary{$word}) {
my $dictionary_word = $dictionary{$word};
$valid_word_hash{$word} = $dictionary_word;
}
}
}
}
#
# I got lazy here... Just dumping out the list of actual words.
# You need to go through this list to find your longest and
# shortest words. Number of syllables? That's trickier, you could
# see if you can divide on CVC and CVVC divides where C = consonant
# and V = vowel.
#
say join "\n", sort keys %valid_word_hash;
运行这个程序产生:
$ ./test.pl | column
a al balei bile del i lai
ab alb bali bill delia iba laid
abdiel albe ball billa dell ibad lea
abe albi balled billed della id lead
abed ale balli blad di ida leal
abel alible be blade dial ide led
abide all bea blae dib idea leda
abie alle bead d die ideal lei
able allie beal da dieb idle leila
ad allied bed dab dill ie lelia
ade b beid dae e ila li
adib ba bel dail ea ill liable
adiel bad bela dal ed l libel
ae bade beld dale el la lid
ai bae belial dali elb lab lida
aid bail bell dalle eld label lide
aide bal bella de eli labile lie
aiel bald bid deal elia lad lied
ail baldie bide deb ell lade lila
aile bale bield debi ella ladle lile