由于您提前知道所有字符串,您可以使用gperf 生成一个没有冲突的perfect hash function。例如,使用四个输入字符串AAAA ABBA ACEA ALFG,它生成了以下哈希函数(使用命令行gperf -L ANSI-C input.txt):
static unsigned int
hash (register const char *str, register unsigned int len)
{
static unsigned char asso_values[] =
{
12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 7, 2, 5, 12, 12,
12, 12, 12, 12, 12, 12, 0, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12
};
return len + asso_values[(unsigned char)str[1]];
}
const char *
in_word_set (register const char *str, register unsigned int len)
{
static const char * wordlist[] =
{
"", "", "", "",
"ALFG",
"",
"ABBA",
"", "",
"ACEA",
"",
"AAAA"
};
if (len <= MAX_WORD_LENGTH && len >= MIN_WORD_LENGTH)
{
register int key = hash (str, len);
if (key <= MAX_HASH_VALUE && key >= 0)
{
register const char *s = wordlist[key];
if (*str == *s && !strcmp (str + 1, s + 1))
return s;
}
}
return 0;
}
这需要单个表查找、长度比较和字符串比较。如果您确定要散列的词是源词之一,则可以跳过字符串比较。
将输入大小从 4 个随机生成的字符串扩展到 10000 个随机生成的字符串,将散列函数增加到只有 4 个表查找以及长度比较和字符串比较。但是,由于字符串比较必须在其中存储每个源字符串,因此在编译的目标文件 (1.4 MB) 中会出现一个非常大的表。如果你不需要做字符串比较,你可以省略那个表。