In 1953, David A. Huffman published his paper "A Method for the Construction of Minimum-Redundancy Codes", and hence printed his name in the history of computer science. As a professor who gives the final exam problem on Huffman codes, I am encountering a big problem: the Huffman codes are NOT unique. For example, given a string "aaaxuaxz", we can observe that the frequencies of the characters 'a', 'x', 'u' and 'z' are 4, 2, 1 and 1, respectively. We may either encode the symbols as {'a'=0, 'x'=10, 'u'=110, 'z'=111}, or in another way as {'a'=1, 'x'=01, 'u'=001, 'z'=000}, both compress the string into 14 bits. Another set of code can be given as {'a'=0, 'x'=11, 'u'=100, 'z'=101}, but {'a'=0, 'x'=01, 'u'=011, 'z'=001} is NOT correct since "aaaxuaxz" and "aazuaxax" can both be decoded from the code 00001011001001. The students are submitting all kinds of codes, and I need a computer program to help me determine which ones are correct and which ones are not.

Input Specification:

Each input file contains one test case. For each case, the first line gives an integer N (2), then followed by a line that contains all the N distinct characters and their frequencies in the following format:

c[1] f[1] c[2] f[2] ... c[N] f[N]

where c[i] is a character chosen from {'0' - '9', 'a' - 'z', 'A' - 'Z', '_'}, and f[i] is the frequency of c[i] and is an integer no more than 1000. The next line gives a positive integer M (≤), then followed by M student submissions. Each student submission consists of N lines, each in the format:

c[i] code[i]

where c[i] is the i-th character and code[i] is an non-empty string of no more than 63 '0's and '1's.

Output Specification:

For each test case, print in each line either "Yes" if the student's submission is correct, or "No" if not.

Note: The optimal solution is not necessarily generated by Huffman algorithm. Any prefix code with code length being optimal is considered correct.

Sample Input:

7
A 1 B 1 C 1 D 3 E 3 F 6 G 6
4
A 00000
B 00001
C 0001
D 001
E 01
F 10
G 11
A 01010
B 01011
C 0100
D 011
E 10
F 11
G 00
A 000
B 001
C 010
D 011
E 100
F 101
G 110
A 00000
B 00001
C 0001
D 001
E 00
F 10
G 11

Sample Output:

Yes
Yes
No
No

 

解题思路

  要判断编码是否为最优编码,需要对编码进行两个方面的检验:对每一组编码来判断WPL是否为最小的,以及是否为前缀码。

  下面给出第一种思路,需要构建Huffman Tree的方法,同时也是用堆来实现的。

  树节点的定义如下:

1 struct Data {
2     char letter;
3     int freq;
4 };
5 
6 struct TNode {
7     Data data;
8     TNode *left, *right;
9 };

  首先我们要根据给出的字符频率来构造出一颗对应的Huffman Tree,同时计算出WPL。

  又因为我们是用堆来实现的,所以要构造一颗Huffman Tree我们先要把给定的频率来构建一个最小堆。

  这里我们需要对堆进行定义,同时还要定义堆的相关操作。这里就直接上代码,不解释了。

 1 struct Heap {
 2     TNode *H;    // 堆的每一个元素的数据类型是树节点TNode 
 3     int size;
 4     int capacity;
 5 };
 6 
 7 Heap *createMinHeap(int n) {
 8     Heap *minHeap = new Heap;
 9     minHeap->size = 0;
10     minHeap->capacity = n;
11     minHeap->H = new TNode[n + 1];
12     minHeap->H[0].data.freq = -1;
13     
14     for (int i = 0; i < minHeap->capacity; i++) {
15         TNode *tmp = new TNode;
16         tmp->left = tmp->right = NULL;
17         getchar();
18         scanf("%c %d", &tmp->data.letter, &tmp->data.freq);
19         insertHeap(minHeap, tmp);
20     }
21     
22     return minHeap;
23 }
24 
25 void insertHeap(Heap *minHeap, TNode *treeNode) {
26     int pos = ++minHeap->size;
27     for ( ; treeNode->data.freq < minHeap->H[pos / 2].data.freq; pos /= 2) {
28         minHeap->H[pos] = minHeap->H[pos / 2];
29     }
30     minHeap->H[pos] = *treeNode;
31 }
32 
33 TNode *deleteMin(Heap *minHeap) {
34     TNode *minTreeNode = new TNode;
35     *minTreeNode = minHeap->H[1];
36     TNode tmp = minHeap->H[minHeap->size--];
37     
38     int parent = 1, child;
39     for ( ; parent * 2 <= minHeap->size; parent = child) {
40         child = parent * 2;
41         if (child != minHeap->size && minHeap->H[child].data.freq > minHeap->H[child + 1].data.freq) child++;
42         if (tmp.data.freq < minHeap->H[child].data.freq) break;
43         else minHeap->H[parent] = minHeap->H[child];
44     }
45     minHeap->H[parent] = tmp;
46     
47     return minTreeNode;
48 }

  构建好最小堆后,下一步我们需要通过这个堆来构造出对应的Huffman Tree。

  就是每次从堆中弹出最小频率的那两个节点,然后把这两个节点分别插在新节点的左右边,作为左右孩子。再把新节点压入堆中。如此循环n-1次后(其中n代表节点的个数),堆中就只剩下一个元素,那个元素就是Huffman Tree的根节点,我们直接返回即可。

  按照上面构造Huffman Tree的思路,相应的代码如下:

 1 TNode *createHuffmanTree(Heap *minHeap) {
 2     int n = minHeap->size - 1;
 3     while (n--) {
 4         TNode *tmp = new TNode;
 5         tmp->left = deleteMin(minHeap);
 6         tmp->right = deleteMin(minHeap);
 7         tmp->data.freq = tmp->left->data.freq + tmp->right->data.freq;
 8         
 9         insertHeap(minHeap, tmp);
10     }
11     
12     return deleteMin(minHeap);
13 }

  然后,我们要根据这颗Huffman Tree来计算WPL。我们用递归来实现计算WPL。

  如果这个节点是叶子节点,那么就用当前深度乘以对应的频率,然后返回。如果不是叶子节点,就递归来计算左右子树的WPL,相加后返回。

1 int WPL(TNode *T, int depth) {
2     if (T->left == NULL && T->right == NULL) return depth * T->data.freq;   // 叶子节点直接返回结果 
3     else return WPL(T->left, depth + 1) + WPL(T->right, depth + 1);         // 不是叶子节点,计算左右子树的WPL并返回,同时由于左右子树的深度加深一层,记得depth+1 
4 }

  好了,折腾了这么久,终于计算出给定频率的WPL了。

  接下来我们先对每一组编码来检测其WPL是不是最小的,也就是每组编码的WPL是否与给定频率的WPL相等。

  计算方法很简单,每一组编码的WPL计算公式为:Huffman Codes

  再判断codeLen是否与上面求出的给定频率的WPL相等,如果不相等,就说明这个编码不是最优编码,就不需要再判断是否为前缀码了。如果相等再去判断是否为前缀码。

  这里还有个陷阱。首先我们要知道,一个最优编码的长度是不会超过n-1的。所以如果某个编码的长度大于n-1也说明该编码不是最优编码。

  这里同时给出计算编码长度和判断是否是前缀码的函数:

 1 bool check(TNode *huffmanTree, int n) {
 2     int wpl = WPL(huffmanTree, 0);  // 计算给定频率构成的Huuffman Tree的WPL 
 3 
 4     std::string code[n];            // 存放每一个字符的编码 
 5     int codeLen = 0;
 6     bool ret = true;                // 用来标记该组编码是否为最优编码 
 7     
 8     for (int i = 0; i < n; i++) {
 9         char letter;
10         getchar();
11         scanf("%c", &letter);
12         getchar();
13         std::cin >> code[i];
14         
15         if (ret) {                  // 如果已经知道该组编码不是最优编码就不需要再计算编码长度了,但仍要继续输入 
16             if (code[i].size() > n - 1) ret = false;                    // 如果某个字符的编码长度大于n-1,说明该组编码不是最优编码 
17             codeLen += code[i].size() * findFreq(huffmanTree, letter);  // 计算编码长度 
18         }
19     }
20     
21     if (ret && codeLen == wpl) {        // 如果ret == true并且编码长度与WPL相同,接着判断该组编码是否为前缀码 
22         TNode *T = new TNode;           // 为这组编码构造一颗Huffman Tree,初始化Huffman Tree的根节点 
23         T->data.freq = 0;
24         T->left = T->right = NULL;
25             
26         for (int i = 0; i < n; i++) {   // 有n个节点,需要判断n次 
27             TNode *pre = T;             // 每次判断一个字符都从根节点开始 
28             
29             for (std::string::iterator it = code[i].begin(); it != code[i].end(); it++) {   // 对该字符的每一个编码进行判断 
30                 if (*it == '0') {                   // 如果编码是0 
31                     if (pre->left == NULL) {        // 如果当前节点的左子树为空 
32                         TNode *tmp = new TNode;     // 就为当前节点生成一颗左子树 
33                         tmp->data.freq = 0;         // 该节点的频率标记为0,表示该节点还没有字符占用 
34                         tmp->left = tmp->right = NULL;
35                         pre->left = tmp;
36                     }
37                     pre = pre->left;                // pre指针指向左子树 
38                 }
39                 else {                              // 如果编码是1 
40                     if (pre->right == NULL) {       // 如果当前节点的右子树为空
41                         TNode *tmp = new TNode;     // 就为当前节点生成一颗右子树
42                         tmp->data.freq = 0;         // 该节点的频率标记为0,表示该节点还没有字符占用
43                         tmp->left = tmp->right = NULL;
44                         pre->right = tmp;
45                     }
46                     pre = pre->right;                // pre指针指向左子树
47                 }
48             }
49             
50             // 读完了字符的编码后,pre指针就指向这个字符应该占用的位置
51             // 这时需要判断pre指向的这个节点是否为叶子节点,并且该节点有没有被其他字符占用 
52             if (pre->left == NULL && pre->right == NULL && pre->data.freq == 0) {
53                 pre->data.freq = 1;                  //  如果是叶子节点并且没有被占用,该字符就占用了这个节点,并把这个节点的频率标记为1 
54             }
55             else {                                   // 否则,如果这些条件中有一个不满足 
56                 ret = false;                         // 就说明该组字符不满足前缀码的要求,ret赋值为false 
57                 break;                               // 后面的字符不需要判断了,直接退出退出判断前缀码的循环 
58             }
59         }
60     }
61     else {          // 如果ret == false并且编码长度不等于WPL,就说明该组编码不是最优编码 
62         ret = false;
63     }
64     
65     return ret;
66 }

  这里是通过构造一颗Huffman Tree来判断该组编码是否符合前缀码。

  判断的过程如下:

  有一个指向Huffman Tree根节点的指针。

  • 如果编码是'0',先判断当前节点的左子树是否存在,如果不存在先生成左子树,再让指针移到左子树的节点。如果存在那么直接让指针移到左子树的节点即可。
  • 如果编码是'1',先判断当前节点的右子树是否存在,如果不存在先生成右子树,再让指针移到右子树的节点。如果存在那么直接让指针移到右子树的节点即可。

  读完该字符的编码后,那么此时字符应该放入这个指针指向的节点。这个节点要满足两个条件才可以放入:

  • 该节点的左右孩子都为空,也就是该节点为叶子节点。
  • 该节点每有被标记过,也就是说该节点没有存放其他的字符。

  如果有一个条件不满足,就说明该组编码不是前缀码。

  最后,给出这种方法的完整AC代码,代码量有点多。

#include <cstdio>
#include <iostream>
#include <string>

struct Data {
    char letter;
    int freq;
};

struct TNode {
    Data data;
    TNode *left, *right;
};

struct Heap {
    TNode *H;
    int size;
    int capacity;
};

Heap *createMinHeap(int n);
void insertHeap(Heap *minHeap, TNode *treeNode);
TNode *deleteHeap(Heap *minHeap);
TNode *createHuffmanTree(Heap *minHeap);
bool check(TNode *huffmanTree, int n);
int WPL(TNode *T, int depth);
int findFreq(TNode *huffmanTree, char letter);

int main() {
    int n;
    scanf("%d", &n);
    
    Heap *minHeap = createMinHeap(n);
    TNode *huffmanTree = createHuffmanTree(minHeap);
    
    int m;
    scanf("%d", &m);
    for (int i = 0; i < m; i++) {
        bool ret = check(huffmanTree, n);
        printf("%s\n", ret ? "Yes" : "No");
    }
    
    return 0;
}

Heap *createMinHeap(int n) {
    Heap *minHeap = new Heap;
    minHeap->size = 0;
    minHeap->capacity = n;
    minHeap->H = new TNode[n + 1];
    minHeap->H[0].data.freq = -1;
    
    for (int i = 0; i < minHeap->capacity; i++) {
        TNode *tmp = new TNode;
        tmp->left = tmp->right = NULL;
        getchar();
        scanf("%c %d", &tmp->data.letter, &tmp->data.freq);
        insertHeap(minHeap, tmp);
    }
    
    return minHeap;
}

void insertHeap(Heap *minHeap, TNode *treeNode) {
    int pos = ++minHeap->size;
    for ( ; treeNode->data.freq < minHeap->H[pos / 2].data.freq; pos /= 2) {
        minHeap->H[pos] = minHeap->H[pos / 2];
    }
    minHeap->H[pos] = *treeNode;
}

TNode *deleteHeap(Heap *minHeap) {
    TNode *minTreeNode = new TNode;
    *minTreeNode = minHeap->H[1];
    TNode tmp = minHeap->H[minHeap->size--];
    
    int parent = 1, child;
    for ( ; parent * 2 <= minHeap->size; parent = child) {
        child = parent * 2;
        if (child != minHeap->size && minHeap->H[child].data.freq > minHeap->H[child + 1].data.freq) child++;
        if (tmp.data.freq < minHeap->H[child].data.freq) break;
        else minHeap->H[parent] = minHeap->H[child];
    }
    minHeap->H[parent] = tmp;
    
    return minTreeNode;
}

TNode *createHuffmanTree(Heap *minHeap) {
    int n = minHeap->size - 1;
    while (n--) {
        TNode *tmp = new TNode;
        tmp->left = deleteHeap(minHeap);
        tmp->right = deleteHeap(minHeap);
        tmp->data.freq = tmp->left->data.freq + tmp->right->data.freq;
        
        insertHeap(minHeap, tmp);
    }
    
    return deleteHeap(minHeap);
}

bool check(TNode *huffmanTree, int n) {
    int wpl = WPL(huffmanTree, 0);

    std::string code[n];
    int codeLen = 0;
    bool ret = true;
    
    for (int i = 0; i < n; i++) {
        char letter;
        getchar();
        scanf("%c", &letter);
        getchar();
        std::cin >> code[i];
        
        if (ret) {
            if (code[i].size() > n - 1) ret = false;
            codeLen += code[i].size() * findFreq(huffmanTree, letter);
        }
    }
    
    if (ret && codeLen == wpl) {
        TNode *T = new TNode;
        T->data.freq = 0;
        T->left = T->right = NULL;
            
        for (int i = 0; i < n; i++) {
            TNode *pre = T;
            
            for (std::string::iterator it = code[i].begin(); it != code[i].end(); it++) {
                if (*it == '0') {
                    if (pre->left == NULL) {
                        TNode *tmp = new TNode;
                        tmp->data.freq = 0;
                        tmp->left = tmp->right = NULL;
                        pre->left = tmp;
                    }
                    pre = pre->left;
                }
                else {
                    if (pre->right == NULL) {
                        TNode *tmp = new TNode;
                        tmp->data.freq = 0;
                        tmp->left = tmp->right = NULL;
                        pre->right = tmp;
                    }
                    pre = pre->right;
                }
            }
            
            if (pre->left == NULL && pre->right == NULL && pre->data.freq == 0) {
                pre->data.freq = 1;
            }
            else {
                ret = false;
                break;
            }
        }
    }
    else {
        ret = false;
    }
    
    return ret;
}

int WPL(TNode *T, int depth) {
    if (T->left == NULL && T->right == NULL) return depth * T->data.freq;
    else return WPL(T->left, depth + 1) + WPL(T->right, depth + 1);
}

int findFreq(TNode *huffmanTree, char letter) {
    int ret = 0;
    if (huffmanTree) {
        if (huffmanTree->left == NULL && huffmanTree->right == NULL && huffmanTree->data.letter == letter) ret = huffmanTree->data.freq;
        if (ret == 0) ret = findFreq(huffmanTree->left, letter);
        if (ret == 0) ret = findFreq(huffmanTree->right, letter);
    }
    
    return ret;
}
AC Code1

相关文章:

  • 2022-03-01
  • 2022-12-23
  • 2022-12-23
  • 2022-12-23
  • 2021-07-03
  • 2022-12-23
  • 2022-12-23
  • 2022-12-23
猜你喜欢
  • 2021-08-26
  • 2021-09-14
  • 2022-12-23
  • 2022-12-23
  • 2021-04-14
  • 2022-12-23
  • 2022-12-23
相关资源
相似解决方案