优化二叉树函数，霍夫曼树答案

【问题标题】：Optimizing a binary tree function, huffman trees优化二叉树函数，霍夫曼树
【发布时间】：2016-08-12 13:00:21
【问题描述】：

所以场景是你认识的人给了你一棵霍夫曼树，但它不是最优的（我知道所有的霍夫曼树都是最优的，只是假设它不是最优的，但确实遵循只有叶子有值的霍夫曼风格）。

该函数应该在不改变树的实际“形状”的情况下尽可能地改进树，并借助字典将每个符号映射到它在您正在压缩的假设文本中的出现次数。该函数通过交换节点来做到这一点。所以最终结果不一定是最佳树，但会尽可能地改进。比如……

Class Node:
    def __init__(self, item = None, left = None, right = None):
        self.item = item
        self.left = left
        self.right = right

     def __repr__(self):
         return 'Node({}, {}, {})'.format(self.item, self.left, self.right)

字典 = {54:12, 101:34, 29:22, 65:3, 20:13}

你的朋友给你...

节点（无，节点（无，节点（20），节点（54）），节点（无，节点（65），节点（无，节点（101），节点（29）））

或者...

               None  
          /     |     \
     None       |       None
   /      \     |     /      \
20          54  |  65       None
                |         /      \
                |      101        29

想要的结果在哪里......

节点（无，节点（无，节点（20），节点（29）），节点（无，节点（101），节点（无，节点（65），节点（54）））

或者...

               None  
          /     |     \
     None       |       None
   /      \     |     /      \
20          29  |  101       None
                |         /      \
                |       65        54

如何定位叶节点，然后定位它应该在的位置，交换它，然后对所有其他叶节点执行此操作，同时确保树的形状相同，无论它是否是最佳的?这也是在 python 中。

【问题讨论】：

标签： python function optimization tree huffman-code

【解决方案1】：

从构造哈夫曼树的basic technique 来看，值最不可能的节点是第一个链接到父节点的节点。这些节点在霍夫曼树中的出现比它们中的任何其他节点更深。由此，我们可以推断出，在树中走得越深，遇到的值就越少。

这个类比对于开发优化功能至关重要，因为我们不需要执行各种交换，当我们第一次就可以做到正确时：获取按深度排序的树中所有项目的列表和它们的匹配值按顺序排列；只要有叶子，就将它们插入各自的深度。这是我编写的解决方案：

def optimize_tree(tree, dictionary):

    def grab_items(tree):
        if tree.item:
            return [tree.item]
        else:
            return grab_items(tree.left) + grab_items(tree.right)

    def grab_depth_info(tree):
        def _grab_depth_info(tree,depth):
            if tree.item:
                return {depth:1}
            else:
                depth_info_list = [_grab_depth_info(child,depth+1) for child in [tree.left, tree.right]]
                depth_info = depth_info_list[0]
                for depth in depth_info_list[1]:
                    if depth in depth_info:
                        depth_info[depth] += depth_info_list[1][depth]
                    else:
                        depth_info[depth] = depth_info_list[1][depth]
                return depth_info

        return _grab_depth_info(tree,0)

    def make_inverse_dictionary(dictionary):
        inv_dictionary = {}
        for key in dictionary:
            if dictionary[key] in inv_dictionary:
                inv_dictionary[dictionary[key]].append(key)
            else:
                inv_dictionary[dictionary[key]] = [key]

        for key in inv_dictionary:
            inv_dictionary[key].sort()

        return inv_dictionary

    def get_depth_to_items(depth_info,actual_values):
        depth_to_items = {}
        for depth in depth_info:
            depth_to_items[depth] = []
            for i in range(depth_info[depth]):
                depth_to_items[depth].append(actual_values[i])

            depth_to_items[depth].sort()
            del actual_values[:depth+1]

        return depth_to_items

    def update_tree(tree,depth_to_items,reference):
        def _update_tree(tree,depth,depth_to_items,reference):
            if tree.item:
                tree.item = reference[depth_to_items[depth].pop(0)].pop(0)
            else:
                for child in [tree.left,tree.right]:
                    _update_tree(child,depth+1,depth_to_items,reference)
        _update_tree(tree,0,depth_to_items,reference)

    items = grab_items(tree)
    depth_info = grab_depth_info(tree)
    actual_values = [dictionary[item] for item in items]
    actual_values.sort(reverse=True)
    inv_dictionary = make_inverse_dictionary(dictionary)

    depth_to_items = get_depth_to_items(depth_info,actual_values)

    update_tree(tree,depth_to_items,inv_dictionary)

解释：

optimize_tree 函数要求用户传入两个参数：

tree：哈夫曼树的根节点。
dictionary：将符号映射到其频率的字典。

函数从定义四个内部函数开始：

grab_items 是一个函数，它接收一棵树并返回其中所有项目的列表。
grab_depth_info 返回一个字典，其中键是深度级别，值是该级别的节点数。
make_inverse_dictionary 返回一个与给定字典相反的字典。（它可以处理值可以映射到两个键的情况。）
get_depth_to_items 返回一个字典，其中键是深度级别，值是实际值列表（来自字典），这些值应该在该级别，以便优化树。
update_tree 将项目插入它们应该在的位置，以优化树。

注意：grab_depth_info 和 update_tree 在其中定义了一个内部函数，以便它们的功能可以递归地工作。

以下算法需要这四个内部函数：

首先，该函数从树中获取项目列表和深度信息。
然后它使用项目列表从给定字典中获取实际值列表，并按降序排列。（以便在第 4 步中将最不频繁的值与最大深度级别匹配。）
接下来，它对给定的字典进行逆运算，交换键和值。（这是为了帮助完成第 5 步。）
做好这些准备后，该函数会将深度信息和实际值列表传递给get_depth_to_items 函数，以获取深度级别字典到有序值列表。
最后，该函数将树、上一步中创建的字典和倒排字典传入update_tree 函数，该函数将使用其内部函数递归地转到树中的每个节点并更新使用来自倒置字典的原始键的 item 属性。

使用此算法的结果将使您传入的树处于最佳状态，而不会改变它的实际形状。

我可以通过执行以下代码行来确认这是否有效：

tree = Node(None, Node(None, Node(20), Node(29)), Node(None, Node(101), Node(None, Node(65), Node(54))))
dictionary = {54: 12, 101: 34, 29: 22, 65: 3, 20: 13}
optimize_tree(tree,dictionary)
print(tree)

这个输出是：

Node(None, Node(None, Node(20, None, None), Node(29, None, None)), Node(None, Node(101, None, None), Node(None, Node(65, None, None), Node(54, None, None))))

【讨论】：