【发布时间】:2022-01-12 02:52:35
【问题描述】:
我正在尝试用 Python 编写一个程序来使用 Huffman Compression 压缩文本。我遇到的问题是压缩文本在保存到文本文件时最终比原始文本大,我不知道为什么会这样。我已经实现了一个名为 Heapnode 的类来帮助我构建优先级队列并使用 Heapq 构建我的二叉树。在 HuffmanCoding 类中,我已经实现了获取每个字符的频率、创建优先级队列、使用它将“节点”合并为一种二叉树并遍历该树为每个字符构建霍夫曼代码的方法。
class HeapNode:
def __init__(self, char, freq):
self.char = char
self.freq = freq
self.left = None
self.right = None
def __lt__(self, other): # if the frequency of one character is lower than the frequency of another one
return self.freq < other.freq
def __eq__(self, other): # if two characters have the same frequencies
if other == None:
return False
if not isinstance(other, HeapNode): # checks if the character is a node or not
return False
return self.freq == other.freq
class HuffmanCoding:
def __init__(self, text_to_compress):
self.text_to_compress = text_to_compress # text that will be compressed
self.heap = []
self.codes = {} # will store the Huffman code of each character
self.decompress_map = {}
def get_frequency(self): # method to find frequency of each character in text - RLE
frequency_Dictionary = {} # creates an empty dictionary where frequency of each character will be stored
for character in self.text_to_compress: # Iterates through the text to be compressed
if character in frequency_Dictionary:
frequency_Dictionary[character] = frequency_Dictionary[character] + 1 # if character already exists in
# dictionary, its value is increased by 1
else:
frequency_Dictionary[character] = 1 # if character is not present in list, its value is set to 1
return frequency_Dictionary
def make_queue(self, frequency): # creates the priority queue of each character and its associated frequency
for key in frequency:
node = HeapNode(key, frequency[key]) # create node (character) and store its frequency alongside it
heapq.heappush(self.heap, node) # Push the node into the heap
def merge_nodes(
self): # creates HuffmanTree by getting the two minimum nodes and merging them together, until theres
# only one node left
while len(self.heap) > 1:
node1 = heapq.heappop(self.heap) # pop node from top of heap
node2 = heapq.heappop(self.heap) # pop next node which is now at the top of heap
merged = HeapNode(None, node1.freq + node2.freq) # merge the two nodes we popped out from heap
merged.left = node1
merged.right = node2
heapq.heappush(self.heap, merged) # push merged node back into the heap
def make_codes(self, root, current_code): # Creates Huffman code for each character
if root == None:
return
if root.char != None:
self.codes[root.char] = current_code
self.decompress_map[current_code] = root.char
self.make_codes(root.left, current_code + "0") # Every time you traverse left, add a 0 - Recursive Call
self.make_codes(root.right, current_code + "1") # Every time you traverse right, add a 1 - Recursive Call
def assignCodes(self): # Assigns codes to each character
root = heapq.heappop(self.heap) # extracts root node from heap
current_code = ""
self.make_codes(root, current_code)
def get_compressed_text(self, text): # Replaces characters in original text with codes
compressed_text = ""
for character in text:
compressed_text += self.codes[character]
return compressed_text
def show_compressed_text(self):
frequency = self.get_frequency()
self.make_queue(frequency)
self.merge_nodes()
self.assignCodes()
compressed_text = self.get_compressed_text(self.text_to_compress)
return compressed_text
print(HuffmanCoding('This sentence will get compressed').show_compressed_text())
【问题讨论】:
-
您的 Huffman 编码表示的消息是一个由 '0'/'1' 字符组成的字符串 - 这很浪费,您需要将此字符串的每个 8 个字符块压缩成一个字节,并且解码时做相反的事情。我不是 python 人,所以我不能说最有效的方法是什么。
标签: python compression huffman-code