Adp Huffman Coding

File compression
Using Huffman coding
KABILAN D - 19IT043
GOPINATH K S - 19IT030
Problem deﬁnition
Data Transfer Speed : Even a system that can handle large
amounts of data transfer is still slowed down when a lot of users
connect to it at once.
Storage Space: File compression is intended to reduce the
storage requirements of data that provide no additional
information, such as white space on a page.
Solution methodology
MESSAGE - BCCABBDDAECCBBAEDDCC
LENGTH OF MESSAGE - 20
BINARY DATA OF A ALPHABET IS 8-BITS
SO SIZE OF THE MESSAGE IS 20*8=160 BITS
“ FROM ABOVE WE CAN SAY THAT
THE SIZE OF THE DATA IS BULKY ”

SOLUTION METHODOLOGY
WHAT HUFFMAN CODING DOES:
Huffman coding uses Greedy Algorithm to sort the characters according to
their frequencies and to assign variable-length codes to input characters,
lengths of the assigned codes are based on the frequencies of corresponding
characters. The most frequent character gets the smallest code and the least
frequent character gets the largest code.
SOLUTION METHODOLOGY THE CHARACTERS ARE SORTED IN ASCENDING ORDER AND IT USES A
GREEDY TECHNIQUE TO
● Extract two nodes with the minimum frequency from

the min heap.
● Create a new internal node with a frequency equal to
the sum of the two nodes frequencies. Make the ﬁrst
extracted node as its left child and the other extracted
node as its right child. Add this node to the min heap.
● Repeat steps #2 and #3 until the heap contains only
one node. The remaining node is the root node and the
tree is complete.
SOLUTION METHODOLOGY And it Traverses the tree formed
starting from the root. Maintain an
auxiliary array. While moving to the
left child, write 0 to the array.
While moving to the right child,
write 1 to the array.
SOLUTION METHODOLOGY Now size of the message becomes
CHAR FREQ CODE BITS
A 3 001 3*3 = 9
B 5 10 5*2 = 10
C 6 11 6*2 = 12
D 4 01 4*2 = 8
E 2 000 2*3 = 6
TOTAL 45 BITS
BITS
FLOWCHART
code
import heapq
import os
class HuffmanCoding:
def __init__(self, path):
self.path = path
self.heap = []
self.codes = {}
self.reverse_mapping = {}
class HeapNode:
def __init__(self, char, freq):
self.char = char
self.freq = freq
self.left = None
self.right = None
def __lt__(self, other):

return self.freq < other.freq
def __eq__(self, other):

if(other == None):
return False
if(not isinstance(other, HeapNode)):
return False
return self.freq == other.freq
code
def make_frequency_dict(self, text):
frequency = {}
for character in text:
if not character in frequency:
frequency[character] = 0
frequency[character] += 1
return frequency
def make_heap(self, frequency):
for key in frequency:
node = self.HeapNode(key, frequency[key])
heapq.heappush(self.heap, node)
def merge_nodes(self):
while(len(self.heap)>1):
node1 = heapq.heappop(self.heap)
node2 = heapq.heappop(self.heap)
merged = self.HeapNode(None, node1.freq + node2.freq)

merged.left = node1
merged.right = node2
heapq.heappush(self.heap, merged)
def make_codes_helper(self, root, current_code):

if(root == None):
return
code
if(root.char != None):
self.codes[root.char] = current_code
self.reverse_mapping[current_code] = root.char
return
self.make_codes_helper(root.left, current_code + "0")
self.make_codes_helper(root.right, current_code + "1")
def make_codes(self):
root = heapq.heappop(self.heap)
current_code = ""
self.make_codes_helper(root, current_code)
def get_encoded_text(self, text):

encoded_text = ""
for character in text:
encoded_text += self.codes[character]
return encoded_text
def pad_encoded_text(self, encoded_text):

extra_padding = 8 - len(encoded_text) % 8
for i in range(extra_padding):
encoded_text += "0"
padded_info = "{0:08b}".format(extra_padding)
encoded_text = padded_info + encoded_text
return encoded_text
code
def get_byte_array(self, padded_encoded_text):
if(len(padded_encoded_text) % 8 != 0):
print("Encoded text not padded properly")
exit(0)
b = bytearray()
for i in range(0, len(padded_encoded_text), 8):
byte = padded_encoded_text[i:i+8]
b.append(int(byte, 2))
return b
def compress(self):
filename, file_extension = os.path.splitext(self.path)
output_path = filename + ".bin"
with open(self.path, 'r+') as file, open(output_path, 'wb') as output:
text = file.read()
text = text.rstrip()
frequency = self.make_frequency_dict(text)
self.make_heap(frequency)
self.merge_nodes()
self.make_codes()
encoded_text = self.get_encoded_text(text)
padded_encoded_text = self.pad_encoded_text(encoded_text)
b = self.get_byte_array(padded_encoded_text)
output.write(bytes(b))
print("Compressed")
return output_path
Result and discussion
Original text
Compressed file using

huffman coding
Result and discussion
We can see that size of the file is reduced to the possible lowes size
using huffman coding
Thus File Compression is achieved through Huffman Encoding

Time and space complexity
TIME COMPLEXITY :The time complexity analysis of Huffman Coding is as follows-
● extractMin( ) is called 2 x (n-1) times if there are n nodes.

● As extractMin( ) calls minHeapify( ), it takes O(logn) time.
Thus, Overall time complexity of Huffman Coding becomes O(nlogn).
SPACE COMPLEXITY : If we have n symbol then we need to store each Symbol in Array so
Space complexity = O(n)

Adp Huffman Coding

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Adp Huffman Coding

Uploaded by

Copyright:

Available Formats

File compression

Using Huffman coding

BINARY DATA OF A ALPHABET IS 8-BITS

SO SIZE OF THE MESSAGE IS 20*8=160 BITS

“ FROM ABOVE WE CAN SAY THAT

THE SIZE OF THE DATA IS BULKY ”

● Extract two nodes with the minimum frequency from

def lt(self, other):

def eq(self, other):

merged = self.HeapNode(None, node1.freq + node2.freq)

def make_codes_helper(self, root, current_code):

def get_encoded_text(self, text):

def pad_encoded_text(self, encoded_text):

Compressed file using

Thus File Compression is achieved through Huffman Encoding

● extractMin( ) is called 2 x (n-1) times if there are n nodes.

Thus, Overall time complexity of Huffman Coding becomes O(nlogn).

Space complexity = O(n)

You might also like