You are on page 1of 24

DATA STRUCTURE

:ProjectHuffman Tree

Submitted to : Sir Abdul Wahab

Submitted by:
Muzmmal Hussain
Muhammad Zia Shahid
Riasat Ali
What Is Huffman Tree ..?

“In computer science and information theory,


Huffman coding is an entropy encoding algorithm
used for lossless data compression”
A data compression technique which varies the length of the encoded
symbol in proportion to its information content, that is the more often a
symbol or token is used, the shorter the binary string used to represent it
in the compressed stream. Huffman codes can be properly decoded
because they obey the prefix property, which means that no code can be
a prefix of another code, and so the complete set of codes can be
represented as a binary tree, known as a Huffman tree. Huffman coding
was first described in a seminal paper by D.A. Huffman in 1952.
N e e d O f H u ff m a n C o d i n g

Huffman tree is a elegant forms of a data compressions. It is


based

On minimum redundancy coding. We need represent the data in a


way that makes the data required less space.
I S T H I S D ATA S T R U C T U R E D E R I V E F R O M
A N Y D ATA S T R U C T U R E
A R E T H E R E A N Y D S T H AT A R E D E R I V E D
?FROM IT

Huffman coding is most efficient form of a


binary tree.
Adaptive Huffman coding
Advantage of Huffman

Huffman is used to compressed the files.

Huffman is used to minimized the binary code

Huffman is used in compression tools and also in fax machine

Reduce storage needed

Reduce size of data e.g. images audio video and text

Reduce transmission cost and band width


Disadvantage of Huffman

Changing ensemble
• If the ensemble changes  the frequencies and probabilities change
 the optimal coding changes
• e.g. in text compression symbol frequencies vary with context
• Re-computing the Huffman code by running through the entire file
in advance?!
• Saving/ transmitting the code too?!

Does not consider ‘blocks of symbols’


• ‘ strings_of_ch’  the next nine symbols are predictable ‘aracters_’
, but bits are used without conveying any new information
Cost Of Huffman

The run time complexity of Huffman is 0(n),where n is number is


a symbol in the original data. Each of these runs is 0 (n) times.
The time to build the Huffman tree does not effect the
complexity of Huffman compress because a running time of this
process depends only on the number of different symbols in the data
which in this implantation is a constant
Te x t C o m p re s s i o n

On a computer: changing the representation of a file so that it takes less


space to store or/and less time to transmit.

– original file can be reconstructed exactly from the

compressed representation.
 different than data compression in general

– text compression has to be lossless.

– compare with sound and images: small changes and noise is


tolerated.
Construction Of Huffman

We can construct lossless compression by following algorithm

Let the word ABRACADABRA

 What is the most economical way to write this string in a binary


representation?
 Generally speaking, if a text consists of N different characters, we
need bits log[N] bits to represent each one using a fixed-
length encoding.
 Thus, it would require 3 bits for each of 5

different letters, or 33 bits for 11 letters.


 Can we do it better?
YES!!!!

We can do better, provided:

– Some characters are more frequent than others.

– Characters may be different bit lengths, so that for

example, in the English alphabet letter a may use

only one or two bits, while letter y may use several.


– We have a unique way of decoding the bit stream.
U s i n g Va r i a b l e - l e n g t h E n c o d i n g ( 1 )

Magic word: ABRACADABRA

LET A = 0
B = 100
C = 1010
D = 1011
R = 11
Thus, ABRACADABRA = 01001101010010110100110
So 11 letters demand 23 bits < 33 bits, an improvement of about
30%.
U s i n g Va r i a b l e - l e n g t h E n c o d i n g ( 2 )

However, there is a serious danger: How to ensure unique


reconstruction?
Let A -> 01 and B -> 0101
How to decode 010101?
AB?

BA?
AAA?
N o P ro b l e m …
……

if we use prefix codes: no code word is a prefix of another code word.
Any prefix code can be represented by a full binary tree.
Each leaf stores a symbol.
Each node has two children – left branch means 0, right means 1.
code word = path from the root to the leaf interpreting suitably the
left and right branches.
P re f i x C o d e s ( 2 )

ABRACADABRA

A = 0

B = 100

C = 1010

D = 1011

R = 11
Decoding is unique and simple!

Read the bit stream from left to right and starting from the root,

whenever a leaf is reached,

write down its symbol and return to the root.


HUFFMAN’S IDEA

the two symbols with the smallest frequencies must be at the bottom
of the optimal tree, as children of the lowest internal node.
Repeat until all nodes merged into one tree:

– Remove two nodes with the lowest frequencies.


– Create a new internal node, with the two just-removed nodes as
children (either node can be either child) and the sum of their
frequencies as the new frequency.
CONSTRUCTING A HUFFMAN CODE
(1)

Assume that frequencies of symbols are:

– A: 40 B: 20 C: 10 D: 10 R: 20
Smallest numbers are 10 and 10 (C and D), so connect them
CONSTRUCTING A HUFFMAN CODE
(2)

 C and D have already been used, and the new node above them
(call it C+D) has value 20
The smallest values are B, C+D, and R, all of which have value 20

– Connect any two of these

It is clear that the algorithm does not construct a unique tree, but
even if we have chosen the other possible connection, the code would
be optimal too!
CONSTRUCTING A HUFFMAN CODE
(3)

The smallest value is R, while A and B+C+D have value 40.

Connect R to either of the others.


CONSTRUCTING A HUFFMAN
CODE(4)

Connect the final two nodes, adding 0 and 1 to each left and
right branch respectively.
A: 40 B: 20 C: 10 D: 10 R: 20

A = 0 100
1
B = 100
60
0
C = 1010

D = 1011 40 1
0
1
R = 11
0 20

0 1

40(A) 20(B) 10(C) 10(D) 20(R)

You might also like