You are on page 1of 29

Name of the Teacher Zohaib Hasan Khan

Mobile Number 7752846666


Email ID zhkhan@iul.ac.in
Designation Assistant Professor
University Name Integral University, Lucknow
Stream Engineering
Faculty Name Engineering
Department Name Electronics and Communication Engineering

Subject Name Data Compression


Program name B.CA
Program Duration 3 years
Subtopic Huffman Algorithm

Content Type Presentation


Search Keywords Minimum Variance Huffmancodes.
Update procedure, Encoding procedure, Decoding procedure,
Unit2
Data Compression(CA209)
by
Zohaib Hasan Khan
Assistant Professor
Department of Electronics and Communication Engg.
Integral University, Lucknow
Encoding and Compression of Data
• Definition
• Reduce size of data
(number of bits needed to represent data)
• Benefits
• Reduce storage needed
• Reduce transmission cost / latency / bandwidth
• Applicable to many forms of data transmission
• Our example: text files
Huffman Codes
Huffman Codes
• For compressing data (sequence of characters)
• Widely used
• Very efficient (saving 20-90%)
• Use a table to keep frequencies of occurrence
of characters.
• Output binary string.

Huffman code algorithm - building the code bottom-up

Proposed by Dr. David A. Huffman in 1952


“A Method for the Construction of Minimum Redundancy Codes”
Huffman realized that the code, to be optimal, that is, with the shortest average code
length, had to obey these conditions:
No code is a prefix of another code;
No extra information is needed to delimit codes;
Symbols, sorted in non-increasing order of frequency of occurrence receive codes of non-
decreasing length; and in particular, the two least frequent receive codes of the same length; in
order word, frequent symbols get the shorter codes and the least frequent symbols the longer
codes;
The last two codes, the longest, differ only in the last bit.

This code is uniquely decodable because it is a prefix code, that


is, a code for which no prefix part of a code is a code by itself.
A "prefix code" is a type of encoding mechanism ("code"). For something to be a
prefix code, the entire set of possible encoded values ("codewords") must not
contain any values that start with any other value in the set.
For example: [3, 11, 22] is a prefix code, because none of the values start with ("have
a prefix of") any of the other values. However, [1, 12, 33] is not a prefix code, because
one of the values (12) starts with another of the values (1).
Prefix codes are useful because, if you have a complete and accurate sequence
of values, you can pick out each value without needing to know where one value
starts and ends.
Huffman Coding
• Huffman codes can be used to compress information
• Like WinZip – although WinZip doesn’t use the Huffman
algorithm
• JPEGs do use Huffman as part of their compression process
• Not all characters occur with the same frequency!
• The basic idea is that instead of storing each character
in a file as an 8-bit ASCII value, we will instead store the
more frequently occurring characters using fewer bits
and less frequently occurring characters using more bits
• On average this should decrease the file size (usually ½)
Huffman Coding Working
First, form a node for each symbol containing the symbol itself and its
frequency of occurrence.
Then, form a list with these nodes and sort them from most to least
frequent.
 At the tail of the list, merge the last two nodes, yielding an internal node
with no symbol but whose frequency of occurrence is the sum of the
frequencies of the two merged nodes, and put it back in the list so that the
list remains sorted from most to least frequent.
Repeat until there’s only one node left.
Let us merge
the last two
nodes again:
Major Steps in Huffman Coding-
There are two major steps in Huffman Coding-
1.Building a Huffman Tree from the input characters.
2.Assigning code to the characters by traversing the Huffman Tree.

Step-01:
•Create a leaf node for each character of the text.
•Leaf node of a character contains the occurring frequency of that character.
Step-02:
•Arrange all the nodes in increasing order of their frequency value.
Step-03:
Considering the first two nodes having minimum frequency,
•Create a new internal node.
•The frequency of this new node is the sum of frequency of those two nodes.
•Make the first node as a left child and the other node as a right child of the newly created node.
Step-04:
•Keep repeating Step-02 and Step-03 until all the nodes form a single tree.
•The tree finally obtained is the desired Huffman Tree.
Adaptive Huffman coding

Adaptive Huffman coding (also called Dynamic Huffman coding) is an adaptive


coding technique based on Huffman coding. It permits building the code as the
symbols are being transmitted, having no initial knowledge of source distribution,
that allows one-pass encoding and adaptation to changing conditions in data.

The benefit of one-pass procedure is that the source can be encoded in real time,
though it becomes more sensitive to transmission errors, since just a single loss ruins
the whole code.
Tree Manipulation
Each node has a sibling
Node's with higher weights have higher orders
On each level, the node farthest to the right will have the highest order although there might be other nodes with equal weight
Leaf nodes contain character values, except the Not Yet Transmitted(NYT) node which is the node whereat all new characters are
added
Internal nodes contain weights equal to the sum of their children's weights
All nodes of the same weight will be in consecutive order.

Every tree contains a root and a NYT node, where the NYT node is the node with the lowest order in the tree. already contains
that character. If it doesn't, the NYT node spawns two new nodes. The node to its right is a new node containing the character
and the new left node is the new NYT node. If the character is already in the tree, you simply update the weight of that particular
tree node. In some cases, when the node is not the highest-ordered node in its weight class, you will need to swap this node so
that it fulfills the property that nodes with higher weight have higher orders. To do this, before you update the node's weight,
search the tree for all nodes of equal weight and swap the soon-to-be updated value with the highest ordered node of equal
weight. Finally update the weight.
However in both cases for inserting values, weights are changed for a leaf and this change will effect all nodes above it.
Therefore, after you insert a node, you must check the parent above the node following the same procedure you followed when
updating already seen values. Check to see whether the node in question is the highest order node in its weight class prior to
updating. If not, swap with the node that is the highest order making sure to reassign only the pointers to the two nodes being
swapped.
Tree Manipulation Procedure: Be sure to notice the
key verbs here: insertnew value, give birth to new
nodes, update weight, check if max in weight
class, swap, isRoot, move to parent. Not all of these
will be functions, but these actions will form the basis
of a tree manipulation class for encoding and
decoding.
Encoding Procedure
Once you have the functions of your tree manipulation
working correctly, it is relatively easy to complete the
encoding and decoding parts of adaptive huffman
coding. To encode, you simply read through the file to be
compressed one character at a time. If you have seen the
character before, you write to the output file the root to
leaf path with 1 demarkating a move right and a 0
marking a move left- the same as you would in static
huffman coding. If the character is new, write out the
root to leaf path of the NYT node to alert the decoder
that a new character follows. Then write out the new
character itself. Use nine bits in anticipation of the
PSEUDO_EOF. Finally update the tree by calling the
appropriate insert function with the new value. Read
through the entire file in this manner and when you are
done, manually write out the final root to NYT path
followed by the PSEUDO_EOF character.
Decoding Procedure
The decoding procedure is very similar to the encoding procedure
and should be easy to figure out based on the information in the
previous section. To uncompress the compressed file, read it in one
bit at a time, traversing the tree as it is up to that point. Eventually,
you will come to a leaf. If that leaf has a character value, write out
the eight-bit translation of the character to the uncompressed file
and then update the count of that character in the tree, making
sure all necessary changes are made to the tree as the whole. If the
leaf is the NYT node, read in the next nine bits and write out the
eight bit translation of that character. Then insert the new
character in the tree. It is extremely important to remember that
the compresser and decompresser, although reading in characters
in different manners, should construct trees exactly the same. At
any given point in a file in either operation, the trees would be
the same if compared. Do not confuse this with saying that the
compresser and decompresser run simultaneously. They don't.
But they do construct the same trees after reading the same
information. You might even say that they use the same
"Adaptive Huffman Tree class"- but that might be just a tad
presumptuous.
Application
Applications of Huffman Encoding-
•Huffman encoding is widely used in
compression formats like GZIP, PKZIP (winzip) and
BZIP2.
•Multimedia codecs like JPEG, PNG and MP3 uses
Huffman encoding (to be more precised the
prefix codes)
•Huffman encoding still dominates the
compression industry since newer arithmetic
and range coding schemes are avoided due
to their patent issues.

You might also like