You are on page 1of 23

# 1

Submitted by-

Submitted toSupratik

## Banerjee Ravish Nirvan (3070070103)

**Content**
Introduction Basic technique Type of Huffman coding Dynamic Huffman Code Adaptive Huffman Code Dynamic Huffman Code Construction of the Tree Example 1. Step Code tree after the 1st step 2. Step Code tree after the 2nd step 3. Step Code tree after the 3rd step 4. Step Code tree after the 4th step Code Table Encoding Decoding Huffman Code Tree Static Huffman Code Principle Example Algorithmstatic Huffman Code Weight of a node.

Root Initial Tree Block Table representing the Code Tree Initial Table Update Procedure Flow chart static Huffman coding: Static Huffman coding Example Coding Output Summary Reference..

Introduction:In computer science and information theory, Huffman coding is an entropy encoding algorithm used for lossless data compression. The term refers to the use of a variable-length code table for encoding a source symbol (such as a character in a file) where the variable-length code table has been derived in a particular way based on the estimated probability of occurrence for each possible value of the source symbol. Huffman coding uses a specific method for choosing the representation for each symbol, resulting in a prefix that expresses the most common source symbols using shorter strings of bits than are used for less common source symbols. Huffman was able to design the most efficient compression method of this type: no other mapping of individual source symbols to unique strings of bits will produce a smaller average output size when the actual symbol frequencies agree with those used to create the code. The running time of Huffman's method is fairly efficient; it takes operations to construct it. A method was later found to design a Huffman code in linear time if input probabilities are sorted. Prefix code: - code sometimes called "prefix-free codes", that is, the bit string representing some particular symbol is never a prefix of the bit string representing any other symbol. For a set of symbols with a uniform probability distribution and a number of members which is a power of two, Huffman coding is equivalent to simple binary block encoding, e.g., ASCII coding. Huffman coding is such a widespread method for creating prefix codes that the term "Huffman code" is widely used as a synonym for "prefix code" even when such a code is not produced by Huffman's algorithm. Basic technique:The technique works by creating a binary tree of nodes. These can be stored in a regular array, the size of which depends on the number of symbols, . A node can be either a leaf node or an internal node. Initially, all nodes are leaf nodes, which contain the symbol itself, the weight (frequency of appearance) of the symbol and optionally, a link to a parent node which makes it easy to read the code (in reverse) starting from a leaf node. Internal nodes contain symbol weight, links to two child nodes and the optional link to a parent node. As a common convention, bit '0' represents following the left child and bit '1' represents following the right child. A finished tree has up to leaf nodes and internal nodes. A Huffman tree that omits unused symbols produces the most optimal code lengths. Type of Huffman coding Dynamic Huffman Code Adaptive Huffman Code

Dynamic Huffman Code This coding scheme presupposes a previous determination of the symbol distribution. The actual algorithm starts with this distribution which is regarded as constant about the entire data. If the symbol distribution changes, then either losses in compression or a completely new construction of the code tree must be accepted (incl. header data required).

Construction of the Tree The Huffman algorithm generates the most efficient binary code tree at given frequency distribution. Prerequisite is a table with all symbols and their frequency. Any symbol represents a leaf node within the tree. The following general procedure has to be applied: Search for the two nodes providing the lowest frequency, which are not yet assigned to a parent node couple these nodes together to a new interior node add both frequencies and assign this value to the new interior node The procedure has to be repeated until all nodes are combined together in a root node.

Symbol Frequency a b r c d 5 2 2 1 1

According to the outlined coding scheme the symbols "d" and "c" will be coupled together in a first step. The new interior node will get the frequency 2. 1. Step

## Symbol Frequency Symbol Frequency a b r c d 5 2 2 a b r 5 2 2 2

1 -----------> 1 1

2. Step

## Symbol Frequency Symbol Frequency a b r 1 5 2 a b 5 2 4

2 -----------> 2 2

3. Step

## Symbol Frequency Symbol Frequency a 2 b 5 a 5 6

4 -----------> 3 2

4. Step

## Code tree after the 4th step:

Code Table If only one single node is remaining within the table, it forms the root of the Huffman tree. The paths from the root node to the leaf nodes define the code word used for the corresponding symbol:

## Complete Huffman Tree:

Encoding The original data will be encoded with this code table as follows:

## encoded data: 23 Bit original data: 33 Bit

10

Decoding For decoding the Huffman tree is passed through with the encoded data step by step. Whenever a node not having a successor is reached, the assigned symbol will be written to the decoded data.

10

11

## Huffman Code Tree:

Static Huffman Code The static Huffman code will be developed in the course of the coding step by step. In contrast to the static or dynamic coding the symbol distribution will not be determined previously. It will be generated in parallel by using already processed symbols.

Principle: The weight of each node representing the last encoded symbol will be increased. Afterwards the corresponding part of the tree will be adapted. Thus the tree approaches gradually the current distribution of the symbols. However, the code tree always reflects the "past" and not the real distribution. Initialization Because the static Huffman code uses previously encoded symbols, an initialization problem arises at the beginning of the coding. At first the code tree is empty and does not contain symbols from already encoded data. To solve this, a suitable initialization has to be utilized. Among others the following options are available:

A standard distribution will be used that is available at the encoder as well as the decoder. An initial code tree will be generated with a frequency of 1 for each symbol. A special control character will be added identifying new symbols following.
11

12

Initialization with Standard Distribution A base for standard distributions may be the analysis of English-speaking texts for example. A proper compression rate would already be achieved at the beginning of the coding. The development of the code tree does not need to be waited for. This advantage is neutralized by the fact that the standard tree must be stored both at the encoder and at the decoder. In principle such a method is only suitable, if the contents is restricted to certain data types (e.g. text). Otherwise the compression rate decreases drastically especially if completely different data types has to be encoded Initialization with Uniform Distribution For initialization a frequency of 1 is assigned to each possible symbol independently from its later use. The code tree generated in this way will afterwards be adapted successively with every encoded symbol. The code length of the individual elements is identical to a fixed length code which is a particular case of a variable length code. If a message source is able to generate 8 different messages, a code length of 3 bit is required. Initial code tree with 8 symbols:

Extension for New Symbols For the initialization with standard or uniform distribution the entire set of symbols must be contained in the code tree even if these are not used within the original data. The introduction of a special code offers a solution. The code identifies a new symbol immediately following. Thus it is guaranteed that only symbols already encoded are part of the code tree.

Disadvantage of this variant is the increased coding effort for the first appearance of a symbol. This consists of the uncoded symbol and the Huffman code for the control character. Caused by the structure of the Huffman tree the control character requires a proportionally large code length. Initially the Huffman tree only consists of one single node which forms the root of the tree and represents the control character. With each additional symbol the number of the leaf nodes grows at one.

12

13

Example:

Weight of a node. Any node has an attribute called the weight of the node. The weight of a leaf node is equal to the frequency with which the corresponding symbol had been coded so far. The weight of the interior nodes is equal to the sum of the two subordinated nodes. The control character NYA gets the weight 0. Structure of the Code Table All nodes are sorted according to their weight in ascending order. The NYA node always provides the lowest node in the hierarchy. .Root The node with the highest weight is placed at the highest level in the hierarchy and forms the current root of the tree. Initial Tree The initial tree only contains the NYA node with the weight 0. Data format for uncoded symbols In the simplest case symbols which are not contained in the code tree are encoded linearly (e.g. with 8 bit from a set of 256 symbols). A better compression efficiency could be achieved, if the data format will be adapted to the number of remaining symbols. A variety of options are available to optimize the code length. A suitable procedure will be presented guaranteeing full utilization of the range of values. Maximum number of nodes Assuming a set of n symbols the total number of nodes results in 2n-1 (leaf and interior nodes). Accordingly the maximum number of nodes is 511 if the standard unit is the byte.

13

14

Block All nodes of identical weight will be summarized logically to a block. Nodes which are part of a block are always neighboring in the code table representing the code tree. Update Procedure Flow chart static Huffman coding:

14

15

Text Compression Static Huffman coding assigns variable length codes to symbols based on their frequency of occurrences in the given message. Low frequency symbols are encoded using many bits, and high frequency symbols are encoded using fewer bits. The message to be transmitted is first analyzed to find the relative frequencies of its constituent characters. The coding process generates a binary tree, the Huffman code tree, with branches labeled with bits (0 and 1). The Huffman tree (or the character codeword pairs) must be sent with the compressed information to enable the receiver decode the message.

Static Huffman Coding Algorithm Find the frequency of each character in the file to be compressed; For each distinct character create a one-node binary tree containing the character and its frequency as its priority; Insert the one-node binary trees in a priority queue in increasing order of frequency; while (there are more than one tree in the priority queue) { dequeue two trees t1 and t2; Create a tree t that contains t1 as its left subtree and t2 as its right subtree; priority (t) = priority(t1) + priority(t2); insert t in its proper location in the priority queue; }

Assign 0 and 1 weights to the edges of the resulting tree, such that the left and right edge of each node do not have the same weight;

15

16

## Static Huffman Coding example

Character : Frequency :

45

65

13

45

18

22

53

Use Huffman technique to answer the following questions: Build the Huffman code tree for the message. Use the Huffman tree to find the codeword for each character. If the data consists of only these characters, what is the total number of bits to be transmitted? What is the compression ratio? Verify that your computed Huffman codewords satisfy the Prefix property.

STEP -1 13 I STEP-2 S 22 31 A 45 N 45 T 53 E 65 18 O 22 S 45 A 45 N 53 T 65 E

18 O

22 S

45 A

45 N

53 T 22(S)

53

65 E 31

13(I)
16

18(O)

17

22(S)

31

45(A)

45(N)

13(I)

18(O)

65(E)

90

106

45(A)

45(N)

53(T) 22(S)

53 31

13(I)

18(O)

STEP:4

155

17

18

261

## CHARACTER HUFFMAN CODWORD

A 110

E 10

I 0110

N 111

O 0111

S 010

T 00

If we assume the message consists of only the characters a,e,l,n,o,s,t then the number of bits for the compressed message will be 696:

If the message is sent uncompressed with 8-bit ASCII representation for the characters, we have 261(addition of all frequencies)*8 = 2088 bits.

18

19

Assuming that the number of character-codeword pairs and the pairs are included at the beginning of the binary file containing the compressed message in the following format:

Number of bits for the transmitted file = bits(7) + bits(characters) + bits(codewords) + bits(compressed message) = 3 + (7*8) + 21 + 696 = 776

Compression ratio = bits for ASCII representation / number of bits transmitted = 2088 / 776 = 2.69

Coding:-

19

20

20

21

21

22

OUTPUT...

22

23

SUMMARY Huffman coding uses a specific method for choosing the representation for each symbol, resulting in a prefix that expresses the most common source symbols using shorter strings of bits than are used for less common source symbols. Huffman was able to design the most efficient compression method of this type: no other mapping of individual source symbols to unique strings of bits will produce a smaller average output size when the actual symbol frequencies agree with those used to create the code. The running time of Huffman's method is fairly efficient; it takes operations to construct it. A method was later found to design a Huffman code in linear time if input probabilities are sorted. Reference Text Book of multimedia and communication by frozen. http://www.binaryessence.com/dct.htm http://www.google/huffmancoding/ http://www.wiki/huffman+coding etc..

23