You are on page 1of 31

October 15, 2015 comp_dep_educ@yahoo.

com 1
Contents

Prefix Code
Representing prefix Codes Using Binary Tree
Binary Tree Terminology
Decoding a Prefix Code
Example
Huffman Coding
Cost of Huffman Tree
Optimality

October 15, 2015 comp_dep_educ@yahoo.com 2


Prefix code

If every word in the code has the same length, the code is called a
fixed-length code, or a block code.

A prefix code is a type of code system (typically a variable-length code)


distinguished by its possession of the "prefix property", which requires that
there is no code word in the system that is a prefix (initial segment) of any
other code word in the system. For example, a code with code words
{9, 59, 55} has the prefix property; a code consisting of {9, 5, 59, 55} does
not, because "5" is a prefix of "59" and also of "55".

A prefix code is a uniquely decodable code: a receiver can identify each


word without requiring a special marker between words.

October 15, 2015 comp_dep_educ@yahoo.com 3


Prefix codes

Suppose we have two binary code words a and b, where a is k bits long, b
is n bits long, and k < n. If the first k bits of b are identical to a, then a is
called a prefix of b. The last n k bits of b are called the dangling suffix.

For example, if

a = 010 and b = 01011,

then a is a prefix of b and the dangling suffix is 11.

October 15, 2015 comp_dep_educ@yahoo.com 4


Representing Prefix Codes using Binary Trees

A prefix code is most easily represented by a binary tree in which the external nodes
are labeled with single characters that are combined to form the message. The
encoding for a character is determined by following the path down from the root of
the tree to the external node that holds that character: a 0 bit identifies a left branch
in the path, and a 1 bit identifies a right branch.

In order for this encoding scheme to reduce the number of bits in a message, we use
short encodings for frequently used characters, and long encodings for infrequent ones.

October 15, 2015 comp_dep_educ@yahoo.com 5


A fundamental property of prefix codes is that messages can be formed
by simply stringing together the code bits from left to right. For example,
the bit-string.

0111110010110101001111100100
encodes the message "abracadabra!". The first 0 must encode 'a', then
the next three 1's must encode 'b', then 110 must encode r, and so on as
follows:
|0|111|110|0|1011|0|1010|0|111|110|0|100
a b r a c a d a b r a !

October 15, 2015 comp_dep_educ@yahoo.com 6


Binary Tree Terminology

1. Each node, except the root, has a unique parent.

2. Each internal node has exactly two children.

October 15, 2015 comp_dep_educ@yahoo.com 7


Decoding a Prefix Code

11000111100

October 15, 2015 comp_dep_educ@yahoo.com 8


Example

1. For a given list of symbols, develop a corresponding list of probabilities or frequency


counts so that each symbols relative frequency of occurrence is known.
we assume the following frequency counts: A : 15, B : 7, C : 6, D : 6, E : 5)
2. Sort the lists of symbols according to frequency, with the most frequently occurring
symbols at the left and the least common at the right.

3. Divide the list into two parts, with the total frequency counts of the left half being as
close to the total of the right as possible.

4. The left half of the list is assigned the binary digit 0, and the right half is assigned the
digit 1. This means that the codes for the symbols in the first half will all start with 0, and
the codes in the second half will all start with 1.

October 15, 2015 comp_dep_educ@yahoo.com 9


Example

5. Recursively apply the steps 3 and 4 to each of the two halves,


subdividing groups and adding bits to the codes until each symbol has
become a corresponding code leaf on the tree.

October 15, 2015 comp_dep_educ@yahoo.com 10


Huffman Coding

Huffman (1951)
Uses frequencies of symbols in a string to build a variable rate prefix
code.
Each symbol is mapped to a binary string.
More frequent symbols have shorter codes.
No code is a prefix of another.
Example:

a 0
b 100
c 101
d 11

October 15, 2015 comp_dep_educ@yahoo.com 11


Huffman Coding

Q. Given a text that uses 32 symbols (26 different letters, space, and some
punctuation characters), how can we encode this text in bits?

A. they encoded with 25. Then 32 symbols encoded with 5 bits.

October 15, 2015 comp_dep_educ@yahoo.com 12


Huffman Coding

Q. Given a text that uses 32 symbols (26 different letters, space, and some
punctuation characters), how can we encode this text in bits?

October 15, 2015 comp_dep_educ@yahoo.com 13


Huffman Coding

Q. Some symbols (e, t, a, o, i, n) are used far more often than others.
How can we use this to reduce our encoding?

A. Encode these characters with fewer bits, and the others with more bits.

October 15, 2015 comp_dep_educ@yahoo.com 14


Huffman Coding

Q. How do we know when the next symbol begins?

October 15, 2015 comp_dep_educ@yahoo.com 15


Huffman Coding

Q. How do we know when the next symbol begins?

A. Use a separation symbol (like the pause in Morse), or make sure that there is no
ambiguity by ensuring that no code is a prefix of another one.

Ex. c(a) = 01 What is 0101?


c(b) = 010
c(e) = 1

October 15, 2015 comp_dep_educ@yahoo.com 16


Cost of a Huffman Tree

Let p1, p2, ... , pm be the probabilities for the symbols a1, a2, ... ,am,
respectively.
Define the cost of the Huffman tree T to be

where ri is the length of the path from the root to ai.


C(T) is the expected length of the code of a symbol coded by the tree T.
C(T) is the bit rate of the code.

October 15, 2015 comp_dep_educ@yahoo.com 17


Cost of a Huffman Tree

Input: Probabilities p1, p2, ... , pm for symbols a1, a2, ... ,am, respectively.
Output: We would like to find a prefix code that is has the lowest possible
average bits per symbol.

That is, minimizes

Suppose we model a code in a binary tree

October 15, 2015 comp_dep_educ@yahoo.com 18


Example of Cost

Example: a : 1/2, b : 1/8, c : 1/8, d : 1/4

C(T) = 1 x 1/2 + 3 x 1/8 + 3 x 1/8 + 2 x 1/4 = 1.75


a b c d

October 15, 2015 comp_dep_educ@yahoo.com 19


Ex. c(a) = 11
c(e) = 01
c(k) = 001
c(l) = 10
c(u) = 000

Note: only the leaves have a label.

An encoding of x is a prefix of an encoding of y if and only if the path of x


is a prefix of the path of y.

October 15, 2015 comp_dep_educ@yahoo.com 20


Optimality

Principle 1
In a Huffman tree a lowest probability symbol has maximum distance from
the root.
exchanging a lowest probability symbol with one at maximum distance will
lower the cost.

C(T) - C(T) = hq + kp - hp - kq = (hq hp) (kq kp)


C(T) - C(T) = (h-k)(q-p) >= 0

October 15, 2015 comp_dep_educ@yahoo.com 21


Optimality

Principle 2
The second lowest probability is a sibling of the smallest in some
Huffman tree.
If we can move it there not raising the cost.

October 15, 2015 comp_dep_educ@yahoo.com 22


Optimality

Principle 3
Assuming we have a Huffman tree T whose two lowest probability symbols
are siblings at maximum depth, they can be replaced by a new symbol whose
probability is the sum of their probabilities.
The resulting tree is optimal for the new symbol set.

October 15, 2015 comp_dep_educ@yahoo.com 23


Optimality

1. If there is just one symbol, a tree with one node is optimal. Otherwise
2. Find the two lowest probability symbols with probabilities p and q
respectively.
3. Replace these with a new symbol with probability p + q.
4. Solve the problem recursively for new symbols.
5. Replace the leaf with the new symbol with an internal node with two
children with the old symbols.

October 15, 2015 comp_dep_educ@yahoo.com 24


Optimality

Principle 3 (cont)
If T were not optimal then we could find a lower cost tree T. This will
lead to a lower cost tree T for the original alphabet.

October 15, 2015 comp_dep_educ@yahoo.com 25


Optimality

Q. What is the meaning of


111010001111101000 ?

October 15, 2015 comp_dep_educ@yahoo.com 26


Optimality

Q. What is the meaning of


111010001111101000 ?

A. simpel

Q. How can this prefix code be made more efficient?

October 15, 2015 comp_dep_educ@yahoo.com 27


Optimality

Q. What is the meaning of


111010001111101000 ?

A. simpel

Q. How can this prefix code be made more efficient?


A. Change encoding of p and s to a shorter one.
This tree is now full.

October 15, 2015 comp_dep_educ@yahoo.com 28


Optimality

Definition. A tree is full if every node that is not a leaf has two children.

Claim. The binary tree corresponding to the optimal prefix code is full.

Q. Where in the tree of an optimal prefix code should symbols be placed


with a high frequency?

October 15, 2015 comp_dep_educ@yahoo.com 29


Optimality

Definition. A tree is full if every node that is not a leaf has two children.

Claim. The binary tree corresponding to the optimal prefix code is full.

Q. Where in the tree of an optimal prefix code should symbols be placed


with a high frequency?

A. Near the top

October 15, 2015 comp_dep_educ@yahoo.com 30


October 15, 2015 comp_dep_educ@yahoo.com 31