You are on page 1of 27

Huffman and Arithmetic Coding

Coding and Its Application


Introduction
Huffman codes can be classified as instantaneous code
It has property:
Huffman codes are compact codes, i.e. it produces a code with
an average length which is the smallest possible to achieve for
the given number of source symbols, code alphabet, and source
statistics
Huffman codes operate by reducing a source with q
symbols to a source with r symbols, where r is the size of
the code alphabet
Introduction
Consider the source S with q symbols and
associated with probabilities
Let the symbols be renumbered so that
By combining the last r symbols of S,
into one symbol, with probability
The trivial r-ary compact code for the reduced source
with r symbols is used to design the compact code for
the preceding reduced source
{ }
: 1, 2, ,
i
s i q =
( ) { }
: 1, 2, ,
i
P s i q =
( ) ( ) ( )
1 2 q
P s P s P s > > >
{ }
1 2
, , ,
q r q r q
s s s
+ +

1 q r
s
+

( )
( )
1
1
1
r
q r
q r
i
P s P s
+
+
=
=

Binary Huffman Coding


The algorithm:
Re-order the source symbols in decreasing order of symbol
probability
Reduce the source by combining the last two symbols and re-
ordering the new set in decreasing order
Assign a compact code for the final reduced source. For a two
symbol source the trivial code is {0, 1}
Backtrack to the original source S assigning a compact code
Example:
Consider a 5 symbol source with the following probability
Binary Huffman Coding
The average length is 2.2 bits/symbol
The efficiency is 96.5%
Are Huffman codes unique?
r-ary Huffman Codes
Calculate . If is a non-integer value then append
dummy symbols to the source with zero probability until
there are symbols
Re-order the source symbols in decreasing order of symbol
probability
Reduce the source to , then and so on by combining
the last r symbols of into a combined symbol and re-
ordering the new set of symbol probabilities for in
decreasing order. For each source keep track of the position of
the combined symbol
Terminate the source reduction when a source with exactly r
symbols is produced. For a source with q symbols the reduced
source with r symbols will be
( )
1
q r
r
o

=

( )
1 q r r o = +
(
(
S
1
S
2
S
j
S
1 j
S
+
1

q r
s
+
S
o (
(
r-ary Huffman Codes
Assign a compact r-ary code for the final reduced source.
For a source with r symbols the trivial code is
Backtrack to the original source assigning a compact code
for the j-th reduced source. The compact code assigned
to , minus the code words assigned to any dummy
symbols, is the r-ary Huffman code
{ }
0,1, , r
S
r-ary Huffman Codes
Example: we want to design a compact quaternary code
for a source with 11 symbols
First, we calculate which is not integer
We need to append dummy symbols, so that we have a
source with symbols
The appended symbols is with
( )
11 4
2.33
4 1
o

= =

( )
4 2.33 4 1 13 q = + =
(
(
{ }
12 13
, s s
( ) ( )
12 13
0.00 P s P s = =
r-ary Huffman Codes
1
s
2
s
3
s
4
s
5
s
6
s
7
s
8
s
9
s
10
s
11
s
12
s
13
s
Arithmetic Coding
Problems related to Huffman coding:
The size of Huffman code table represents an exponential
increase in memory and computational requirements
The code table needs to be transmitted to the receiver
The source statistics are assumed stationary
Encoding and decoding is performed on a per block basis: the
code is not produced until a block of n symbols is received
One solution to using Huffman coding on increasingly
larger extensions of the source is by using arithmetic
coding
Arithmetic coding also needs source statistics
Arithmetic Coding
Consider the N-length source message ,
where are the source symbols and
indicates that the j-th character in the message is the
source symbol
Arithmetic coding assumes that
for can be calculated. Remember Markov
model!!
The goal of this coding is to assign a unique interval along
the unit number line of length equal to the probability of
the given source message, with its position on the
number line given by cumulative probability of given
source message,
1 2
, , ,
i i iN
s s s
{ }
: 1, 2, ,
i
s i q =
ij
s
i
s
( )
1 2 , 1
| , , ,
ij i i i j
P s s s s

1, 2, , j N =
( )
1 2
, , ,
i i iN
Cum s s s
Arithmetic Coding
The basic operation of arithmetic coding is to produce
this unique interval by starting with the interval [0,1) and
iteratively subdividing it by for
Consider the first letter of message of the message,
namely
The individual symbols are each assigned the interval
where
and
( )
1 2 , 1
| , , ,
ij i i i j
P s s s s

1, 2, , j N =
1 i
s
| )
,
i i
lb hb
( ) ( )
1
i
i i k
k
hb Cum s P s
=
= =

( ) ( ) ( )
1
1
i
i i i k
k
lb Cum s P s P s

=
= =

Arithmetic Coding
The length of each interval is and the end of
the interval is given by
The interval corresponding to the symbol is then selected
Next we consider the second letter of the message,
The individual symbols are now assigned the interval
where:
( )
i i i
hb lb P s =
( )
i
Cum s
2 i
s
| )
,
i i
lb hb
( ) ( )
( )
1 1 1
1 1
1
| *
| *
i i i i i
i
k i
k
hb lb Cum s s P s
lb P s s R
=
= +

= +
`
)

( ) ( ) { } ( )
( )
1 1 1 1
1
1 1
1
| | *
| *
i i i i i i i
i
k i
k
lb lb Cum s s P s s P s
lb P s s R

=
= +

= +
`
)

( )
1 1 1 i i i
R hb lb P s = =
Arithmetic Coding
The length of each interval is
The interval corresponding to the symbol , that is
is then selected
The length of the interval corresponding to the message
seen so far, is
( ) ( )
1 1
| *
i i i i i
hb lb P s s P s =
2 i
s
| )
2 2
,
i i
lb hb
1 2
,
i i
s s
( ) ( ) ( )
2 2 2 1 1 1 2
| * ,
i i i i i i i
hb lb P s s P s P s s = =
Arithmetic Coding
Example: consider the message originating from
the 3-symbol source with the following individual and
cumulative probabilities
We assume that the source is zero-memory such that
Initially, the probbaility line [0,1) is divided into three
intervals: [0,0.2), [0.2,0.7), and [0.7,1.0) corresponding
to and the length ratios 0.2 : 0.5 : 0.3
2 2 1
, , s s s
( ) ( )
1 2 , 1
| , , ,
ij i i i j ij
P s s s s P s

=
1 2 3
, , s s s
Arithmetic Coding
The first letter of the message is then the interval
[0.2,0.7) is selected
The interval [0.2,0.7) is divided into three subinterval of
length ratios 0.2 : 0.5 : 0.3, that is [0.2,0.3), [0.3,0.55), and
[0.55,0.7)
When the second letter is received the subinterval
[0.3,0.55) is selected
The interval [0.3,0.55) is subdivided into [0.3,0.35),
[0.35,0.475), and [0.475,0.55) with length ratios 0.2 : 0.5 :
0.3
2
s
2
s
Arithmetic Coding
When the last letter of the message is received, the
corresponding interval [0.3,0.35) of length
is selected
The final interval can be written, in binary, as
[0.01001100,0.01011001)
We need to select a number of significant bits that is able
to represent it.
It suppose to be
1
s
( )
2 2 1
, , 0.05 P s s s =
( ) ( )
2 2 2 1 2
log , , log 0.05 4.32 4 bits P s s s = = ~
Arithmetic
Coding
The graphical
interpretation of
the previous
example
Arithmetic Coding
How does one select the number that falls within the
final interval so that it can be transmitted in the least
number of bits?
Let [low,high) denote the final interval
Since low < high, at the first place that differ will be a 0 in
the expansion for low and a 1 in the expansion for high
It is selected and transmitted as the t-bit code sequence
1 2 1
1 2 1
0. 0
0. 1
t
t
low a a a
high a a a

=
=


1 2 1
1
t
a a a

Encoding Algorithm
Encoding Algorithm
Decoding Algorithm
Decoding Algorithm
Example of Encoding
Consider this zero-memory source
Suppose that the message bad_dab. is generated where .
is the EOF
Example of Encoding
Applying the algorithm, we yield
Binary representation for [0.434249583, 0.434250482):
So, we transmit 16-bit value: 0110111100101011
Example of Decoding
Suppose we want to decode the result from the last
example
Which is Better?
A question arises: which is better between arithmetic
code and Huffman code?
Huffman coding and arithmetic coding exhibit similar
performance in theory
Huffman coding becomes computionally prohibitive with
increasing n because the computational complexity is the
order of
n
q

You might also like