You are on page 1of 22

Review

Shannon-Fano Code

Bounds on the Optimal Code Length

Shannon Competitive Optimality

Shannon-Fano-Elias Code

Data Compression
Besma Smida
ES250: Lecture 7

Fall 2008-09

B. Smida (ES250)

Data Compression

Fall 2008-09

1 / 22

Review

Shannon-Fano Code

Bounds on the Optimal Code Length

Shannon Competitive Optimality

Shannon-Fano-Elias Code

Todays outline

Review of Kraft Inequality and Optimal Codes


Shannon-Fano Code
Bounds on the optimal code length
Shannon-Competitive Optimality
Shannon-Fano-Elias code

B. Smida (ES250)

Data Compression

Fall 2008-09

2 / 22

Review

Shannon-Fano Code

Bounds on the Optimal Code Length

Shannon Competitive Optimality

Shannon-Fano-Elias Code

Kraft Inequality

Theorem: Kraft inequality


For any uniquely decodable code (including prefix code) over an alphabet of size
D, the codeword lengths l1 , l2 , . . . , lm must satisfy the inequality
X
D li 1.
i

Conversely, given a set of codeword lengths that satisfy this inequality, there exists
an instantaneous (prefix) code with these word lengths.

B. Smida (ES250)

Data Compression

Fall 2008-09

3 / 22

Review

Shannon-Fano Code

Bounds on the Optimal Code Length

Shannon Competitive Optimality

Shannon-Fano-Elias Code

Optimal Codes

Problem: We wish to find a prefix code which has the shortest average code
length L = E[l(x)].
This is equivalent to solve the constrained minimization problem:
( m
)
m
X
X
pk l k :
D lk 1, lk N, k = 1, . . . , m
min
k=1

B. Smida (ES250)

k=1

Data Compression

Fall 2008-09

4 / 22

Review

Shannon-Fano Code

Bounds on the Optimal Code Length

Shannon Competitive Optimality

Shannon-Fano-Elias Code

Solution

We used the Lagrangian and we obtained the minimum


lk = logD pk .
Note that L =

Pm

k=1

pk lk = HD (x).

In general li is not an integer.


Definition:
A probability distribution is called D-adic if each of the probabilities is equal to
D n for some n.

B. Smida (ES250)

Data Compression

Fall 2008-09

5 / 22

Review

Shannon-Fano Code

Bounds on the Optimal Code Length

Shannon Competitive Optimality

Shannon-Fano-Elias Code

Lower Bound

Theorem: Lower bound on codeword length


The expected length L of any instantaneous D-ary code for a random variable X
is greater than or equal to the base-D entropy HD (X ):
L HD (x),

with equality iff D li = pi for all i.

Equality holds iff the distribution of X is D-adic.

B. Smida (ES250)

Data Compression

Fall 2008-09

6 / 22

Review

Shannon-Fano Code

Bounds on the Optimal Code Length

Shannon Competitive Optimality

Shannon-Fano-Elias Code

Upper Bound

Theorem: Upper bound on codeword length


We exhibit a sub-optimal prefix code for which the expected length L is less than
the base-D entropy HD (X ) + 1:
L < HD (x) + 1.

B. Smida (ES250)

Data Compression

Fall 2008-09

7 / 22

Review

Shannon-Fano Code

Bounds on the Optimal Code Length

Shannon Competitive Optimality

Shannon-Fano-Elias Code

Proof
Let x be the smallest integer which is greater than or equal to x, and
choose
1
lk = logD
pk
This choice of code lengths satisfies Kraft inequality
X
X log 1 X log 1
pk
pk
D

D
=
pk = 1,
and since

logD

1
1
lk < logD
+ 1,
pk
pk

The result follows.


Note: this code is called the Shannon code.

B. Smida (ES250)

Data Compression

Fall 2008-09

8 / 22

Review

Shannon-Fano Code

Bounds on the Optimal Code Length

Shannon Competitive Optimality

Shannon-Fano-Elias Code

Shannon Code

Round up optimal code length: lk = logD

1
pk

This choice of code lengths satisfies Kraft inequality, hence prefix code exists.
Put lk into ascending order and set
ck =

k1
X

D li

or ck =

i =1

k1
X

p(xi )

i =1

sum of the probabilities of all symbols less than k.


Then the codeword k is the number ck rounded off to lk bits.

B. Smida (ES250)

Data Compression

Fall 2008-09

9 / 22

Review

Shannon-Fano Code

Bounds on the Optimal Code Length

Shannon Competitive Optimality

Shannon-Fano-Elias Code

Shannon Code Examples


Example 1:
p(x)
log2 p(x)
lx = log2 p(x)
Lc
lk
1
2
3
3

B. Smida (ES250)

= [0.5 0.25 0.125 0.125]


= [1 2 3 3]
= [1 2 3 3]
= 1.75 bits, H(X ) = 1.75 bits

P
ck = k1
i =1 p(xi )
0.0 = 0.002
0.5 = 0.102
0.75 = 0.1102
0.875 = 0.1112

Data Compression

Code
0
10
110
111

Fall 2008-09

10 / 22

Review

Shannon-Fano Code

Bounds on the Optimal Code Length

Shannon Competitive Optimality

Shannon-Fano-Elias Code

Shannon Code Examples

Example 2:
p(x)
log2 p(x)
lx = log2 p(x)
Lc

= [0.99 0.01]
= [0.0145 6.64]
= [1 7]
= 1.06 bits, H(X ) = 0.08 bits

We can make H(X ) + 1 bound tighter by encoding longer blocks as


super-symbol.

B. Smida (ES250)

Data Compression

Fall 2008-09

11 / 22

Review

Shannon-Fano Code

Bounds on the Optimal Code Length

Shannon Competitive Optimality

Shannon-Fano-Elias Code

Shannon-Fano Code
Put probabilities in decreasing order.
Split as close to 50-50 as possible; repeat with each half.

H(X ) = 2.81 bits and LSF = 2.89 bits. Always H(X ) LSF < H(X ) + 1.
Intuitively natural but not optimal.

B. Smida (ES250)

Data Compression

Fall 2008-09

12 / 22

Review

Shannon-Fano Code

Bounds on the Optimal Code Length

Shannon Competitive Optimality

Shannon-Fano-Elias Code

Bounds on the Optimal Code Length

Theorem: Optimal expected codeword length


Let l1 , l2 , . . . , lm be optimal codeword lengths for a source distribution p and a
D-ary P
alphabet, and let L be the associated expected length of an optimal code

(L = pi li ). Then
HD (x) L < HD (x) + 1.

B. Smida (ES250)

Data Compression

Fall 2008-09

13 / 22

Review

Shannon-Fano Code

Bounds on the Optimal Code Length

Shannon Competitive Optimality

Shannon-Fano-Elias Code

Block coding
Consider sending a sequence of n symbols drawn iid according to p(x) in a
block, so that we have a supersymbol from X n .
Let Ln be the expected codeword length per input symbol:
Ln :=

1
E [l(X1 , X2 , . . . , Xn )].
n

H(X1 , X2 , . . . , Xn )
n

Ln <

H(X )

Ln <

H(X1 , X2 , . . . , Xn ) + 1
n
1
H(X ) +
n

Then by letting the block length n become large, we may achieve an


expected length per symbol Ln arbitrarily close to the entropy.

B. Smida (ES250)

Data Compression

Fall 2008-09

14 / 22

Review

Shannon-Fano Code

Bounds on the Optimal Code Length

Shannon Competitive Optimality

Shannon-Fano-Elias Code

Stationary stochastic process

Theorem: Distributing the extra overhead bit


The minimum expected codeword length per symbol satisfies
H(X1 , X2 , . . . , Xn )
H(X1 , X2 , . . . , Xn ) 1
Ln <
+ .
n
n
n
Moreover, if X1 , X2 , . . . , Xn is a stationary stochastic process, then Ln H(X ),
where H(X ) is the entropy rate of the process.
The previous theorem confirms that the entropy rate of a stationary stochastic
process is indeed the minimum expected number of bits per symbol needed to
describe the process.

B. Smida (ES250)

Data Compression

Fall 2008-09

15 / 22

Review

Shannon-Fano Code

Bounds on the Optimal Code Length

Shannon Competitive Optimality

Shannon-Fano-Elias Code

Wrong distribution

If we design a code for the wrong input distribution, then the increase in
expected description length is given exactly by the relative entropy:
Theorem: Wrong code
1
The expected length under p(x) of the code assignment l(x) = log q(x)
satisfies

H(p) + D(p k q) Ep [l(X )] < H(p) + D(p k q) + 1.


If you use the wrong distribution, the penalty is D(p k q).

B. Smida (ES250)

Data Compression

Fall 2008-09

16 / 22

Review

Shannon-Fano Code

Bounds on the Optimal Code Length

Shannon Competitive Optimality

Shannon-Fano-Elias Code

Proof

We have
Ep [l(X )] =

p(x)log

xX

<
=

q(x)



1
p(x) log
+1
q(x)
xX


X
p(x)
1
p(x) log
+ log
+1
q(x)
p(x)
X

xX

D(p k q) + H(p) + 1

The lower bound is derived similarly.

B. Smida (ES250)

Data Compression

Fall 2008-09

17 / 22

Review

Shannon-Fano Code

Bounds on the Optimal Code Length

Shannon Competitive Optimality

Shannon-Fano-Elias Code

Shannon Competitive Optimality


Theorem:
Let l(x) be the codeword lengths associated with the Shannon code, and let l (x)
be the codeword lengths associated with other uniquely decodable code. Then,
Pr(l(x) l (x) + c)

1
2c1

Proof:
Pr(l(x) l (x) + c) = Pr( log p(x) l (x) + c)
Pr( log p(x) l (x) + c 1) = Pr(p(x) 2l
X
p(x)
=

(x)c+1

xA

2l

(x)c+1

= 2c+1

2l

(x)

2c+1

No other code can do much better than Shannon code most of the time.
B. Smida (ES250)

Data Compression

Fall 2008-09

18 / 22

Review

Shannon-Fano Code

Bounds on the Optimal Code Length

Shannon Competitive Optimality

Shannon-Fano-Elias Code

Dyadic Competitive Optimality


Theorem:
For a Dyadic probability mass function p(x), let l(x) be the word lengths of the
binary Shannon code for the source, and let l (x) be the lengths of any other
uniquely decodable binary code for the source. Then,
Pr(l(x) < l (x)) Pr(l(x) > l (x))
with equality if and only if l(x) = l (x) for all x.
Proof: Note sgn(i) 2i 1
Pr(l(x) < l (x)) Pr(l(x) > l (x))

p(x)sgn(l(x) l (x))

p(x)2l(x)l

(x)

2l(x) 2l(x)l

(x)

11=0
B. Smida (ES250)

Data Compression

Fall 2008-09

19 / 22

Review

Shannon-Fano Code

Bounds on the Optimal Code Length

Shannon Competitive Optimality

Shannon-Fano-Elias Code

Shannon-Fano-Elias Code

Shannon-Fano-Elias coding makes direct use of the cumulative distribution


function (cdf) F (x) to assign codewords.
P
By using the midpoint F (x) = t<x p(t) + 12 p(x) of each jump in the cdf,
we derive a prefix-free code C .
The codeword of x is obtained by rounding off F (x) to l(x) bits.
1
We show it is sufficient to choose l(x) = log p(x)
+ 1.

B. Smida (ES250)

Data Compression

Fall 2008-09

20 / 22

Review

Shannon-Fano Code

Bounds on the Optimal Code Length

Shannon Competitive Optimality

Shannon-Fano-Elias Code

Shannon-Fano-Elias Code
By definition F (x) F (x)l(x) < 2l(x) .
We also have that
1

2l(x) = 2log p(x) 1 <

p(x)
p(x)
F (x) F (x)l(x) <
2
2

Therefore F (x)l(x) lies within the step corresponding to x. The sets




1
F (x)l(x) , F (x)l(x) + l(x)
2
1
are disjoint. Thus l(x) = log p(x)
+ 1 suffice to describe x and guarantee
the prefix code.
Finally note that,

L=

xX
B. Smida (ES250)

p(x)l(x) =

p(x)log

xX
Data Compression

1
+ 1 < H(X ) + 2.
p(x)
Fall 2008-09

21 / 22

Review

Shannon-Fano Code

Bounds on the Optimal Code Length

Shannon Competitive Optimality

Shannon-Fano-Elias Code

Shannon-Fano-Elias Code: Example

B. Smida (ES250)

Data Compression

Fall 2008-09

22 / 22

You might also like