Lecture 7 F

Review
Shannon-Fano Code
Bounds on the Optimal Code Length
Shannon Competitive Optimality
Shannon-Fano-Elias Code
Data Compression
Besma Smida
ES250: Lecture 7
Fall 2008-09
B. Smida (ES250)
Data Compression
Fall 2008-09
1 / 22
Review
Shannon-Fano Code
Todays outline
Review of Kraft Inequality and Optimal Codes

Shannon-Fano Code
Bounds on the optimal code length
Shannon-Competitive Optimality
Shannon-Fano-Elias code
B. Smida (ES250)
Data Compression
Fall 2008-09
2 / 22
Review
Shannon-Fano Code
Kraft Inequality
Theorem: Kraft inequality

For any uniquely decodable code (including prefix code) over an alphabet of size
D, the codeword lengths l1 , l2 , . . . , lm must satisfy the inequality
X
D li 1.
i
Conversely, given a set of codeword lengths that satisfy this inequality, there exists
an instantaneous (prefix) code with these word lengths.
B. Smida (ES250)
Data Compression
Fall 2008-09
3 / 22
Review
Shannon-Fano Code
Optimal Codes
Problem: We wish to find a prefix code which has the shortest average code
length L = E[l(x)].
This is equivalent to solve the constrained minimization problem:
( m
)
m
X
X
pk l k :
D lk 1, lk N, k = 1, . . . , m
min
k=1
B. Smida (ES250)
k=1
Data Compression
Fall 2008-09
4 / 22
Review
Shannon-Fano Code
Solution
We used the Lagrangian and we obtained the minimum

lk = logD pk .
Note that L =
Pm
k=1
pk lk = HD (x).
In general li is not an integer.

Definition:
A probability distribution is called D-adic if each of the probabilities is equal to
D n for some n.
B. Smida (ES250)
Data Compression
Fall 2008-09
5 / 22
Review
Shannon-Fano Code
Lower Bound
Theorem: Lower bound on codeword length

The expected length L of any instantaneous D-ary code for a random variable X
is greater than or equal to the base-D entropy HD (X ):
L HD (x),
with equality iff D li = pi for all i.
Equality holds iff the distribution of X is D-adic.
B. Smida (ES250)
Data Compression
Fall 2008-09
6 / 22
Review
Shannon-Fano Code
Upper Bound
Theorem: Upper bound on codeword length

We exhibit a sub-optimal prefix code for which the expected length L is less than
the base-D entropy HD (X ) + 1:
L < HD (x) + 1.
B. Smida (ES250)
Data Compression
Fall 2008-09
7 / 22
Review
Shannon-Fano Code
Proof
Let x be the smallest integer which is greater than or equal to x, and
choose
1
lk = logD
pk
This choice of code lengths satisfies Kraft inequality
X
X log 1 X log 1
pk
pk
D
D
=
pk = 1,
and since
logD
1
1
lk < logD
+ 1,
pk
pk
The result follows.

Note: this code is called the Shannon code.
B. Smida (ES250)
Data Compression
Fall 2008-09
8 / 22
Review
Shannon-Fano Code
Shannon Code
Round up optimal code length: lk = logD
1
pk
This choice of code lengths satisfies Kraft inequality, hence prefix code exists.
Put lk into ascending order and set
ck =
k1
X
D li
or ck =
i =1
k1
X
p(xi )
i =1
sum of the probabilities of all symbols less than k.

Then the codeword k is the number ck rounded off to lk bits.
B. Smida (ES250)
Data Compression
Fall 2008-09
9 / 22
Review
Shannon-Fano Code
Shannon Code Examples

Example 1:
p(x)
log2 p(x)
lx = log2 p(x)
Lc
lk
1
2
3
3
B. Smida (ES250)
= [0.5 0.25 0.125 0.125]

= [1 2 3 3]
= [1 2 3 3]
= 1.75 bits, H(X ) = 1.75 bits
P
ck = k1
i =1 p(xi )
0.0 = 0.002
0.5 = 0.102
0.75 = 0.1102
0.875 = 0.1112
Data Compression
Code
0
10
110
111
Fall 2008-09
10 / 22
Review
Shannon-Fano Code
Shannon Code Examples
Example 2:
p(x)
log2 p(x)
lx = log2 p(x)
Lc
= [0.99 0.01]
= [0.0145 6.64]
= [1 7]
= 1.06 bits, H(X ) = 0.08 bits
We can make H(X ) + 1 bound tighter by encoding longer blocks as

super-symbol.
B. Smida (ES250)
Data Compression
Fall 2008-09
11 / 22
Review
Shannon-Fano Code
Shannon-Fano Code
Put probabilities in decreasing order.
Split as close to 50-50 as possible; repeat with each half.
H(X ) = 2.81 bits and LSF = 2.89 bits. Always H(X ) LSF < H(X ) + 1.
Intuitively natural but not optimal.
B. Smida (ES250)
Data Compression
Fall 2008-09
12 / 22
Review
Shannon-Fano Code
Theorem: Optimal expected codeword length

Let l1 , l2 , . . . , lm be optimal codeword lengths for a source distribution p and a
D-ary P
alphabet, and let L be the associated expected length of an optimal code
(L = pi li ). Then
HD (x) L < HD (x) + 1.
B. Smida (ES250)
Data Compression
Fall 2008-09
13 / 22
Review
Shannon-Fano Code
Block coding
Consider sending a sequence of n symbols drawn iid according to p(x) in a
block, so that we have a supersymbol from X n .
Let Ln be the expected codeword length per input symbol:
Ln :=
1
E [l(X1 , X2 , . . . , Xn )].
n
H(X1 , X2 , . . . , Xn )
n
Ln <
H(X )
Ln <
H(X1 , X2 , . . . , Xn ) + 1
n
1
H(X ) +
n
Then by letting the block length n become large, we may achieve an

expected length per symbol Ln arbitrarily close to the entropy.
B. Smida (ES250)
Data Compression
Fall 2008-09
14 / 22
Review
Shannon-Fano Code
Stationary stochastic process
Theorem: Distributing the extra overhead bit

The minimum expected codeword length per symbol satisfies
H(X1 , X2 , . . . , Xn )
H(X1 , X2 , . . . , Xn ) 1
Ln <
+ .
n
n
n
Moreover, if X1 , X2 , . . . , Xn is a stationary stochastic process, then Ln H(X ),
where H(X ) is the entropy rate of the process.
The previous theorem confirms that the entropy rate of a stationary stochastic
process is indeed the minimum expected number of bits per symbol needed to
describe the process.
B. Smida (ES250)
Data Compression
Fall 2008-09
15 / 22
Review
Shannon-Fano Code
Wrong distribution
If we design a code for the wrong input distribution, then the increase in
expected description length is given exactly by the relative entropy:
Theorem: Wrong code
1
The expected length under p(x) of the code assignment l(x) = log q(x)
satisfies
H(p) + D(p k q) Ep [l(X )] < H(p) + D(p k q) + 1.

If you use the wrong distribution, the penalty is D(p k q).
B. Smida (ES250)
Data Compression
Fall 2008-09
16 / 22
Review
Shannon-Fano Code
Proof
We have
Ep [l(X )] =
p(x)log
xX
<
=
q(x)

1
p(x) log
+1
q(x)
xX

X
p(x)
1
p(x) log
+ log
+1
q(x)
p(x)
X
xX
D(p k q) + H(p) + 1
The lower bound is derived similarly.
B. Smida (ES250)
Data Compression
Fall 2008-09
17 / 22
Review
Shannon-Fano Code

Theorem:
Let l(x) be the codeword lengths associated with the Shannon code, and let l (x)
be the codeword lengths associated with other uniquely decodable code. Then,
Pr(l(x) l (x) + c)
1
2c1
Proof:
Pr(l(x) l (x) + c) = Pr( log p(x) l (x) + c)
Pr( log p(x) l (x) + c 1) = Pr(p(x) 2l
X
p(x)
=
(x)c+1
xA
2l
(x)c+1
= 2c+1
2l
(x)
2c+1
No other code can do much better than Shannon code most of the time.
B. Smida (ES250)
Data Compression
Fall 2008-09
18 / 22
Review
Shannon-Fano Code
Dyadic Competitive Optimality

Theorem:
For a Dyadic probability mass function p(x), let l(x) be the word lengths of the
binary Shannon code for the source, and let l (x) be the lengths of any other
uniquely decodable binary code for the source. Then,
Pr(l(x) < l (x)) Pr(l(x) > l (x))
with equality if and only if l(x) = l (x) for all x.
Proof: Note sgn(i) 2i 1
Pr(l(x) < l (x)) Pr(l(x) > l (x))
p(x)sgn(l(x) l (x))
p(x)2l(x)l
(x)
2l(x) 2l(x)l
(x)
11=0
B. Smida (ES250)
Data Compression
Fall 2008-09
19 / 22
Review
Shannon-Fano Code
Shannon-Fano-Elias coding makes direct use of the cumulative distribution

function (cdf) F (x) to assign codewords.
P
By using the midpoint F (x) = t<x p(t) + 12 p(x) of each jump in the cdf,
we derive a prefix-free code C .
The codeword of x is obtained by rounding off F (x) to l(x) bits.
1
We show it is sufficient to choose l(x) = log p(x)
+ 1.
B. Smida (ES250)
Data Compression
Fall 2008-09
20 / 22
Review
Shannon-Fano Code
By definition F (x) F (x)l(x) < 2l(x) .
We also have that
1
2l(x) = 2log p(x) 1 <
p(x)
p(x)
F (x) F (x)l(x) <
2
2
Therefore F (x)l(x) lies within the step corresponding to x. The sets

1
F (x)l(x) , F (x)l(x) + l(x)
2
1
are disjoint. Thus l(x) = log p(x)
+ 1 suffice to describe x and guarantee
the prefix code.
Finally note that,
L=
xX
B. Smida (ES250)
p(x)l(x) =
p(x)log
xX
Data Compression
1
+ 1 < H(X ) + 2.
p(x)
Fall 2008-09
21 / 22
Review
Shannon-Fano Code
Shannon-Fano-Elias Code: Example
B. Smida (ES250)
Data Compression
Fall 2008-09
22 / 22

Lecture 7 F

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 7 F

Uploaded by

Copyright:

Available Formats

Review

Bounds on the Optimal Code Length

Shannon Competitive Optimality

Bounds on the Optimal Code Length

Shannon Competitive Optimality

Review of Kraft Inequality and Optimal Codes

Bounds on the Optimal Code Length

Shannon Competitive Optimality

Theorem: Kraft inequality

Bounds on the Optimal Code Length

Shannon Competitive Optimality

Bounds on the Optimal Code Length

Shannon Competitive Optimality

We used the Lagrangian and we obtained the minimum

In general li is not an integer.

Bounds on the Optimal Code Length

Shannon Competitive Optimality

Theorem: Lower bound on codeword length

with equality iff D li = pi for all i.

Equality holds iff the distribution of X is D-adic.

Bounds on the Optimal Code Length

Shannon Competitive Optimality

Theorem: Upper bound on codeword length

Bounds on the Optimal Code Length

Shannon Competitive Optimality

The result follows.

Bounds on the Optimal Code Length

Shannon Competitive Optimality

Round up optimal code length: lk = logD

sum of the probabilities of all symbols less than k.

Bounds on the Optimal Code Length

Shannon Competitive Optimality

Shannon Code Examples

= [0.5 0.25 0.125 0.125]

Bounds on the Optimal Code Length

Shannon Competitive Optimality

Shannon Code Examples

We can make H(X ) + 1 bound tighter by encoding longer blocks as

Bounds on the Optimal Code Length

Shannon Competitive Optimality

Bounds on the Optimal Code Length

Shannon Competitive Optimality

Bounds on the Optimal Code Length

Theorem: Optimal expected codeword length

Bounds on the Optimal Code Length

Shannon Competitive Optimality

Then by letting the block length n become large, we may achieve an

Bounds on the Optimal Code Length

Shannon Competitive Optimality

Stationary stochastic process

Theorem: Distributing the extra overhead bit

Bounds on the Optimal Code Length

Shannon Competitive Optimality

H(p) + D(p k q) Ep [l(X )] < H(p) + D(p k q) + 1.

Bounds on the Optimal Code Length

Shannon Competitive Optimality

The lower bound is derived similarly.

Bounds on the Optimal Code Length

Shannon Competitive Optimality

Shannon Competitive Optimality

Bounds on the Optimal Code Length

Shannon Competitive Optimality

Dyadic Competitive Optimality