This action might not be possible to undo. Are you sure you want to continue?
1
OneMinute Survey Result
Thank you for your responses
Kristen, Anusha, Ian, Christofer, Bernard, Greg, Michael, Shalini,
Brian and Justin
Valentine‟s challenge
Min: 3045 minutes, Max: 5 hours, Ave: 23 hours
Muddiest points
Regular tree grammar (CS410 compiler or CS422: Automata)
Fractal geometry (“The fractal geometry of nature” by Mandelbrot)
Seeing the Connection
Remember the first story in Steve Jobs‟ speech “Staying Hungry,
Staying Foolish”?
In addition to Jobs and Shannon, I have two more examples:
Charles Darwin and Bruce Lee
EE465: Introduction to Digital Image Processing
2
Data Compression Basics
Discrete source
Information=uncertainty
Quantification of uncertainty
Source entropy
Variable length codes
Motivation
Prefix condition
Huffman coding algorithm
EE465: Introduction to Digital Image Processing
3
Information
What do we mean by information?
“A numerical measure of the uncertainty of an
experimental outcome” – Webster Dictionary
How to quantitatively measure and represent
information?
Shannon proposes a statisticalmechanics inspired
approach
Let us first look at how we assess the amount of
information in our daily lives using common
sense
EE465: Introduction to Digital Image Processing
4
Information = Uncertainty
Zero information
Pittsburgh Steelers won the Superbowl XL (past news, no
uncertainty)
Yao Ming plays for Houston Rocket (celebrity fact, no uncertainty)
Little information
It will be very cold in Chicago tomorrow (not much uncertainty
since this is winter time)
It is going to rain in Seattle next week (not much uncertainty since
it rains nine months a year in NW)
Large information
An earthquake is going to hit CA in July 2006 (are you sure? an
unlikely event)
Someone has shown P=NP (Wow! Really? Who did it?)
EE465: Introduction to Digital Image Processing
5
Shannon‟s Picture on Communication (1948)
source
encoder
channel
source
decoder
source destination
Examples of source:
Human speeches, photos, text messages, computer programs …
Examples of channel:
storage media, telephone lines, wireless transmission …
superchannel
channel
encoder
channel
decoder
The goal of communication is to move information
from here to there and from now to then
EE465: Introduction to Digital Image Processing
6
The role of source coding (data compression):
Facilitate storage and transmission by eliminating source redundancy
Our goal is to maximally remove the source redundancy
by intelligent designing source encoder/decoder
SourceChannel Separation Principle*
The role of channel coding:
Fight against channel errors for reliable transmission of information
(design of channel encoder/decoder is considered in EE461)
We simply assume the superchannel achieves errorfree transmission
EE465: Introduction to Digital Image Processing
7
Discrete Source
A discrete source is characterized by a discrete
random variable X
Examples
Coin flipping: P(X=H)=P(X=T)=1/2
Dice tossing: P(X=k)=1/6, k=16
Playingcard drawing:
P(X=S)=P(X=H)=P(X=D)=P(X=C)=1/4
What is the redundancy with a discrete source?
EE465: Introduction to Digital Image Processing
8
Two Extreme Cases
source
encoder
channel
source
decoder
tossing
a fair coin
Head
or
Tail?
channel
duplication
tossing a coin with
two identical sides
P(X=H)=P(X=T)=1/2: (maximum uncertainty)
Minimum (zero) redundancy, compression impossible
P(X=H)=1,P(X=T)=0: (minimum redundancy)
Maximum redundancy, compression trivial (1bit is enough)
HHHH…
TTTT…
Redundancy is the opposite of uncertainty
EE465: Introduction to Digital Image Processing
9
Quantifying Uncertainty of an Event
p p I
2
log ) ( ÷ =
p
 probability of the event x
(e.g., x can be X=H or X=T)
p
1
0
) ( p I
0
·
notes
must happen
(no uncertainty)
unlikely to happen
(infinite amount of uncertainty)
Selfinformation
Intuitively, I(p) measures the amount of uncertainty with event x
EE465: Introduction to Digital Image Processing
10
Weighted Selfinformation
p
0
1
) ( p I
0
·
1/2 1
0
0
1/2
p p p I
w 2
log ) ( · ÷ =
Question: Which value of p maximizes I
w
(p)?
) ( ) ( p I p p I
w
· =
As p evolves from 0 to 1, weighted selfinformation
first increases and then decreases
EE465: Introduction to Digital Image Processing
11
p=1/e
2 ln
1
) (
e
p I
w
=
Maximum of Weighted Selfinformation*
EE465: Introduction to Digital Image Processing
12
} ,..., 2 , 1 { N xe
N i i x prob p
i
,..., 2 , 1 ), ( = = =
¿
=
=
N
i
i
p
1
1
 To quantify the uncertainty of a discrete source,
we simply take the summation of weighted self
information over the whole set
X is a discrete random variable
Quantification of Uncertainty of a Discrete Source
 A discrete source (random variable) is a collection
(set) of individual events whose probabilities sum to 1
EE465: Introduction to Digital Image Processing
13
Shannon‟s Source Entropy Formula
¿
=
=
N
i
i w
p I X H
1
) ( ) (
¿
=
÷ =
N
i
i i
p p X H
1
2
log ) (
(bits/sample)
or bps
Weighting
coefficients
EE465: Introduction to Digital Image Processing
14
Source Entropy Examples
 Example 1: (binary Bernoulli source)
) 1 ( 1 ), 0 ( = = ÷ = = = x prob p q x prob p
) log log ( ) (
2 2
q q p p X H + ÷ =
Flipping a coin with probability of head being p (0<p<1)
Check the two extreme cases:
As p goes to zero, H(X) goes to 0 bps ÷ compression gains the most
As p goes to a half, H(X) goes to 1 bps ÷ no compression can help
EE465: Introduction to Digital Image Processing
15
Entropy of Binary Bernoulli Source
EE465: Introduction to Digital Image Processing
16
Source Entropy Examples
 Example 2: (4way random walk)
4
1
) ( ,
2
1
) ( = = = = N x prob S x prob
bps X H 75 . 1 )
8
1
log
8
1
8
1
log
8
1
4
1
log
4
1
2
1
log
2
1
( ) (
2 2 2 2
= + + + ÷ =
N
E
S
W
8
1
) ( ) ( = = = = W x prob E x prob
EE465: Introduction to Digital Image Processing
17
Source Entropy Examples (Con‟t)
 Example 3:
2
1
) ( 1 ,
2
1
) ( = = = ÷ = = = blue x prob p red x prob p
A jar contains the same number of balls with two different colors: blue and red.
Each time a ball is randomly picked out from the jar and then put back. Consider
the event that at the kth picking, it is the first time to see a red ball – what is the
probability of such event?
Prob(event)=Prob(blue in the first k1 picks)Prob(red in the kth pick )
=(1/2)
k1
(1/2)=(1/2)
k
(source with geometric distribution)
EE465: Introduction to Digital Image Processing
18
Source Entropy Calculation
If we consider all possible events, the sum of their probabilities will be one.
Then we can define a discrete random variable X with
1
2
1
1
=

.

\

¿
·
=
k
k
Check:
k
k x P

.

\

= =
2
1
) (
Entropy:
bps k p p X H
k
k
k
k k
2
2
1
log ) (
1 1
2
=

.

\

= ÷ =
¿ ¿
·
=
·
=
Problem 1 in HW3 is slightly more complex than this example
EE465: Introduction to Digital Image Processing
19
Properties of Source Entropy
Nonnegative and concave
Achieves the maximum when the source
observes uniform distribution (i.e., P(x=k)=1/N,
k=1N)
Goes to zero (minimum) as the source becomes
more and more skewed (i.e., P(x=k)÷1, P(x=k)
÷0)
History of Entropy
Origin: Greek root for “transformation content”
First created by Rudolf Clausius to study
thermodynamical systems in 1862
Developed by Ludwig Eduard Boltzmann in
1870s1880s (the first serious attempt to
understand nature in a statistical language)
Borrowed by Shannon in his landmark work “A
Mathematical Theory of Communication” in
1948
EE465: Introduction to Digital Image Processing
20
A Little Bit of Mathematics*
Entropy S is proportional to log P (P is the
relative probability of a state)
Consider an ideal gas of N identical particles,
of which N
i
are in the ith microscopic
condition (range) of position and momentum.
Use Stirling‟s formula: log N! ~ NlogNN and
note that p
i
= N
i
/N, you will get S ~ ∑ p
i
log p
i
EE465: Introduction to Digital Image Processing
21
Entropyrelated Quotes
“My greatest concern was what to call it. I thought of calling it
„information‟, but the word was overly used, so I decided to call it
„uncertainty‟. When I discussed it with John von Neumann, he had a
better idea. Von Neumann told me, „You should call it entropy, for two
reasons. In the first place your uncertainty function has been used in
statistical mechanics under that name, so it already has a name. In the
second place, and more important, nobody knows what entropy really
is, so in a debate you will always have the advantage. ”
Conversation between Claude Shannon and John von Neumann regarding
what name to give to the “measure of uncertainty” or attenuation in
phoneline signals (1949)
EE465: Introduction to Digital Image Processing
22
Other Use of Entropy
In biology
“the order produced within cells as they grow and
divide is more than compensated for by the
disorder they create in their surroundings in the
course of growth and division.” – A. Lehninger
Ecological entropy is a measure of biodiversity in
the study of biological ecology.
In cosmology
“black holes have the maximum possible entropy of
any object of equal size” – Stephen Hawking
EE465: Introduction to Digital Image Processing
23
EE465: Introduction to Digital Image Processing
24
What is the use of H(X)?
Shannon‟s first theorem (noiseless coding theorem)
For a memoryless discrete source X, its entropy H(X)
defines the minimum average code length required to
noiselessly code the source.
Notes:
1. Memoryless means that the events are independently
generated (e.g., the outcomes of flipping a coin N times
are independent events)
2. Source redundancy can be then understood as the
difference between raw data rate and source entropy
EE465: Introduction to Digital Image Processing
25
Code Redundancy*
0 ) ( > ÷ = X H l r
Average code length:
Theoretical bound
Practical performance
¿
=
=
N
i
i
i
p
p X H
1
2
1
log ) (
¿
=
=
N
i
i i
l p l
1
l
i
: the length of
codeword assigned
to the ith symbol
Note: if we represent each symbol by q bits (fixed length codes),
Then redundancy is simply qH(X) bps
EE465: Introduction to Digital Image Processing
26
How to achieve source entropy?
Note: The above entropy coding problem is based on simplified
assumptions are that discrete source X is memoryless and P(X)
is completely known. Those assumptions often do not hold for
realworld data such as images and we will recheck them later.
entropy
coding
discrete
source X
P(X)
binary
bit stream
EE465: Introduction to Digital Image Processing
27
Data Compression Basics
Discrete source
Information=uncertainty
Quantification of uncertainty
Source entropy
Variable length codes
Motivation
Prefix condition
Huffman coding algorithm
EE465: Introduction to Digital Image Processing
28
Recall:
Variable Length Codes (VLC)
Assign a long codeword to an event with small probability
Assign a short codeword to an event with large probability
p p I
2
log ) ( ÷ =
Selfinformation
It follows from the above formula that a smallprobability event contains
much information and therefore worth many bits to represent it. Conversely,
if some event frequently occurs, it is probably a good idea to use as few bits
as possible to represent it. Such observation leads to the idea of varying the
code lengths based on the events‟ probabilities.
(
) ( log ) (
2
x p x l ÷ =
EE465: Introduction to Digital Image Processing
29
symbol k
p
k
S
W
N
E
0.5
0.25
0.125
fixedlength
codeword
0.125
00
01
10
11
variablelength
codeword
0
10
110
111
4way Random Walk Example
symbol stream :
S S N W S E N N N W S S S N E S S
fixed length:
variable length:
00 00 01 11 00 10 01 01 11 00 00 00 01 10 00 00
0 0 10 111 0 110 10 10 111 0 0 0 10 110 0 0
32bits
28bits
4 bits savings achieved by VLC (redundancy eliminated)
EE465: Introduction to Digital Image Processing
30
=0.5×1+0.25×2+0.125×3+0.125×3
=1.75 bits/symbol
• average code length:
Toy Example (Con‟t)
• source entropy:
¿
=
÷ =
4
1
2
log ) (
k
k k
p p X H
s
b
N
N
l =
Total number of bits
Total number of symbols
(bps)
) ( 2 X H bps l > =
fixedlength
variablelength
) ( 75 . 1 X H bps l = =
EE465: Introduction to Digital Image Processing
31
Problems with VLC
When codewords have fixed lengths, the
boundary of codewords is always identifiable.
For codewords with variable lengths, their
boundary could become ambiguous
symbol
S
W
N
E
VLC
0
1
10
11
S S N W S E …
0 0 1 11 0 10…
0 0 11 1 0 10… 0 0 1 11 0 1 0…
S S W N S E … S S N W S E …
e
d d
EE465: Introduction to Digital Image Processing
32
Uniquely Decodable Codes
To avoid the ambiguity in decoding, we need to
enforce certain conditions with VLC to make
them uniquely decodable
Since ambiguity arises when some codeword
becomes the prefix of the other, it is natural to
consider prefix condition
Example:
p · pr · pre · pref · prefi · prefix
a·b: a is the prefix of b
EE465: Introduction to Digital Image Processing
33
Prefix condition
No codeword is allowed to
be the prefix of any other
codeword.
We will graphically illustrate this condition
with the aid of binary codeword tree
EE465: Introduction to Digital Image Processing
34
Binary Codeword Tree
1 0
… …
10 11 01 00
root
Level 1
Level 2
# of codewords
2
2
2
2
k
Level k
EE465: Introduction to Digital Image Processing
35
Prefix Condition Examples
symbol x
W
E
S
N
0
1
10
11
codeword 1 codeword 2
0
10
110
111
1 0
… …
10 11 01 00
1 0
… …
10 11
codeword 1
codeword 2
111 110
EE465: Introduction to Digital Image Processing
36
How to satisfy prefix condition?
Basic rule: If a node is used as a codeword,
then all its descendants cannot be used as
codeword.
1 0
10 11
111 110
Example
…
EE465: Introduction to Digital Image Processing
37
Kraft‟s inequality
1 2
1
s
¿
=
÷
N
i
l
i
l
i
: length of the ith codeword
Property of Prefix Codes
W
E
S
N
0
1
10
11
0
10
110
111
symbol x VLC 1 VLC2
Example
1 2
4
1
>
¿
=
÷
i
l
i
1 2
4
1
=
¿
=
÷
i
l
i
(proof skipped)
EE465: Introduction to Digital Image Processing
38
Two Goals of VLC design
–log
2
p(x) (
For an event x with probability of p(x), the optimal
codelength is , where x( denotes the
smallest integer larger than x (e.g., 3.4(=4 )
• achieve optimal code length (i.e., minimal redundancy)
• satisfy prefix condition
code redundancy: 0 ) ( > ÷ = X H l r
Unless probabilities of events are all power of 2,
we often have r>0
EE465: Introduction to Digital Image Processing
39
Solution:
Huffman Coding (Huffman‟1952) –
we will cover it later while studying JPEG
Arithmetic Coding (1980s) –
not covered by EE465 but EE565 (F2008)
EE465: Introduction to Digital Image Processing
40
Golomb Codes for Geometric Distribution
k
1
2
3
4
5
6
7
8
…
codeword
0
10
110
1110
11110
111110
1111110
11111110
… …
Optimal VLC for geometric source: P(X=k)=(1/2)
k
, k=1,2,…
0
1
1 0
1 0
1 0
…
EE465: Introduction to Digital Image Processing
41
Summary of Data Compression Basics
Shannon‟s Source entropy formula (theory)
Entropy (uncertainty) is quantified by weighted
selfinformation
VLC thumb rule (practice)
Long codeword ÷ smallprobability event
Short codeword ÷ largeprobability event
¿
=
÷ =
N
i
i i
p p X H
1
2
log ) (
bps
(
) ( log ) (
2
x p x l ÷ =