Professional Documents
Culture Documents
Theory
In this Lecture
We will go through:
– Entropy and some related properties
– Source Coding
• Shannon’s Source Coding Theorem
Information Theory
Example:
binary source: x { 0, 1 } with P( x = 0 ) = p; P( x = 1 ) = 1 - p
M-ary source: x {1,2, , M} with Pi =1.
Information Source
Discrete finite ensemble:
a,b,c,d 00, 01, 10, 11
in general: k binary digits specify 2k messages
M messages need log2M bits.
11
Minimum number of bits for a source
What is the minimum number of bits/symbol required to
communicate an information source having n symbols?
12
Minimum number of bits for a source
• Let there be a source X that wants to communicate information
of its direction to a destination
– i.e., n=4 symbols: North (N), South (S), East (E), West (W)
13
Minimum number of bits for a source
• Let there be a source X which wants to communicate
information of its direction to a destination
– i.e., n=4 symbols: North (N), South (S), East (E), West (W)
15
Minimum number of bits for a source
Are 2 bits/symbol the minimum number of bits/symbol required to
communicate an information source having n=4 symbols?
16
Minimum number of bits for a source
• So far in this example, we implicitly assumed that all symbols are
equally likely
• Let’s now assume that symbols are generated according to a
probability mass function pX
pX
0.6
0.3
0.05
N S E W X
17
Minimum number of bits for a source
pX
0.6
0.3
0.05
N S E W X
18
Minimum number of bits for a source
pX
– N: 0 0.6
– S: 01
0.3
– E: 011
– W: 0.05
0111 X
N S E W
Now if 1000 symbols are generated by X, how many bits are required to transmit
these 1000 symbols?
19
Minimum number of bits for a source
pX
– N: 0 0.6
– S: 01
0.3
– E: 011
– W: 0111 0.05
N S E W X
Now if 1000 symbols are generated by X, how many bits are required to transmit
these 1000 symbols?
On average, the 1000 symbols will have:
600 N’s, 300 S’s, 50 E’s and 50 W’s
20
Minimum number of bits for a source
pX
N: 0
0.6
S: 01
E: 011
0.3
W: 0111
0.05
N S E W X
Now if 1000 symbols are generated by X, how many bits are required to transmit
these 1000 symbols?
600 N’s, 300 S’s, 50 E’s and 50 W’s
Total bits=600×1+300×2+50×3+50×4=1550
1550 bits are required to communicate 1000 symbols
1.55 bits/symbol
21
Minimum number of bits for a source
pX
N: 0
0.6
S: 01
E: 011
0.3
W: 0111
0.05
N S E W X
22
Minimum number of bits for a source
• Coming back to our original question:
Are 1.55 bits/symbol the minimum number of bits/symbol required
to communicate an information source having n=4 symbols?
23
Minimum number of bits for a source
Are 1.55 bits/symbol the minimum number of bits/symbol required
to communicate an information source having n=4 symbols?
24
Information content of a source
• The minimum number of bits/symbol required to communicate
the symbols of a source is the information content of the source
25
Information content of a source
• We start quantification of a source’s information content using a
simple question
26
Minimum number of bits for a source
pX
– N: 0111
0.6
– S: 011
– E: 01 0.3
– W: 0
0.05
N S E W X
Now if 1000 symbols are generated by X, how many bits are required to transmit
these 1000 symbols?
Total bits=600×4+300×3+50×2+50×1=3450
3.45 bits/symbol
These are more bits than we would need if we assumed all symbols to be equally
likely
27
Information content of a source
• So in the worst-case scenario, we can simply ignore the
probability of each symbol and assign an equal-length codeword
to each symbol
– i.e., we are inherently assuming that all symbols are equally
likely
28
Information content of a source
• If we assume equally-likely symbols, we will always be able to
communicate all the symbols of the source using log2(n)
bits/symbol
29
Information content of “uniform” sources
If the source’s symbols are in fact equally likely, what is the
minimum number of bits required to communicate this source?
The minimum number of bits required to represent a source with
equally-likely symbols is log2(n) bits/symbol
pX
1/n
1 2 n X
30
Information content of “uniform”
sources
• The minimum number of bits required to represent a discrete
uniform source is log2(n) bits/symbol
• For any discrete source where all symbols are not equally-likely
(i.e., non-uniform source), log2(n) represents the maximum
number of bits/symbol
31
Information content of “uniform”
sources
• The minimum number of bits required to represent a discrete
source with equally-likely symbols is log2(n) bits/symbol
32
Information content of “uniform”
sources
• Two uniform sources S1 and S2
33
Information content of “uniform”
sources
• Two uniform sources S1 and S2
• n1 and n2 respectively represent the total number of symbols for the two
sources with n1 > n2
• For example, compare the (North, South, East, West) source with a source
having the symbols (North, South, East, West, Northwest, Northeast,
Southeast, Southwest)
34
Information content of “uniform”
sources
• Thus if there are multiple sources with equally-likely symbols, the source with
the maximum number of symbols has the maximum information content
• In other words, for equally likely sources, a function H(.) that quantifies
information content of a source should be an increasing function of the
number of symbols
– Let’s call this function H(n)
35
Information content of “uniform”
sources
• For equally likely source, a function that quantifies information
content should be an increasing function of the number of
symbols, H(n)
36
Information content of “uniform”
sources
• You should convince yourself that for a uniform source:
H(n) = log(n)
37
Information content of non-uniform
sources
• Generally, information sources do not have equally-likely
symbols; i.e., they are non-uniform
38
Information content of a non-uniform
source
• As more likely symbols will occur more often than less likely
ones, the total number of bits required by a code having the
above properties will be less than log2(n)
40
Information content of a non-uniform
source
• A function to quantify the information content of a non-uniform
source X should be a function of the probability distribution pX of
X, say H(pX)
• Since more bits are assigned to less likely symbols, H(pX) should
increase as pX decreases
41
Information content of a non-uniform
source
• Since more bits are assigned to less likely symbols, H(pX) should
increase as pX decreases
42
Information content of a non-uniform
source
• For a given symbol i, the information content of that symbol is
given by:
H(pX=i)=log2(1/pX=i)
43
Information content of a non-uniform
source
What is the expected or average value of the information content of
all the symbols of pX?
44
Entropy of a Discrete Information Source
The information content of a discrete source with symbol
distribution pX is:
N
H (pX ) pX i log2 pX i
i 1
45
Entropy of a Discrete Information Source
Before finishing our discussion on information sources, apply the
formula for entropy on a uniform source:
N
H (pX ) pX i log2 pX i
i 1
pX
1/n
1 2 n X
46
Entropy of a Discrete Information Source
If we apply the formula for entropy on a discrete uniform source,
we get:
n
1 1
H (pX ) log2
i 1 n
n
1 1
n log2
n n
1
log2 log2 n
n
Note that this is the same function that we had deduced earlier 47
Entropy of a Continuous Information
Source
• We can extend the definition of entropy for continuous sources
(e.g., Gaussian, exponential, etc.) as follows:
H ( fX ) fX x log2 fX x dx
48
Entropy of a Gaussian Source
• A particularly important source is a source with Gaussian source
with zero mean and variance σ2
49
Entropy of a Gaussian Source
What is the entropy of a Gaussian source with zero mean and
variance σ2?
H ( fX ) fX x log fX x dx
1 x
2
fX x e 22
2
50
Entropy of a Gaussian Source
What is the entropy of a Gaussian source with zero mean and
variance σ2?
1 x 2 22
H ( fX ) fX x ln fX x dx fX x ln e dx
2
1 x 2
fX x ln dx fX x 2 dx
2
2
1 1
fX x dx 2
ln
2
2 x 2 fX x dx
1
ln 2 ln 2 ln e
2
ln 2e
51
Entropy of a Gaussian Source
What is the entropy of a Gaussian source with zero mean and
variance σ2?
1 x 2 22
H ( fX ) fX x ln fX x dx fX x ln e dx
2
1 x 2
fX x ln dx fX x 2 dx
2
2
1 1
fX x dx 2
ln
2
2 x 2 fX x dx
1
ln 2 ln 2 ln e
2
ln 2e
Since we have used the ln(.) function, the
52
units of this entropy are nats/symbol
Entropy of a Gaussian Source
H ( fX ) ln 2e
• Recall that the discrete uniform distribution has the highest
entropy among all the distributions defined on a given number of
symbols
53
DETOUR
Communication System Fundamentals
54
Radio Waves
• Radio waves are characterized by:
– Oscillating in time at a frequency, f
– Travelling through the air at speed of light, c
– Distance covered by one cycle of the wave or wavelength,
λ
c
f
55
Signal Bandwidth versus Channel
Bandwidth
• A signal is usually made up of sinusoidal signals
of varying frequencies
56
Signal Bandwidth versus Channel
Bandwidth
• We can have an equivalent representation of any
signal in the frequency domain
57
Image courtesy of William Stallings’s lecture notes
Signal Bandwidth versus Channel
Bandwidth
• Can you see how
we can represent
the following signal
in terms of its
sinusoidal
components?
58
Image courtesy of William Stallings’s lecture notes
Signal Bandwidth versus Channel
Bandwidth
• Difference between maximum and minimum
frequency components of a signal is generally
referred to as:
– Signal bandwidth, information bandwidth, source
bandwidth, absolute bandwidth …
59
Fourier Series of Periodic Signals
• Fourier also showed that if a signal is periodic
(i.e., repeats over time), then it can be
represented as a sum of discrete, attenuated sine
waves
60
Signal Bandwidth versus Channel
Bandwidth
• Signal bandwidth should be lesser than the
medium bandwidth for the signal to be
transmitted accurately
61
Signal Bandwidth versus Channel
Bandwidth
• Can you see how
we can represent a
square wave in
terms of its
sinusoidal
components?
62
Signal Bandwidth versus Channel
Bandwidth
• Can you see how
we can represent a
square wave in
terms of its
sinusoidal
components?
65
Image courtesy of William Stallings’s lecture notes
Bandwidth and Data Rate
• Consider a transmitter uses the following
approximate sine wave to send binary (0,1) symbols
66
Image courtesy of William Stallings’s lecture notes
Bandwidth and Data Rate
67
Image courtesy of William Stallings’s lecture notes
Bandwidth and Data Rate
68
Image courtesy of William Stallings’s lecture notes
Bandwidth and Data Rate
69
Image courtesy of William Stallings’s lecture notes
Spectral Efficiency
• It can be easily observed that the data rate
increases linearly with the channel bandwidth
70
END OF DETOUR
71
Nyquist-Shannon Sampling Theorem
72
A Typical Communication System
73
A Typical Communication System:
Transmitter
Adds redundancy to
Pseudo-randomizes
allow error detection
data transmission to
and correction at the
receiver
cater for burst errors Channel
Information
source
Source Channel Interleaver Modulator
Encoder Encoder
Transmitter
Transmitter
Information
source
Source Encoder
77
Sampling
m(t) p(t)
t t
ms(t)=m(t)p(t)
t
78
Sampling
• The important question then is how frequently
should we sample a given analog signal?
– i.e., what Ts should be used so that the signal can be
reconstructed at the receiver?
Ts
t
79
Sampling
• Let W represent the highest frequency in the
analog signal
M(f)
-W W f
2W
80
Sampling
• W: highest frequency in the analog signal
81
Sampling
• The sampled signal can be written as:
ms (t ) a 0m(t ) an m(t )cos 2nfst
n 1
82
Sampling
Each cosine basis functions is separated from its
lower and higher frequency bases by a
frequency of fs Hz
0×fs, 1×fs, 2×fs, 3×fs, 4×fs, …
M(f)
fs 2W
This is called the sampling theorem
and 2W is called the Nyquist rate
M(f)
> 2fs
84
Sampling Theorem
fs 2W
Sampling Theorem: If a signal is sampled at regular
intervals at a rate higher than twice the highest signal
frequency, the samples contain all the information of the
original signal
M(f)
> 2W
85
Aliasing
M(f)
fs=2W
M(f)
fs>2W
M(f)
fs<2W
M(f)
fs>2W
88
Sampled Signal Recovery at the Receiver
To recover the signal, the receiver filters the
signal to remove the additional frequency
components
M(f)
M(f)
-W 0 W f
89
Entropy of a Sampled Signal
• A sampled source, Xs, produces information at a rate
of 2W symbols per second
90
Shannon’s Source Coding Theorem
91
A Typical Communication System: Source Encoder
Maps continuous time values to
discrete values; irreversible
removal of information
Information
source
Source Encoder
Information
source
Source Encoder 93
Shannon’s Source Coding Theorem
Shannon’s Source Coding Theorem: Given a discrete memoryless
information source characterized by a certain amount of entropy, the
average codeword length (bits/symbol) for a lossless source encoding
scheme is upper bounded by the entropy.
Information
source
Source Encoder 94
Shannon’s Source Coding Theorem
• Thus the efficiency of a source code that uses L bits/symbol
is:
H X
L
Information
source
Source Encoder 95