Entropy and Source Coding Theorem-Cse802

Introduction to Information
Theory
In this Lecture
We will go through:
– Entropy and some related properties
– Source Coding
• Shannon’s Source Coding Theorem
Information Theory
“The fundamental problem of communication is that of

reproducing at one point, either exactly or
approximately, a message selected at another point."
Claude Shannon, 1948
Information Theory
Information theory deals with the problem of

efficient and reliable transmission of information
It specifically encompasses theoretical and applied aspects of
- coding, communications and communications networks

- complexity and cryptography
- detection and estimation
- learning Shannon theory, and stochastic processes
Entropy of an information source
A typical Communication System
Information Information sink

source
Transmitter Channel Receiver

Information Source
Output x  { finite set of messages}

Source
Example:
binary source: x  { 0, 1 } with P( x = 0 ) = p; P( x = 1 ) = 1 - p
M-ary source: x  {1,2, , M} with Pi =1.
Information Source
Discrete finite ensemble:
a,b,c,d  00, 01, 10, 11
in general: k binary digits specify 2k messages
M messages need log2M bits.
Analogue signal: (problem is sampling speed)

1) sample and 2) represent sample value in
binary.
What is an information source?
• An information source produces a message or a sequence of
messages to be communicated to a destination or receiver
• On a finer granularity, an information source produces

symbols to be communicated to the destination
• In this lecture, we will mainly focus on discrete sources

– i.e., sources that produce discrete symbols from a
predefined alphabet
• However, most of these concepts can be extended to

continuous sources as will be shown in the lecture
9
What is an information source?
• Intuitively, an information source having more symbols should
have more information
• For instance, consider a source, say S1, that wants to

communicate its direction to a destination using the following
symbols:
– North (N), South (S), East (E), West (W)
• Another source, say S2, can communicate its direction using:

– North (N), South (S), East (E), West (W), Northwest (NW),
Northeast (NE), Southwest (SW), Southeast (SE)
• Intuitively, all other things being equally likely, S2 has more

information that S1 10
Minimum number of bits for a source
• Before we formally define information, let us try to answer the
following question:
What is the minimum number of bits/symbol required to

communicate an information source having n symbols?
11
communicate an information source having n symbols?
A simple answer is that log2(n) bits are required to represent n

symbols
– 2 symbols: 0, 1
– 4 symbols: 00, 01, 10, 11
– 8 symbols: 000, 001, 010, 011, 100, 101, 110, 111
–…
12
• Let there be a source X that wants to communicate information
of its direction to a destination
– i.e., n=4 symbols: North (N), South (S), East (E), West (W)
• According to our previous definition, log2(4)=2 bits are required

to represent each symbol
– N: 00, S: 01, E: 10, W: 11
If 1000 symbols are generated by X, how many bits are required to

transmit these 1000 symbols?
13
• Let there be a source X which wants to communicate
information of its direction to a destination
– i.e., n=4 symbols: North (N), South (S), East (E), West (W)
• According to our previous definition, log2(4)=2 bits are

required to represent each symbol
– N: 00, S: 01, E: 10, W: 11
If 1000 symbols are generated by X, how many bits are

required to transmit these 1000 symbols?
2000 bits are required to communicate 1000 symbols
2 bits/symbol
14
• Thus we need two bits/symbol to communicate the information
of a source X with 4 symbols
• Let’s reiterate our original question:

communicate an information source having n symbols? (n=4 in
present example)
• In fact, let’s rephrase the question as:
Are 2 bits/symbol the minimum number of bits/symbol required to

communicate an information source having n=4 symbols?
15
Are 2 bits/symbol the minimum number of bits/symbol required to
communicate an information source having n=4 symbols?
The correct answer is NO!
Let’s see an example to emphasize this point
16
• So far in this example, we implicitly assumed that all symbols are
equally likely
• Let’s now assume that symbols are generated according to a
probability mass function pX
pX
0.6
0.3
0.05
N S E W X
17
pX
0.6
0.3
0.05
N S E W X
• Let us map the symbols to the following bit sequences:

– N: 0
– S: 01
– E: 011
– W: 0111
18
pX
– N: 0 0.6
– S: 01
0.3
– E: 011
– W: 0.05
0111 X
N S E W
Now if 1000 symbols are generated by X, how many bits are required to transmit
these 1000 symbols?
19
pX
– N: 0 0.6
– S: 01
0.3
– E: 011
– W: 0111 0.05
N S E W X
these 1000 symbols?
On average, the 1000 symbols will have:
600 N’s, 300 S’s, 50 E’s and 50 W’s
20
pX
 N: 0
0.6
 S: 01
 E: 011
0.3
 W: 0111
0.05
N S E W X
these 1000 symbols?
600 N’s, 300 S’s, 50 E’s and 50 W’s
Total bits=600×1+300×2+50×3+50×4=1550
1.55 bits/symbol
21
pX
 N: 0
0.6
 S: 01
 E: 011
0.3
 W: 0111
0.05
N S E W X

1.55 bits/symbol
 The bit mapping defined in this example is generally called a code
 And the process of defining this code is called source coding or source
compression
 The mapped symbols (0, 01, 011 and 0111) are called codewords
22
• Coming back to our original question:
Are 1.55 bits/symbol the minimum number of bits/symbol required
to communicate an information source having n=4 symbols?
23
Are 1.55 bits/symbol the minimum number of bits/symbol required
to communicate an information source having n=4 symbols?
The correct answer is I don’t know!
To answer this question, we first need to know the minimum

number of bits/symbol for a source with 4 symbols
24
Information content of a source
• The minimum number of bits/symbol required to communicate
the symbols of a source is the information content of the source
• How to find a code that can compress the source to an extent

that it is using the minimum number of bits is a different
question
• However, we can quantify the information of a source without

knowing the code(s) that can achieve this minimum
• In this lecture, we will refer to the minimum number of

bits/symbol of a source as the information content of the source
25
• We start quantification of a source’s information content using a
simple question
• Recall from our earlier example that we assigned less number of

bits to symbols with high probabilities
• Will the number of bits required to represent the source increase

or decrease if we assign longer code words to more probable
symbols?
26
pX
– N: 0111
0.6
– S: 011
– E: 01 0.3
– W: 0
0.05
N S E W X
these 1000 symbols?
Total bits=600×4+300×3+50×2+50×1=3450
3.45 bits/symbol
These are more bits than we would need if we assumed all symbols to be equally
likely
27
• So in the worst-case scenario, we can simply ignore the
probability of each symbol and assign an equal-length codeword
to each symbol
– i.e., we are inherently assuming that all symbols are equally
likely
• The total number of bits required in this case will be log2(n),

where n is the total number of symbols
• Using this coding, we will always be able to communicate all the

symbols of the source
28
• If we assume equally-likely symbols, we will always be able to
communicate all the symbols of the source using log2(n)
bits/symbol
• In other words, this is the maximum number of bits required to

communicate any discrete source
• But if a source’s symbols are in fact equally likely, what is the

minimum number of bits required to communicate this source?
29
Information content of “uniform” sources
If the source’s symbols are in fact equally likely, what is the
minimum number of bits required to communicate this source?
The minimum number of bits required to represent a source with
equally-likely symbols is log2(n) bits/symbol
Such sources are called uniform sources
pX
1/n
1 2 n X
30
Information content of “uniform”
sources
• The minimum number of bits required to represent a discrete
uniform source is log2(n) bits/symbol
• For any discrete source where all symbols are not equally-likely
(i.e., non-uniform source), log2(n) represents the maximum
number of bits/symbol
Among all discrete sources producing a given number of n

symbols, a uniform source has the highest information content
31
sources
• The minimum number of bits required to represent a discrete
source with equally-likely symbols is log2(n) bits/symbol
• Now consider two uniform sources S1 and S2
• Let n1 and n2 respectively represent the total number of symbols

for the two sources with n1>n2
• Which uniform source has higher information content?
32
sources
• Two uniform sources S1 and S2
• n1 and n2 respectively represent the total number of symbols for

the two sources with n1 > n2
• Which source has higher information content?
33
sources
• Two uniform sources S1 and S2
• n1 and n2 respectively represent the total number of symbols for the two
sources with n1 > n2
• Which source has higher information content?

S1 has more information than S2
• For example, compare the (North, South, East, West) source with a source
having the symbols (North, South, East, West, Northwest, Northeast,
Southeast, Southwest)
34
sources
• Thus if there are multiple sources with equally-likely symbols, the source with
the maximum number of symbols has the maximum information content
• In other words, for equally likely sources, a function H(.) that quantifies
information content of a source should be an increasing function of the
number of symbols
– Let’s call this function H(n)
• Any ideas what H(n) should be?
35
sources
• For equally likely source, a function that quantifies information
content should be an increasing function of the number of
symbols, H(n)
• How about the following functions:

– H(n) = an+k? a, k>1, or perhaps
– H(n) = nk? k>1
–…
36
sources
• You should convince yourself that for a uniform source:
H(n) = log(n)
• This logarithm can have any base, but if we are measuring

information in terms of bits then we will use base-2 logarithm:
H(n) = log2(n)
37
Information content of non-uniform
sources
• Generally, information sources do not have equally-likely
symbols; i.e., they are non-uniform
• An example is the frequency distribution of letters in the English

language
38
Information content of a non-uniform
source
Normalized frequencies of words in the English language

Image courtesy of Wikipedia: http://en.wikipedia.org/wiki/Letter_frequencies
39
source
• Since, in general, symbols are not equally-likely, source
compression can be achieved by designing a code that:
– assigns more number of bits to less likely symbols, and
– assigns less number of bits to more likely symbols
• As more likely symbols will occur more often than less likely
ones, the total number of bits required by a code having the
above properties will be less than log2(n)
40
source
• A function to quantify the information content of a non-uniform
source X should be a function of the probability distribution pX of
X, say H(pX)
• Since more bits are assigned to less likely symbols, H(pX) should
increase as pX decreases
41
source
• Since more bits are assigned to less likely symbols, H(pX) should
increase as pX decreases
• The following function has been proven to provide the right

quantification for a given symbol i:
H(pX=i) = log2(1/pX=i)
Since pX=i≤1, H(1/pX=i) is always non-negative
42
source
• For a given symbol i, the information content of that symbol is
given by:
H(pX=i)=log2(1/pX=i)
So what is the expected or average value of the information content

of all the symbols of pX?
43
source
What is the expected or average value of the information content of
all the symbols of pX?
This expected value should be the weighted average of the

information content of all the symbols in pX :
N
H (pX )   pX i H (pX i )
i 1
N  1  N
  pX i log2     pX i log2 pX i 
i 1 pX i 
 i 1
44
Entropy of a Discrete Information Source
The information content of a discrete source with symbol
distribution pX is:
N
H (pX )   pX i log2 pX i 
i 1
This is called the entropy of the source

and represents the minimum expected number of bits/symbol
required to communicate this source
45
Before finishing our discussion on information sources, apply the
formula for entropy on a uniform source:
N
H (pX )   pX i log2 pX i 
i 1
pX
1/n
1 2 n X
46
If we apply the formula for entropy on a discrete uniform source,
we get:
n
1  1 
H (pX )   log2  
i 1 n
n 
1  1 
  n log2  
n n 
 1 
  log2    log2 n 
n 
Note that this is the same function that we had deduced earlier 47
Entropy of a Continuous Information
Source
• We can extend the definition of entropy for continuous sources
(e.g., Gaussian, exponential, etc.) as follows:

H ( fX )    fX x  log2  fX x dx

48
Entropy of a Gaussian Source
• A particularly important source is a source with Gaussian source
with zero mean and variance σ2
• What is the entropy of this Gaussian source?
49
What is the entropy of a Gaussian source with zero mean and
variance σ2?

H ( fX )    fX x  log  fX x dx

1 x
2
fX x   e 22
2
50
variance σ2?
 
 1 x 2 22 
H ( fX )    fX x  ln  fX x dx    fX x  ln  e dx
 
 2 
 
 1  x 2
   fX x  ln  dx   fX x  2 dx

 2  
2
 
 1  1
   fX x dx  2
  ln 
 2  
 
2  x 2 fX x dx

1
 ln  2    ln  2   ln  e 
2
 ln  2e  
51
variance σ2?
 
 1 x 2 22 
H ( fX )    fX x  ln  fX x dx    fX x  ln  e dx
 
 2 
 
 1  x 2
   fX x  ln  dx   fX x  2 dx

 2  
2
 
 1  1
   fX x dx  2
  ln 
 2  
 
2  x 2 fX x dx

1
 ln  2    ln  2   ln  e 
2
 ln  2e  
Since we have used the ln(.) function, the
52
units of this entropy are nats/symbol
H ( fX )  ln  2e  
• Recall that the discrete uniform distribution has the highest
entropy among all the distributions defined on a given number of
symbols
• Similarly, it can be proven that among all continuous distributions

with variance σ, the Gaussian distribution has the highest
entropy
53
DETOUR
Communication System Fundamentals
54
Radio Waves
• Radio waves are characterized by:
– Oscillating in time at a frequency, f
– Travelling through the air at speed of light, c
– Distance covered by one cycle of the wave or wavelength,
λ
c

f
• Wavelength and Frequency are inversely

proportional
55
Signal Bandwidth versus Channel
Bandwidth
• A signal is usually made up of sinusoidal signals
of varying frequencies
• Fourier proved that all signals are made up of

(component) sine waves
• Hence, we can have an equivalent representation

of any signal in the frequency domain
56
Bandwidth
• We can have an equivalent representation of any
signal in the frequency domain
• Can you see how we can represent the following

signal in terms of its sinusoidal components?
57
Image courtesy of William Stallings’s lecture notes
Bandwidth
• Can you see how
we can represent
the following signal
in terms of its
sinusoidal
components?
58
Bandwidth
• Difference between maximum and minimum
frequency components of a signal is generally
referred to as:
– Signal bandwidth, information bandwidth, source
bandwidth, absolute bandwidth …
• The bandwidth available on the channel is

referred to as:
– Channel bandwidth, medium bandwidth,
transmission bandwidth, available bandwidth…
59
Fourier Series of Periodic Signals
• Fourier also showed that if a signal is periodic
(i.e., repeats over time), then it can be
represented as a sum of discrete, attenuated sine
waves
• Specifically, a periodic signal s(t) with frequency f

Hz can be expressed as:

s(t )  a 0   an cos 2nft
n 1
n  1,2, 
60
Bandwidth
• Signal bandwidth should be lesser than the
medium bandwidth for the signal to be
transmitted accurately
• If the signal bandwidth is greater than the

medium bandwidth, then some of the signal’s
frequency components are truncated during
transmission
61
Bandwidth
• Can you see how
we can represent a
square wave in
terms of its
sinusoidal
components?
62
Bandwidth
• Can you see how
we can represent a
square wave in
terms of its
sinusoidal
components?
Image courtesy of William Stallings’s lecture

notes 63
Bandwidth
• As we add more
frequency components,
the signal becomes
more and more exact
• But the signal
frequency increases
and it might not fit in
the channel bandwidth
• Thus an inherent
tradeoff between signal
and channel
bandwidths exists
Image courtesy of William Stallings’s lecture
notes 64
Bandwidth and Data Rate
• Consider a transmitter uses the following
approximate sine wave to send binary (0,1) symbols
• What is the bandwidth of one cycle of this signal?
65
• Consider a transmitter uses the following
approximate sine wave to send binary (0,1) symbols
• What is the bandwidth of one cycle of this signal?

5-1 = 4 Hz
66
• Signal Bandwidth of one cycle: 5-1 = 4 Hz
• If the channel bandwidth is 4MHz, how many cycles

of this signal can we transmit in the available
bandwidth?
67
• Signal Bandwidth of one cycle: 5-1 = 4 Hz
• If the channel bandwidth is 4MHz, how many cycles of this

signal can we transmit?
4 Hz for one cycle
1 million cycles in 4 MHz
=> Signal bandwidth = 1 MHz
68
• Signal Bandwidth = 1 MHz

• What is the data rate (bits/sec) that we can achieve?
106 cycles ----------- 1 sec
1 cycle ------------- 1µ sec
2 bits ------------- 1µ sec
1 bit -------------- 0.5µ sec
=> Data rate = 2 Mbps
69
Spectral Efficiency
• It can be easily observed that the data rate
increases linearly with the channel bandwidth
• Therefore, efficiency of a wireless

communication system is measured in terms of
data rate per Hertz of frequency
• This measure is called spectral efficiency (units:

bits/sec/Hz) of a wireless system
70
END OF DETOUR
71
Nyquist-Shannon Sampling Theorem
72
A Typical Communication System
Information Information sink

source
Transmitter Channel Receiver
Let us expand this part
73
A Typical Communication System:
Transmitter
Adds redundancy to
Pseudo-randomizes
allow error detection
data transmission to
and correction at the
receiver
cater for burst errors Channel
Information
source
Source Channel Interleaver Modulator
Encoder Encoder
Transmitter
Compresses the Modulates the

symbols to remove encoded data for
redundancy transmission on
the channel
74
A Typical Communication System :
Transmitter
Adds redundancy to
Let us expand this Pseudo-randomizes
allow error detection
part data transmission to
and correction at the
receiver
cater for burst errors Channel
Information
source
Source Channel Interleaver Modulator
Encoder Encoder
Transmitter
Compresses the Modulates the

symbols to remove encoded data for
redundancy transmission on
the channel
75
A Typical Communication System: Source Encoder
Maps continuous time values to
discrete values; irreversible
removal of information
Information
source
Sampler Quantizer Code Mapper
Source Encoder
Samples a signal to remove

Removes redundancy by
redundancy; if sampled properly,
assigning longer codewords to
the signal can be perfectly
less likely quantized symbols,
reconstructed
and vice versa; mapping is
perfectly invertible 76
Sampling
• Before an analog signal is transmitted digitally, it
must be sampled at regular intervals
• Sampling implies taking samples of the analog

signal at different intervals
• This is achieved by multiplying the signal with

another signal containing a train of pulses
77
Sampling
m(t) p(t)
t t
ms(t)=m(t)p(t)
t
78
Sampling
• The important question then is how frequently
should we sample a given analog signal?
– i.e., what Ts should be used so that the signal can be
reconstructed at the receiver?
• Or what sampling rate will allow us to reconstruct

the analog signal at the receiver?
– i.e., what sampling frequency fs = 1/Ts should be used so
that the signal can be reconstructed at the receiver?
p(t)
Ts
t
79
Sampling
• Let W represent the highest frequency in the
analog signal
M(f)
-W W f
2W
80
Sampling
• W: highest frequency in the analog signal
• Since the multiplying pulse is periodic, it can be

expanded in terms of a Fourier Series
• Then the sampled signal can be written as:

ms (t )  m(t )p(t )
  
 m(t ) a 0   an cos 2nfst 
 n 1


 a 0m(t )   an m(t )cos 2nfst
n 1
81
Sampling
• The sampled signal can be written as:

ms (t )  a 0m(t )   an m(t )cos 2nfst
n 1
• We note that the second term is an infinite point

cosine transform of the input signal
• Each cosine basis functions is separated from its

lower and higher frequency bases by a
frequency of fs Hz
0×fs, 1×fs, 2×fs, 3×fs, 4×fs, …
82
Sampling
Each cosine basis functions is separated from its
lower and higher frequency bases by a
frequency of fs Hz
0×fs, 1×fs, 2×fs, 3×fs, 4×fs, …
M(f)
-2×fs -1×fs -W 0 W 1×fs 2×fs f
Figure redrawn based on Martin S. Roden, Analog and Digital Communication

Systems
83
Sampling Theorem
• Each cosine basis functions is separated from its lower and
higher frequency bases by a frequency of fs Hz
• Hence, these shifted signal variants do not overlap if
fs  2W
This is called the sampling theorem
and 2W is called the Nyquist rate
M(f)
> 2fs
-2×fs -1×fs -W 0 W 1×fs 2×fs f
84
Sampling Theorem
fs  2W
Sampling Theorem: If a signal is sampled at regular
intervals at a rate higher than twice the highest signal
frequency, the samples contain all the information of the
original signal
M(f)
> 2W
-2×fs -1×fs -W 0 W 1×fs 2×fs f
85
Aliasing
M(f)
fs=2W
-2×fs -1×fs -W 0 W 1×fs 2×fs f
M(f)
fs>2W
-2×fs -1×fs -W 0 W 1×fs 2×fs f
M(f)
fs<2W
-2×fs -1×fs -W 0 W 1×fs 2×fs 86 f

Aliasing
M(f)
fs=2W
-2×fs -1×fs -W 0 W 1×fs 2×fs f
M(f)
fs>2W
-2×fs -1×fs -W 0 W 1×fs 2×fs f

This is called aliasing M(f)
fs<2W
-2×fs -1×fs -W 0 W 1×fs 2×fs 87 f

Sampled Signal Recovery at the Receiver
The sampled signal can be written as:

ms (t )  a 0m(t )   an m(t )cos 2nfst
n 1
At the receiver, if we can get rid of this term, we can

recover the original signal by simple attenuation or
amplification
88
Sampled Signal Recovery at the Receiver
To recover the signal, the receiver filters the
signal to remove the additional frequency
components
M(f)
-2×fs -1×fs -W 0 W 1×fs 2×fs f
M(f)
-W 0 W f
89
Entropy of a Sampled Signal
• A sampled source, Xs, produces information at a rate
of 2W symbols per second
• Since we require H(Xs) bits per symbol, we will need

a channel bandwidth or capacity of 2WH(Xs) bits per
second to transmit the symbols of this source
90
Shannon’s Source Coding Theorem
91
A Typical Communication System: Source Encoder
Maps continuous time values to
discrete values; irreversible
removal of information
Information
source
Source Encoder
Samples a signal to remove

Removes redundancy by
redundancy; if sampled properly
assigning longer codewords to
the signal can be perfectly
less likely quantized symbols,
reconstructed
and vice versa; mapping is
perfectly invertible 92
Shannon’s Source Coding Theorem: Given a discrete memoryless
information source characterized by a certain amount of entropy,
the average codeword length (bits/symbol) for a lossless source
encoding scheme is upper bounded by the entropy.
Information
source
Source Encoder 93
Shannon’s Source Coding Theorem: Given a discrete memoryless
information source characterized by a certain amount of entropy, the
average codeword length (bits/symbol) for a lossless source encoding
scheme is upper bounded by the entropy.
In other words, it is impossible to devise a code mapper that can transmit

information with less number of bits/symbol than the entropy of the
source
Information
source
Source Encoder 94
• Thus the efficiency of a source code that uses L bits/symbol
is:
H X 

L
– i.e., a code becomes more and more efficient as η approaches

1
Information
source
Source Encoder 95

Entropy and Source Coding Theorem-Cse802

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Entropy and Source Coding Theorem-Cse802

Uploaded by

Copyright:

Available Formats

Introduction to Information

“The fundamental problem of communication is that of

Information theory deals with the problem of

It specifically encompasses theoretical and applied aspects of

- coding, communications and communications networks

Information Information sink

Transmitter Channel Receiver

Output x  { finite set of messages}

Analogue signal: (problem is sampling speed)

• On a finer granularity, an information source produces

• In this lecture, we will mainly focus on discrete sources

• However, most of these concepts can be extended to

• For instance, consider a source, say S1, that wants to

• Another source, say S2, can communicate its direction using:

• Intuitively, all other things being equally likely, S2 has more

What is the minimum number of bits/symbol required to

A simple answer is that log2(n) bits are required to represent n

• According to our previous definition, log2(4)=2 bits are required

If 1000 symbols are generated by X, how many bits are required to

• According to our previous definition, log2(4)=2 bits are

If 1000 symbols are generated by X, how many bits are

What is the minimum number of bits/symbol required to

• In fact, let’s rephrase the question as:

Are 2 bits/symbol the minimum number of bits/symbol required to

The correct answer is NO!

Let’s see an example to emphasize this point

• Let us map the symbols to the following bit sequences:

1550 bits are required to communicate 1000 symbols

The correct answer is I don’t know!

To answer this question, we first need to know the minimum

• How to find a code that can compress the source to an extent

• However, we can quantify the information of a source without

• In this lecture, we will refer to the minimum number of

• Recall from our earlier example that we assigned less number of

• Will the number of bits required to represent the source increase

• The total number of bits required in this case will be log2(n),

• Using this coding, we will always be able to communicate all the

• In other words, this is the maximum number of bits required to

• But if a source’s symbols are in fact equally likely, what is the

Such sources are called uniform sources

Among all discrete sources producing a given number of n

• Now consider two uniform sources S1 and S2

• Let n1 and n2 respectively represent the total number of symbols

• Which uniform source has higher information content?

• n1 and n2 respectively represent the total number of symbols for

• Which source has higher information content?

• Which source has higher information content?

• Any ideas what H(n) should be?

• How about the following functions:

• This logarithm can have any base, but if we are measuring

• An example is the frequency distribution of letters in the English

Normalized frequencies of words in the English language

• The following function has been proven to provide the right

Since pX=i≤1, H(1/pX=i) is always non-negative

So what is the expected or average value of the information content

This expected value should be the weighted average of the

This is called the entropy of the source

• What is the entropy of this Gaussian source?

• Similarly, it can be proven that among all continuous distributions

• Wavelength and Frequency are inversely

• Fourier proved that all signals are made up of

• Hence, we can have an equivalent representation

• Can you see how we can represent the following