Ece141 Lec10 Information Theory

ECE 141 Lecture 10:
Information Theory
Entropy, Joint Entropy, Conditional Entropy, Mutual Information, Source Coding, Channel Coding
ECE 141: DIGITAL COMMUNICATIONS 1

What is Information?
❑ In our daily lives, information is often obtained by learning
something new.
❑ Consider some sentence examples that conveys
“information”:
Which of the statements do
▪ The weather will be good tomorrow you think contains the most
▪ The weather was bad last Sunday information?
▪ I will get a passing grade in ECE 141 this semester

▪ I will be able to pass all my subjects this semester
❑ In other words, one gets some information when learning
about something that he/she is uncertain before
Shannon: “Information is the resolution of uncertainty”

Discrete Sources
Data Source 𝑌
❑ The data output 𝑌 is a discrete random variable coming from

a set with a finite cardinality:
𝑌 ∈ 𝑦1 , 𝑦2 , … , 𝑦𝑁
❑ Each element of the set has a probability of being the output
of the data source: 𝑃 𝑌 . It follows that:
𝑁
෍ 𝑃 𝑦𝑘 = 1
𝑘=1
❑ Examples: English Alphabet, Morse Code, Braille Code
ECE 141: DIGITAL COMMUNICATION I 3

Mutual Information
𝑋
Data Source Channel 𝑌
❑ Consider two data sources with outputs 𝑋 and 𝑌.

❑ Suppose we are sampling 𝑌, we want to know by observing 𝑌,
how much information about 𝑋 we can get.
❑ This is an a posteriori problem:
𝑃 𝑋=𝑥𝑌=𝑦 =𝑃 𝑥𝑦
❑ 𝑥 ∈ 𝑋ത and 𝑦 ∈ 𝑌ത where 𝑋ത and 𝑌ത are arbitrary sets of finite
cardinality

Mutual Information
𝑋
❑ If the value of 𝑌 is dependent on 𝑋, then we must consider

also the probability distribution of 𝑋, 𝑃 𝑋 = 𝑥 .
❑ The mutual information between the output 𝑌 = 𝑦 and
output 𝑋 = 𝑥 is defined as:
𝑃 𝑥𝑦
𝐼 𝑥, 𝑦 = log
𝑃 𝑥
❑ This is the amount of information that 𝑌 tells about 𝑋.

Mutual Information
𝑋
❑ The mutual information between random variables 𝑋 and

𝑌 is the sum of all the mutual information between their
possible values:
𝑃 𝑥𝑦
𝐼 𝑋, 𝑌 = ෍ ෍ 𝑃 𝑋 = 𝑥, 𝑌 = 𝑦 log
𝑃 𝑥
𝑥∈𝑋ത 𝑦∈𝑌ത
❑ The units depend on the logarithm base. Base 𝑒 has a unit of

nats while base 2 has a unit of bits.

Properties of Mutual
Information
❑ Property 1: Mutual Information is symmetric.
𝐼 𝑋, 𝑌 = 𝐼 𝑌, 𝑋
❑ Proof:
𝑃 𝑥𝑦
𝐼 𝑋, 𝑌 = ෍ ෍ 𝑃 𝑋 = 𝑥, 𝑌 = 𝑦 log
𝑃 𝑥
𝑥∈𝐴1 𝑦∈𝐴2
𝑃 𝑥𝑦 𝑃 𝑦𝑥
𝑃 𝑥𝑦 𝑃 𝑦 =𝑃 𝑦𝑥 𝑃 𝑥 → =
𝑃 𝑥 𝑃 𝑦
𝑃 𝑦𝑥
𝐼 𝑋, 𝑌 = ෍ ෍ 𝑃 𝑋 = 𝑥, 𝑌 = 𝑦 log
𝑃 𝑦
𝐼 𝑋, 𝑌 = 𝐼 𝑌, 𝑋

Properties of Mutual
Information
❑ Property 2: Mutual information is non-negative.
𝐼 𝑋, 𝑌 ≥ 0
❑ Proof:
𝑃 𝑥𝑦
𝐼 𝑋, 𝑌 = ෍ ෍ 𝑃 𝑥, 𝑦 log
𝑃 𝑥
𝑃 𝑥, 𝑦
𝐼 𝑋, 𝑌 = ෍ ෍ 𝑃 𝑥, 𝑦 log
𝑃 𝑦 𝑃 𝑥
0 ≤ 𝑃 𝑥, 𝑦 ≤ 1 𝑃 𝑥, 𝑦 ≤ 𝑃 𝑥 𝑃 𝑦
𝑃 𝑥, 𝑦 𝑃 𝑥, 𝑦
≥ 1 → log ≥0
𝑃 𝑥 𝑃 𝑦 𝑃 𝑦 𝑃 𝑥
∴ 𝐼 𝑋, 𝑌 ≥ 0

Independent Events
❑ Recall in probability theory, two events 𝑋 and 𝑌 are
independent with each other if:
𝑃 𝑋, 𝑌 = 𝑃 𝑋 𝑃 𝑌
❑ Note that if the events are independent:
𝑃 𝑋, 𝑌
𝑃 𝑋𝑌 = =𝑃 𝑋
𝑃 𝑌
❑ The mutual information given by independent events:
𝑃 𝑥
𝐼 𝑋, 𝑌 = ෍ ෍ 𝑃 𝑥, 𝑦 log → 𝐼 𝑋, 𝑌 = 0
𝑃 𝑥
❑ This means that observing 𝑌 DOES NOT give any information

on 𝑋.

Dependent Events
❑ In the extreme case that observing 𝑌 uniquely determines the
value of 𝑋, i.e., knowing 𝑌 means knowing with certainty the
value of 𝑋 or:
𝑃 𝑥 𝑦 = 1 ∀𝑥 ∈ 𝑋ത and ∀𝑦 ∈ 𝑌ത
❑ The mutual information becomes:
1
𝐼 𝑋, 𝑌 = ෍ ෍ 𝑃 𝑥, 𝑦 log
𝑃 𝑥
𝑥∈𝑋ത 𝑦∈𝑌ത
1
𝐼 𝑋, 𝑌 = ෍ 𝑃 𝑥 log
𝑃 𝑥
𝑥∈𝑋ത
𝐼 𝑋, 𝑌 = − ෍ 𝑃 𝑥 log 𝑃 𝑥
𝑥∈𝑋ത
❑ 𝐼 𝑋, 𝑌 in this case is called the entropy of the source.

Independent vs. Dependent
Events
Channel
𝑋
Data Source 𝑌
𝑋 is independent of 𝑌
Channel
𝑋
Data Source 𝑌
𝑋 is completely dependent on 𝑌

Entropy
Data Source 𝑋
𝐻 𝑋 = − ෍ 𝑃 𝑥 log 𝑃 𝑥
𝑥∈𝑋ത
❑ The entropy of a random variable 𝑋 is a measure of
uncertainty or ambiguity.
❑ If we observe 𝑋, uncertainty about it is removed. By
consequence, 𝐻 𝑋 can be thought of as the measure of
information acquired by observing 𝑋.

Properties of Entropy
1. 0 ≤ 𝐻 𝑋 ≤ log 𝑐 𝑋ത
▪ Entropy is always positive and its maximum value is the logarithm of
cardinality of its set.
2. 𝐼 𝑋, 𝑋 = 𝐻 𝑋
▪ The mutual information between 𝑋 and itself is equal to its entropy.
▪ Entropy is also called self information.
3. 𝐼 𝑋, 𝑌 ≤ min 𝐻 𝑋 , 𝐻 𝑌
▪ The maximum mutual information between two random variables is
equal to the entropy of the variable with less entropy.
4. If 𝑌 = 𝑔 𝑋 , then 𝐻 𝑌 ≤ 𝐻 𝑋
▪ The output of a function has less entropy than its input.

Bounds of Entropy
0 ≤ 𝐻 𝑋 ≤ log 𝑐 𝑋ത
❑ Entropy is the measure of uncertainty so if 𝐻 𝑋 = 0, then we
are always certain of the value of 𝑋. This occurs when:
𝑃 𝑥 = 1, 𝑥 ∈ 𝑋ത
❑ The source is always generating the same symbol.
❑ Alternatively, we are most uncertain when 𝐻 𝑋 = log 𝑐 𝑋ത .
This occurs when:
1
𝑃 𝑥 = ∀𝑥 ∈ 𝑋ത
𝑐 𝑋ത
❑ Equiprobable symbols give the most uncertainty as the source
has no bias in generating symbols.

Example: Binary Source
❑ For a binary source where 𝑃 1 = 𝑝 and 𝑃 0 = 1 − 𝑝. Solve
for the entropy in bits.
𝐻 𝑋 = −𝑃 log 2 𝑃 − 1 − 𝑃 log 2 1 − 𝑃

Example: Gaussian Random
Variable
❑ For a continuous source, we can change the summation to an
integral:
𝐻 𝑥 = − න𝑝 𝑥 log 𝑝 𝑥 𝑑𝑥
❑ Recall that a Gaussian Random variable is defined by the PDF:

𝑥−𝑥ҧ 2
1 − 2
𝑝 𝑥 = 𝑒 2𝜎𝑥
2𝜋𝜎𝑥2
❑ The entropy is:
𝐻 𝑥 = 0.5 log 2 2𝜋𝑒𝜎𝑥2
❑ Note: It can be shown that for a given 𝜎𝑥2 , the largest entropy
is for Gaussian PDF.

Joint Entropy
❑ For two random variables 𝑋 and 𝑌 with joint probability
𝑃 𝑋, 𝑌 , the joint entropy between them is:
𝐻 𝑋, 𝑌 = − ෍ 𝑃 𝑥, 𝑦 log 𝑃 𝑥, 𝑦
𝑥,𝑦 ∈
ത 𝑌)
𝑋× ത
❑ Represents the amount of information is in the joint result of

two random variables.
❑ For multiple random variables:
𝐻 𝑋1 , … , 𝑋𝑁 = − ෍ 𝑃 𝑥1 , … , 𝑥𝑁 log 𝑃 𝑥1 , … , 𝑥𝑁
𝑥1 ,…,𝑥𝑁 ∈
𝑋ത1 ×⋯×𝑋ത 𝑁 )

Conditional Entropy
❑ If the random variable 𝑋 is given to be 𝑥, then the entropy of
𝑌 becomes:
𝐻 𝑌|𝑋 = 𝑥 = − ෍ 𝑃 𝑌 = 𝑦|𝑋 = 𝑥 log 𝑃 𝑌 = 𝑦|𝑋 = 𝑥
𝑦∈𝑌ത
❑ Averaging this term for all possible values of 𝑋, we get the

conditional entropy of 𝑌 given 𝑋:
𝐻 𝑌 𝑋 =−෍𝑃 𝑥 𝐻 𝑌 𝑋 =𝑥
𝑥∈𝑋ത
𝐻 𝑌 𝑋 = − ෍ 𝑃 𝑋 = 𝑥, 𝑌 = 𝑦 log 𝑃 𝑌 = 𝑦|𝑋 = 𝑥
𝑥,𝑦 ∈
ത 𝑌ത
𝑋×

Properties of Joint and
Conditional Entropy
1. 𝐻 𝑋, 𝑌 = 𝐻 𝑋 + 𝐻 𝑌 𝑋 = 𝐻 𝑌 + 𝐻 𝑋 𝑌
2. 0≤𝐻 𝑋𝑌 ≤𝐻 𝑋
▪ The conditional entropy is always positive and is bounded by the
entropy of the random variable.
▪ Maximum when 𝑋 and 𝑌 are independent of each other.
3. 𝐻 𝑋, 𝑌 ≤ 𝐻 𝑋 + 𝐻 𝑌
▪ Maximum value of the joint entropy is the sum of the entropies of
the individual variables.
4. 𝐼 𝑋, 𝑌 = 𝐻 𝑋 + 𝐻 𝑌 − 𝐻 𝑋, 𝑌
▪ Mutual Information about both 𝑋 and 𝑌

Interpretation Conditional
Entropy
0≤𝐻 𝑋𝑌 ≤𝐻 𝑋
❑ Observing 𝑌 will give information on 𝑋.
❑ 𝐻 𝑋 𝑌 is the uncertainty on 𝑋 when 𝑌 is observed. It is
maximum when 𝑋 and 𝑌 are independent of each other – the
case when 𝑌 gives no information on 𝑋.
❑ On the other hand, 𝐻 𝑋 𝑌 = 0 if 𝑌 uniquely determines 𝑋 –
entropy or uncertainty is completely removed.

Conditional Entropy and
Mutual Information
❑ Using property 1 and 4 of joint and conditional entropy:
𝐼 𝑋, 𝑌 = 𝐻 𝑋 − 𝐻 𝑋 𝑌
0≤𝐻 𝑋𝑌 ≤𝐻 𝑋
❑ If 𝑌 uniquely determines 𝑋, then 𝐼 𝑋, 𝑌 = 𝐻 𝑋 which means
complete information about 𝑋 can be observed from 𝑌.
❑ If 𝑋 is independent of 𝑌, then 𝐼 𝑋, 𝑌 = 0 which means no
information about 𝑋 can be observed from 𝑌.

ECE 141 Lecture 10:
Information Theory
Entropy, Joint Entropy, Conditional Entropy, Mutual Information, Source Coding, Channel Coding

Source Coding
❑ A discrete source transmits or “conveys” information to its
destination.
❑ Information, however, is abstract. Entropy defines a measure of
information but how does a system interpret it?
❑ Source coding is a method of representing information.
❑ The goal of source coding is to represent the information
from a discrete source without errors.
❑ Example:
0,1,2,3 → 00,01,10,11

More Examples of Source
Coding
❑ The letter “A” can be represented in different forms:
Spoken [ɑ]
Written A, a
Sign Language
Morse Code
Braille Code
ASCII Code 065

Binary Code 01000001

Source Coding Theorem
❑ Assumptions: (1) data is noiseless, (2) source alphabet has finite
number of elements, and (3) statistics of the source is known
❑ Goal: To say that our representation is optimal, the average number of
bits used to faithfully represent the source alphabet is at minimum
❑ Intuition:
▪ Assign shorter (longer) codewords to frequent (rare) source symbols
▪ Average codeword length is:
𝐾−1
𝑙 is codeword length of k-th source symbol
𝐿ത = ෍ 𝑙𝑘 𝑝𝑘 , where ቊ 𝑘
𝑝𝑘 is probability of k-th source symbol
𝑘=0
Lmin
▪ Coding efficiency is defined as:  = , where Lmin is the optimal codeword length
L
▪ Q: How to determine Lmin?

Lossless Source Coding
Theorem
❑ Assume a stream of information, 𝒙, taken from the symbol set
𝑎1 , 𝑎2 , … , 𝑎𝑁 with a very large length 𝑛. Let the source be
memoryless and 𝒙 be a typical transmission.
❑ If 𝑝𝑖 represents the probability that 𝑎𝑖 occurred, then the
number of times 𝑎𝑖 occurred in the stream is 𝑛𝑝𝑖 :
𝑛
𝒙 = 𝑎𝑖 , … , 𝑎𝑖 , … , 𝑎𝑖 , … , 𝑎𝑖
❑ The probability that the transmission is 𝒙 becomes:

𝑁
𝑛𝑝𝑖
log 2 𝑃 𝑿 = 𝒙 ≈ log 2 ෑ 𝑝𝑖
𝑖=1
𝑁
log 2 𝑃 𝑿 = 𝒙 = ෍ 𝑛𝑝𝑖 log 2 𝑝𝑖 = −𝑛𝐻 𝑋

𝑖=1

Lossless Source Coding
Theorem
❑ Since 𝒙 is a typical sequence from the set of all typical
sequences 𝑨 and 𝑛 is very large, the entropy of 𝑿 is at
maximum which occurs at:
𝐻 𝑿 = 𝑛𝐻 𝑥 = log 2 𝑐 𝑨
1
𝑐 𝑨 = = 2𝑛𝐻 𝑋
𝑃 𝑿=𝒙
❑ This means that a stream of information with length 𝑛 can be
represented with 𝑛𝐻 𝑥 bits!
❑ It follows that each symbol in the stream is represented by:
𝑛𝐻 𝑥 bits
𝐿ത ≈ =𝐻 𝑥
𝑛 symbol
❑ This is Shannon’s First Theorem or the Lossless Source
Coding Theorem.

Shannon’s First Theorem
“Let 𝑋 denote a discrete memoryless source with
entropy 𝐻 𝑋 . There exists a lossless source code for
this source at any rate 𝐿ത if 𝐿ത > 𝐻 𝑋 . There exists no
lossless code for this source at rates less than 𝐻 𝑋 .”
❑ Any representation of information with less than 𝐻 𝑋 bits
per symbol becomes a lossy compression.
❑ 𝑅 can also be called as the average codeword length.
Data Source 𝑋

Example
❑ Represent the symbol set X ∈ 0,1,2,3,4,5 assuming
equiprobable symbols.
❑ The entropy is:
𝐻 𝑋 = log 2 6 = 2.585 bits
❑ Theoretically, we can use 2.585 bits to represent the set.
Symbol Representation Symbol Representation
0 000 3 011
1 001 4 100
2 010 5 101
❑ Now, since there are 6 symbols, 3 bits can be used which
results into 23 possible combinations. That leaves 2 unused
combinations for the set.

Example
❑ If we use 2 bits per symbol:
0 00 3 11
1 01 4 −
2 10 5 −
❑ Only 22 combinations are possible. That leaves 2 symbols that
cannot be represented.
❑ That is, there is a loss of two symbols – while the number of
bits are reduced, some loss of information occurred.

Example
❑ Now try:
0 000 3 011
1 001 4 10
2 010 5 11
❑ This representation is valid, that is, it does not cause
confusion when the receiver is decoding it.
❑ The average codeword length is:
3 4 +2 2
ത𝐿 = → 𝐿ത = 2.667 bits/symbol
6

Huffman Coding
❑ A type of variable-length coding algorithm.
❑ Two steps:
1. The symbols with the two lowest probabilities are given bits 0 and
1. It doesn’t matter which bit is assigned.
2. Add the probabilities of the two symbols.
3. Repeat step 1 on the remaining symbols along with the combined
symbols until the total probability is 1.
4. Reading the assigned bits from top to bottom is the representation
of the different symbols.

Example
1/6
0 0 1/3 Symbol Representation
1/6 0
1 2/3
0 000
1
1/6 0 1 001
2 0 1/3
2 010
1/6 1 1
3 3 011
1
1/6 4 10
4 1/3
0
1/6
5 11
5 1
1

Example
❑ Represent the symbol set X ∈ 0,1,2,3,4,5 assuming the
probabilities are 0.3, 0.25, 0.15, 0.12, 0.1, 0.08 .
0.3
0 0 Symbol Representation
0.25 0.57
1 0 00
0
0
0.15 1
2 1 10
0 0.27
0.43 2 010
0.12 1
3 1
1 3 011
0.1
4 4 110
0 0.18
0.08 1
5 5 111
1
❑ Note: 𝐻 𝑋 = 2.422 bits/symbol, 𝐿ത = 2.45 bits/symbol
❑ This results into 𝜂 = 98.86%

Huffman Coding Notes
❑ The average codeword length is bounded by:
𝐻 𝑋 ≤𝑅 ≤𝐻 𝑋 +1
❑ In the previous examples, no sequence is a prefix of another
sequence which makes the set of codes have unique
decodability or instantaneous decodability.
❑ Decreasing 𝑅 below 𝐻 𝑋 would produce errors. This
condition implies that the code conveys less information.

Increasing the Efficiency
❑ If we combine 𝑛 symbols into one “supersymbol” and use the
Huffman code to represent them, the average number of bits
per symbol becomes:
1
𝐻 𝑋 ≤𝑅≤𝐻 𝑋 +
𝑛
❑ This is referred to as the 𝑛𝑡ℎ extension of the code.
❑ As 𝑛 → ∞ the code becomes closer to the theoretical
minimum, 𝐻 𝑋 .

Example:
❑ Binary source with 𝑝 𝑎1 = 0.9 and 𝑝 𝑎2 = 0.1
❑ If only a single symbol is used, the average codeword length is
1. The theoretical minimum is 𝐻 𝑋 = 0.47.
❑ Pair them up: Symbol Representation
𝑎1 𝑎1 0
0.81
𝑎1 𝑎1 0 𝑎1 𝑎2 10
1
0.09
𝑎1 𝑎2 𝑎2 𝑎1 110
0 0.19
0.09 𝑎2 𝑎2 111
𝑎2 𝑎1 1
0 0.10
0.01 1
𝑎2 𝑎2
1
❑ 𝐿ത = 0.645 bits/symbol.

Channel
𝑥 Channel 𝑦
❑ Symbol 𝑥 comes in and a noise corrupted version, 𝑦, comes

out of the channel.
❑ A stream of symbol inputs 𝒙 = 𝑥1 , 𝑥2 , … , 𝑥𝑛 will have
symbol outputs 𝒚 = 𝑦1 , 𝑦2 , … , 𝑦𝑛 .
❑ The channel is called memoryless if:
𝑛
𝑃 𝒙|𝒚 = ෑ 𝑃 𝑥𝑖 𝑦𝑖
𝑖=1
❑ This means that the output 𝑦𝑖 is only dependent on 𝑥𝑖 .

Binary Symmetric Channel
(BSC)
❑ The set of inputs and outputs come from the set 0,1 .
❑ Let the probability of error be 𝑝, then:
𝑃 𝑦 =0 𝑥 =0 =𝑃 𝑦 =1 𝑥 =1 =1−𝑝
𝑃 𝑦=1𝑥=0 =𝑃 𝑦=0𝑥=1 =𝑝
1−𝑝
0 0
𝑝
𝑝
1 1
1−𝑝

Transition Probability Matrix
❑ In a discrete channel, the conditional probability that the
output is 𝑦𝑘 given that the input is 𝑥𝑖 is 𝑃𝑘𝑖 = 𝑃 𝑦𝑘 𝑥𝑖 , the
transition probability matrix is of the form:
𝑃11 𝑃12 ⋯ 𝑃1𝑁
𝑃 𝑃22 ⋯ 𝑃2𝑁
𝑃 = 21
⋮ ⋮ ⋱ ⋮
𝑃𝑁1 𝑃𝑁2 ⋯ 𝑃𝑁𝑁
𝑃 𝑦1 𝑃11 𝑃12 ⋯ 𝑃1𝑁 𝑃 𝑥1

𝑃 𝑦2 𝑃 𝑃22 ⋯ 𝑃2𝑁 𝑃 𝑥2
= 21
⋮ ⋮ ⋮ ⋱ ⋮ ⋮
𝑃 𝑦𝑁 𝑃𝑁1 𝑃𝑁2 ⋯ 𝑃𝑁𝑁 𝑃 𝑥𝑁

Channel Capacity
❑ This is the maximum value of information 𝐼 𝑋, 𝑌 that can be
transmitted through the channel.
𝐶 ′ = max 𝐼 𝑋, 𝑌 bits/symbol
𝑝 𝑥
𝐶 = 𝐶 ′ 𝑅𝑐 bits/s
❑ 𝑅𝑐 is the symbol rate, 𝐶′ is the maximum mutual information
that can be transferred, 𝐶 is the maximum bitrate that can be
transmitted.

Example: AWGN
❑ The output 𝑌 is continuous and the input 𝑋 is discrete.
𝐼 𝑋, 𝑌 = 𝐻 𝑌 − 𝐻 𝑌 𝑋
𝐻 𝑌|𝑋 = − න න 𝑃 𝑦 𝑥 log 2 𝑃 𝑦 𝑥 𝑑𝑦 𝑃 𝑥 𝑑𝑥
❑ The noise is Gaussian:

1 𝑌−𝑋 2
− 𝑁
𝑃 𝑌𝑋 = 𝑒 0
𝜋𝑁0
❑ Note that using the Gaussian distribution:
𝐻 𝑌 𝑋 = 0.5 log 2 𝜋𝑒𝑁0

Example: AWGN
❑ The channel capacity becomes:
𝐶 ′ = max 𝐻 𝑌 − 0.5 log 2 𝜋𝑒𝑁0
𝑝 𝑥
𝐶 ′ = max 𝐻 𝑌 − 0.5 log 2 2𝜋𝑒𝜎𝑛2

𝑝 𝑥
❑ Maximum 𝐻 𝑌 occurs if 𝑌 is a Gaussian RV. If the variance is

𝜎𝑦2 , then:
𝐻 𝑌 = 0.5 log 2 2𝜋𝑒𝜎𝑦2
❑ Note: 𝜎𝑦2 = 𝜎𝑥2 + 𝜎𝑛2 where 𝜎𝑥2 is the signal power and 𝜎𝑛2 is
the noise power.

Example: AWGN
𝐶 ′ = 0.5 log 2 2𝜋𝑒𝜎𝑦2 − 0.5 log 2 2𝜋𝑒𝜎𝑛2
′
𝜎𝑦2
𝐶 = 0.5 log 2
𝜎𝑛2
𝜎𝑥2
𝐶 ′ = 0.5 log 2 1+ 2
𝜎𝑛
𝑆
𝐶′ = 0.5 log 2 1+
𝑁
❑ Note: The channel capacity is dependent on the signal to noise ratio.

Shannon’s Second Theorem
“Reliable communication over discrete memoryless channel
is possible if the communication rate 𝑅 satisfies 𝑅 < 𝐶,
where 𝐶 is the channel capacity. At rates higher than
capacity, reliable communication is not possible.”
❑ For a noisy channel, transmitting at a faster rate than the
capacity will make the transmitted data useless (or garbage
data) due to the large amount of errors.
𝑥 Channel 𝑦

Information Transfer Analogy
❑ Task: Get water from well and bring it home

❑ Amount of water you can carry is limited by
▪ Pail size and your strength (resource limitation)
▪ External factors affecting the process (shaking, etc.)

Capacity of Band-limited
Gaussian Channel
❑ The maximum bit rate that a noisy channel can accommodate
depends on the previous result.
′
1 𝑆
𝐶 = log 2 1 +
2 𝑁
❑ If the bandwidth is 𝐵𝑐 , then the bit rate is 𝑅𝑐 = 2𝐵𝑐 .
❑ The maximum bit rate on a noisy channel is:
′
𝑆
𝐶 = 𝑅𝑐 𝐶 → 𝐶 = 𝐵𝑐 log 2 1 +
𝑁
❑ This is the Shannon-Hartley Capacity Theorem.

Shannon Limit
❑ The signal power is 𝑆 = 𝐸𝑏 𝑅𝑏 where 𝑅𝑏 is the bit rate and 𝐸𝑏
is the energy per bit. The noise power is 𝑁 = 𝑁0 𝐵𝑐 where
2𝐵𝑐 is the channel bandwidth.
❑ Shannon-Hartley Capacity:
𝐸𝑏 𝑅𝑏 𝐶 𝐸𝑏 𝑅𝑏
𝐶 = 𝐵𝑐 log 2 1 + → = log 2 1 + ∙
𝑁0 𝐵𝑐 𝐵𝑐 𝑁0 𝐵𝑐
❑ Take the limit as 𝐵𝑐 → ∞, then 𝐸𝑏 /𝑁0 becomes −1.6 𝑑𝐵. This
is the Shannon Limit.
❑ Any SNR lower than this even with infinite bandwidth does
not guarantee a reliable communication.

Region where
𝑅𝑏 > 𝐶
Bandwidth
limited region
More spectrally efficient
Power limited
region
More energy efficient

Ece141 Lec10 Information Theory

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ece141 Lec10 Information Theory

Uploaded by

Copyright:

Available Formats

ECE 141 Lecture 10:

ECE 141: DIGITAL COMMUNICATIONS 1

▪ I will get a passing grade in ECE 141 this semester

Shannon: “Information is the resolution of uncertainty”

ECE 141: DIGITAL COMMUNICATIONS 2

❑ The data output 𝑌 is a discrete random variable coming from

ECE 141: DIGITAL COMMUNICATION I 3

❑ Consider two data sources with outputs 𝑋 and 𝑌.

ECE 141: DIGITAL COMMUNICATION I 4

❑ If the value of 𝑌 is dependent on 𝑋, then we must consider

ECE 141: DIGITAL COMMUNICATION I 5

❑ The mutual information between random variables 𝑋 and

❑ The units depend on the logarithm base. Base 𝑒 has a unit of

ECE 141: DIGITAL COMMUNICATION I 6

ECE 141: DIGITAL COMMUNICATION I 7

ECE 141: DIGITAL COMMUNICATION I 8

❑ This means that observing 𝑌 DOES NOT give any information

ECE 141: DIGITAL COMMUNICATION I 9

ECE 141: DIGITAL COMMUNICATION I 10

ECE 141: DIGITAL COMMUNICATION I 11

ECE 141: DIGITAL COMMUNICATION I 12

ECE 141: DIGITAL COMMUNICATION I 13

ECE 141: DIGITAL COMMUNICATION I 14

ECE 141: DIGITAL COMMUNICATION I 15

❑ Recall that a Gaussian Random variable is defined by the PDF:

ECE 141: DIGITAL COMMUNICATION I 16

❑ Represents the amount of information is in the joint result of

ECE 141: DIGITAL COMMUNICATION I 17

❑ Averaging this term for all possible values of 𝑋, we get the

ECE 141: DIGITAL COMMUNICATION I 18

ECE 141: DIGITAL COMMUNICATION I 19

ECE 141: DIGITAL COMMUNICATION I 20

ECE 141: DIGITAL COMMUNICATION I 21

ECE 141: DIGITAL COMMUNICATIONS 22

ECE 141: DIGITAL COMMUNICATION I 23

ASCII Code 065

ECE 141: DIGITAL COMMUNICATION I 24

ECE 141: DIGITAL COMMUNICATION I 25

❑ The probability that the transmission is 𝒙 becomes:

log 2 𝑃 𝑿 = 𝒙 = ෍ 𝑛𝑝𝑖 log 2 𝑝𝑖 = −𝑛𝐻 𝑋

ECE 141: DIGITAL COMMUNICATION I 26

ECE 141: DIGITAL COMMUNICATION I 27

ECE 141: DIGITAL COMMUNICATION I 28

ECE 141: DIGITAL COMMUNICATION I 29

ECE 141: DIGITAL COMMUNICATION I 30

ECE 141: DIGITAL COMMUNICATION I 31

ECE 141: DIGITAL COMMUNICATION I 32

ECE 141: DIGITAL COMMUNICATION I 33

ECE 141: DIGITAL COMMUNICATION I 34

ECE 141: DIGITAL COMMUNICATION I 35

ECE 141: DIGITAL COMMUNICATION I 36

ECE 141: DIGITAL COMMUNICATION I 37

❑ Symbol 𝑥 comes in and a noise corrupted version, 𝑦, comes

ECE 141: DIGITAL COMMUNICATION I 38

ECE 141: DIGITAL COMMUNICATION I 39

𝑃 𝑦1 𝑃11 𝑃12 ⋯ 𝑃1𝑁 𝑃 𝑥1

ECE 141: DIGITAL COMMUNICATION I 40

ECE 141: DIGITAL COMMUNICATION I 41

❑ The noise is Gaussian:

ECE 141: DIGITAL COMMUNICATION I 42

𝐶 ′ = max 𝐻 𝑌 − 0.5 log 2 2𝜋𝑒𝜎𝑛2

❑ Maximum 𝐻 𝑌 occurs if 𝑌 is a Gaussian RV. If the variance is