You are on page 1of 49

ECE 141 Lecture 10:

Information Theory
Entropy, Joint Entropy, Conditional Entropy, Mutual Information, Source Coding, Channel Coding

ECE 141: DIGITAL COMMUNICATIONS 1


What is Information?
❑ In our daily lives, information is often obtained by learning
something new.
❑ Consider some sentence examples that conveys
“information”:
Which of the statements do
▪ The weather will be good tomorrow you think contains the most
▪ The weather was bad last Sunday information?

▪ I will get a passing grade in ECE 141 this semester


▪ I will be able to pass all my subjects this semester
❑ In other words, one gets some information when learning
about something that he/she is uncertain before

Shannon: “Information is the resolution of uncertainty”

ECE 141: DIGITAL COMMUNICATIONS 2


Discrete Sources
Data Source 𝑌

❑ The data output 𝑌 is a discrete random variable coming from


a set with a finite cardinality:
𝑌 ∈ 𝑦1 , 𝑦2 , … , 𝑦𝑁
❑ Each element of the set has a probability of being the output
of the data source: 𝑃 𝑌 . It follows that:
𝑁

෍ 𝑃 𝑦𝑘 = 1
𝑘=1
❑ Examples: English Alphabet, Morse Code, Braille Code

ECE 141: DIGITAL COMMUNICATION I 3


Mutual Information
𝑋
Data Source Channel 𝑌

❑ Consider two data sources with outputs 𝑋 and 𝑌.


❑ Suppose we are sampling 𝑌, we want to know by observing 𝑌,
how much information about 𝑋 we can get.
❑ This is an a posteriori problem:
𝑃 𝑋=𝑥𝑌=𝑦 =𝑃 𝑥𝑦
❑ 𝑥 ∈ 𝑋ത and 𝑦 ∈ 𝑌ത where 𝑋ത and 𝑌ത are arbitrary sets of finite
cardinality

ECE 141: DIGITAL COMMUNICATION I 4


Mutual Information
𝑋
Data Source Channel 𝑌

❑ If the value of 𝑌 is dependent on 𝑋, then we must consider


also the probability distribution of 𝑋, 𝑃 𝑋 = 𝑥 .
❑ The mutual information between the output 𝑌 = 𝑦 and
output 𝑋 = 𝑥 is defined as:
𝑃 𝑥𝑦
𝐼 𝑥, 𝑦 = log
𝑃 𝑥
❑ This is the amount of information that 𝑌 tells about 𝑋.

ECE 141: DIGITAL COMMUNICATION I 5


Mutual Information
𝑋
Data Source Channel 𝑌

❑ The mutual information between random variables 𝑋 and


𝑌 is the sum of all the mutual information between their
possible values:
𝑃 𝑥𝑦
𝐼 𝑋, 𝑌 = ෍ ෍ 𝑃 𝑋 = 𝑥, 𝑌 = 𝑦 log
𝑃 𝑥
𝑥∈𝑋ത 𝑦∈𝑌ത

❑ The units depend on the logarithm base. Base 𝑒 has a unit of


nats while base 2 has a unit of bits.

ECE 141: DIGITAL COMMUNICATION I 6


Properties of Mutual
Information
❑ Property 1: Mutual Information is symmetric.
𝐼 𝑋, 𝑌 = 𝐼 𝑌, 𝑋
❑ Proof:
𝑃 𝑥𝑦
𝐼 𝑋, 𝑌 = ෍ ෍ 𝑃 𝑋 = 𝑥, 𝑌 = 𝑦 log
𝑃 𝑥
𝑥∈𝐴1 𝑦∈𝐴2
𝑃 𝑥𝑦 𝑃 𝑦𝑥
𝑃 𝑥𝑦 𝑃 𝑦 =𝑃 𝑦𝑥 𝑃 𝑥 → =
𝑃 𝑥 𝑃 𝑦
𝑃 𝑦𝑥
𝐼 𝑋, 𝑌 = ෍ ෍ 𝑃 𝑋 = 𝑥, 𝑌 = 𝑦 log
𝑃 𝑦
𝑥∈𝐴1 𝑦∈𝐴2
𝐼 𝑋, 𝑌 = 𝐼 𝑌, 𝑋

ECE 141: DIGITAL COMMUNICATION I 7


Properties of Mutual
Information
❑ Property 2: Mutual information is non-negative.
𝐼 𝑋, 𝑌 ≥ 0
❑ Proof:
𝑃 𝑥𝑦
𝐼 𝑋, 𝑌 = ෍ ෍ 𝑃 𝑥, 𝑦 log
𝑃 𝑥
𝑥∈𝐴1 𝑦∈𝐴2
𝑃 𝑥, 𝑦
𝐼 𝑋, 𝑌 = ෍ ෍ 𝑃 𝑥, 𝑦 log
𝑃 𝑦 𝑃 𝑥
𝑥∈𝐴1 𝑦∈𝐴2
0 ≤ 𝑃 𝑥, 𝑦 ≤ 1 𝑃 𝑥, 𝑦 ≤ 𝑃 𝑥 𝑃 𝑦
𝑃 𝑥, 𝑦 𝑃 𝑥, 𝑦
≥ 1 → log ≥0
𝑃 𝑥 𝑃 𝑦 𝑃 𝑦 𝑃 𝑥
∴ 𝐼 𝑋, 𝑌 ≥ 0

ECE 141: DIGITAL COMMUNICATION I 8


Independent Events
❑ Recall in probability theory, two events 𝑋 and 𝑌 are
independent with each other if:
𝑃 𝑋, 𝑌 = 𝑃 𝑋 𝑃 𝑌
❑ Note that if the events are independent:
𝑃 𝑋, 𝑌
𝑃 𝑋𝑌 = =𝑃 𝑋
𝑃 𝑌
❑ The mutual information given by independent events:
𝑃 𝑥
𝐼 𝑋, 𝑌 = ෍ ෍ 𝑃 𝑥, 𝑦 log → 𝐼 𝑋, 𝑌 = 0
𝑃 𝑥
𝑥∈𝐴1 𝑦∈𝐴2

❑ This means that observing 𝑌 DOES NOT give any information


on 𝑋.

ECE 141: DIGITAL COMMUNICATION I 9


Dependent Events
❑ In the extreme case that observing 𝑌 uniquely determines the
value of 𝑋, i.e., knowing 𝑌 means knowing with certainty the
value of 𝑋 or:
𝑃 𝑥 𝑦 = 1 ∀𝑥 ∈ 𝑋ത and ∀𝑦 ∈ 𝑌ത
❑ The mutual information becomes:
1
𝐼 𝑋, 𝑌 = ෍ ෍ 𝑃 𝑥, 𝑦 log
𝑃 𝑥
𝑥∈𝑋ത 𝑦∈𝑌ത
1
𝐼 𝑋, 𝑌 = ෍ 𝑃 𝑥 log
𝑃 𝑥
𝑥∈𝑋ത
𝐼 𝑋, 𝑌 = − ෍ 𝑃 𝑥 log 𝑃 𝑥
𝑥∈𝑋ത
❑ 𝐼 𝑋, 𝑌 in this case is called the entropy of the source.

ECE 141: DIGITAL COMMUNICATION I 10


Independent vs. Dependent
Events
Channel
𝑋
Data Source 𝑌

𝑋 is independent of 𝑌

Channel
𝑋
Data Source 𝑌

𝑋 is completely dependent on 𝑌

ECE 141: DIGITAL COMMUNICATION I 11


Entropy
Data Source 𝑋

𝐻 𝑋 = − ෍ 𝑃 𝑥 log 𝑃 𝑥
𝑥∈𝑋ത
❑ The entropy of a random variable 𝑋 is a measure of
uncertainty or ambiguity.
❑ If we observe 𝑋, uncertainty about it is removed. By
consequence, 𝐻 𝑋 can be thought of as the measure of
information acquired by observing 𝑋.

ECE 141: DIGITAL COMMUNICATION I 12


Properties of Entropy
1. 0 ≤ 𝐻 𝑋 ≤ log 𝑐 𝑋ത
▪ Entropy is always positive and its maximum value is the logarithm of
cardinality of its set.
2. 𝐼 𝑋, 𝑋 = 𝐻 𝑋
▪ The mutual information between 𝑋 and itself is equal to its entropy.
▪ Entropy is also called self information.
3. 𝐼 𝑋, 𝑌 ≤ min 𝐻 𝑋 , 𝐻 𝑌
▪ The maximum mutual information between two random variables is
equal to the entropy of the variable with less entropy.
4. If 𝑌 = 𝑔 𝑋 , then 𝐻 𝑌 ≤ 𝐻 𝑋
▪ The output of a function has less entropy than its input.

ECE 141: DIGITAL COMMUNICATION I 13


Bounds of Entropy
0 ≤ 𝐻 𝑋 ≤ log 𝑐 𝑋ത
❑ Entropy is the measure of uncertainty so if 𝐻 𝑋 = 0, then we
are always certain of the value of 𝑋. This occurs when:
𝑃 𝑥 = 1, 𝑥 ∈ 𝑋ത
❑ The source is always generating the same symbol.
❑ Alternatively, we are most uncertain when 𝐻 𝑋 = log 𝑐 𝑋ത .
This occurs when:
1
𝑃 𝑥 = ∀𝑥 ∈ 𝑋ത
𝑐 𝑋ത
❑ Equiprobable symbols give the most uncertainty as the source
has no bias in generating symbols.

ECE 141: DIGITAL COMMUNICATION I 14


Example: Binary Source
❑ For a binary source where 𝑃 1 = 𝑝 and 𝑃 0 = 1 − 𝑝. Solve
for the entropy in bits.
𝐻 𝑋 = −𝑃 log 2 𝑃 − 1 − 𝑃 log 2 1 − 𝑃

ECE 141: DIGITAL COMMUNICATION I 15


Example: Gaussian Random
Variable
❑ For a continuous source, we can change the summation to an
integral:
𝐻 𝑥 = − න𝑝 𝑥 log 𝑝 𝑥 𝑑𝑥

❑ Recall that a Gaussian Random variable is defined by the PDF:


𝑥−𝑥ҧ 2
1 − 2
𝑝 𝑥 = 𝑒 2𝜎𝑥
2𝜋𝜎𝑥2
❑ The entropy is:
𝐻 𝑥 = 0.5 log 2 2𝜋𝑒𝜎𝑥2
❑ Note: It can be shown that for a given 𝜎𝑥2 , the largest entropy
is for Gaussian PDF.

ECE 141: DIGITAL COMMUNICATION I 16


Joint Entropy
❑ For two random variables 𝑋 and 𝑌 with joint probability
𝑃 𝑋, 𝑌 , the joint entropy between them is:
𝐻 𝑋, 𝑌 = − ෍ 𝑃 𝑥, 𝑦 log 𝑃 𝑥, 𝑦
𝑥,𝑦 ∈
ത 𝑌)
𝑋× ത

❑ Represents the amount of information is in the joint result of


two random variables.
❑ For multiple random variables:
𝐻 𝑋1 , … , 𝑋𝑁 = − ෍ 𝑃 𝑥1 , … , 𝑥𝑁 log 𝑃 𝑥1 , … , 𝑥𝑁
𝑥1 ,…,𝑥𝑁 ∈
𝑋ത1 ×⋯×𝑋ത 𝑁 )

ECE 141: DIGITAL COMMUNICATION I 17


Conditional Entropy
❑ If the random variable 𝑋 is given to be 𝑥, then the entropy of
𝑌 becomes:
𝐻 𝑌|𝑋 = 𝑥 = − ෍ 𝑃 𝑌 = 𝑦|𝑋 = 𝑥 log 𝑃 𝑌 = 𝑦|𝑋 = 𝑥
𝑦∈𝑌ത

❑ Averaging this term for all possible values of 𝑋, we get the


conditional entropy of 𝑌 given 𝑋:
𝐻 𝑌 𝑋 =−෍𝑃 𝑥 𝐻 𝑌 𝑋 =𝑥
𝑥∈𝑋ത
𝐻 𝑌 𝑋 = − ෍ 𝑃 𝑋 = 𝑥, 𝑌 = 𝑦 log 𝑃 𝑌 = 𝑦|𝑋 = 𝑥
𝑥,𝑦 ∈
ത 𝑌ത
𝑋×

ECE 141: DIGITAL COMMUNICATION I 18


Properties of Joint and
Conditional Entropy
1. 𝐻 𝑋, 𝑌 = 𝐻 𝑋 + 𝐻 𝑌 𝑋 = 𝐻 𝑌 + 𝐻 𝑋 𝑌
2. 0≤𝐻 𝑋𝑌 ≤𝐻 𝑋
▪ The conditional entropy is always positive and is bounded by the
entropy of the random variable.
▪ Maximum when 𝑋 and 𝑌 are independent of each other.
3. 𝐻 𝑋, 𝑌 ≤ 𝐻 𝑋 + 𝐻 𝑌
▪ Maximum value of the joint entropy is the sum of the entropies of
the individual variables.
4. 𝐼 𝑋, 𝑌 = 𝐻 𝑋 + 𝐻 𝑌 − 𝐻 𝑋, 𝑌
▪ Mutual Information about both 𝑋 and 𝑌

ECE 141: DIGITAL COMMUNICATION I 19


Interpretation Conditional
Entropy
0≤𝐻 𝑋𝑌 ≤𝐻 𝑋
❑ Observing 𝑌 will give information on 𝑋.
❑ 𝐻 𝑋 𝑌 is the uncertainty on 𝑋 when 𝑌 is observed. It is
maximum when 𝑋 and 𝑌 are independent of each other – the
case when 𝑌 gives no information on 𝑋.
❑ On the other hand, 𝐻 𝑋 𝑌 = 0 if 𝑌 uniquely determines 𝑋 –
entropy or uncertainty is completely removed.

ECE 141: DIGITAL COMMUNICATION I 20


Conditional Entropy and
Mutual Information
❑ Using property 1 and 4 of joint and conditional entropy:
𝐼 𝑋, 𝑌 = 𝐻 𝑋 − 𝐻 𝑋 𝑌
0≤𝐻 𝑋𝑌 ≤𝐻 𝑋
❑ If 𝑌 uniquely determines 𝑋, then 𝐼 𝑋, 𝑌 = 𝐻 𝑋 which means
complete information about 𝑋 can be observed from 𝑌.
❑ If 𝑋 is independent of 𝑌, then 𝐼 𝑋, 𝑌 = 0 which means no
information about 𝑋 can be observed from 𝑌.

ECE 141: DIGITAL COMMUNICATION I 21


ECE 141 Lecture 10:
Information Theory
Entropy, Joint Entropy, Conditional Entropy, Mutual Information, Source Coding, Channel Coding

ECE 141: DIGITAL COMMUNICATIONS 22


Source Coding
❑ A discrete source transmits or “conveys” information to its
destination.
❑ Information, however, is abstract. Entropy defines a measure of
information but how does a system interpret it?
❑ Source coding is a method of representing information.
❑ The goal of source coding is to represent the information
from a discrete source without errors.
❑ Example:
0,1,2,3 → 00,01,10,11

ECE 141: DIGITAL COMMUNICATION I 23


More Examples of Source
Coding
❑ The letter “A” can be represented in different forms:

Spoken [ɑ]
Written A, a

Sign Language

Morse Code

Braille Code

ASCII Code 065


Binary Code 01000001

ECE 141: DIGITAL COMMUNICATION I 24


Source Coding Theorem
❑ Assumptions: (1) data is noiseless, (2) source alphabet has finite
number of elements, and (3) statistics of the source is known
❑ Goal: To say that our representation is optimal, the average number of
bits used to faithfully represent the source alphabet is at minimum
❑ Intuition:
▪ Assign shorter (longer) codewords to frequent (rare) source symbols
▪ Average codeword length is:
𝐾−1
𝑙 is codeword length of k-th source symbol
𝐿ത = ෍ 𝑙𝑘 𝑝𝑘 , where ቊ 𝑘
𝑝𝑘 is probability of k-th source symbol
𝑘=0
Lmin
▪ Coding efficiency is defined as:  = , where Lmin is the optimal codeword length
L
▪ Q: How to determine Lmin?

ECE 141: DIGITAL COMMUNICATION I 25


Lossless Source Coding
Theorem
❑ Assume a stream of information, 𝒙, taken from the symbol set
𝑎1 , 𝑎2 , … , 𝑎𝑁 with a very large length 𝑛. Let the source be
memoryless and 𝒙 be a typical transmission.
❑ If 𝑝𝑖 represents the probability that 𝑎𝑖 occurred, then the
number of times 𝑎𝑖 occurred in the stream is 𝑛𝑝𝑖 :
𝑛
𝒙 = 𝑎𝑖 , … , 𝑎𝑖 , … , 𝑎𝑖 , … , 𝑎𝑖

❑ The probability that the transmission is 𝒙 becomes:


𝑁
𝑛𝑝𝑖
log 2 𝑃 𝑿 = 𝒙 ≈ log 2 ෑ 𝑝𝑖
𝑖=1
𝑁

log 2 𝑃 𝑿 = 𝒙 = ෍ 𝑛𝑝𝑖 log 2 𝑝𝑖 = −𝑛𝐻 𝑋


𝑖=1

ECE 141: DIGITAL COMMUNICATION I 26


Lossless Source Coding
Theorem
❑ Since 𝒙 is a typical sequence from the set of all typical
sequences 𝑨 and 𝑛 is very large, the entropy of 𝑿 is at
maximum which occurs at:
𝐻 𝑿 = 𝑛𝐻 𝑥 = log 2 𝑐 𝑨
1
𝑐 𝑨 = = 2𝑛𝐻 𝑋
𝑃 𝑿=𝒙
❑ This means that a stream of information with length 𝑛 can be
represented with 𝑛𝐻 𝑥 bits!
❑ It follows that each symbol in the stream is represented by:
𝑛𝐻 𝑥 bits
𝐿ത ≈ =𝐻 𝑥
𝑛 symbol
❑ This is Shannon’s First Theorem or the Lossless Source
Coding Theorem.

ECE 141: DIGITAL COMMUNICATION I 27


Shannon’s First Theorem
“Let 𝑋 denote a discrete memoryless source with
entropy 𝐻 𝑋 . There exists a lossless source code for
this source at any rate 𝐿ത if 𝐿ത > 𝐻 𝑋 . There exists no
lossless code for this source at rates less than 𝐻 𝑋 .”
❑ Any representation of information with less than 𝐻 𝑋 bits
per symbol becomes a lossy compression.
❑ 𝑅 can also be called as the average codeword length.

Data Source 𝑋

ECE 141: DIGITAL COMMUNICATION I 28


Example
❑ Represent the symbol set X ∈ 0,1,2,3,4,5 assuming
equiprobable symbols.
❑ The entropy is:
𝐻 𝑋 = log 2 6 = 2.585 bits
❑ Theoretically, we can use 2.585 bits to represent the set.
Symbol Representation Symbol Representation
0 000 3 011
1 001 4 100
2 010 5 101
❑ Now, since there are 6 symbols, 3 bits can be used which
results into 23 possible combinations. That leaves 2 unused
combinations for the set.

ECE 141: DIGITAL COMMUNICATION I 29


Example
❑ Represent the symbol set X ∈ 0,1,2,3,4,5 assuming
equiprobable symbols.
❑ If we use 2 bits per symbol:
Symbol Representation Symbol Representation
0 00 3 11
1 01 4 −
2 10 5 −
❑ Only 22 combinations are possible. That leaves 2 symbols that
cannot be represented.
❑ That is, there is a loss of two symbols – while the number of
bits are reduced, some loss of information occurred.

ECE 141: DIGITAL COMMUNICATION I 30


Example
❑ Represent the symbol set X ∈ 0,1,2,3,4,5 assuming
equiprobable symbols.
❑ Now try:
Symbol Representation Symbol Representation
0 000 3 011
1 001 4 10
2 010 5 11
❑ This representation is valid, that is, it does not cause
confusion when the receiver is decoding it.
❑ The average codeword length is:
3 4 +2 2
ത𝐿 = → 𝐿ത = 2.667 bits/symbol
6

ECE 141: DIGITAL COMMUNICATION I 31


Huffman Coding
❑ A type of variable-length coding algorithm.
❑ Two steps:
1. The symbols with the two lowest probabilities are given bits 0 and
1. It doesn’t matter which bit is assigned.
2. Add the probabilities of the two symbols.
3. Repeat step 1 on the remaining symbols along with the combined
symbols until the total probability is 1.
4. Reading the assigned bits from top to bottom is the representation
of the different symbols.

ECE 141: DIGITAL COMMUNICATION I 32


Example
❑ Represent the symbol set X ∈ 0,1,2,3,4,5 assuming
equiprobable symbols.
1/6
0 0 1/3 Symbol Representation
1/6 0
1 2/3
0 000
1
1/6 0 1 001
2 0 1/3
2 010
1/6 1 1
3 3 011
1
1/6 4 10
4 1/3
0
1/6
5 11
5 1
1

ECE 141: DIGITAL COMMUNICATION I 33


Example
❑ Represent the symbol set X ∈ 0,1,2,3,4,5 assuming the
probabilities are 0.3, 0.25, 0.15, 0.12, 0.1, 0.08 .
0.3
0 0 Symbol Representation
0.25 0.57
1 0 00
0
0
0.15 1
2 1 10
0 0.27
0.43 2 010
0.12 1
3 1
1 3 011
0.1
4 4 110
0 0.18
0.08 1
5 5 111
1
❑ Note: 𝐻 𝑋 = 2.422 bits/symbol, 𝐿ത = 2.45 bits/symbol
❑ This results into 𝜂 = 98.86%

ECE 141: DIGITAL COMMUNICATION I 34


Huffman Coding Notes
❑ The average codeword length is bounded by:
𝐻 𝑋 ≤𝑅 ≤𝐻 𝑋 +1
❑ In the previous examples, no sequence is a prefix of another
sequence which makes the set of codes have unique
decodability or instantaneous decodability.
❑ Decreasing 𝑅 below 𝐻 𝑋 would produce errors. This
condition implies that the code conveys less information.

ECE 141: DIGITAL COMMUNICATION I 35


Increasing the Efficiency
❑ If we combine 𝑛 symbols into one “supersymbol” and use the
Huffman code to represent them, the average number of bits
per symbol becomes:
1
𝐻 𝑋 ≤𝑅≤𝐻 𝑋 +
𝑛
❑ This is referred to as the 𝑛𝑡ℎ extension of the code.
❑ As 𝑛 → ∞ the code becomes closer to the theoretical
minimum, 𝐻 𝑋 .

ECE 141: DIGITAL COMMUNICATION I 36


Example:
❑ Binary source with 𝑝 𝑎1 = 0.9 and 𝑝 𝑎2 = 0.1
❑ If only a single symbol is used, the average codeword length is
1. The theoretical minimum is 𝐻 𝑋 = 0.47.
❑ Pair them up: Symbol Representation

𝑎1 𝑎1 0
0.81
𝑎1 𝑎1 0 𝑎1 𝑎2 10
1
0.09
𝑎1 𝑎2 𝑎2 𝑎1 110
0 0.19
0.09 𝑎2 𝑎2 111
𝑎2 𝑎1 1
0 0.10
0.01 1
𝑎2 𝑎2
1

❑ 𝐿ത = 0.645 bits/symbol.

ECE 141: DIGITAL COMMUNICATION I 37


Channel
𝑥 Channel 𝑦

❑ Symbol 𝑥 comes in and a noise corrupted version, 𝑦, comes


out of the channel.
❑ A stream of symbol inputs 𝒙 = 𝑥1 , 𝑥2 , … , 𝑥𝑛 will have
symbol outputs 𝒚 = 𝑦1 , 𝑦2 , … , 𝑦𝑛 .
❑ The channel is called memoryless if:
𝑛

𝑃 𝒙|𝒚 = ෑ 𝑃 𝑥𝑖 𝑦𝑖
𝑖=1
❑ This means that the output 𝑦𝑖 is only dependent on 𝑥𝑖 .

ECE 141: DIGITAL COMMUNICATION I 38


Binary Symmetric Channel
(BSC)
❑ The set of inputs and outputs come from the set 0,1 .
❑ Let the probability of error be 𝑝, then:
𝑃 𝑦 =0 𝑥 =0 =𝑃 𝑦 =1 𝑥 =1 =1−𝑝
𝑃 𝑦=1𝑥=0 =𝑃 𝑦=0𝑥=1 =𝑝
1−𝑝
0 0
𝑝

𝑝
1 1
1−𝑝

ECE 141: DIGITAL COMMUNICATION I 39


Transition Probability Matrix
❑ In a discrete channel, the conditional probability that the
output is 𝑦𝑘 given that the input is 𝑥𝑖 is 𝑃𝑘𝑖 = 𝑃 𝑦𝑘 𝑥𝑖 , the
transition probability matrix is of the form:
𝑃11 𝑃12 ⋯ 𝑃1𝑁
𝑃 𝑃22 ⋯ 𝑃2𝑁
𝑃 = 21
⋮ ⋮ ⋱ ⋮
𝑃𝑁1 𝑃𝑁2 ⋯ 𝑃𝑁𝑁

𝑃 𝑦1 𝑃11 𝑃12 ⋯ 𝑃1𝑁 𝑃 𝑥1


𝑃 𝑦2 𝑃 𝑃22 ⋯ 𝑃2𝑁 𝑃 𝑥2
= 21
⋮ ⋮ ⋮ ⋱ ⋮ ⋮
𝑃 𝑦𝑁 𝑃𝑁1 𝑃𝑁2 ⋯ 𝑃𝑁𝑁 𝑃 𝑥𝑁

ECE 141: DIGITAL COMMUNICATION I 40


Channel Capacity
❑ This is the maximum value of information 𝐼 𝑋, 𝑌 that can be
transmitted through the channel.

𝐶 ′ = max 𝐼 𝑋, 𝑌 bits/symbol
𝑝 𝑥
𝐶 = 𝐶 ′ 𝑅𝑐 bits/s
❑ 𝑅𝑐 is the symbol rate, 𝐶′ is the maximum mutual information
that can be transferred, 𝐶 is the maximum bitrate that can be
transmitted.

ECE 141: DIGITAL COMMUNICATION I 41


Example: AWGN
❑ The output 𝑌 is continuous and the input 𝑋 is discrete.
𝐼 𝑋, 𝑌 = 𝐻 𝑌 − 𝐻 𝑌 𝑋

𝐻 𝑌|𝑋 = − න න 𝑃 𝑦 𝑥 log 2 𝑃 𝑦 𝑥 𝑑𝑦 𝑃 𝑥 𝑑𝑥

❑ The noise is Gaussian:


1 𝑌−𝑋 2
− 𝑁
𝑃 𝑌𝑋 = 𝑒 0
𝜋𝑁0
❑ Note that using the Gaussian distribution:
𝐻 𝑌 𝑋 = 0.5 log 2 𝜋𝑒𝑁0

ECE 141: DIGITAL COMMUNICATION I 42


Example: AWGN
❑ The channel capacity becomes:
𝐶 ′ = max 𝐻 𝑌 − 0.5 log 2 𝜋𝑒𝑁0
𝑝 𝑥

𝐶 ′ = max 𝐻 𝑌 − 0.5 log 2 2𝜋𝑒𝜎𝑛2


𝑝 𝑥

❑ Maximum 𝐻 𝑌 occurs if 𝑌 is a Gaussian RV. If the variance is


𝜎𝑦2 , then:
𝐻 𝑌 = 0.5 log 2 2𝜋𝑒𝜎𝑦2
❑ Note: 𝜎𝑦2 = 𝜎𝑥2 + 𝜎𝑛2 where 𝜎𝑥2 is the signal power and 𝜎𝑛2 is
the noise power.

ECE 141: DIGITAL COMMUNICATION I 43


Example: AWGN
𝐶 ′ = 0.5 log 2 2𝜋𝑒𝜎𝑦2 − 0.5 log 2 2𝜋𝑒𝜎𝑛2


𝜎𝑦2
𝐶 = 0.5 log 2
𝜎𝑛2

𝜎𝑥2
𝐶 ′ = 0.5 log 2 1+ 2
𝜎𝑛

𝑆
𝐶′ = 0.5 log 2 1+
𝑁

❑ Note: The channel capacity is dependent on the signal to noise ratio.

ECE 141: DIGITAL COMMUNICATION I 44


Shannon’s Second Theorem
“Reliable communication over discrete memoryless channel
is possible if the communication rate 𝑅 satisfies 𝑅 < 𝐶,
where 𝐶 is the channel capacity. At rates higher than
capacity, reliable communication is not possible.”
❑ For a noisy channel, transmitting at a faster rate than the
capacity will make the transmitted data useless (or garbage
data) due to the large amount of errors.

𝑥 Channel 𝑦

ECE 141: DIGITAL COMMUNICATION I 45


Information Transfer Analogy

❑ Task: Get water from well and bring it home


❑ Amount of water you can carry is limited by
▪ Pail size and your strength (resource limitation)
▪ External factors affecting the process (shaking, etc.)

ECE 141: DIGITAL COMMUNICATION I 46


Capacity of Band-limited
Gaussian Channel
❑ The maximum bit rate that a noisy channel can accommodate
depends on the previous result.

1 𝑆
𝐶 = log 2 1 +
2 𝑁
❑ If the bandwidth is 𝐵𝑐 , then the bit rate is 𝑅𝑐 = 2𝐵𝑐 .
❑ The maximum bit rate on a noisy channel is:

𝑆
𝐶 = 𝑅𝑐 𝐶 → 𝐶 = 𝐵𝑐 log 2 1 +
𝑁

❑ This is the Shannon-Hartley Capacity Theorem.

ECE 141: DIGITAL COMMUNICATION I 47


Shannon Limit
❑ The signal power is 𝑆 = 𝐸𝑏 𝑅𝑏 where 𝑅𝑏 is the bit rate and 𝐸𝑏
is the energy per bit. The noise power is 𝑁 = 𝑁0 𝐵𝑐 where
2𝐵𝑐 is the channel bandwidth.
❑ Shannon-Hartley Capacity:
𝐸𝑏 𝑅𝑏 𝐶 𝐸𝑏 𝑅𝑏
𝐶 = 𝐵𝑐 log 2 1 + → = log 2 1 + ∙
𝑁0 𝐵𝑐 𝐵𝑐 𝑁0 𝐵𝑐
❑ Take the limit as 𝐵𝑐 → ∞, then 𝐸𝑏 /𝑁0 becomes −1.6 𝑑𝐵. This
is the Shannon Limit.
❑ Any SNR lower than this even with infinite bandwidth does
not guarantee a reliable communication.

ECE 141: DIGITAL COMMUNICATION I 48


Region where
𝑅𝑏 > 𝐶

Bandwidth
limited region
More spectrally efficient

Power limited
region

More energy efficient

ECE 141: DIGITAL COMMUNICATION I 49

You might also like