Lecture 22

Lecture 22: Final Review
• Nuts and bolts

• Fundamental questions and limits
• Tools
• Practical algorithms
• Future topics
Dr. Yao Xie, ECE587, Information Theory, Duke University

Basics
Dr. Yao Xie, ECE587, Information Theory, Duke University 1

Nuts and bolts
• Entropy: ∑
H(X) = − p(x) log2 p(x) (bits)
∫x
H(X) = − f (x) log f (x)dx (bits)
• Conditional entropy: H(X|Y ), joint entropy: H(X, Y )
• Mutual information: reduction in uncertainty due to another random

variable
I(X; Y ) = H(X) − H(X|Y )
∑ p(x)
• Relative entropy: D(p||q) = x p(x) log q(x)

• For stochastic processes: entropy rate
H(X n)
H(X ) = lim = lim H(Xn|Xn−1, . . . , X1)
n→∞ n n→
∑
=− µiPij log Pij for 1st order Markov chain
ij
;4#47.(<"=%$>78%"(
!"#$%&'(( (
56.7896(!"#$%&'( <*+?(@,(-()*+,(2()*+0@,(
)*+,(-((
3*&00:,( (
.%/01023*&004,((
<*+?(@,(-(3*&*AB(',00&*A,&*',,(

H(X,Y )
H(X |Y ) I(X;Y ) H(Y |X )
H(X ) H(Y )

Thou Shalt Know (In)equalities
• Chain rules:
H(X, Y ) = H(X) + H(Y ∑n|X),
H(X1, X2, · · · , Xn) = ∑ i=1 H(Xi |Xi−1 , · · · , X1 )
n
I(X1, X2, · · · , Xn; Y ) = i=1 I(Xi; Y |Xi−1, · · · , X1)
• Jensen’s inequality: if f is a convex function, Ef (X) ≥ f (EX).
• Conditioning reduces entropy: H(X|Y ) ≤ H(X)
• H(X) ≥ 0 (but differential entropy can be < 0), I(X; Y ) ≥ 0 (for both
discrete and continuous)

• Data processing inequality: X → Y → Z
I(X; Z) ≤ I(X; Y )
I(X; Z) ≤ I(Y ; Z)
• Fano’s inequality:
H(X|Y ) − 1
Pe ≥
log |X |

Fundamental Questions and Limits

Fundamental questions
Physical Channel
Source Encoder Transmitter Receiver Decoder
• Data compression limit (lossless source coding)
• Data transmission limit (channel capacity)
• Tradeoff between rate and distortion (lossy compression)

Codebook
/0+1'+"#$,"-#''
10.'#23$).2")')' !"#$%""&'
()*'+"#$,"-#.'

Fundamental limits
Data compression Data transmission

limit limit
^
min l (X; X ) max l (X; Y )
Lossless compression:
• I(X; X̂) = H(X) − H(X|X̂)
• H(X|X̂) = 0, I(X; X̂) = H(X)

∑
• Data compression limit: x l(x)p(x) ≥ H(X)

∑m −li
• instantaneous code: i=1 D ≤1
• optimal code length: li∗ = − logD pi
Root
10
110
111

limit limit
^
Channel capacity:
• given fixed channel with transition probability p(y|x)
• C = maxp(x) I(X; Y )

• BSC: C = 1 − H(p), Erasure: C = 1 − α
• Gaussian channel: C = 12 log(1 + N

P
)
• water-filling
Power
P1
P2
N3
N1
N2
Channel 1 Channel 2 Channel 3

!"
%&'()"!*" %&'()"!!"
$" #"

limit limit
^
Rate-distortion:
• given source with distribution p(x)
• R(D) = minp(x|x̂):∑ p(x)p(x̂|x)d(x,x̂)≤D I(X; X̂)
• compute R(D): construct “test” channel

• Bernoulli source: R(D) = (H(p) − H(D))+
( )+
σ2
• Gaussian source: R(D) = 1
2 log D

Tools

Tools: from probability
• Law of Large Number (LLN): If xn independent and identically

distributed,
1 ∑
N
xn → E{X}, w.p.1
N n=1
• Variance: σ 2 = E{(X − µ)2} = E{X 2} − µ2
• Central Limit Theorem (CLT):
1 ∑
N
√ (xn − µ) → N (0, 1)
N σ2 n=1

Tools: AEP
• LLN for product of i.i.d. random variables

v
u n
u∏
t
n
Xi → eE(log X)
i=1
• AEP
1 1
log → H(X)
n p(X1, X2, . . . , Xn)
p(X1, X2, . . . , Xn) ≈ 2−nH(X)
Divide all sequences in X n into two sets

Typicality
n:| |n elements
Non-typical set
Typical set
A(n) : 2n(H + )
∋
∋ elements

Joint typicality
Yn
Xn
yn
xn
. .. . . .
. . . . .. . . .. .
.. .
.. . . . . . . . .
. . . . .
. .. . . . . . .. . . .
. .
.. . . . . .. . . . .
. . .
.. . . . . . .. ..
. . . .
. . .
. . . .. . . . . ..

Tools: Maximum entropy
Discrete:
• H(X) ≤ log |X |, equality when X has uniform distribution
Continuous:
• H(X) ≤ 12 log(2πe)n|K|, EX 2 = K
equality when X ∼ N (0, K)

Practical Algorithms

Practical algorithms: source coding
• Huffman, Shannon-Fano-Elias, Arithmetic code
Codeword
Length Codeword X Probability
2 01 1 0.25 0.3 0.45 0.55 1
2 10 2 0.25 0.25 0.3 0.45
2 11 3 0.2 0.25 0.25
3 000 4 0.15 0.2
3 001 5 0.15
This code has average length 2.3 bits.

Practical algorithms: channel coding
!"#$%&'()*+,'' .*#/*)01*#%)'
+*-$' +*-$'
2%33"#4' 80&(*'
+*-$'
5$$-6
!9:.'
7*)*3*#'
?.2' :*)%&'
;'<=%&1%)>'-"%4&%3'

Practical algorithms: decoding
Viterbi algorithm

What are some future topics

Distributed lossless coding
• Consider the two i.i.d. sources (U, V ) ∼ p(u, v)
I
Un Encoder 1
Decoder (Û n , V̂ n)
J
Vn Encoder 2
R1 + R2 ≥ H(U, V )
R2
H(V )
H(V |U )
H(U |V ) H(U ) R1

Multi-user information theory

Multiple access channel
1 2 1 2
X1n
W1 Encoder 1
Yn
p(y|x1 , x2 ) Decoder (Ŵ1 , Ŵ2 )
X2n
W2 Encoder 2
2 2 1
R2
I(X2 ; Y |X1)
I(X2 ; Y )
I(X1 ; Y ) I(X1 ; Y |X2) R1

rem 1 [1,2]: The capacity region for the DM-MAC is the convex
Statistics
• Large deviation theory
• Stein’s lemma...

One last thing...
Make things as simple as possible, but not simpler.

– A. Einstein

Lecture 22

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 22

Uploaded by

Copyright:

Available Formats

Lecture 22: Final Review

• Nuts and bolts

Dr. Yao Xie, ECE587, Information Theory, Duke University

Dr. Yao Xie, ECE587, Information Theory, Duke University 1

• Conditional entropy: H(X|Y ), joint entropy: H(X, Y )

• Mutual information: reduction in uncertainty due to another random

Dr. Yao Xie, ECE587, Information Theory, Duke University 2

Dr. Yao Xie, ECE587, Information Theory, Duke University 3

H(X |Y ) I(X;Y ) H(Y |X )

Dr. Yao Xie, ECE587, Information Theory, Duke University 4

• Jensen’s inequality: if f is a convex function, Ef (X) ≥ f (EX).

• Conditioning reduces entropy: H(X|Y ) ≤ H(X)

Dr. Yao Xie, ECE587, Information Theory, Duke University 5

Dr. Yao Xie, ECE587, Information Theory, Duke University 6

Dr. Yao Xie, ECE587, Information Theory, Duke University 7

Source Encoder Transmitter Receiver Decoder

• Data compression limit (lossless source coding)

• Data transmission limit (channel capacity)

• Tradeoff between rate and distortion (lossy compression)

Dr. Yao Xie, ECE587, Information Theory, Duke University 8

Dr. Yao Xie, ECE587, Information Theory, Duke University 9

Data compression Data transmission

• H(X|X̂) = 0, I(X; X̂) = H(X)

Dr. Yao Xie, ECE587, Information Theory, Duke University 10

• optimal code length: li∗ = − logD pi

Dr. Yao Xie, ECE587, Information Theory, Duke University 11

Dr. Yao Xie, ECE587, Information Theory, Duke University 12

• Gaussian channel: C = 12 log(1 + N

Channel 1 Channel 2 Channel 3

Dr. Yao Xie, ECE587, Information Theory, Duke University 13

Dr. Yao Xie, ECE587, Information Theory, Duke University 14

• R(D) = minp(x|x̂):∑ p(x)p(x̂|x)d(x,x̂)≤D I(X; X̂)

• compute R(D): construct “test” channel

Dr. Yao Xie, ECE587, Information Theory, Duke University 15

Dr. Yao Xie, ECE587, Information Theory, Duke University 16

Dr. Yao Xie, ECE587, Information Theory, Duke University 17

• Law of Large Number (LLN): If xn independent and identically

• Variance: σ 2 = E{(X − µ)2} = E{X 2} − µ2

• Central Limit Theorem (CLT):

Dr. Yao Xie, ECE587, Information Theory, Duke University 18

• LLN for product of i.i.d. random variables

Divide all sequences in X n into two sets

Dr. Yao Xie, ECE587, Information Theory, Duke University 19

Dr. Yao Xie, ECE587, Information Theory, Duke University 20

Dr. Yao Xie, ECE587, Information Theory, Duke University 21

Dr. Yao Xie, ECE587, Information Theory, Duke University 22

Dr. Yao Xie, ECE587, Information Theory, Duke University 23

• Huffman, Shannon-Fano-Elias, Arithmetic code

This code has average length 2.3 bits.

Dr. Yao Xie, ECE587, Information Theory, Duke University 24

Dr. Yao Xie, ECE587, Information Theory, Duke University 25

Dr. Yao Xie, ECE587, Information Theory, Duke University 26

Dr. Yao Xie, ECE587, Information Theory, Duke University 27

Dr. Yao Xie, ECE587, Information Theory, Duke University 28

Dr. Yao Xie, ECE587, Information Theory, Duke University 29

I(X1 ; Y ) I(X1 ; Y |X2) R1

• Large deviation theory

Dr. Yao Xie, ECE587, Information Theory, Duke University 31

Make things as simple as possible, but not simpler.

Dr. Yao Xie, ECE587, Information Theory, Duke University 32

You might also like