You are on page 1of 33

Lecture 22: Final Review

• Nuts and bolts


• Fundamental questions and limits
• Tools
• Practical algorithms
• Future topics

Dr. Yao Xie, ECE587, Information Theory, Duke University


Basics

Dr. Yao Xie, ECE587, Information Theory, Duke University 1


Nuts and bolts

• Entropy: ∑
H(X) = − p(x) log2 p(x) (bits)
∫x
H(X) = − f (x) log f (x)dx (bits)

• Conditional entropy: H(X|Y ), joint entropy: H(X, Y )

• Mutual information: reduction in uncertainty due to another random


variable
I(X; Y ) = H(X) − H(X|Y )
∑ p(x)
• Relative entropy: D(p||q) = x p(x) log q(x)

Dr. Yao Xie, ECE587, Information Theory, Duke University 2


• For stochastic processes: entropy rate

H(X n)
H(X ) = lim = lim H(Xn|Xn−1, . . . , X1)
n→∞ n n→

=− µiPij log Pij for 1st order Markov chain
ij

;4#47.(<"=%$>78%"(
!"#$%&'(( (
56.7896(!"#$%&'( <*+?(@,(-()*+,(2()*+0@,(
)*+,(-((
3*&00:,( (
.%/01023*&004,((
<*+?(@,(-(3*&*AB(',00&*A,&*',,(

Dr. Yao Xie, ECE587, Information Theory, Duke University 3


H(X,Y )

H(X |Y ) I(X;Y ) H(Y |X )

H(X ) H(Y )

Dr. Yao Xie, ECE587, Information Theory, Duke University 4


Thou Shalt Know (In)equalities

• Chain rules:
H(X, Y ) = H(X) + H(Y ∑n|X),
H(X1, X2, · · · , Xn) = ∑ i=1 H(Xi |Xi−1 , · · · , X1 )
n
I(X1, X2, · · · , Xn; Y ) = i=1 I(Xi; Y |Xi−1, · · · , X1)

• Jensen’s inequality: if f is a convex function, Ef (X) ≥ f (EX).

• Conditioning reduces entropy: H(X|Y ) ≤ H(X)

• H(X) ≥ 0 (but differential entropy can be < 0), I(X; Y ) ≥ 0 (for both
discrete and continuous)

Dr. Yao Xie, ECE587, Information Theory, Duke University 5


• Data processing inequality: X → Y → Z

I(X; Z) ≤ I(X; Y )

I(X; Z) ≤ I(Y ; Z)

• Fano’s inequality:
H(X|Y ) − 1
Pe ≥
log |X |

Dr. Yao Xie, ECE587, Information Theory, Duke University 6


Fundamental Questions and Limits

Dr. Yao Xie, ECE587, Information Theory, Duke University 7


Fundamental questions

Physical Channel

Source Encoder Transmitter Receiver Decoder

• Data compression limit (lossless source coding)

• Data transmission limit (channel capacity)

• Tradeoff between rate and distortion (lossy compression)

Dr. Yao Xie, ECE587, Information Theory, Duke University 8


Codebook

/0+1'+"#$,"-#''
10.'#23$).2")')' !"#$%""&'

()*'+"#$,"-#.'

Dr. Yao Xie, ECE587, Information Theory, Duke University 9


Fundamental limits

Data compression Data transmission


limit limit

^
min l (X; X ) max l (X; Y )

Lossless compression:
• I(X; X̂) = H(X) − H(X|X̂)

• H(X|X̂) = 0, I(X; X̂) = H(X)



• Data compression limit: x l(x)p(x) ≥ H(X)

Dr. Yao Xie, ECE587, Information Theory, Duke University 10


∑m −li
• instantaneous code: i=1 D ≤1

• optimal code length: li∗ = − logD pi

Root

10

110

111

Dr. Yao Xie, ECE587, Information Theory, Duke University 11


Data compression Data transmission
limit limit

^
min l (X; X ) max l (X; Y )

Channel capacity:
• given fixed channel with transition probability p(y|x)

• C = maxp(x) I(X; Y )

Dr. Yao Xie, ECE587, Information Theory, Duke University 12


• BSC: C = 1 − H(p), Erasure: C = 1 − α

• Gaussian channel: C = 12 log(1 + N


P
)

• water-filling

Power

P1

P2
N3

N1

N2

Channel 1 Channel 2 Channel 3

Dr. Yao Xie, ECE587, Information Theory, Duke University 13


!"

%&'()"!*" %&'()"!!"

$" #"

Dr. Yao Xie, ECE587, Information Theory, Duke University 14


Data compression Data transmission
limit limit

^
min l (X; X ) max l (X; Y )

Rate-distortion:
• given source with distribution p(x)

• R(D) = minp(x|x̂):∑ p(x)p(x̂|x)d(x,x̂)≤D I(X; X̂)

• compute R(D): construct “test” channel

Dr. Yao Xie, ECE587, Information Theory, Duke University 15


• Bernoulli source: R(D) = (H(p) − H(D))+
( )+
σ2
• Gaussian source: R(D) = 1
2 log D

Dr. Yao Xie, ECE587, Information Theory, Duke University 16


Tools

Dr. Yao Xie, ECE587, Information Theory, Duke University 17


Tools: from probability

• Law of Large Number (LLN): If xn independent and identically


distributed,
1 ∑
N
xn → E{X}, w.p.1
N n=1

• Variance: σ 2 = E{(X − µ)2} = E{X 2} − µ2

• Central Limit Theorem (CLT):

1 ∑
N
√ (xn − µ) → N (0, 1)
N σ2 n=1

Dr. Yao Xie, ECE587, Information Theory, Duke University 18


Tools: AEP

• LLN for product of i.i.d. random variables


v
u n
u∏
t
n
Xi → eE(log X)
i=1

• AEP
1 1
log → H(X)
n p(X1, X2, . . . , Xn)
p(X1, X2, . . . , Xn) ≈ 2−nH(X)

Divide all sequences in X n into two sets

Dr. Yao Xie, ECE587, Information Theory, Duke University 19


Typicality

n:| |n elements

Non-typical set

Typical set
A(n) : 2n(H + )

∋ elements

Dr. Yao Xie, ECE587, Information Theory, Duke University 20


Joint typicality
Yn
Xn
yn
xn
. .. . . .
. . . . .. . . .. .
.. .
.. . . . . . . . .
. . . . .
. .. . . . . . .. . . .
. .
.. . . . . .. . . . .
. . .
.. . . . . . .. ..
. . . .
. . .
. . . .. . . . . ..

Dr. Yao Xie, ECE587, Information Theory, Duke University 21


Tools: Maximum entropy

Discrete:
• H(X) ≤ log |X |, equality when X has uniform distribution

Continuous:
• H(X) ≤ 12 log(2πe)n|K|, EX 2 = K
equality when X ∼ N (0, K)

Dr. Yao Xie, ECE587, Information Theory, Duke University 22


Practical Algorithms

Dr. Yao Xie, ECE587, Information Theory, Duke University 23


Practical algorithms: source coding

• Huffman, Shannon-Fano-Elias, Arithmetic code

Codeword
Length Codeword X Probability
2 01 1 0.25 0.3 0.45 0.55 1
2 10 2 0.25 0.25 0.3 0.45
2 11 3 0.2 0.25 0.25
3 000 4 0.15 0.2
3 001 5 0.15

This code has average length 2.3 bits.

Dr. Yao Xie, ECE587, Information Theory, Duke University 24


Practical algorithms: channel coding
!"#$%&'()*+,'' .*#/*)01*#%)'
+*-$' +*-$'

2%33"#4' 80&(*'
+*-$'

5$$-6
!9:.'
7*)*3*#'

?.2' :*)%&'

;'<=%&1%)>'-"%4&%3'

Dr. Yao Xie, ECE587, Information Theory, Duke University 25


Practical algorithms: decoding
Viterbi algorithm

Dr. Yao Xie, ECE587, Information Theory, Duke University 26


What are some future topics

Dr. Yao Xie, ECE587, Information Theory, Duke University 27


Distributed lossless coding
• Consider the two i.i.d. sources (U, V ) ∼ p(u, v)

I
Un Encoder 1

Decoder (Û n , V̂ n)

J
Vn Encoder 2

R1 + R2 ≥ H(U, V )
R2

H(V )

H(V |U )

H(U |V ) H(U ) R1

Dr. Yao Xie, ECE587, Information Theory, Duke University 28


Multi-user information theory

Dr. Yao Xie, ECE587, Information Theory, Duke University 29


Multiple access channel
1 2 1 2

X1n
W1 Encoder 1
Yn
p(y|x1 , x2 ) Decoder (Ŵ1 , Ŵ2 )
X2n
W2 Encoder 2

2 2 1

R2

I(X2 ; Y |X1)

I(X2 ; Y )

I(X1 ; Y ) I(X1 ; Y |X2) R1


rem 1 [1,2]: The capacity region for the DM-MAC is the convex
Dr. Yao Xie, ECE587, Information Theory, Duke University 30
Statistics

• Large deviation theory

• Stein’s lemma...

Dr. Yao Xie, ECE587, Information Theory, Duke University 31


One last thing...

Make things as simple as possible, but not simpler.


– A. Einstein

Dr. Yao Xie, ECE587, Information Theory, Duke University 32

You might also like