SS 19

Technische Universität München
Lehrstuhl für Nachrichtentechnik

Prof. Dr. sc. techn. Gerhard Kramer
Information Theory
Final Exam
Prof. Dr. sc. techn. Gerhard Kramer
Summer Semester 2019
September 30, 2019
• Write your name and student ID on every sheet. Please have your student card ready for
examination.
• The exam duration is 90 minutes.
• This exam has 5 questions on 20 pages (excluding the cover page).
• Do not write with pencils or red pens.
• You can get full credit if and only if you give reasons for your answer.
• Problem parts that are marked with an =⇒ can be solved independently of previous parts
of the same problem.
• Please remain seated until the end of the exam.
• You must hand in this problem set!
• Unless a problem part begins with ”Prove..”, you may use results from the lecture without
proof.
• No handwritten notes and no calculators are permitted.
Good Luck!
Name:
Student ID: Course of Studies:
Please read and sign below:

I hereby confirm that I have been informed prior to the beginning of the examination that I have
to notify the examination supervisors immediately if sudden illness occurs during the examina-
tion. This will be noted in the examination protocol. An application for exam withdrawal has to
be filed immediately to the board of examiners being in charge. A medical certificate from one of
the physicians acknowledged by the Technische Universität München issued on the same day as
the examination must be forwarded without delay. In case the examination is completed despite
illness, a subsequent withdrawal due to illness cannot be accepted. In case the examination is
ended due to illness it will not be graded.
Signature:
For internal use only:
Question Points Score
Entropy and Mutual Information 36
Source Codes 25
Discrete Stationary Sources 10
Differential Entropy 13
Channel Coding 16
Total: 100
Problem 1: Entropy and Mutual Information (36 points)
a) For two discrete random variables X and Y taking values in the same set, define the (4 pt)
following quantities as functions of their probability distribution: H(X), H(X|Y = b),
D(PX kPY ), I(X; Y ).
Solution:
X
H(X) = − PX (x) log2 PX (x)
x∈supp(PX )
X
H(X|Y = b) = − PX|Y (x|b) log2 PX|Y (x|b)
x∈supp(PX|Y (·|b))
X PX (x)
D(PX kPY ) = PX (x) log2
PY (x)
x∈supp(PX )
X PXY (x, y)
I(X, Y ) = PXY (x, y) log2
PX (x)PY (y)
(x,y)∈supp(PXY )
=⇒ b) Let X be a discrete random variable and g be some function defined on the set in which (4 pt)
X takes values. Show that (you may use basic properties of entropy without proof)
(i) H(X, g(X)) = H(X)
(ii) H(X, g(X)) ≥ H(g(X)). When does equality hold?
Solution:
(i) We have
H(X, g(X)) = H(X) + H(g(X)|X) = H(X)

| {z }
=0
since g(X) is determined by X.

(ii) We have
H(X, g(X)) = H(g(X)) + H(X|g(X)) ≥ H(g(X)),

| {z }
≥0
since any discrete entropy is nonnegative. Equality holds if g is invertible

since we have H(X|g(X)) in this case.
Page 1 - Please turn page

=⇒ c) Consider two random variables X and Y with the following joint distribution PXY : (8 pt)
X=a X=b X=c X=d
Y =a 1/8 0 0 1/8
Y =b 1/8 0 1/8 0
Y =c 1/8 1/8 0 0
Y =d 1/8 1/8 0 0
(i) Determine PX and compute H(X) in bits.
(ii) Determine PY and compute H(Y ) in bits.
(iii) Compute H(XY ) and I(X; Y ) in bits.
(iv) Compute H(X|Y ) and H(Y |X) in bits.
Note: No credit will be given if an incorrect result is based on previous incorrect results.
Use the relations between entropies and mutual information to confirm your results.
Solution:
(i)
PX (a) = 1/2, PX (b) = 1/4, PX (c) = 1/8, PX (d) = 1/8,

1 1 1
H(X) = log2 4 + log2 4 + 2 log2 8 = 7/4 bits
4 2 8
(ii)
PX (a) = 1/4, PX (b) = 1/4, PX (c) = 1/4, PX (d) = 1/4,

1
H(Y ) = 4 log2 4 = 2 bits
4
(iii)
1
H(XY ) =8 log2 8 = 3 bits
8
I(X; Y ) =H(X) + H(Y ) − H(XY ) = 7/4 + 2 − 3 = 3/4 bits
(iv)
H(X|Y ) = H(XY ) − H(Y ) = 3 − 2 = 1 bit

H(Y |X) = H(XY ) − H(X) = 3 − 7/4 = 5/4 bit

Solution: This is an answer.

=⇒ d) If I(X; Y1 ) = 0 and I(X; Y2 ) = 0, does this mean that I(Y1 ; Y2 ) = 0? Prove it or provide (4 pt)
a counterexample.
Solution: Counterexample: X is independent of Y1 and Y2 = Y1 . Then, I(X; Y1 ) =

0 and I(X; Y2 ) = 0, but I(Y1 ; Y2 ) = H(Y1 ).
=⇒ e) For any discrete random variable X, show that (5 pt)

X
2
H(X) ≥ − log2 PX (x) .
x∈supp(PX )
Hint: use Jensen’s inequality.
Solution:

−H(X) = E log2 PX (X)
(a)
≤ log2 E PX (X)
X
2
= log2 PX (x) ,
x∈supp(PX )
where (a) follows from Jensen’s inequality and the concavity of log2 (·). Multiplying
both sides by (−1) yields the desired inequality.

=⇒ f) For any discrete random variables X, Y , Z, show that (5 pt)
I(X; Y ) − I(Y ; Z) ≤ I(X; Y |Z).
When does equality hold?
Solution:
I(X; Y ) − I(Y ; Z) = H(Y ) − H(Y |X) − H(Y ) + H(Y |Z)

= H(Y |Z) − H(Y |X)
| {z }
≥H(Y |XZ)
≤ H(Y |Z) − H(Y |XZ)

= I(Y ; X|Z),
with equality iff Y is conditionally independent of Z given X, i.e. iff Y –X–Z forms
a Markov chain.
=⇒ g) Give examples of joint random variables (X, Y, Z) such that (6 pt)

(i) I(X; Y |Z) < I(X; Y )
(ii) I(X; Y |Z) > I(X; Y )
and show that each example satisfies the relation.
Solution:
(i) X Ber(1/2) and Y = Z = X. We have I(X; Y |Z) = 0 and
I(X; Y ) = H(X) = 1.
(ii) X Ber(1/2), Z Ber(1/2) independent of X and Y = X ⊕ Z.

Extra space:

Problem 2: Source Codes (25 points)
a) Consider each of following codes. Is it uniquely decodable? Explain why! (8 pt)

(i) {00, 0101, 0001, 011}
(ii) {0, 01, 10, 110}
(iii) The Elias code for the positive integers.
(iv) The concatenation of a Tunstall and a Huffman code.
Solution:
(i) uniquely decodable since it is suffix-free
(ii) not u.d., 0110 could be 01,10 or 0,110
(iii) not u.d.. I.e. 1 is coded as 1 and 3 is coded as 11.
(iv) u.d. since both are u.d. and we can first decode the Huffman code and then decode
the Tunstall code.

Consider a discrete memoryless source with distribution PX , where PX is given by:

x a b c d e f g
PX (x) 1/8 1/4 1/16 1/8 1/8 1/4 1/16
=⇒ b) Draw the tree of a ternary Huffman code for the source PX (·). Label all nodes with (5 pt)
their probabilities and the branches with their code symbols. Make the assignment of
source symbols to code words explicit.

=⇒ c) Draw the tree of a ternary Shannon-Fano code for the source PX . Label all nodes with (6 pt)
their probabilities and the branches with their code symbols. Make the assignment of
source symbols to code words explicit.
Solution:
dlog3 4e = dlog3 8e = 2, dlog3 16e = 3
d) (i) Compute the expected code word lengths for the codes in (a) and (b). (6 pt)
(ii) Is either of the two codes an optimal ternary prefix-free code? If so, which one?
(iii) Can you improve the efficiency of the Huffman code by changing the alphabet size
of the code? If so, to what number? Justify your answer answer. The efficiency of
a D-ary block-to-variable-length code is defined as E[L]H(X)
log D
.
2
Solution:
(i)
(ii) Huffman codes are always optimal for a given alphabet size.
(iii) The source probabilities are all negative powers of two. Hence, a Huffman code for
D = 2 will achieve `(x) = − log2 PX (x) for all x and thus E[L] = H(X), i.e., an
efficiency of 1.

Extra space:

Problem 3: Discrete Stationary Sources (10 points)
Consider the following discrete stationary source putting out X1 , X2 , . . . where each Xi takes
values in {a, b, c}. Each branch specifies the conditional probability for a transition from one
state to the state to which the arrow is pointed.
β 1−γ
a b c γ
1−β
a) Derive the stationary distribution of the source. What is the memory? (5 pt)
Solution: The memory of the source is µ = 1, since the transition probability only
depends on the current state.
PX (a) = βPX (b)

PX (b) = (1 − γ)PX (c)
1 = PX (a) + PX (b) + PX (c)
Putting (i) in (ii) yields

PX (a) = PX (c)β(1 − γ).
Using (iii) then gives
1 = PX (c)β(1 − γ) + PX (c)(1 − γ) + PX (c)
and thus
β(1 − γ)
PX (a) =
1 + (1 + β)(1 − γ)
1−γ
PX (b) =
1 + (1 + β)(1 − γ)
1
PX (c) =
1 + (1 + β)(1 − γ)

=⇒ b) Compute the entropy rate H∞ (X) = limB→∞ H(X B )/B of the source. What is the (5 pt)
entropy rate if β, γ ∈ {0, 1}? If you did not complete (b), take PX (a) = PX (b) =
PX (c) = 1/3 to compute the entropy rate.
Solution:
H(X B )
H∞ (X) = lim
B→∞ B
= H(X2 |X1 )
= PX (a) H(X2 |X1 = a) +PX (b) H(X2 |X1 = b) +PX (c) H(X2 |X1 = c)
| {z } | {z } | {z }
=0 =H2 (β) =H2 (γ)
(1 − γ)H2 (β) H2 (γ)

= +
1 + (1 + β)(1 − γ) 1 + (1 + β)(1 − γ)
If β, γ ∈ {0, 1}, the entropy rate is zero since the next state is always deterministic
given the current state.

Extra space:

Problem 4: Differential Entropy (13 points)
Let C be a fair coin flip with Pr[C = heads] = Pr[C = tails] = 1/2. Further let X be a
real-valued random variable that is influenced by C in the following way: if C is heads, then
X is uniformly distributed over the set [0, 1]; if C is tails, then X is uniformly distributed
over the set [0, 1/2].
a) Compute h(X|C). (4 pt)
=⇒ b) Compute h(X). (4 pt)
Hint: You may use log2 (3) ≈ 3/2.
Solution: a)
h(X|C) = Pr[C = heads]h(X|C = heads) + Pr[C = tails]h(X|C = tails)

1 1 1 1/2
Z Z
1
= 1 log2 1dx + 2 log2 dx
2 0 2 0 2
= −1/2 bit
b) we have
pX (x) = 1/2 · 1 · 1{0≤x<1/2} + (1/2 · 1 + 1/2 · 2)1{1/2≤x<1}

= 1/2 · 1{0≤x<1/2} + 3/2 · 1{1/2≤x<1}
Thus,
Z 1/2 Z 1
1 3 2
h(X) = log2 2 dx + log2 dx
0 2 1/2 2 3
1 3
≈ + (1 − 3/2)
4 4
= −1/8

=⇒ c) Suppose that X and Y are continuous random variables satisfying E (X − Y )2 ≤ D. (5 pt)
Justify the steps (i)-(iii). When does equality hold in (ii) and (iii)?
I(X; Y ) = h(X) − h(X|Y )

(i)
= h(X) − h(X − Y |Y )
(ii)
≥ h(X) − h(X − Y )
(iii) 1
≥ h(X) − log(2πeD)
2
Solution:
(i) Translation invariance of differential entropy (Y is constant if condition on it).
(ii) Conditioning does not increase entropy. Equality iff (X − Y ) is independent of Y .
(iii) Maximum entropy property of Gaussian random variables with a second moment
constraint. Equality iff X − Y

Problem 5: Channel Coding (16 points)
a) Find the capacity and the capacity achieving input distribution PX of the following (6 pt)
channel (all transition probabilities are written above their branches). Justify your
derivations.
X PY |X Y
1−ε
1 4
ε
1−ε
2 5
ε ε
1−ε
3 6
Solution:
C = max I(X; Y )
PX
= max(H(Y ) − H(Y |X)

PX
 
X3
= max H(Y ) − PX (x) H(Y |X = x)
 
PX | {z }
x=1
= H2 (ε)
= max H(Y ) − H2 (ε)

PX
= log2 (3) − H2 (ε),
where in the last line we used that by symmetry, PY is uniform if PX is uniform and
thus H(Y ) = log2 3 is possible. Hence, the capacity of log2 3 − H2 (ε) is achieved by
a uniform input distribution.

N ρN
Next, let Z1 , Z2 be jointly Gaussian with zero mean and covariance matrix
ρN N
for ρ ∈ [−1, 1]. Consider the following channel with the cost constraint E[X 2 ] ≤ P for
some P > 0:
Z1 α Z2
X + × + Y
=⇒ b) Derive the channel capacity as a function of α and ρ. (6 pt)

Hint: you may use the capacity result for the AWGN channel C(SNR) = 12 log2 (1+SNR)
without proof.
c) Maximize the channel capacity over ρ for some fixed α > 0. (4 pt)
Solution:
The equivalent AWGN channel model is
Z0
X0 + Y
Here, X 0 = αX and Z 0 = αZ1 + Z2 . We have
Var[X 0 ] = Var[αX] ≤ α2 P, (1)

Var[Z 0 ] = E[(αZ1 + Z2 )2 ] = E[α2 Z12 + 2αZ1 Z2 + Z22 ] (2)
= α2 E[Z12 ] + 2αE[Z1 Z2 ] + E[Z22 ] (3)
= α2 N + 2αρ + N. (4)
Using the AWGN capacity formula, we get
α2 P

C = 1/2 · log2 1 + . (5)
N (α2 + 2αρ + 1)
(c) To maximize the capacity, we need to minimize Var[Z 0 ]. Since this is linear in ρ, we
must choose ρ as small as possible, i.e., ρ = −1. We get
Var[Z 0 ] = N (α2 − 2α + 1) = N (α − 1)2 (6)

α2 P

C = 1/2 · log2 1 + . (7)
N (α − 1)2



Extra space:
Page 20 - End of exam

SS 19

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SS 19

Uploaded by

Copyright:

Available Formats

Technische Universität München

Lehrstuhl für Nachrichtentechnik

Student ID: Course of Studies:

Please read and sign below:

H(X, g(X)) = H(X) + H(g(X)|X) = H(X)

since g(X) is determined by X.

H(X, g(X)) = H(g(X)) + H(X|g(X)) ≥ H(g(X)),

since any discrete entropy is nonnegative. Equality holds if g is invertible

Page 1 - Please turn page

PX (a) = 1/2, PX (b) = 1/4, PX (c) = 1/8, PX (d) = 1/8,

PX (a) = 1/4, PX (b) = 1/4, PX (c) = 1/4, PX (d) = 1/4,

H(X|Y ) = H(XY ) − H(Y ) = 3 − 2 = 1 bit

Page 2 - Please turn page

Page 3 - Please turn page

Solution: Counterexample: X is independent of Y1 and Y2 = Y1 . Then, I(X; Y1 ) =

=⇒ e) For any discrete random variable X, show that (5 pt)

Hint: use Jensen’s inequality.

Page 4 - Please turn page

I(X; Y ) − I(Y ; Z) = H(Y ) − H(Y |X) − H(Y ) + H(Y |Z)

≤ H(Y |Z) − H(Y |XZ)

=⇒ g) Give examples of joint random variables (X, Y, Z) such that (6 pt)

Page 5 - Please turn page

Solution: This is an answer.

Page 6 - Please turn page

a) Consider each of following codes. Is it uniquely decodable? Explain why! (8 pt)

Page 7 - Please turn page

Solution: This is an answer.

Page 8 - Please turn page

Page 9 - Please turn page

Solution: This is an answer.

Page 10 - Please turn page

PX (a) = βPX (b)

Putting (i) in (ii) yields

1 = PX (c)β(1 − γ) + PX (c)(1 − γ) + PX (c)

Page 11 - Please turn page

(1 − γ)H2 (β) H2 (γ)

Page 12 - Please turn page

Solution: This is an answer.

Page 13 - Please turn page

h(X|C) = Pr[C = heads]h(X|C = heads) + Pr[C = tails]h(X|C = tails)

pX (x) = 1/2 · 1 · 1{0≤x<1/2} + (1/2 · 1 + 1/2 · 2)1{1/2≤x<1}

Page 14 - Please turn page

I(X; Y ) = h(X) − h(X|Y )

Page 15 - Please turn page

= max(H(Y ) − H(Y |X)

= max H(Y ) − H2 (ε)

= log2 (3) − H2 (ε),

Page 16 - Please turn page

=⇒ b) Derive the channel capacity as a function of α and ρ. (6 pt)

Here, X 0 = αX and Z 0 = αZ1 + Z2 . We have

Var[X 0 ] = Var[αX] ≤ α2 P, (1)

Using the AWGN capacity formula, we get

Var[Z 0 ] = N (α2 − 2α + 1) = N (α − 1)2 (6)

Page 17 - Please turn page

Page 18 - Please turn page

Page 19 - Please turn page

Solution: This is an answer.

Page 20 - End of exam

You might also like