Professional Documents
Culture Documents
Information Theory
Final Exam
Prof. Dr. sc. techn. Gerhard Kramer
Summer Semester 2019
September 30, 2019
• Write your name and student ID on every sheet. Please have your student card ready for
examination.
• The exam duration is 90 minutes.
• This exam has 5 questions on 20 pages (excluding the cover page).
• Do not write with pencils or red pens.
• You can get full credit if and only if you give reasons for your answer.
• Problem parts that are marked with an =⇒ can be solved independently of previous parts
of the same problem.
• Please remain seated until the end of the exam.
• You must hand in this problem set!
• Unless a problem part begins with ”Prove..”, you may use results from the lecture without
proof.
• No handwritten notes and no calculators are permitted.
Good Luck!
Name:
a) For two discrete random variables X and Y taking values in the same set, define the (4 pt)
following quantities as functions of their probability distribution: H(X), H(X|Y = b),
D(PX kPY ), I(X; Y ).
Solution:
X
H(X) = − PX (x) log2 PX (x)
x∈supp(PX )
X
H(X|Y = b) = − PX|Y (x|b) log2 PX|Y (x|b)
x∈supp(PX|Y (·|b))
X PX (x)
D(PX kPY ) = PX (x) log2
PY (x)
x∈supp(PX )
X PXY (x, y)
I(X, Y ) = PXY (x, y) log2
PX (x)PY (y)
(x,y)∈supp(PXY )
=⇒ b) Let X be a discrete random variable and g be some function defined on the set in which (4 pt)
X takes values. Show that (you may use basic properties of entropy without proof)
(i) H(X, g(X)) = H(X)
(ii) H(X, g(X)) ≥ H(g(X)). When does equality hold?
Solution:
(i) We have
Solution:
(i)
(ii)
(iii)
1
H(XY ) =8 log2 8 = 3 bits
8
I(X; Y ) =H(X) + H(Y ) − H(XY ) = 7/4 + 2 − 3 = 3/4 bits
(iv)
Solution:
−H(X) = E log2 PX (X)
(a)
≤ log2 E PX (X)
X
2
= log2 PX (x) ,
x∈supp(PX )
where (a) follows from Jensen’s inequality and the concavity of log2 (·). Multiplying
both sides by (−1) yields the desired inequality.
Solution:
with equality iff Y is conditionally independent of Z given X, i.e. iff Y –X–Z forms
a Markov chain.
Solution:
(i) X Ber(1/2) and Y = Z = X. We have I(X; Y |Z) = 0 and
I(X; Y ) = H(X) = 1.
(ii) X Ber(1/2), Z Ber(1/2) independent of X and Y = X ⊕ Z.
Solution:
(i) uniquely decodable since it is suffix-free
(ii) not u.d., 0110 could be 01,10 or 0,110
(iii) not u.d.. I.e. 1 is coded as 1 and 3 is coded as 11.
(iv) u.d. since both are u.d. and we can first decode the Huffman code and then decode
the Tunstall code.
Solution:
dlog3 4e = dlog3 8e = 2, dlog3 16e = 3
d) (i) Compute the expected code word lengths for the codes in (a) and (b). (6 pt)
(ii) Is either of the two codes an optimal ternary prefix-free code? If so, which one?
(iii) Can you improve the efficiency of the Huffman code by changing the alphabet size
of the code? If so, to what number? Justify your answer answer. The efficiency of
a D-ary block-to-variable-length code is defined as E[L]H(X)
log D
.
2
Solution:
(i)
(ii) Huffman codes are always optimal for a given alphabet size.
(iii) The source probabilities are all negative powers of two. Hence, a Huffman code for
D = 2 will achieve `(x) = − log2 PX (x) for all x and thus E[L] = H(X), i.e., an
efficiency of 1.
Consider the following discrete stationary source putting out X1 , X2 , . . . where each Xi takes
values in {a, b, c}. Each branch specifies the conditional probability for a transition from one
state to the state to which the arrow is pointed.
β 1−γ
a b c γ
1−β
a) Derive the stationary distribution of the source. What is the memory? (5 pt)
Solution: The memory of the source is µ = 1, since the transition probability only
depends on the current state.
and thus
β(1 − γ)
PX (a) =
1 + (1 + β)(1 − γ)
1−γ
PX (b) =
1 + (1 + β)(1 − γ)
1
PX (c) =
1 + (1 + β)(1 − γ)
Solution:
H(X B )
H∞ (X) = lim
B→∞ B
= H(X2 |X1 )
= PX (a) H(X2 |X1 = a) +PX (b) H(X2 |X1 = b) +PX (c) H(X2 |X1 = c)
| {z } | {z } | {z }
=0 =H2 (β) =H2 (γ)
If β, γ ∈ {0, 1}, the entropy rate is zero since the next state is always deterministic
given the current state.
Let C be a fair coin flip with Pr[C = heads] = Pr[C = tails] = 1/2. Further let X be a
real-valued random variable that is influenced by C in the following way: if C is heads, then
X is uniformly distributed over the set [0, 1]; if C is tails, then X is uniformly distributed
over the set [0, 1/2].
a) Compute h(X|C). (4 pt)
=⇒ b) Compute h(X). (4 pt)
Hint: You may use log2 (3) ≈ 3/2.
Solution: a)
b) we have
Thus,
Z 1/2 Z 1
1 3 2
h(X) = log2 2 dx + log2 dx
0 2 1/2 2 3
1 3
≈ + (1 − 3/2)
4 4
= −1/8
Solution:
(i) Translation invariance of differential entropy (Y is constant if condition on it).
(ii) Conditioning does not increase entropy. Equality iff (X − Y ) is independent of Y .
(iii) Maximum entropy property of Gaussian random variables with a second moment
constraint. Equality iff X − Y
a) Find the capacity and the capacity achieving input distribution PX of the following (6 pt)
channel (all transition probabilities are written above their branches). Justify your
derivations.
X PY |X Y
1−ε
1 4
ε
1−ε
2 5
ε ε
1−ε
3 6
Solution:
C = max I(X; Y )
PX
where in the last line we used that by symmetry, PY is uniform if PX is uniform and
thus H(Y ) = log2 3 is possible. Hence, the capacity of log2 3 − H2 (ε) is achieved by
a uniform input distribution.
X + × + Y
Solution:
The equivalent AWGN channel model is
Z0
X0 + Y
α2 P
C = 1/2 · log2 1 + . (5)
N (α2 + 2αρ + 1)
(c) To maximize the capacity, we need to minimize Var[Z 0 ]. Since this is linear in ρ, we
must choose ρ as small as possible, i.e., ρ = −1. We get