# COMP-761: Quantum Information Theory

Winter 2009

Lecture 4 — January 20
Lecturer: Patrick Hayden

4.1

Scribe: Mazen Al Borno

Erasure Channel

Back in 1948, it was a surprise to find that positive rates were achievable in general. If it seems obvious to
you, keep in mind that the rate achievable using repetition codes for a non-trivial erasure channel is exactly
zero. The only way to get the error probability to zero is to repeat an unbounded number of times.

4.2

Jointly Typical Sequences
(n)

The set of jointly typical sequences A

⊆ χn × Y n with respect to p(x, y) is:

A(n)
={(xn , y n ) ∈ χn × Y n : 

.

.

1 .

.

− log p(xn ) − H(X).

. < .

n .

.

.

.

.

1 .

− log p(y n ) − H(Y ).

< . .

.

n .

.

.

1 .

.

Y ).− log p(xn . y n ) − H(X.

. < }.

n .

Proof: (of Joint AEP) 1. then P r{(xn . |A | ≤ 2n(H(X. Same procedure as for AEP. y n ) but x ˜n and y˜n are independent). The size of the X typical set is 2nH(X) and the size of the Y typical set is 2nH(Y ) .Y ) . y˜n ) ∼ p(x)p(y) (i.i.e. |A | ≥ (1 − δ)2n(H(X. If (xn . Therefore.Y )+) and ∀δ > 0. assuming 4-1 . i. y˜ ) ∈ A } ≤ 2 and P r{(˜ x .1. Theorem 4. Joint Asymptotic Equipartition (Joint AEP) Theorem i.d 3. 3.d (n) n→∞ 1. y n ) ∈ A } −→ 1 (n) (n) 2. y˜ ) ∈ A } ≥ (1 − δ)2−n(I(X.i.Y )+3) for n sufficiently large. 2.Y )−) for n sufficiently large.Y )−3) n n P r{(˜ x . then (n) (n) n n −n(I(X. y n ). y n ) ∼ p(xn .2. However. Follows after applying AEP three times. We first provide an intuitive argument. the size of the jointly typical set is only 2nH(X. same marginals as (xn . If (˜ xn .

If R < I(X. y n ) are jointly typical.  4. Let M = 2nR . nH(Y ) = 2 2 X P r{(˜ xn . then xn (w0 ) and xn (1) are independent. (b) There exists a w0 6= w such that (xn (w0 ). You want to generate a (2nR .y n ∈A −n(H(X)−) −n(H(Y )−) ≤ |A(n) 2  |2 ≤ 2n(H(X. xn (2nR ) i. y n ) ∈ / A } <  for n sufficiently large. Therefore. Ec P¯e < 2 for n sufficiently large. (n) (a) P r{(xn (1).3 Shannon’s Noisy Coding Achievability Fix p(x). y n ) ∈ A(n)  }≤2 nR P r{∃w0 6= w such that (xn (w0 ). (b) If w0 6= 1.Y )−3)  } ≤ (2 The last step is justified by the union bound P r{A ∪ B} ≤ P r{A} + P r{B}. which implies that y n and xn (w0 ) are independent. Choose xn (1). y˜n ) ∈ A } ∼ nH(X)2 .Y ) all sequences are equiprobable. which is the average over all codes (c) of the average error probability.i.d according to pn (xn ) = Qn j=1 p(xj ) (n) 2. .Y ) (n) 2 −nI(X. We begin by estimating Ec P¯en . 1. Y ) − 3. Let  > 0. . y n ) is not jointly typical. . There are two possible sources of error for a given w: (a) (xn (w). it’s a failure. By the Joint AEP: −n(I(X. then (n) (n) P r{∃w0 6= w such that (xn (w0 ). y˜n ) ∈ A(n) p(xn )p(y n )  }= (n) xn . 4-2 . n) code.Y )−3) P r{(xn (w0 ). y n ) ∈ A . xn (2). . Use typical set decoding.Y )+) 2−n(H(X)−) 2−n(H(Y )−) ≤ 2−n(I(X. M 1 X (n) Ec P¯e(n) = Ec P (w) M w=1 e = M 1 X Ec Pe(n) (w) M w=1 = Ec Pe(n) (1) [by the symmetry of the code with respect to permutation of messages] Error type: Let y n be generated by the channel by input xn (1). y n ) ∈ A(n) − 1)2−n(I(X. Decode y n as the unique w such that (xn (w).COMP-761 Lecture 4 — January 20 Winter 2009 nH(X. P r{(˜ xn . y n ) ∈ A } <  as n gets sufficiently large. Formally. If no such w exists.Y )−3) Proving the lower bound on P r{(˜ xn . y˜n ) ∈ A(n)  } is similar.

y n ) + H(w|y n ) (H(w|y n ) = 0 since Pe(n) = 0) ≤ I(xn . nR = H(w) = I(w. . P (yj |xn . .COMP-761 Lecture 4 — January 20 Winter 2009 (n) Since the expectation over codes of P¯e is no more than 2. yj ) j=1 4-3 .) Assign a uniform distribution to the messages.  The expurgated code has a rate of: 1 n log[2nR /2] = R − n1 . (n) Proof: Suppose not. yj−1 ) j=1 Since the channel is memoryless. M 1 X (n) P (w) P¯en = M w=1 e 1 X Pe(n) (w) ≥ M w tossed 1 M > ( )4 M 2 = 2. n X = H(y n ) − H(yj |xj ) j=1 ≤ ≤ n X j=1 n X [H(yj ) − H(yj |xj )] (by subadditivity) I(xj . there must exist a code with this average error rate. all tossed codewords must have Pe (w) > 4. we now have to find another code with good worst case error criterion (n) (n) Pe . . y1 . y1 . Expurgation: Throw away the half of the codewords with worst Pe (w). . Why can’t we do better? (n) Proof (The Converse proof ): Assume for the moment that we have a code at rate R where Pe = 0. (n) This contradicts the known P¯e . (We’ll relax the assumption in the next lecture. y) p(x)  Why (∗)? I(xn . y n ) (by the data-processing inequality) n (∗) X I(xj . yj ) ≤ j=1 ≤ n max I(x. . . We have a Markov Chain: w − xn − y n − w ˜ = w. What remains will have (n) Pe (w) ≤ 4. Starting from that code. Then. yj−1 ) = P (yj |xj ). y n ) = H(y n ) − H(y n |xn ) n X n = H(y ) − H(yj |xn . . .

then I(X. Y ) = 1 − H2 (p).4 Lecture 4 — January 20 Winter 2009 Binary Symmetric Channel Reminder: p(0|0) = p(1|1) = 1 − p and p(0|1) = p(1|0) = p.COMP-761 4. I(X. Y ) = H(Y ) − H(Y |X) X = H(Y ) − p(x)H(Y |X = x) x = H(Y ) − H2 (p) (H2 (p) is a binary entropy: H2 (p) = H(X) for Bernoulli X) ≤ 1 − H2 (p) For p(x) = 12 . Therefore. Y ) = 1 − H2 (p). p(x) 4-4 . max I(X.