You are on page 1of 32

Solutions to Chapter 5 Problems

Introduction to Communication Systems, by Upamanyu Madhow

Problem 5.1(a) The event that we get a sum of 10 can be written as


E = {(6, 4), (5, 5), (4, 6)}
where the outcome (a, b) denotes getting a on the first throw and b on the second. Given that
we pick the unfair die, the conditional probability of the event is therefore given by
P [E|unfair] = P [(6, 4)|unfair] + P [(5, 5)|unfair] + P [(4, 6)|unfair]
11
= (1/2)(1/10) + (1/10)(1/10) + (1/10)(1/2) = 100
(b) Using Bayes’ rule and the law of total probability,
P [E|unfair]P [unfair] P [E|unfair]P [unfair]
P [unfair|E] = =
P [E] P [E|unfair]P [unfair] + P [E|fair]P [fair]
Since we pick between the fair and unfair die at random, we have p[unfair] = P [fair] = 21 , so that
P [E|unfair]
P [unfair|E] =
P [E|unfair] + P [E|fair]
It remains to find the conditional probability of getting a sum of 10 given that we pick the fair
die:
1
P [E|fair] = P [(6, 4)|fair] + P [(5, 5)|fair] + P [(4, 6)|fair] = 3 (1/6)2 =
12
We therefore obtain
11/100 33
P [unfair|E] = =
11/100 + 1/12 58

Problem 5.2 Let F denote the event that a student fails (hence F c is the event that the student
passes), and S denote the event that a student studies (hence S c is the event that the student
does not study. We are given that P (F c |S) = 0.9 (hence P (F |S) = 1 −0.9 = 0.1), P (F |S c) = 0.9
(hence P (F c |S c ) = 1 − 0.9 = 0.1), and P (S) = 0.7 (hence P (S c ) = 1 − 0.7 = 0.3).
(a) By the law of total probability,
P (F ) = P (F |S)P (S) + P (F |S c )P (S c) = 0.1 × 0.7 + 0.9 × 0.3 = 0.34
(b) By Bayes’ rule, the conditional probability that a student that failed studied for the exam is
given by
P (F |S)P (S) 0.1 × 0.7 7
P (S|F ) = = =
P (F ) 0.34 34
(c) The conditional probability that a student that failed did not study is P (S c |F ) = 1−P (S|F ) =
7 27
1 − 34 = 34 .
(d) Yes, since conditional probabilities obey the same rule as probabilities, as long as we are
conditioning on the same event.
Remark: On the other hand, we would not expect P (S|F ) and P (S c|F c ) to add up to one,
since we are conditioning on different events. To see this, let us use Bayes’ rule to compute the
probability that a student that passed did not study:
P (F c |S c )P (S c ) 0.1 × 0.3 1
P (S c |F c ) = c
= =
P (F ) 1 − 0.34 22
Adding this to the result of (b) does not have the interpretation of adding the probabilities of
complementary events (and, indeed, gives a result unequal to one), since we are conditioning on
different events.
1
Problem 5.3 We have Y ∼ Exp(1) if 0 sent, Y ∼ Exp( 10 ) if 1 sent, and P [0 sent] = 1 −
P [1 sent] = 0.6. Recall that the complementary CDF of an exponential random variable is given
by P [Exp(µ) > z] = e−µz , z ≥ 0.
(a) P [Y > 5|0 sent] = e−5 .
5 1
(b) P [Y > 5|1 sent] = e− 10 = e− 2 .
(c) Using the law of total probability,
1
P [Y > 5] = P [Y > 5|0 sent]P [0 sent]+P [Y > 5|1 sent]P [1 sent] = e−5 ×0.6+e− 2 ×0.4 = 0.2467

(d) Using Bayes’s rule,

P [Y > 5|0 sent]P [0 sent] e−5 × 0.6


P [0 sent|Y > 5] = = = 0.0164
P [Y > 5] 0.2467

(e) Using Bayes’ rule,


pY |0 (5|0)P [0 sent]
P [0 sent|Y = 5] =
pY (5)
y
1 − 10
where, for y ≥ 0, we have the conditional densities pY |0 (y|0) = e−y and pY |1 (y|1) = 10
e , and
the unconditional density
1 −y
pY (y) = pY |0 (y|0)P [0 sent] + pY |1 (y|1)P [1 sent] = 0.6e−y + 0.4 e 10
10
Plugging these in, we obtain

e−5 × 0.6
P [0 sent|Y = 5] = 5
1 − 10
= 0.1428
e−5 × 0.6 + 10
e × 0.4

Problem 5.4(a) We have

P [b3 = 0] = P [b1 = 0, b2 = 0]+P [b1 = 1, b2 = 1] = P [b1 = 0]P [b2 = 0]+P [b1 = 1]P [b2 = 1] = 0.8×0.1+0.2×0
pi
(b) Denoting pi = P [bi = 0], we have eLi = 1−pi
. Thus,

p3 p1 p2 + (1 − p1 )(1 − p2 )
eL3 = =
1 − p3 p1 (1 − p2 ) + (1 − p1 )p2

Dividing the right-hand side by (1 − p1 )(1 − p2 ), we obtain that


p1 p2
1−p1 1−p2
+1 eL1 +L2 + 1
eL3 = p1 p2 =
1−p1
+ 1−p2 eL1 + eL2

so that
eL1 +L2 + 1
L3 = log
eL1 + eL2
We can view Li as an expression of our “belief” about the value of bit bi , and the preceding
computation is a key component of “belief propagation” based decoding of channel codes, as

2
discussed in Chapter 7.

Problem 5.5 Let Y1 , ..., Yn denote the outputs corresponding to the n channel uses. Then
Z = Y1 + ... + Yn .
(a) Given X = 0, Y1 , ..., Yn are conditionally i.i.d Bernoulli random variables with P [Yi = 1|X =
0] = a = 1 − P [Yi = 0|X = 0], so that their sum Z is conditionally binomial: Z|X=0 ∼ Bin(n, a).
Thus, the conditional pmf is given by
 
n
P [Z = z|X = 0] = p(z|0) = az (1 − a)n−z
z

(b) According to the majority rule, the receiver says 1 if Z > ⌊n/2⌋ (assume n odd). Thus, the
conditional probability of error given that 0 is sent is
n  
X n
Pe|0 = P [Z > ⌊n/2⌋|X = 0] = az (1 − a)n−z
z
z=⌊n/2⌋+1

For n = 5, a = 0.1, this evaluates to Pe|0 = 0.0086.


(c) The posterior probability

0.9

0.8

0.7

0.6
m

0.5

0.4

0.3

0.2

0.1

0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
P[X=0|Z=m]

Figure 1: Posterior probability of X = 0 given Z = m in Problem 5.5(c) (equal priors).

P [Z = m|X = 0]P [X = 0] P [Z = m|X = 0]P [X = 0]


P [X = 0|Z = m] = =
P [Z = m] P [Z = m|X = 0]P [X = 0] + P [Z = m|X = 1]P [X = 1]

Since Z|X=0 ∼ Bin(n, a) and Z|X=1 ∼ Bin(n, 1 − a), the preceding simplifies to

am (1 − a)n−m 1
P [X = 0|Z = m] = =
am (1 − a)n−m + an−m (1 − a)m 1−a 2m−n

1+ a

assuming equal priors. The stem plot is shown in Figure 1. As expected, smaller values of Z
correspond to a higher posterior probability for X = 0.
(d) When P [X = 0] = 0.9, the posterior probability can be written as

0.9am (1 − a)n−m 1
P [X = 0|Z = m] = =
0.9am (1 − a)n−m + 0.1an−m (1 − a)m 1 1−a 2m−n

1+ 9 a

3
1

0.9

0.8

0.7

0.6

m
0.5

0.4

0.3

0.2

0.1

0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
P[X=0|Z=m]

Figure 2: Posterior probability of X = 0 given Z = m in Problem 5.5(d), when P [X = 0] = 0.9.

The corresponding stem plot is shown in Figure 2. Comparing with Figure 1, we see that a
higher prior probability for X = 0 leads to a higher posterior probability for X = 0, but for
large values of m, we still have small posterior probabilities for 0 sent (if we get a large enough
number of 1s from the channel, it outweighs our prior information that 0 is significantly more
likely to be sent). (e) The LLR for a given value of Z is given as
P [X = 0|Z = m] P [X = 0|Z = m]
LLR(m) = log = log
P [X = 1|Z = m] 1 − P [X = 0|Z = m]
The stem plots corresponding to (c)-(d) are shown in Figure 3.

15 15

10
10

5
m

−5

−5
−10

−15 −10
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
LLR(m) LLR(m)

(a) LLRs (equal priors) (b) LLRs (unequal priors)

Figure 3: LLRs for equal and unequal priors in Problem 5.5(e).

Problem 5.6 (a) Using Bayes’ rule and the law of total probability, the posterior probabilities
are given by
P [Y = y|X = 0]P [X = 0] P [Y = y|X = 0]P [X = 0]
P [X = 0|Y = y] = =
P [Y = y] P [Y = y|X = 0]P [X = 0] + P [Y = y|X = 1]P [X = 1]
For 0, 1 equally likely, this reduces to
P [Y = y|X = 0]
P [X = 0|Y = y] = 0, 1 equiprobable
P [Y = y|X = 0] + P [Y = y|X = 1]

4
which gives  1−p−q−r

 1−q−r
= 0.917 y = +3
r


r+q
= 0.75 y = +1
P [X = 0|Y = y] = q

 r+q
= 0.25 y = −1
p
= 0.083 y = −3


1−q−r

(b) The LLRs are given by L(y) = log PP [X=0|Y =y]


[X=1|Y =y]
= log 1−PP[X=0|Y
[Y =y]
=y]
, and can therefore be
calculated by plugging in answers from (a). However, it is worth noting that they can also be
written as    
L(y) = log P [Y =y|X=0]P
P [Y =y]
[X=0]
/ P [Y =y|X=1]P [X=1]
P [Y =y]
= log PP [Y =y|X=0]
[Y =y|X=1]
+ log PP [X=1]
[X=0]

Thus, the LLR is a sum of two terms, one corresponding to the transition probabilities {P [Y =
y|X = i]} and one to the prior probabilities {P [X = i]}, where i = 0, 1. For equiprobable priors,
the second term is zero, hence the LLRs are given by

log 1−p−q−r = 2.4 y = +3



 p
r

P [Y = y|X = 0] 
q
= 1.1 y = +1
L(y) = log = q
P [Y = y|X = 1]  = −1.1 y = −1
 r p

1−p−q−r
= −2.4 y = −3

(c) Since the channel uses are conditionally independent, the required conditional probabilities
are given by

P [Y = y|X = i] = P [Y1 = y1 , Y2 = y2 , Y3 = y3 |X = i]
= P [Y1 = y1 |X = i]P [Y2 = y2 |X = i]P [Y3 = y3 |X = i], i = 0, 1

For y = (+1, +3, −1)T , we get

P [Y = y|X = 0] = r(1 − p − q − r)q = 0.0165, P [Y = y|X = 1] = qpr = 0.0015

(d) We replicate the argument in (a) to emphasize that the key ideas apply to vector observations
gathered over multiple channel uses as well. Using Bayes’ rule, we have

P [Y = y|X = 0]P [X = 0] P [Y = y|X = 0]P [X = 0]


P [X = 0|Y = y] = =
P [Y = y] P [Y = y|X = 0]P [X = 0] + P [Y = y|X = 1]P [X = 1]

For 0, 1, equiprobable, we have

P [Y = y|X = 0]
P [X = 0|Y = y] = = 0.917 0, 1 equiprobable
P [Y = y|X = 0] + P [Y = y|X = 1]

where we have used the results of (d). As before, the LLR can be written as

P [X = 0|Y = y] P [Y = y|X = 0] P [X = 0]
L(y) = log = log + log
P [X = 1|Y = y] P [Y = y|X = 1] P [X = 1]

Thus, for equal priors, we have


 
Y = y|X = 0] r(1 − p − q − r)q 1−p−q−r
L(y) = log = log = log = 2.4
P [Y = y|X = 1] qpr p

5
Remark: For independent channel uses, we can write the LLR as
3
X P [Y = yk |X = 0] P [X = 0]
L(y) = log + log
k=1
P [Y = yk |X = 1] P [X = 1]

so that the contributions from the different channel uses and the priors simply add up. This
illustrates why the LLR is an attractive means of combining information from prior probabilities
and observations.
(e) Since the LLR is positive (i.e., the posterior probability of 0 is higher than that of 1), we
would decide on 0 based on the channel output +1,+3,-1.

1
Problem 5.7 The random variable X ∼ Exp(µ = 10 ) (mean E[X] = µ1 = 10).
(a) P [X > x] = e−µx for x ≥ 0, hence P [X > 20] = e−20/10 = e−2 = 0.1353.
(b) P [X ≤ x] = P [X < x] = 1 − e−µx for x ≥ 0, hence P [X < 5] = 1 − e−5/10 = 0.3935.
(c) By Bayes’ rule,
P [X > 20|X > 10] = P [X>20,X>10]
P [X>10]
= PP [X>20]
[X>10]
e−20/10
= e−10/10
= e−1 = 0.3679
(d) We have
  R R∞ e−(µ+1)x ∞ µ 1
E e−X = e−x p(x)dx = 0 e−x µe−µx dx = |
−(µ+1) 0
= µ+1
= 11

1
setting µ = 10
. (e) We have
∞ ∞
1
Z Z Z
E X3 = 3 3 −µx
t3 e−t dt
 
x p(x)dx = x µe dx = 3
0 µ 0

substituting t = µx. As discussed in the text, the integral evaluates to Γ(4) = 3! = 6, so that

6
E X 3 = 3 = 6000
 
µ

Problem 5.8 (a) For X = max (U1 , ..., Un ), we have X ≤ x if and only if U1 ≤ x, ..., Un ≤ x.
The CDF of X is therefore given by

FX (x) = P [X ≤ x] = P [U1 ≤ x, ..., Un ≤ x] = P [U1 ≤ x]...P [Un ≤ x] = FUn (x)

where we have used the independence of the {Ui }.


(b) For Y = min (U1 , ..., Un ), we have Y > y if and only if U1 > y, ..., Un > y. Thus,

P [Y > y] = P [U1 > y, ..., Un > y] = P [U1 > y]...P [Un > y] = (1 − FU (y))n

Thus, the CDF of Y is given by

FY (y) = P [Y ≤ y] = 1 − P [Y > y] = 1 − (1 − FU (y))n

(c) When the {Ui } are uniform over [0, 1], we have FU (u) == u, 0 ≤ u ≤ 1, hence

FX (x) = xn , 0 ≤ x ≤ 1

6
1 1
n=1
0.9 n=5 0.9
n=10
0.8 0.8

0.7 0.7

0.6 0.6
CDF F(x)

CDF F(y)
0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2
n=1
0.1 0.1 n=5
n=10
0 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x y

(a) CDF of maximum (b) CDF of minimum

Figure 4: The CDFs of the maximum and minimum of n i.i.d. uniform random variables.

(Of course, FX (x) = 0 for x < 0 and FX (x) = 0 for x ≥ 1, since X lies in [0, 1].) Figure 4(a)
plots the CDF of X. The probability mass shifts towards one as n increases. For 0 < x < 1,
xn → 0 as n → ∞, so that the limiting CDF concentrates all of its probability mass at one:

0, x < 1
lim FX (x) =
n→∞ 1, x ≥ 1

Thus, we get a discrete random variable putting all its probability mass at one, taking the limit
of a large number of continuous random variables.
(d) When the {Ui } are uniform over [0, 1], their minimum Y also lies in [0, 1]. The CDF is given
by
FY (y) = 1 − (1 − y)n , 0 ≤ y ≤ 1
Figure 4(b) plots the CDF of Y . The probability mass shifts towards zero as n increases. For
0 < y ≤ 1, we have 0 ≥ 1 − y < 1 and (1 − y)n → 0, so that we get that

0, y ≤ 0
lim FY (y) =
n→∞ 1, y > 0

This limit is actually not a valid CDF, since it is not right-continuous at y = 0. But it does
show a unit jump at y = 0, indicating that all the probability mass is concentrated at zero. We
can prove the latter rigorously using more sophisticated techniques, but we do not attempt to
do this here.

Problem 5.9 Let U1 ∼ Exp(µ1 ) and U2 ∼ Exp(µ2 ) denote two independent exponential random
variables.
(a) We wish to express events involving min (U1 , U2 ) in terms of an intersection of events involving
U1 and U2 , in order to exploit the independence assumption. We have min (U1 , U2 ) > x if and
only if U1 > x and U2 > x. Thus, we have, for x ≥ 0,

P [min (U1 , U2 ) > x] = P [U1 > x, U2 > x] = P [U1 > x]P [U2 > x]
= e−µ1 x e−µ2 x = e−(µ1 +µ2 )x

which is in the form of the complementary CDF of an exponential random variable. That is,
min (U1 , U2 ) ∼ Exp(µ1 + µ2 ), hence the statement is True.

7
(b) Since max (U1 , U2 ) ≤ x if and only if U1 ≤ x and U2 ≤ x, we have, for x ≥ 0,

P [max (U1 , U2 ) ≤ x] = P [U1 ≤ x, U2 ≤ x] = P [U1 ≤ x]P [U2 ≤ x]


= (1 − e−µ1 x ) (1 − e−µ2 x ) = 1 + e−(µ1 +µ2 )x − e−µ1 x − e−µ2 x

which is not in the 1 − e−µx form of an exponential CDF. Thus, the maximum is not an expo-
nential random variable, and the statement is False.

Problem 5.10 (a) This is the same as Problem 5.8(b), but we can see the explicit form here

P [min(U, V ) < x] = 1 − (1 − x)2 = 2x − x2 , 0 ≤ x ≤ 1


Sketch omitted. See Figure 4(b) for similar plots.
(b) Y = V /U takes values in [0, ∞) for U, V , taking values in [0, 1].
Method 1: The CDF can be computed by conditioning on U = u, and then removing the
conditioning, as follows:
Z Z 1
FY (y) = P [Y = V /U ≤ y] = P [V ≤ Uy] = P [V ≤ Uy|U = u]p(u)du = P [V ≤ uy]du
0

where we have used the independence of U, V , and the uniform distribution of U, in the last
equality. We know that P [V ≤ v] = v, 0 ≤ v ≤ 1, and P [V ≤ v] = 1 for v ≥ 1. For y ≤ 1,
uy ≤ 1, so that Z 1
y
FY (y) = uydu = , 0 ≤ y ≤ 1
0 2
For y > 1, uy > 1 and P [V ≤ uy] = 1 for u > y1 , so that
1
1
1
Z Z
y
FY (y) = uydu + 1dy = 1 − , y>1
0 1
y
2y

Method 2: We can also compute the CDF pictorially as shown in Figure 5. The joint distribution
v=yu
v v 1/y
1 v=yu 1

y
Area = 1/(2y) Area = 1 − 1/(2y)
0 1 u 0 1 u

Area = y/2
y<1 y>1

Figure 5: Pictorial computation of CDF of V /U, where U, V are i.i.d. uniform over [0, 1].

of (U, V ) is uniform over the unit square in the (u, v) plane, so that the event V /U ≤ y whose
probability we desire to find represents the shaded regions shown, and probability itself is simply
the area of these shaded regions. For y > 1, the region V /U ≤ y is triangular, with area y/2.
For y > 1, the region V /U ≤ y is the complement of a triangular region of area 1/(2y).
p √
Problem 5.11 We have already derived the joint density of R = X12 + X22 = Z and Θ =
tan−1 X
X1
2
in Example 5.4.3, so we will just use those results.

8
2 √
(a) We have p(r) = re−r Ir≥0 from Example 5.4.3. Since Z = R2 and dz/dr = 2r = 2 z, we
have
p(r)
p(z) = √ = e−z Iz≥0
|dz/dr| r= z
showing that Z ∼ Exp(1).
(b) The statement is True. Since R and Θ are independent, so are Z = R2 and Θ.

Problem 5.12(a) The CDF of X = F −1 (U) is given as

P [X ≤ x] = P [F −1(U) ≤ x] = P [U ≤ F (x)] = F (x)

(If F is not monotone increasing, then we have to define the inverse carefully, but the preceding
result still holds.)
(b) For Y ∼ Exp(1/2) (mean 2), we have

F (y) = FY (y) = 1 − e−y/2 , y ≥ 0

For F −1 (u) = y, we have u = F (y) = 1−e−y/2 so that y = −2 log(1−u). Thus, the transformation
Y = −2 log(1 − U) works. Since 1 − U is also uniform over [0, 1], we can replace it by U, so
Y = −2 log U also works.
(c) The Matlab code for generating the histogram in Figure 6 is given below.

uniform = rand(1000,1)%1000 uniform random variables


exponential = - 2*log(uniform); %1000 exponential random variables of mean 2
hist(exponential,100);%histogram with 100 bins

80

70

60

50

40

30

20

10

0
0 5 10 15

Figure 6: Histogram for exponential random variable of mean 2, simulated using uniform random
variables as in Problem 5.12.

Problem 5.13: While all the derivations in this problem are there in the text and other problems,
we do them from scratch here in order to reinforce the concepts.
(a) Since U1 and U2 are independent, so are Z = −2 ln U1 and Θ = 2πU2 . Clearly, Θ is uniform
over [0, 2π]. Since U1 takes values in [0, 1], the random variable Z takes values in [0, ∞). The
CDF of Z is given by

P [Z ≤ z] = P [−2 ln U1 ≤ z] = P [U1 ≥ e−z/2 ] = 1 − e−z/2 , z ≥ 0

9
We recognize that Z ∼ Exp( 12 ), or an exponential random variable with mean E[Z] = 2.
Remark: This provides a good opportunity to emphasize that one does not always have to write
down explicit expressions for the joint CDF or density in order to specify the joint distribution.
Any specification that would allow one to write down such expressions if needed is sufficient. In
the preceding, we provided such a specification by stating that Z and Θ are independent, with
Z ∼ Exp( 21 ) and Θ ∼ Unif [0, 2π].
(b) Let us do this from scratch instead of using results from prior examples and problems.
1 1
p(z, θ) = p(z)p(θ) = e−z/2 , z ≥ 0, 0 ≤ θ ≤ 2π (1)
2 2π
and
p(z, θ)
p(x1 , x2 ) = (2)
|det (J(x1 , x2 ; z, θ))| z=x2 +x2 ,θ=tan−1 x2
1 2 x1

where the Jacobian can be computed as


 1 1 

2 z
cos θ √
2 z
sin θ
J(x1 , x2 ; z, θ) =  
√ √
− z sin θ z cos θ

so that det (J(x1 , x2 ; z, θ)) = 12 cos2 θ + 12 sin2 θ = 12 . Plugging this and (1) into (2), we obtain
that the joint density of X1 , X2 is given by
1 −(x21 +x22 )/2
p(x1 , x2 ) = e , − ∞ < x1 , x2 < ∞

We recognize that this is a product of two N(0, 1) densities, so that X1 , X2 are i.i.d. N(0, 1)
random variables.
(c) A code fragment using the preceding to generate N(0, 1) random variables is provided below,
and the histogram generated is shown in Figure 7.
70

60

50

40

30

20

10

0
−4 −3 −2 −1 0 1 2 3 4

Figure 7: Histogram based on 2000 N(0, 1) random variables generated using the method in
Problem 5.13.

%generating Gaussian random variables from uniform random variables


N=1000; %half the number of Gaussians needed

10
%generate uniform random variables
U1 = rand(N,1);
U2 = rand(N,1);
Z = -2*log(U1); %exponentials, mean 2
theta=2*pi*U2; % uniform over [0,2 pi]
%transform to standard Gaussian
X1=sqrt(Z).*cos(theta);
X2=sqrt(Z).*sin(theta);
X = [X1;X2];%2N independent N(0,1) random variables
hist(X,100); %histogram with hundred bins

(d) We estimate E[X 2 ] as the empirical mean by adding the following code fragment:

estimated_power = sum(X.^2)/(2*N)

The answer should be close to the theoretical answer E[X 2 ] = var(X) + (E[X])2 = 1 + 02 = 1.
(e) The desired probability P [X 3 + X > 3] can be estimated by adding the following code
fragment.

E = (X.^3 + X > 3); %indicates whether the desired event occurs


probability = sum(E)/(2*N) %counts fraction of times desired event occurs

We get an answer of about 0.11.

Figure 8: CDF of the Bernoulli random variable Y1 in Problem 5.14. The dots at the jumps at
0 and 1 indicate the right continuity of the CDF.

Problem 5.14 (a) Y1 takes values 0 and 1, with P [Y1 = 0] = P [U1 ≤ 0.7] = 0.7 and P [Y1 =
1] = 1 − P [Y1 = 0] = 0.3. Thus, the CDF is given by

 0, y1 < 0
F (y1 ) = P [Y1 ≤ y1 ) = 0.7, 0 ≤ y1 < 1
1, y≥1

and is sketched in Figure 8.


(b) The random variable Z = Y1 + ... + Yn ∼ Bin(n, p) for p = P [Y1 = 1] = 0.3. The pmt is
therefore given by  
n
P [Z = k] = pk (1 − p)n−k , k = 0, 1, ..., n
k

11
The plot for n = 20 and p = 0.3 is given in Figure 5.6.
(c)-(e) We skip the histogram, but show how to compute moments using simulation. Since Z is
a sum of n Bernoulli random variables, its first moment is simply
E[Z] = nE[Y1 ] = np
The second and third moments can be computed in a number of ways, including using moment
generating functions. We skip deriving these, but give the expressions in the code below. If we
run the code for n = 20 and p = 0.3, we will get E[Z] = 6, E[Z 2 ] = 40.2 (this was not asked for)
and E]Z 3 ] = 293.3. Simulations with 10000 runs come very close to these values, but you should
check what happens with fewer runs, say 1000.

n=20;
p=0.3;
runs=10000;%number of simulation runs needed
U=rand(n,runs); %matrix of unif(0,1) random variables
Y=U > 0.7; %threshold to get matrix Bernoulli random variables P[1]=0.3
Z=sum(Y); %add n rows to get Bin(n,p) random variables
%simulation-based moment computations
first_moment_estimate = sum(Z)/runs
second_moment_estimate=sum(Z.^2)/runs
third_moment_estimate = sum(Z.^3)/runs
%analytical computation of moments
first_moment_analytical=n*p
second_moment_analytical=n*(n-1)*p^2+n*p
third_moment_analytical=n*(n-1)*(n-2)*p^3+3*n*(n-1)*p^2+n*p

Problem 5.15 (a) The joint density must integrate to one, hence we must have
Z ∞Z ∞ Z 0 Z 0 Z ∞Z ∞
−(2x2 +y 2 )/2 −(2x2 +y 2 )/2 2 2
1=K e dxdy + K e dxdy = 2K e−(2x +y )/2 dxdy
0 0 −∞ −∞ 0 0

where we have used symmetry. The integrals in x and y separate out, and we have
Z ∞ Z ∞ −x2 /(2v2 ) √
e 2 1 π
q 1
q
−(2x2 )/2 2
e dx = 2πv1 2
dx = 2πv1 =
0 0 2πv1 2 2

massaging the x integrand into an N(0, v12 ) density, with v12 = 21 . Similarly, we can massage the
y integrand into an N(0, v22 ) density with v22 = 1 to get
Z ∞ √
21 2π
q
−y 2 /2
e dy = 2πv2 =
0 2 2
√ √ √
We therefore have 1 = 2K 2π 22π , or K = 2/π.
(b) The marginal density of X is
 R ∞ −(2x2 +y2 )/2
Z  0 e dy, x ≥ 0
p(x) = p(x, y)dy =
 R 0 −(2x2 +y2 )/2
−∞
e dy, x < 0

By symmetry, the y integrals evaluate to the same answer for the two cases above, so that
2
p(x) ∼ e−x . Thus, X ∼ N(0, 21 ) (the constant must evaluate out to whatever is needed for p(x)

12
to integrate to one. A similar reasoning shows that Y ∼ N(0, 1).
(c) The event X 2 + X > 2 can be written as

X 2 + X − 2 = (X + 2)(X − 1) > 0

which happens if X + 2 > 0, X − 1 > 0, or X + 2 < 0, X − 1 < 0. That is, it happens if X > 1
or X < −2. Thus,
! !
1−0 −2 − 0 √ √
P [X 2 + X > 2] = P [X > 1] + P [X < −2] = Q p +Φ p = Q( 2) + Q(2 2)
1/2 1/2

where we have used X ∼ N(0, 1/2) and Φ(−x) = Q(x).


(d) X, Y are not jointly Gaussian, since the probability mass is constrained to two of the four
quadrants, unlike the joint Gaussian density, for which the probability mass is spread over the
entire plane.
(e) If X > 0, then Y > 0 (even though Y can take both positive and negative values). Hence X
and Y cannot be independent.
(f) From the marginals in (b), we know that E[X] = E[Y ] = 0. However, E[XY ] > 0, since all
the probability mass falls in the region xy > 0. Thus, cov(X, Y ) = E[XY ] − E[X]E[Y ] > 0. So
X, Y are not uncorrelated.
(g) The conditional density is p(x|y) = p(x, y)/p(y). If y > 0, this evaluates to p(x|y) =
2 2
k1 e−x Ix≥0 . If y < 0, it evaluates to p(x|y) = k1 e−x Ix<0 . Since the probability mass of the
conditional density is constrained to part of the real line, it is not Gaussian.
Problem 5.16 (a) Manual computation using bilinearity:

cov(Y1 , Y2 ) = cov(X1 + 2X2 , −X1 + X2 )


== −cov(X1 , X1 ) + cov(X1 , X2 ) − 2cov(X2 , X1 ) + 2cov(X2 , X2 )
= −1 + (−1) − 2(−1) + 2(4) = 8

(b) We have Y = AX ∼ N(mY = AmX , CY = ACX AT ). We find mY = (4, −1)T ;


 
13 8
CY =
8 7

The inverse of the covariance matrix is given by


   
−1 1 7 −8 0.2593 −0.2963
CY = =
27 −8 13 −0.2963 0.4815

Plugging into the expression for the joint Gaussian density, we obtain
 
1 1 2 2
p(y1 , y2 ) = √ exp − [7(y1 − 4) − 16(y1 − 4)(y2 + 1) + 13(y2 + 1) ]
2π 27 54

(c) Z = Y1 − 2Y2 = aT Y, with aT = (1 − 2), so that Z ∼ N(aT mY = 6, aT CY a = 9). Thus,


 
1−6
P [Y1 > 2Y2 + 1] = P [Z > 1] = Q √ = Q(−5/3) = 1 − Q(5/3) = 0.9522
9

13
Problem 5.17 (a) Using bilinearity of covariance, we have

cov(Y1 , Y2) = cov(2X1 − X2 , −X1 + 3X2 )


= −2cov(X1 , X1 ) + 6cov(X1 , X2 ) + cov(X2 , X1 ) − 3cov(X2 , X2 )
= −2(4) + 6(−2) + (−2) − 3(9) = −49

(b) We have Y = AX, where  


2 −1
A=
−1 3
so that Y ∼ N(mY = Am, CY = ACAT ). We can compute mY = (−8, 9)T and
 
33 −49
CY =
−49 97

(I used Matlab, even though I could have computed it by hand.) We can now plug into formula
(6.10) for the joint Gaussian density.
(c) P [Y2 > 2Y1 −1] = P [Z > 0], where Z = Y2 −2Y1 + 1 = aT Y + 1, where a = (−2, 1)T . Thus, Z
is Gaussian with mean E[Z] = aT mY + 1 = 26 and var(Z) = aT CY a = 425, where I have again
used Matlab. Since Y = AX, we could also express Z in terms of the original random vector
X: Z = aT AX + 1 = (AT a)T X + 1 = aT1 X + 1, where a1 = AT a = (−5, 5)T . We would then
obtain E[Z] = aT1 m + 1 and var(Z) = aT1 Ca1 , which, as can be checked, give the same answers
as before. Now that we know that Z ∼ N(26, 425), we have
   
0 − 26 26
P [Z > 0] = Q √ =1−Q √ = 0.8964
425 425
(d) We know that Y1 ∼ N(−8, 33) from (b). The desired probability can be written as

P [Y12 > 3Y1 + 10] = P [Y12 − 3Y1 − 10 > 0] = P [(Y1 − 5)(Y1 + 2) > 0]
= P [Y
 1 > 5,Y1 > −2]
 + P [Y1 < 5,Y1 <−2] =P [Y1 > 5] + P [Y1 < −2]
5−(−8) −2−(−8)
=Q √
33
+Φ √ = Q √1333 +Φ √6
   33 33

=Q √13 + 1− Q √633 = 0.8637


33

Problem 5.18 (a)-(c) The densities and contour plots are given by Figures 9-11.
(d) The contour plots are ellipses which are aligned along the x and y axes for ρ = 0, but are
2
rotated for ρ 6= 0. Deriving the specific relationship between σX , σY2 and ρ, and the ellipse major
and minor axes and their alignment, is left as an exercise.
The code for producing these plots is provided below (the numerical values set for the parameters
are for part (c)).

var_x=4; %variance of X
var_y=1; %variance of Y
rho=0.5; %normalized correlation between X and Y, lies in (-1,1)
r12=rho*sqrt(var_x*var_y);
R=[var_x r12;r12 var_y];
[x,y]=meshgrid(-5:0.1:5);
Rinv=inv(R);
exp_arg= Rinv(1,1)*(x.^2)+Rinv(2,2)*(y.^2)+2*Rinv(1,2)*(x.*y);

14
normalization = 1/(2*pi*sqrt(det(R)));
z=normalization*exp(-0.5*exp_arg);
%density plot
figure;
surf(x,y,z);
xlabel(’x’);
ylabel(’y’);
zlabel(’Joint Gaussian Density’);
figure;
%contour plot
contour(x,y,z);
xlabel(’x’);
ylabel(’y’);

3
0.16

0.14 2
Joint Gaussian Density

0.12
1
0.1

0.08

y
0

0.06
−1
0.04

0.02 −2
0
5 −3
5

0 −4
0
−5
−5 −5 −5 −4 −3 −2 −1 0 1 2 3 4 5
y x x

(a) Joint density (b) Contours

2
Figure 9: Joint Gaussian density with σX = 1, σY2 = 1, ρ = 0.

3
0.2

2
Joint Gaussian Density

0.15
1

0.1
y

−1
0.05

−2
0
5 −3
5

0 −4
0
−5
−5 −5 −5 −4 −3 −2 −1 0 1 2 3 4 5
y x x

(a) Joint density (b) Contours

2
Figure 10: Joint Gaussian density with σX = 1, σY2 = 1, ρ = 0.5.

Problem 5.19 Assuming X, Y are zero mean, joint Gaussian Z = X − 2Y N(0, σZ2 ) with.
σZ2 = var(Z) = cov(X − 2Y, X − 2Y ) = cov(X, X) − 4cov(X, Y ) + 4cov(Y, Y )
2
= σX − 4ρσX σY + 4σY2
(a) For parts (a)-(c) in Problem 5.18, we have σZ2 = 1−0+4(1) = 5, σZ2 = 1−4(0.5)(1)(1)+4(1) =
4, and σZ2 = 4 − 4(0.5)(2)(1) + 4(1) = 4.
(b) Z = X − 2Y and X are joint Gaussian, so they are independent if uncorrelated.
2
cov(Z, X) = cov(X − 2Y, X) = cov(X, X) − 2cov(X, Y ) = σX − 2ρσX σY

15
5

3
0.1

Joint Gaussian Density


0.08

1
0.06

y
0
0.04
−1
0.02
−2
0
5 −3
5

0 −4
0
−5
−5 −5 −5 −4 −3 −2 −1 0 1 2 3 4 5
y x x

(a) Joint density (b) Contours

2
Figure 11: Joint Gaussian density with σX = 4, σY2 = 1, ρ = 0.5.

equals zero if σX = 2ρσY . This holds only in case (b) in Problem 5.18.
Problem 5.20 (a) We have cov(X, Y ) = ρσX σY = − 43 , so the covariance matrix is given by

1 − 34
 
CX =
− 34 1

where X = (X, Y )T .
(b) Z = aT X, where aT = (2 3), so that Z ∼ N(aT mY = 8, aT CX a = 4).
(c) In order to compute P [Z 2 − Z > 6] = P [Z 2 − Z − 6 > 0], we factorize

Z 2 − Z − 6 = Z 2 − 3Z + 2Z − 6 = (Z + 2)(Z − 3)
This expression is positive if both factors are positive (Z > −2 and Z > 3, which is equivalent
to Z > 3), or if both factors are negative (Z < −2 and Z < 3, which is equivalent to Z < −2).
These two events are mutually exclusive, hence
   
3−8 −2−8
P [{Z > 3} or {Z < −2}] = P [Z > 3] + P [Z < −2] = Q √
4
+Φ √
4
Q(−5/2) + Φ(−10/2) = 1 − Q(5/2) + Q(5) = 0.9938

Problem 5.21 (a), (b) This was worked out in Example 5.4.3 for v 2 = 1. Using the same
reasoning, we obtain that the joint density is
r − r22
p(r, φ) = e 2v Ir≥0 Iφ ∈ [0,2π]
2πv 2
so that R and Φ are independent, with Φ uniform over [0, 2π] and R a Rayleigh random variable
with density
r r2
p(r) = 2 e− 2v2 Ir≥0
v
(c) Z = R2 takes values in [0, ∞) with
r 2
r − 2v
p(r) v2
e 2
p(z) = √ = √
|dz/dr| r= z 2r r= z

1 − z2
p(z) = e 2v I{z≥0}
2v 2

16
Thus, Z ∼ Exp( 2v12 ), i.e., it is exponential with mean 2v 2 .
(d) 20 dB below corresponds to a factor of 0.01. Using the well-known expression for the expo-
nential CDF, we have
2 )/(2v 2 )
P [Z ≤ 0.01(2v 2)] = 1 − e−0.01(2v = 1 − e−0.01 ≈ 0.01
using the approximation ex ≈ 1 + x for |x| small. The answer does not depend on v 2 .
Problem 5.22 (a) The mean function is given by
mX (t) = E [2 sin(20πt + Θ)]
= 14 2 sin(20πt) + 2 sin(20πt + π2 ) + 2 sin(20πt + π) + 2 sin(20πt + 3π

2
)
= 14 (2 sin(20πt) + 2 cos(20πt) − 2 sin(20πt) − 2 cos(20πt))
=0
Since 2 sin θ1 sin θ2 = cos(θ1 − θ2 ) − cos(θ1 + θ2 ), the autocorrelation function is given by
RX (t1 , t2 ) = E[X(t1 )X(t2 )] = E [2 sin(20πt1 + Θ) 2 sin(20πt2 + Θ)]
= E [2 cos (20π(t1 − t2 )) − 2 cos (20π(t1 + t2 ) + 2Θ)]
= 2 cos (20π(t1 − t2 ))
since
E [cos (20π(t1 + t2 ) + 2Θ)]
= 41 (cos (20π(t1 + t2 )) + cos (20π(t1 + t2 ) + π) + cos (20π(t1 + t2 ) + 2π) + cos (20π(t1 + t2 ) + 3π))
= 41 (cos (20π(t1 + t2 )) − cos (20π(t1 + t2 )) + cos (20π(t1 + t2 )) − cos (20π(t1 + t2 )))
=0
(b) X is WSS, since its mean function, and its autocorrelation function depends only on time
differences.
(c) A delayed version of X is given by

X̃(t) = X(t − δ) = 2 sin (20π(t − δ) + Θ) = 2 sin(20πt + Γ)


where
Γ = Θ − 20πδ modulo 2π
Setting 20πδ = π4 (i.e., δ = 80 1
), for example, we obtain that Γ is uniform over − π4 , π4 , 3π 5π

4
, 4
.
Thus, while X̃(t) is also a sinusoid with random phase, the phase variable Γ does not have the
same distribution as the corresponding phase Θ in X(t), which implies that X is notnstationary. o
1 √1
Easy check: X(0) and X̃(0) have different distributions, taking values in {−1, 1} and − 2 , 2 , √

respectively.
(d) The time averaged mean and autocorrelation function of X can be computed exactly as in
the example of a sinusoid with random phase in the text, and match the ensemble averages in
(a).
(e) Yes, X is ergodic in mean and autocorrelation.
Problem 5.23 The three candidate functions are sketched in Figure 12.
(a) The triangle function is a convolution of two boxes: f1 (τ ) = I[−1/2,1/2] ∗ I[−1/2,1/2] . Its Fourier
transform is F1 (f ) = sinc2 (f ). The latter is symmetric and nonnegative, and hence is a valid
PSD, hence f1 is a valid autocorrelation function.
(b) The shifted triangle f2 (τ ) is not symmetric, and hence is not a valid autocorrelation function.
(c) Taking the Fourier transform of f3 , we obtain
1
F3 (f ) = F1 (f ) − F1 (f ) e−j2πf + ej2πf = sinc2 (f ) (1 − cos 2πf ) ≥ 0

2

17
f1 (τ ) f 2(τ ) f 3(τ )

1 1 1

τ τ τ
−1 1 0 1 2 −1 1
−1/2

Figure 12: Three candidate functions in Problem 5.23.

since cosine is bounded above by one. Thus, F3 (f ) is symmetric and nonnegative, and hence
f3 (τ ) is a valid autocorrelation function.

Problem 5.24 (a) The mean function is given by

E [Xp (t)] = E [Xc (t)] cos 2πfc t − E [Xs (t)] sin 2πfc t

Since cosine and sine are linearly independent, the preceding can be constant if and only if

E [Xc (t)] = E [Xs (t)] ≡ 0

Plugging this in, the autocorrelation function is given by

RXp (t1 , t2 ) = E [Xp (t1 )Xp (t2 )] = RXc (t1 , t2 ) cos 2πfc t1 cos 2πfc t2 +RXs (t1 , t2 ) sin 2πfc t1 sin 2πfc t2 −RXc ,Xs (t1 ,
(3)
Using trigonometric identities we can write these out in terms of t1 − t2 and t1 + t2 . We have

cos 2πfc t1 cos 2πfc t2 = 12 cos 2πfc (t1 − t2 ) + 12 cos 2πfc (t1 + t2 )
sin 2πfc t1 sin 2πfc t2 = 12 cos 2πfc (t1 − t2 ) − 21 cos 2πfc (t1 + t2 )
(4)
sin 2πfc t1 cos 2πfc t2 = 12 sin 2πfc (t1 − t2 ) + 21 sin 2πfc (t1 + t2 )
cos 2πfc t1 sin 2πfc t2 = − 12 sin 2πfc (t1 − t2 ) + 12 sin 2πfc (t1 + t2 )

Plugging (4) into (3), we obtain

RXp (t1 , t2 ) = A(t1 − t2 ) + B(t1 + t2 )

where
1 1
A= (RXc (t1 , t2 ) + RXs (t1 , t2 )) cos 2πfc (t1 −t2 )− (RXs ,Xc (t1 , t2 ) − RXc ,Xs (t1 , t2 )) sin 2πfc (t1 −t2 )
2 2
(5)
and
1 1
B= (RXc (t1 , t2 ) − RXs (t1 , t2 )) cos 2πfc (t1 +t2 )− (RXs ,Xc (t1 , t2 ) + RXc ,Xs (t1 , t2 )) sin 2πfc (t1 +t2 )
2 2
(6)
In order for the autocorrelation function to depend on t1 − t2 alone, the undesired t1 + t2 terms
in (6) must vanish, which requires that the coefficients of the cosine and sine in the previous
equations must vanish:

RXc (t1 , t2 ) − RXs (t1 , t2 ) = 0, RXs ,Xc (t1 , t2 ) + RXc ,Xs (t1 , t2 ) = 0 (7)

18
Plugging into (5), we obtain that

A = RXc (t1 , t2 ) cos 2πfc (t1 − t2 ) − RXs ,Xc (t1 , t2 ) sin 2πfc (t1 − t2 )

This depends on t1 − t2 alone if RXc (t1 , t2 ) and RXs ,Xc (t1 , t2 ) depend on t1 − t2 alone. Putting
these together with (7), we obtain that Xp is WSS if Xc , Xs are zero mean, jointly WSS with

RXc (τ ) = RXs (τ ) , RXs ,Xc (τ ) + RXs ,Xc (−τ ) = 0 (8)

where we have rewritten the conditions (7) in terms of τ = t1 − t2 using joint wide sense
stationarity.
(b) Under the conditions derived in (a), we obtain that

RXp (τ ) = RXc (τ ) cos 2πfc τ − RXs ,Xc (τ ) sin 2πfc τ (9)

Taking Fourier transforms, we have


1 j
SXp (f ) = (SXc (f − fc ) + SXc (f + fc )) + (SXs ,Xc (f − fc ) − SXs ,Xc (f + fc )) (10)
2 2
We see that SXp (f ) is passband (i.e., the PSD is nonzero in a band away from DC) if Xc and
Xs are baseband (i.e., SXc (f ) = SXs (f ) and SXs ,Xc (f ) are nonzero in a band around DC with
bandwidth smaller than fc ).

Figure 13: Autocorrelation computation in Problem 5.25(a).

Problem 5.25 As we see from Figure 13, the signal x(t) is periodic with period 2, and hence so
is its time-averaged autocorrelation function

Rx (τ ) = x(t)x(t − τ ) (11)

We can see this by replacing τ by τ + 2 in (11). Hence we only need to compute Rx (τ ) over
a single period, say for τ ∈ [−1, 1]. Furthermore, since Rx (τ ) is even, we can focus on τ ∈ [0, 1].
Finally, we only need to average over a period to compute Rx ; we can see this by replacing t by

19
Figure 14: Autocorrelation function in Problem 5.25(a).

Figure 15: Fourier series via differentiation trick in Problem 5.25(b).

t + 2 in (11). Thus, we can restrict the average over t to a period. Figure 13 shows x(t) and
x(t − τ ), with t ranging over a period, and for an arbitrary τ ∈ [0, 1]. We see that
R3
Rx (τ ) = 12 −2 1 x(t)x(t − τ )dt = 12 (−τ + (1 − τ ) − τ + (1 − τ ))
2
= 1 − 2τ , 0 ≤ τ ≤ 1
We may now replace τ by its magnitude (Rx is an even function), and invoke periodicity, to
specify the autocorrelation function as follows:
Rx (τ ) = 1 − 2|τ | , |τ | ≤ 1 , Rx (τ ) = Rx (τ + 2)
The autocorrelation function is sketched in Figure 14.
(b) For finding the Fourier series of x(t), we use the differentiation trick to reduce it down to an
impulse train (see Example 2.4.1). The procedure is shown in Figure 15. Since z(t) = dx/dt is
a sum of two interleaved impulse trains, its Fourier series is easily computed as follows:
1
2e−j2πkf0 (−1/2) − 2e−j2πkf0 (1/2)

z(t) = dx/dt ↔ zk =
T0
where T0 = 2 is the period and f0 = 1/T0 = 1/2 is the fundamentl. Simplifying, we obtain
zk = 2j sin(πk/2), and hence, for k =
6 0,

zk sin(πk/2) 0, k even
xk = =2 = 2
j2πkf0 πk ± πk
, k odd
The PSD is given by
X X 4
Sx (f ) = |xk |2 δ(f − kf0 ) = δ(f − k/2)
k k odd
π2k2

20
(c) It is left as an exercise to check that the Fourier series of Rx (τ ) derived in (a) is given by
{|xk |2 }, so that the answers in (a) and (b) are consistent.
Problem 5.26: Typo in the problem. We should have set D to be uniform over [0, 2] (i.e., the
period of x(t)) in order to generate a WSS and stationary process X(t). In this case, taking
expectation over D effectively performs averaging over a period. Ensemble averages therefore
give the same answer as time averages of the sort computed in Problem 5.25. Details omitted.
40
PSD of x1
PSD of x2
35

30

PSD 25

20

15

10

0
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
f

Figure 16: PSDs in Problem 5.27(a)-(b).

Problem 5.27 We derive the PSDs below and plot them in Figure 16.
(a) x1 (t) = (n ∗ h)(t), where H(f ) = j2πf ↔ dtd . Thus,

Sx1 (f ) = |H(f )|2Sn (f ) = 4π 2 f 2 I[−1,1] (f )


1
(b) x2 (t) = (n ∗ g)(t), where g(t) = d
(δ(t) − δ(t − d)), so that

1 − e−j2πf d
G(f ) =
d
1−e−j2πf d 1−ej2πf d
|G(f )|2 = G(f )G∗ (f ) = d d 2
sin2 πf d
= 2−2 cos
d2
2πf d
= 4 d2 = 4π f 2 2 sin πf d
πf d
2 2 2
= 4π f sinc (f d)
Thus,
Sx2 (f ) = |G(f )|2Sn (f ) = 4π 2 f 2 sinc2 (f d) I[−1,1] (f )
As d → 0, g(t) tends to the derivative, and Sx2 → Sx1 . For nonzero d, since sinc2 (f d) ≤ 1 for all
f , we have Sx2 (f ) ≤ Sx1 (f ) for all f , with strict inequality everywhere except at f = 0. Thus,
x2 has smaller power than x1 .
(c) The power of x1 is given by
Z 1
8π 2
Z
Rx1 (0) = Sx1 (f )df = 4π 2 f 2 df = ≈ 26.3
−1 3
1
The power of x2 (for d = 2
is given by
Z Z 1
Rx2 (0) = Sx2 (f )df = 4π 2 f 2 sinc2 (f /2) df ≈ 16.0
−1

21
where the integral is evaluated numerically.
Problem 5.28: Taking the Fourier transform of the autocorrelation function, we get
R∞ R0
SX (f ) = 0
e−aτ e−j2πf τ dτ + −∞
eaτ e−j2πf τ dτ
∞ 0
e−(a+j2πf )τ e(a−j2πf )τ 1 1
= −(a+j2πf )
+ (a−j2πf )
= a+j2πf
+ a−j2πf
0 −∞

2a
= a2 +4π 2 f 2

When X is passed through an ideal LPF of bandwidth W , i.e., with transfer function I[−W,W ](f ),
the output power is given by
W W
2a
Z Z
Output Power = SX (f ) df = df
−W −W a2 + 4π 2 f 2
a
Make the standard substitution 2πf = a tan θ )(so df = 2π
sec2 θ dθ) to get
2πW
Z tan−1 a
2a a
Output Power = a2 +a2 tan2 θ 2π
sec2 θ dθ
− tan−1 2πW
a

2 2πW
= π
tan−1 a

since 1 + tan2 θ = sec2 θ.


2πW
(b) Letting W → ∞ gives us the power of X as one, since tan−1 a
→ π2 .
The 99% power containment bandwidth satisfies

2 2πW
tan−1 = 0.99
π a
so that
tan(0.99π/2)
W0.99 = a ≈ 10.13a

which scales linearly with a. This is to be expected: since RX (τ ) = e−a|τ | , and the exponent is
dimensionless, we can think of 1/a as the unit of time, and hence a as the unit of frequency.

Sm(f)
2

f
−2 2

Figure 17: PSD of m in Problem 5.29. The area under it gives the signal power at the channel
input.

Problem 5.29 (a) Signal power at channel input is the area under the PSD shown in Figure 17:
Z ∞
1 1
Pm = Sm (f )df = × base × height = × 4 × 2 = 4
−∞ 2 2

22
(b) The PSD of the signal at the channel output is given by Sm1 (f ) = Sm (f )|H(f )|2. The signal
power is therefore given by
R∞ R2
Pm1 = −∞ Sm (f )|H(f )|2df = 2 0 Sm (f )|H(f )|2df
R1 R2 1 2
= 2 0 2(1 − f /2)22 df + 2 1 2(1 − f /2)12df = 16(f − f 2 /4) 0 + 4(f − f 2 /4) 1
= 16 × (1 − 12 /4) + 4 × ((2 − 22 /4) − (1 − 12 /4)) = 13

(c) Noise power at equalizer input is given by


Z ∞
1
Z
Nin = Sn (f )df = 3 df = 3/2
−∞ −3 4

From (b), we know that the signal power Sin = 13, so that the SNR at equalizer input is given
by
SNRin = Sin /Nin = 13/(3/2) = 26/3 = 8.67
(d) The messagel sees the cascade of channel and equalizer, which is given by the transfer function
2I[−2,2] (f ), hence the signal power at the equalizer output is given by Sout = 22 Pm = 16. The
noise sees only the equalizer’s transfer function, say G(f ), so the noise power at the equalizer
output is given by
Z ∞ Z 1 Z 2
2 1 2 1 2
Nout = Sn (f )|G(f )| df = 2 1 df + 2 2 df = 5/2
−∞ 0 4 1 4

The SNR at the equalizer output is therefore given by

SNRout = Sout /Nout = 16/(5/2) = 32/5 = 6.4

Note that the SNR at the equalizer output is actually smaller than that at the equalizer input.
Thus, in undoing the distortion of the channel, we have enhanced the noise. Such noise enhance-
ment is also seen in digital communication over dispersive channels, as we see in Chapter 8.

Problem 5.30: We have SX (f ) = (1 − |f |)I[−1,1](f ) ↔ RX (τ ) = sinc2 τ .


(a) E[X(100)X(100.5)] = RX (0.5) = sinc2 (0.5) = 0.4053.
(b) We have H(f ) = I[−1/2,1/2] (f ) ↔ h(t) = sinct, so

R 1/2 R 1/2 1/2


3
|H(f )|2SX (f ) df = = −(1 − f )2 =
R
Output Power = (1 − |f |)df = 2 (1 − f )df

−1/2 0 4
0

(A more pleasant approach is to draw a picture of the PSD and find the appropriate area under
the curve–try it.)

Problem 5.31 (a) WGN has infinite power, so SNR at filter input is zero.
(b) For B ≤ 1, the signal power is given by
Z B B
S=2 (1 − f )df = 2(f − f 2 /2) 0 = 2B − B 2
0

The noise power is given by N = 0.002B, so that

2B − B 2
SNR = S/N = = 500(2 − B)
0.002B

23
This is decreasing in B, hence the SNR for B = 12 is better than for B = 1, even though the
former distorts the signal.
Problem 5.32 (a) The autocorrelation function is given by
N0 N0 ∞
Z
Ry (τ ) = (h ∗ hM F )(τ ) = h(t)h(t − τ )dt
2 2 −∞
For τ ≥ 0, we obtain
R∞ R∞
Ry (τ ) = N20 τ e−t/T0 e−(t−τ )/T0 dt = N0 τ /T0
2
e τ
e−2t/T0 dt = N0 T0 −τ /T0
4
e ,τ ≥ 0
Since the autocorrelation function is symmetric, we can replace τ by its magnitude to get
N0 T0 −|τ |/T0
Ry (τ ) = e
4
The power equals Ry (0) = N04T0 . The PSD can be obtained by directly taking the Fourier
transform of Ry , but we choose to use the formula
N0
Sy (f ) = |H(f )|2
2
where
−(j2πf + 1 )t ∞
R∞ R∞
e T0
−j2πf t −t/T0 −j2πf t
H(f ) = h(t)e dt = e e dt =

−∞ 0 −(j2πf + T1 ) 0

0
1 T0
= j2πf + T1
= j2πf T0 +1
0

The PSD is therefore given by


N0 N0 T02
Sy (f ) = |H(f )|2 =
2 2(4π 2 f 2 T02 + 1)

(b) If n, and therefore y, is a Gaussian random process. Thus, y(0) and y(t0) − 21 y(0) are jointly
Gaussian, and are independent if they are uncorrelated. The covariance is given by
 
1 1 N0 T0 −|t0 |/T0 1
cov(y(t0) − y(0), y(0)) = Ry (t0 ) − Ry (0) = e )−
2 2 4 2
This equals zero for
|t0 | = T0 ln 2 ≈ 0.69T0

H(f)
H(f)
1
1/2

f f
−1 1 −51 −50 −49 1 49 50 51

(a) Baseband fil- (b) Passband filter


ter

Figure 18: Filter frequency responses in Problem 5.33.

Problem 5.33 The noise power in each case is given by


N0 ∞
Z ∞
N0
Z
2 2
Pn = ||h|| = |H(f )| df = N0 |H(f )|2df
2 2 −∞ 0

24
(a) The filter transfer function is as shown in Figure 18(a), so that the noise power at the output
is given by
Z 1
N0
Pn = N0 (1 − f )2 df =
0 3
(b) The filter transfer function is as shown in Figure 18(b), and the noise power at the output is
given by
R 51 R 51
Pn = N0 49 |H(f )|2 df = N0 49 ((1/2)(1 − |f − 50|))2 df
R1
= 2N0 (1/2)2 0 (1 − f )2 df = N60
after doing a change of variables to evaluate the integral. We could also have recognized that
||h||2 in (b) is half that in (a): the filter in (b) has a passband impulse response whose I compo-
nent is the impulse response in (a) (with Q component zero), so that the energy in the impulse
response in (b) is half that in (a).

Figure 19: Autocorrelation function of y in Problem 5.34.

Problem 5.34 (a) The filter is specified as h(t) = I[−1,1] (t) ↔ H(f )2sinc2f . We have

Sy (f ) = Sn (f )|H(f )|2 = σ 2 4sinc2 2f = 4sinc2 2f

PSD sketch omitted.


(b) The samples are joint Gaussian with mean zero, with

cov (y(t1), y(t2 )) = Cy (t1 − t2 ) = Ry (t1 − t2 ) = Ry (|t1 − t2 |)

where Ry (τ ) = σ 2 (h ∗ hM F )(τ ) is sketched in Figure 19. Thus, Y = (y(1), y(2), y(3))T ∼ N(0, C)
where    
Ry (0) Ry (1) Ry (2) 2 1 0
C =  Ry (1) Ry (0) Ry (1)  =  1 2 1 
Ry (2) Ry (1) Ry (0) 0 1 2
(c) We can write this as Z = aT Y ∼ N(0, σZ2 ), where a = (1, −2, 1)T and σZ2 = aT Ca = 4.
Alternatively, we can compute the variance as σZ2 = cov(y(1) − 2y(2) + y(3), y(1) − 2y(2) + y(3))
and use the bilinearity of covariance.

Problem 5.35: The samples are given by


Z 2 Z 2 Z 2
Y1 = y(t)dt = s(t)dt + n(t)dt = 2 + N1
0 0 0

Z 3 Z 3 Z 3
Y2 = y(t)dt = s(t)dt + n(t)dt = 2 + N2
1 1 1

25
R2 R3
where N1 = 0 n(t)dt, and N2 = 1 n(t)dt are jointly Gaussian and zero mean, by the Gaussianity
of n. Thus, Y1 and Y2 are joint Gaussian, with means E[Y1 ] = E[Y2 ] = 2, and covariances
Z 2 Z 2 Z 2Z 2
cov(Y1 , Y1) = cov(N1 , N1 ) = E[ n(t)dt n(u)du] = E[n(t)n(u)]dtdu
0 0 0 0

The noise autocorrelation function inside the integral is given by

E[n(t)n(u)] = σ 2 δ(t − u)

Substituting, the two integrals collapse into one, and we obtain


Z 2
2 1
var(Y1 ) = cov(Y1 , Y1 ) = σ dt = 2σ 2 =
0 2

A similar computation gives var(Y2 ) = 2σ 2 = 21 . The cross-term is given by


Z 2 Z 3 Z 2 Z 3
cov(Y1 , Y2) = cov(N1 , N2 ) = E[ n(t)dt n(u)du] = E[n(t)n(u)]dtdu
0 1 0 1

Again, plugging in the delta function, we have


2
1
Z
2
cov(Y1 , Y2) = σ dt = σ 2 =
1 4
since we can only set t = u over the common integration interval [1, 2].
Note: In Chapter 6, we specify formulas for covariances obtained by WGN correlated against
multiple signals. What we did here was a special case where we are correlating against rectangular
pulses.
(b) The random variable Z = Y1 + Y2 is Gaussian with mean E[Z] = E[Y1 ] + E[Y2 ] = 4 and
variance
3
var(Z) = cov(Y1 + Y2 , Y1 + Y2 ) = var(Y1 ) + var(Y2 ) + 2cov(Y1 , Y2 ) =
2
We can now compute the desired probability:
!
2−4 p p
P [Y1 + Y2 < 2] = Φ p = Φ(−2 2/3) = Q(2 2/3)
3/2

Problem 5.36: We have h(t) = p(t) − p(t − 1), where p(t) = I[0,1] (t) and

Rz (τ ) = σ 2 (h ∗ hM F )(τ )

Note that hM F (t) = h(−t) = p(−t) − p(−t − 1) = pM F (t) + pM F (t + 1), so that we can break up
the desired convolution as follows:
(h ∗ hM F )(τ ) = (p(τ ) − p(τ − 1)) ∗ (pM F (τ ) − pM F (τ + 1))
= p(τ ) ∗ pM F (τ ) − p(τ ) ∗ pM F (τ + 1) − p(τ − 1) ∗ pM F (τ ) + p(τ − 1) ∗ pM F (τ + 1)
= a(τ ) − a(τ + 1) − a(τ − 1) + a(τ ) = 2a(τ ) − a(τ + 1) − a(τ − 1)

where a(τ ) = (p ∗ pM F )(τ ) is a tent function centered at the origin. This computation, and the
resulting autocorrelation function, is depicted in Figure 20.

26
Figure 20: Computation of autocorrelation function in Problem 5.36.

(b) z(49) and z(50) are linear transformations of a zero mean Gaussian process, and are therefore
zero mean Gaussian random variables with covariance matrix given by
   
Rz (|49 − 49|) Rz (|49 − 50|) 2 −1
C= =
Rz (|50 − 49|) Rz (|50 − 50|) −1 2
(c) As in (b), z(49) and z(52) are zero mean, jointly Gaussian, each with variance Rz (0) = 2,
but their covariance is Rz (|49 − 52|) = Rz (3) = 0, hence they are independent. Thus, they are
iid N(0, 2).
(d) The required probability can be written as P [Y > 0], where Y = 2z(50) − z(49) + z(51) is
zero mean Gaussian. Hence P [Y > 0] = 21 by the symmetry of the Gaussian density around its
mean.  
(e) The required probability can be written as P [Y > 2] = Q 2−0 σY
. We now compute the
variance of Y using bilinearity of covariance:
var(Y ) = cov (2z(50) − z(49) + z(51), 2z(50) − z(49) + z(51)) = 4Rz (|50 − 50|) + Rz (|49 − 49|) + Rz (|51 − 51|
= 6Rz (0) − 2Rz (2) = 12
   
Thus, P [Y > 2] = Q √212 = Q √13 .

Problem 5.37 (a) The autocorrelation function equals Rz (τ ) = σ 2 (h ∗ hM F )(τ ). To compute


the convolution, it helps to express it as a sum of simpler pulses whose convolutions are easy
to compute. In particular, we can write h(t) = 2I[0,2] (t) − I[1,2] (t) = 2p(t) + p(t − 1), where
p(t) = I[0,1] (t). The matched filter can therefore be written as hM F (t) = h(−t) = 2p(−t) +
p(−t − 1) = 2pM F (t) + pM F (t + 1). The desired convolution can therefore be written as
(h ∗ hM F )(τ ) = (2p(τ ) + p(τ − 1)) ∗ (2pM F (τ ) + pM F (τ + 1))
= 4p(τ ) ∗ pM F (τ ) + 2p(τ ) ∗ pM F (τ + 1) + 2p(τ − 1) ∗ pM F (τ ) + p(τ − 1) ∗ pM F (τ + 1)
= 4a(τ ) + 2a(τ + 1) + 2a(τ − 1) + a(τ ) = 5a(τ ) + 2a(τ + 1) + 2a(τ − 1)
where a(τ ) = (p ∗ pM F )(τ ) is a tent function centered at the origin. The convolution computation
is depicted in Figure 21 (the result needs to be scaled by σ 2 to obtain the autocorrelation

27
Figure 21: Computation of autocorrelation function in Problem 5.37.

function).
(b) The random vector Z = (z(0), z(1), z(2))T is a linear transformation of a zero mean Gaussian
random process, and is therefore a zero mean Gaussian random vector with covariance matrix
given by    
Rz (|0 − 0|) Rz (|0 − 1|) Rz (|0 − 2|) 5 2 0
C =  Rz (|1 − 0|) Rz (|1 − 1|) Rz (|1 − 2|)  = σ 2  2 5 2 
Rz (|2 − 0|) Rz (|2 − 1|) Rz (|2 − 2|) 0 2 5
(c) We wish to compute P [Y > 4], where Y = z(0) − z(1) + z(2) is zero mean Gaussian with
variance computed as

var(Y ) = cov (z(0) − z(1) + z(2), z(0) − z(1) + z(2)) = Rz (|0 − 0|) + Rz (|1 − 1|) + Rz (|2 − 2|) − 2Rz (|0 −
= 3Rz (0) − 4Rz (1) + 2Rz (2) = 3 × 5 − 4 × 2 + 2 × 0 = 7
   
We therefore obtain P [Y > 4] = Q 4−0 √
7
= Q √4 .
7

Figure 22: Convolution for time autocorrelation function in Problem 5.38.

Problem 5.38 (a) The random process z(t) is zero mean Gaussian, being a linear transformation
of the zero mean Gaussian process n. We can therefore compute the covariance of samples of
z(t) as follows:

cov(z[n],Rz[m]) = E[z(nTs )z ∗ (mTs )] = Rz ((n − m)Ts ) = N20 (h ∗ hM F )((n − m)Ts )


= N20 h(t)hM F ((n − m)Ts − t)dt = N20 h(t)h∗ (t − (n − m)Ts )dt
R

28
since hM F (t) = h∗ (−t).
(b) For h(t) = I[0,1] (t), the convolution h ∗ hM F is the tent function a(τ ) shown in Figure 22.
The samples Z = (z[1], z[2], z[3]) have covariances given by cov(z[m], z[n]) = N20 a((m − n)Ts ) =
N0
2
a(|m − n|Ts ). For Ts = 21 , we get the covariance matrix
   
a(0) a(1/2) a(1) 1 1/2 0
N0  N0 
C= a(1/2) a(0) a(1/2)  = 1/2 1 1/2 
2 2
a(1) a(1/2) a(0) 0 1/2 1

(c) For Ts = 1, cov(z[m], z[n]) = N20 a(|m−n|) = 0 for m 6= n, so that the samples are independent
(since they are jointly Gaussian and uncorrelatied). As before, var(z[m]) = N20 a(0) = N20 , so that
{z[m]} are i.i.d. N(0, N20 ).
(d) The samples are Rindependent if cov(z[n], z[m]) = 0 for all m 6= n, which happens if (h ∗
hM F )((n − m)Ts ) = h(t)h∗ (t − (n − m)Ts )dt = 0 for m 6= n. Thus, q(t) = (h ∗ hM F )(t) ↔
Q(f ) = H(f )HM F (f ) = H(f )H ∗(f ) = |H(f )|2 is Nyquist at rate 1/Ts , so that h(t) ↔ H(f ) is
square root Nyquist at rate 1/Ts .

Figure 23: The signal and its matched filter for Problem 5.39.

Figure 24: The convolution of the signal with its matched filter for Problem 5.39(b).

Problem 5.39 (a) The signal s(t) and its matched filter sM F (t) = s(−t) are sketched in Figure
23.
(b) In order to compute s∗sM F , we break them into smaller pieces. Specifically, we see from Figure
23 that we can write s(t) = p(t) − p(t − 1) − 2p(t − 2), where p(t) = I[0,1] (t). The matched filter
impulse response can therefore be written as sM F (t) = s(−t) = p(−t) − p(−t − 1) − 2p(−t − 2) =
pM F (t) − pM F (t + 1) − 2pM F (t + 2), where pM F (t) = p(−t). The convolution can therefore be

29
written as
(s ∗ sM F )(t) = (p(t) − p(t − 1) − 2p(t − 2)) ∗ (pM F (t) − pM F (t + 1) − 2pM F (t + 2))
= p(t) ∗ pM F (t) + p(t − 1) ∗ pM F (t + 1) + 4p(t − 2) ∗ pM F (t + 2)
− p(t) ∗ pM F (t + 1) − 2p(t) ∗ pM F (t + 2) − p(t − 1) ∗ pM F (t) + 2p(t − 1) ∗ pM F (t + 2)
− 2p(t − 2) ∗ pM F (t) + 2p(t − 2) ∗ pM F (t + 1)
= 6a(t) − a(t + 1) − 2a(t + 2) − a(t − 1) + 2a(t + 1) − 2a(t − 2) + 2a(t − 1)
= 6a(t) + a(t + 1) + a(t − 1) − 2a(t + 2) − 2a(t − 2)
where a(t) = (p ∗ pM F )(t) is a tent function centered at the origin. The computation and the
final waveform is shown in Figure 24.
(c) Using the decomposition in (b), we have pM F (t) = I[−1,0] (t) = h(t + 1), so that sM F (t) =
pM F (t) − pM F (t + 1) − 2pM F (t + 2) = h(t + 1) − h(t + 2) − 2h(t + 3). Thus, (x ∗ sM F )(t) =
y(t + 1) − y(t + 2) − 2y(t + 3), where y(t) = (x ∗ h)(t).

Figure 25: Effective correlators corresponding to samples and linear combinations of samples at
filter output.

Problem 5.40(a) We have


Z
y(2) = x(t)h(2 − t)dt = hx, g1 i

where g1 (t) = h(2 − t) = h(−(t − 2)) = hM F (t − 2).


(b) We have
Z Z Z
y(1) − 2y(2) = x(t)h(1 − t)dt − 2 x(t)h(2 − t)dt = x(t) (h(1 − t) − 2h(2 − t)) dt = hx, g2 i

where g2 (t) = h(1 − t) − 2h(2 − t) = hM F (t − 1) − 2hM F (t − 2).


The relevant signals are sketched in Figure 25.

Problem 5.41 (a) A sample at a given time is given by


Z
y(ti ) = x(t)h(ti − t)dt

30
Figure 26: Approximating correlation with g(t) using three samples at the output of h(t) =
I[0,1] (t).

so that a linear combination can be written as


n n Z Z n
!
X X X
αi y(ti ) = αi x(t)h(ti − t)dt = x(t) αi h(ti − t) dt = hx, gi
i=1 i=1 i=1

where n n
X X
g(t) = αi h(ti − t) = αi hM F (t − ti )
i=1 i=1

(b) The triangular waveform can be approximated by three boxes, as shown in Figure 26, with
sampling times {ti } given by 0, 21 , 1 and αi ≡ 1.

h(0.5 − t)
s(t)
t
−1 −1/2 1/2 1
1
Figure 27: Sampling at t0 = 2
gives the best match between g(t) = h(t0 − t) and s(t).

h(−t)
g(t)
t 2
−1
h(0.5 − t)
+ 1
t
−1/2 1/2 =
t
h(1−t) −1 −1/2 1/2 1
+
t
1

Figure 28: Choosing three samples and combining them to produce an effective correlator g
which approximates the triangular shape of the signal s.

31
Problem 5.42 We have signal corrupted by white noise with PSD N20 = σ 2 = 0.1.
(a) The integrator output can be written as the output of a correlator, hy, gi = hs, gi + hn, gi,
where g(t) = I[−1,1] (t). In general, the signal contribution at the output of a correlator g is hs, gi,
and the noise contribution N = hn, gi ∼ N(0, σ 2 ||g||2). The SNR at the output of a correlator u
is therefore given by  2
|hs, gi|2 |hs, gi|2

1 g
SNR = = 2 = 2 s, (12)
E[N 2 ] σ ||g||2 σ ||g||
For g(t) = I[−1,1] (t), we have hs, gi = 1 (area under triangle of base 2 and height 1) and ||g||2 = 2.
Plugging into (12) and setting σ 2 = 0.1, we obtain SNR = 5.
(b) As stated in Theorem 5.7.1, the SNR is maximized by correlating against the signal s(t).
Setting g(t) = s(t) in (12), we have hs, gi = ||s||2 and N = hn, si ∼ N(0, σ 2 ||s||2), which gives

||s||2
SNRmax = = 20/3 ≈ 6.67
σ2
since ||s||2 = 32 and σ 2 = 0.1.
(c) The filter output at time t0 is given by
Z
(y ∗ h)(t0 ) = y(t)h(t0 − t)dt = hy, gi

where g(t) = h(t0 − t). From (12) (which repeats the discussion before Theorem 5.7.1), we see
that we should choose the sampling time such that g(t) “best matches” s(t) in its shape, in the
g
sense of maximizing |hs, ||g|| i|2 , subject to whatever constraints we are placing on the choice of g.
If unconstrained, g = s (or any scalar multiple thereof) is optimal, but here we are constraining
g to take the form g(t) = h(t0 − t).) From Figure 27, we see that this happens when t0 = 21 . We
have ||g||2 = 1 and
Z 1
2 3
hs, gi = (1 − |t|)dt =
− 12 4
Plugging into (12), we obtain SNR = 5.625.
(d) If we can now take linear combinations of samples at the output of the filter, we obtain
X X Z
ai (y ∗ h)(ti ) = ai y(t)h(ti − t)dt = hy, gi
i i

where X
g(t) = ai h(ti − t)
i

We now have to choose the sampling times and combinations coefficients so that the shape of g
matches up well with that of s. For example, t0 = 0, t1 = 12 and t2 = 1 with a0 = a1 = a2 = 1
works well, as shown in Figure 28. For the scaling shown in the figure, we have ||g||2 = 5 and
hs, gi = 47 . Plugging into (12), we get SNR = 6.125, which is slightly better than in (c). We
could certainly play with the coefficients to try to get a better matching of shape, but we leave
it at this.
Take-away: Even when implementation constraints prevent us from using the optimal correlator
g = s, we can construct approximations to s within these constraints by trying to match the
shape of s as closely as we can.

32

You might also like