Communication and Detection Theory: Lecture 1: Amos Lapidoth

Communication and Detection Theory: Lecture 1
Amos Lapidoth
ETH Zurich
February 21, 2017
Teaching Assistant: Mr. Tibor Keresztfalvi.
c
Lecture 1, Amos Lapidoth 2017
Why Should I Take this Class?
• Digital communication is everywhere.

• The wireless industry is huge.
• Math and Engineering in beautiful harmony.
• You have taken and enjoyed:
Signals and Systems, Linear Algebra, and Probability Theory.
c
This Class Is Not for You if:
• You are trying to avoid math.

• You are looking for easy credit.
• You prefer to skip the theory and rush to the practice.
c
If You Are Staying:
c
The Plan for Today
• Course information
• A point-to-point digital communication system.
• Functions, signals, and time-reversal.
• The inner product, orthogonality, and energy.
• The Fourier Transform (review):
• On 2π, f , ω, i, and j.
• Conjugate Symmetry
• Parseval’s Theorem
• The definition of bandwidth.
c
The Textbook
AMOS LAPIDOTH
• Greater mathematical precision than the
lecture.
A Foundation in
Digital • The lecture’s level suffices for the exam.
Communication • The exercises, however, are at the
Second Edition
course’s level.
• Open-book exam.
• No electronic devices allowed in the exam.
If you have seen some of the material before, now is the chance to
cover it in depth. I love questions!
c
Warning!
• Some classes cover material you cannot understand:

• You may not have the prerequisites.
• There may not be enough lecture hours.
• Or maybe the whole field is on shaky grounds.
• Homework problems then indicate what you need to know.
• The good news: Here you can understand everything.
• The bad news: Here you must understand everything.
• The problems check your understanding, but the exam will be
different!
c
encryption
encoder
encoder
channel
source
source
waveform bits bits waveform
file, etc (encrypted)
channel
reconstruction
decryption
decoder
channel
source
waveform bits bits waveform

sink
file, etc (encrypted)
c
Functions
• A function or a mapping
u: A → B
associates with each element in its domain A a unique

element in its co-domain B.
• The rule specifying for each element of the domain the
element in the range to which it is mapped is often written to
the right or underneath, e.g.,
u : R→(−5, ∞), t 7→t2 .
• u(t) is the result of applying u to t, e.g., u(17).

• The function t 7→ x(t) cos(2πfc t) has no name, and its
domain and co-domain are unspecified.
c
Signals
• If the domain of a function u is R and its co-domain is R,

then we sometimes say that u is real-valued signal or a real
signal, especially if the argument of u stands for time.
• Similarly, we sometimes refer to a function u : R → C as a
complex-valued signal or a complex signal.
• If I say that u is “a signal,” then whether it is real or complex
should be:
• clear from the context, or
• immaterial, or
• you should ask!
c
Caution
• While u and u(·) denote functions, u(t) denotes the result of
applying u to t. If u is a real-valued signal then u(t) is a real
number!
• If x and y are signals, then
x?y
denotes their convolution.
• The value of their convolution at time 17 is
(x ? y)(17).
• It is also perfectly fine to write
(t 7→ x(t) cos(2πfc t)) ? h.

• But I really don’t like
x(t) cos(2πfc t) ? h(t).

c
Shifting a Signal in Time
x(t)
t
1
x(t − 2)
t
3
c
Reflecting a Signal: ~x : t 7→ x(−t)
x(t)
t
1
~x(t)
t
−1
c
The Energy in a Real Signal: Ch. 3
• We define the energy in a real signal u : R → R as

Z ∞
u2 (t) dt.
−∞
• If this is finite, then we say that u is a finite-energy signal.

• The square root of the energy is denoted kuk2 ,
sZ
∞
kuk2 , u2 (t) dt.
−∞
This use of the word “energy” is justified when u corresponds to

the current through a unit load or the voltage across such a load.
c
The Energy in a Complex Signal
• We define the energy in a complex signal u : R → C as

Z ∞

u(t)2 dt.
−∞
• If this is finite, then we say that u is a finite-energy signal.

• The square root of the energy is denoted kuk2 ,
sZ
∞
kuk2 , u(t)2 dt.
−∞
Justification in the baseband representation of passband signals.
c
The Inner Product
We define the inner product hu, vi between the signals u and v as
Z ∞
hu, vi , u(t) v ∗ (t) dt,
−∞
whenever the integral exists.

Z ∞ Z ∞
∗

hu, vi = Re u(t) v (t) dt + i Im u(t) v ∗ (t) dt,
−∞ −∞
Mathematicians might object. . . .

If hu, vi is zero, then we say that u and v are orthogonal.
kuk22 = hu, ui.
c
The Fourier Transform: Ch. 6
The Fourier Transform of an integrable signal x : R → C is the
mapping x̂ : R → C defined by
Z ∞
x̂ : f 7→ x(t) e−i2πf t dt.
−∞
In contrast to
X(jω)
√
• we use i for −1;
• we use x̂ instead of X(·), so
• we can write things like
ˆ = ~x,
x̂
• and reserve upper-case letters for random things.
• We use f instead of ω.
• But most important is the 2π.
c
The Inverse Fourier Transform
The Inverse Fourier Transform (IFT) of an integrable function

g : R → C is denoted by ǧ and is defined as
Z ∞
ǧ : t 7→ g(f ) ei2πf t df.
−∞
• Very similar to the FT—instead of e−i2πf t use ei2πf t .

• For symmetric functions, FT and IFT are identical!
• Hence, fewer pairs to memorize.
c
Parseval’s Theorem
The Fourier Transform preserves inner products and energies:
hu, vi = hû, v̂i
and
kuk2 = kûk2 .
c
The Fourier Transform of Real Signals (1)
If x is a real signal, then its FT is conjugate symmetric

x̂(−f ) = x̂∗ (f ), f ∈ R , x is real.
Thus, if x is real, then

• The magnitude of x̂ is symmetric, and
• the phase of x̂ is antisymmetric.
We indicate this by dashing the plot of x̂ at the negative
frequencies.
c
x̂(f )
Figure: The FT x̂ of a real signal x.
c
ŷ(f )
Figure: The FT ŷ of a real signal y.
c
Since Z ∞
x̂ : f 7→ x(t) e−i2πf t dt,
−∞
Z ∞
x̂(−f ) = x(t) e−i2π(−f )t dt
Z−∞
∞
= x(t) ei2πf t dt
−∞
Z ∞ ∗ ∗
i2πf t
= x(t) e dt
−∞
Z ∞ ∗ ∗ ∗
i2πf t
= x(t) e dt
−∞
Z ∞ ∗
= x(t) e−i2πf t dt
−∞
= x̂∗ (f ).
c
An Important Fourier Pair
1
first zero at α
cutoff γ
1 1
δ = 2γβ, γ = .
α 2
c
Bandwidth
We say that x is bandlimited to W Hz if,
x̂(f ) = 0, |f | > W.
The bandwidth of x is the smallest W to which it is bandlimited.
x̂(f )
f
−W W
c
Some Notes on Bandwidth
• Find the shortest symmetric interval containing all the
frequencies where x̂ is not zero.
• Half the length of this interval is the bandwidth.
• We seek a symmetric interval, but we only measure its part
where the frequencies are positive.
• Not to be confused with bandwidth around the carrier
frequency fc .
W W
f
−W W
c
Another Example of Bandwidth
W W
f
−W W
c
Not to Be Confused with
Bandwidth around a Carrier Frequency
ŷ(f )
c
More Precise Definition of Bandwidth
We say that the signal x is an integrable signal that is

bandlimited to W Hz if x is integrable and if it is unaltered when
it is lowpass filtered by an ideal unit-gain lowpass filter of cutoff
frequency W:
x(t) = (x ? LPFW )(t), t ∈ R.
c
The Ideal Unit-Gain Lowpass Filter
Here
(
2Wc sin(2πW c t)
if t 6= 0,
LPFWc (t) , 2πWc t t ∈ R.
2Wc if t = 0,
d W (f )
LPF c
Wc
f
−Wc Wc
c
The Two Are not the same
• Changing a signal at one point can cause it to be

discontinuous, but it does not affect its Fourier Transform.
• The output of a LPF is continuous.
c
Set of Measure Zero (1): (Section 2.5)
• Changing the value of an integrand at one point does not

change the integral.
• Likewise for a finite number of points.
• Likewise for a countable infinite number of points.
• Likewise for?
c
Set of Measure Zero (2)
• Changing the value of an integrand at a set of points of

measure zero does not change the integral.
• If a nonnegative function integrates to zero, then it must be
zero outside a set of measure zero.
• We say that two functions are indistinguishable if they differ
on a set of measure zero.
• Every countable set is of measure zero, but there are some
sets of measure zero that are not countable.
c
Set of Measure Zero (3)
We say that a subset N of the real line R is a set of Lebesgue

measure zero (or a Lebesgue null set) if for every > 0 we can
find a sequence of intervals [a1 , b1 ], [a2 , b2 ], . . . such that the total
length of the intervals is smaller than or equal to
∞
X
(bj − aj ) ≤
j=1
and the union of the intervals covers the set N
N ⊆ [a1 , b1 ] ∪ [a2 , b2 ] ∪ · · · .
c
Amos Lapidoth
ETH Zurich
February 28, 2017
Passband Signals: Bandwidth and Representation
c
Today: Ch. 7
• Passband signals.
• Bandpass filters (Sec. 6.3).
• Signals that are bandlimited to W Hz around a carrier
frequency fc .
• Bandwidth around a carrier frequency.
• Multiplying a signal by a carrier.
• The Analytic representation.
• The Baseband representation.
• Inner products in passband and baseband.
• Baseband representation of xPB ? yPB .
• Baseband representation of xPB ? h.
c
The FT of a Real Passband Signal
ŷ(f )
f
W W
−fc fc − 2
fc fc + 2
c
Passband Signals
Loosely speaking, xPB is a passband signal that is bandlimited to

W Hz around the carrier frequency fc if
W
fc > >0
2
and

x̂PB (f ) = 0, |f | − fc > W .
2
c
The Ideal Unit-Gain Bandpass Filter
BPFW,fc (t) = 2W cos(2πfc t) sinc(Wt) t ∈ R.
n o
d W,f (f ) , I |f | − fc ≤ W ,
BPF f ∈ R.
c
2
d W,f (f )
BPF c
W
1
f
−fc fc
c
Passband Signals
A signal xPB is said to be an integrable passband signal that is

bandlimited to W Hz around the carrier frequency fc if it is
integrable
xPB ∈ L1 ; (2a)
the carrier frequency fc satisfies
W
fc > > 0; (2b)
2
and if xPB is unaltered when it is fed to an ideal unit-gain
bandpass filter of bandwidth W around the carrier frequency fc

xPB (t) = xPB ? BPFW,fc (t), t ∈ R. (2c)
c
Bandwidth around fc
The bandwidth of xPB around fc is the smallest W s.t. xPB is

bandlimited to W Hz around fc .
ŷ(f )
f
W W
−fc fc − 2
fc fc + 2
c
Remarks on the Bandwidth around fc
• We look at equal-length, symmetric intervals around fc and

around −fc .
• We measure the “length” of the positive frequencies.
• Depends on both xPB and fc .
c
The Bandwidth around fc Depends on fc
f
W W
−fc fc − 2
fc fc + 2
c
The FT of t 7→ x(t) ei2πfc t is f 7→ x̂(f − fc ).
Z ∞ Z ∞
i2πfc t
x(t) e e −i2πf t
dt = x(t) ei2πfc t e−i2πf t dt
−∞
Z−∞
∞
= x(t) e−i2π(f −fc )t dt
−∞
= x̂(f − fc ).
Likewise
t 7→ x(t) e−i2πfc t has FT f 7→ x̂(f + fc ).
So
1 1
t 7→ x(t) cos(2πfc t) has FT f 7→ x̂(f − fc ) + x̂(f + fc )
2 2
because
1 1
cos(2πfc t) = ei2πfc t + e−i2πfc t .
2 2
c
Multiplication by a Carrier Doubles the Bandwidth
If x is of bandwidth W Hz and if fc > W, then

t 7→ x(t) cos(2πfc t) is a passband signal of bandwidth 2W around
the carrier frequency fc .
Recall that
1 1
t 7→ x(t) cos(2πfc t) has FT f 7→ x̂(f − fc ) + x̂(f + fc )
2 2
c
x̂(f )
f
−W W
ŷ(f )
2W
1
2
f
fc − W fc fc + W
c
x̂(f )
f
−W W
ŷ(f )
2W
1
2
f
fc − W fc fc + W
c
The Analytic and the Baseband Representations
• We’ll use the analytic representation as a stepping stone

towards the baseband representation.
• The baseband representation allows us to separate between
things that depend on the carrier and things that don’t.
• It is important in sampling and in simulation.
c
First Aid
• Only real passband signals have such representations.

• The transmitted signals are real.
• Integrals and convolutions of complex signals can be handled
by working with the real and imaginary parts separately.
c
The Analytic Representation
xA is a complex signal whose FT x̂A is

(
x̂PB (f ) if f ≥ 0,
x̂A (f ) =
0 otherwise.
It is obviously complex, because its FT is not conjugate symmetric.
c
x̂PB (f )
f
−fc fc
x̂A (f )
f
fc
c
From xA to xPB
Because xPB is real,
x̂PB (−f ) = x̂∗PB (f ), f ∈ R.
And, by definition,
(
x̂PB (f ) if f ≥ 0,
x̂A (f ) =
0 otherwise.
So,
x̂PB (f ) = x̂A (f ) + x̂∗A (−f ), f ∈R
and hence, as we next argue,

xPB (t) = 2 Re xA (t) , t ∈ R.
c
The FT of x∗ and Re(x)
x∗ has FT f 7→ x̂∗ (−f )

because
Z ∞ Z ∞ ∗
∗
x (t) e −i2πf t
dt = x(t) ei2πf t dt
−∞ −∞
Z∞ ∗
i2πf t
= x(t) e dt
−∞
= x̂∗ (−f ).
Since Re(x) equals (x + x∗ )/2,

1 1
Re(x) has FT f 7→ x̂(f ) + x̂∗ (−f )
2 2
And
2 Re(x) has FT f 7→ x̂(f ) + x̂∗ (−f )
c
kxPB k22 = 2 kxA k22
Proof:
• Parseval
• xPB is real, so |x̂| is symmetric.
Z ∞
kxPB k22 = x̂PB (f )2 df
−∞
Z ∞

=2 x̂PB (f )2 df
Z0 ∞

=2 x̂A (f )2 df
0
= 2 kxA k22 .
c
x̂PB (f )
f
−fc fc
x̂A (f )
f
fc
c

xPB , yPB = 2 Re xA , yA
ku + vk22 = hu + v, u + vi
= hu, ui + hu, vi + hv, ui + hv, vi
= kuk22 + hu, vi + hu, vi∗ + kvk22

= kuk22 + kvk22 + 2 Re hu, vi .
Thus,

kxA + yA k22 = kxA k22 + kyA k22 + 2 Re hxA , yA i ,
kxPB + yPB k22 = kxPB k22 + kyPB k22 + 2 hxPB , yPB i .
The result now follows from
kxPB k22 = 2kxA k22 ,
kyPB k22 = 2kyA k22 ,
kxPB + yPB k22 = 2kxA + yA k22 .
c
x̂PB (f )
f
−fc fc
ŷPB (f )
f
−fc fc
c
Some Comments on the Analytic Representation
• The representation is linear in the sense that if α and β are

real, then the representation of αxPB + βyPB is αxA + βyA .
• xPB = 2 Re(z) does not imply that z equals xA .
• But if xPB = 2 Re(z) and ẑ is zero at all negative frequencies,
then z indeed equals xA .
c
The Baseband Representation of xPB w.r.t. fc
In the time domain:
xBB (t) , e−i2πfc t xA (t), t ∈ R.
In the frequency domain:
x̂BB (f ) = x̂A (f + fc ), f ∈ R.
c
118 Passband Signals and Their Representation
x̂PB (f )
f
−fc fc
x̂A (f )
f
fc
x̂BB (f )
c
x̂PB (f )
f
−fc fc
x̂PB (f + fc )
f
−2fc −fc −W
2
W
2
g0 (f )
f
−Wc Wc
x̂BB (f )
f
−W
2
W
2
c
From xPB to xBB

xBB = t 7→ e−i2πfc t xPB (t) ? ǧ0 ,
where g0 : f 7→ g0 (f ) is any integrable function satisfying
W
g0 (f ) = 1, |f | ≤ ,
2
and
W
g0 (f ) = 0, |f + 2fc | ≤ .
2
For example,

xBB = t 7→ e−i2πfc t xPB (t) ? LPFWc ,
where Wc is any cutoff frequency in the range
W W
≤ Wc ≤ 2fc − .
2 2
c
Convolving a Complex Signal with a Real Signal

Re x ? h = Re(x) ? h,
h is real-valued.
Im x ? h = Im(x) ? h,
Proof: Start with

Z ∞
(x ? h)(t) = x(τ ) h(t − τ ) dτ ;
−∞
R R R R
recall that Re( ) = Re and Im( ) = Im; and note that if h(·)
is real-valued, then for all t, τ ∈ R,

Re x(τ ) h(t − τ ) = Re x(τ ) h(t − τ ),

Im x(τ ) h(t − τ ) = Im x(τ ) h(t − τ ).
c
The In-Phase and Quadrature Components
Because xPB is real,

Re xPB (t) e−i2πfc t = xPB (t) cos(2πfc t), t ∈ R,

Im xPB (t) e−i2πfc t = −xPB (t) sin(2πfc t), t ∈ R.
And because we are convolving t 7→ xPB (t) e−i2πfc t with a real

filter LPFWc

Re(xBB ) = t 7→ xPB (t) cos(2πfc t) ? LPFWc ,

Im(xBB ) = − t 7→ xPB (t) sin(2πfc t) ? LPFWc .
Re(xBB ) , in-phase component

Im(xBB ) , quadrature component.
c
From xPB to xBB

xPB (t) cos(2πfc ) Re xBB (t)
× LPFWc
cos(2πfc t)
W W
xPB (t) 2 ≤ Wc ≤ 2fc − 2
90◦
× LPFWc
−xPB (t) sin(2πfc t) Im xBB (t)
c
Bandwidth
The bandwidth of xPB around fc is twice the bandwidth of xBB .
Its all in the figure. . .
c
x̂PB (f )
f
−fc fc
x̂PB (f + fc )
f
−2fc −fc −W
2
W
2
g0 (f )
f
−Wc Wc
x̂BB (f )
f
−W
2
W
2
c
Recovering xPB from xBB and fc
Recall that
xBB (t) , e−i2πfc t xA (t), t∈R
and that
xPB (t) = 2 Re xA (t) , t ∈ R.
Consequently,

xPB (t) = 2 Re xBB (t) ei2πfc t , t ∈ R.
c
x̂BB (f )
x̂BB (f − fc )
f
fc
x̂∗BB (−f )
x̂∗BB (−f − fc )
f
−fc
x̂PB (f ) = x̂BB (f − fc ) + x̂∗BB (−f − fc )
f
−fc −fc
c
Some Remarks on the Baseband Representation
• The baseband representation is linear in the sense that if α

and β are real, then the representation of αxPB + βyPB is
αxBB + βyBB .
• To recover xPB you need xBB and fc .
• This is not a bug; it is a feature!

• xPB (t) = 2 Re z(t) ei2πfc t does not imply that z equals xBB .

• However, if xPB (t) = 2 Re z(t) ei2πfc t and z is bandlimited
to W/2 Hz, then z equals xBB .
c
Inner Products

hxPB , yPB i = 2 Re hxBB , yBB i
kxPB k22 = 2 kxBB k22
Proof:
Z ∞
∗
hxBB , yBB i = xBB (t) yBB (t) dt
Z−∞
∞ ∗
= e−i2πfc t xA (t) e−i2πfc t yA (t) dt
Z−∞
∞
= e−i2πfc t xA (t) ei2πfc t yA∗ (t) dt
−∞
= hxA , yA i .
c
x̂PB (f )
f
−fc fc
ŷPB (f )
f
−fc fc
c
Orthogonality in Passband
Two real passband signals xPB , yPB are orthogonal iff the inner
product between their baseband representations is purely imaginary.
c
The Baseband Representation of xPB ? yPB
Recalling that the FT of a convolution is the product of the

transforms, we obtain
The baseband representation of xPB ? yPB

is xBB ? yBB .
c
x̂PB (f )
f
−fc fc
ŷPB (f )
f
−fc fc
c
7.6 Baseband Representation of Real Passband Signals 127
x̂PB (f )
f
−fc fc
ŷPB (f )
1.5
x̂PB (f ) ŷPB (f )
1.5
x̂BB (f )
ŷBB (f )
x̂BB (f ) ŷBB (f )
Figure 7.13: The convolution of two real passband signals and its baseband rep-
resentation.
c
The Baseband Representation of xPB ? h
Here
• xPB is a real passband signal that is bandlimited to W Hz
around fc , but
• h is a general (not necessarily bandpass) real impulse
response.
• xPB ? h is the filter’s response to xPB .
c
7.7 Energy-Limited Passband Signals 131
x̂PB (f )
W
1
f
−fc fc
ĥ(f )
f
−fc fc
x̂PB (f ) ĥ(f )
f
−fc fc
c
Frequency Response w.r.t. the bandwidth W around fc (1)
ĥ(f )
W
f
fc
f
c
−W
2
W
2
Frequency Response w.r.t. the bandwidth W around fc (2)
Definition (Frequency Response with Respect to a Band)

For a stable real filter of impulse response h we define the
frequency response with respect to the bandwidth W around
the carrier frequency fc (satisfying fc > W/2) as the mapping
n Wo
f 7→ ĥ(f + fc ) I |f | ≤ .
2
The FT of the baseband representation of xPB ? h is the product

of x̂BB by the filter’s frequency response with respect to the
bandwidth W around the carrier frequency fc .
c
x̂PB (f )
W
1
f
−fc fc
ĥ(f )
f
−fc fc
x̂PB (f ) ĥ(f )
f
−fc fc
c
x̂BB (f )
f
−W
2
W
2
c
Next Week
We’ll cover Chapter 8. Please review your linear algebra (inner

product spaces) by reading Chapter 4.
Thank you!
c
Amos Lapidoth
ETH Zurich
March 7, 2017
The Geometry of L2 and the Sampling Theorem
c
Today
Chapters 4 and 8:
• L2 as vector space.
• Finite-dimensional subspaces of L2 : bases, dimension,
orthonormal bases, and the Gram-Schmidt procedure (review).
• Expressing a signal in terms of a given orthonormal basis.
• Projections onto a finite dimensional sub-space.
• The projection and closest element in the subspace.
• Complete Orthonormal Systems (CONS).
• CONS and Parseval’s Theorem.
• The Sampling Theorem as an orthonormal expansion.
c
Amplification and Superposition
L2 is the space of energy-limited signals.
Given a complex signal u and α ∈ C, the amplification-by-α-of-u

is the signal αu
t 7→ αu(t), t ∈ R.
The superposition u + v of the signals u and v, is the signal
t 7→ u(t) + v(t), t ∈ R.
With these operations, L2 forms a vector space over C.
A finite sum u1 + · · · + un is similarly defined.
c
Linear Subspaces
A subset U ⊆ L2 is a linear subspace of L2 if it is not empty; it is

closed under superposition
u1 + u2 ∈ U, u1 , u2 ∈ U;
and it is closed under amplification

αu ∈ U, α ∈ C, u∈U .
Example: The set of all energy-limited signals that are zero

whenever t 6= 17.
c
Another Linear Subspace
All signals of the form
t 7→ p(t) e−|t| ,
where p(t) is any complex polynomial of degree ≤ 3.

u : t 7→ α0 + α1 t + α2 t2 + α3 t3 e−|t| .

αu : t 7→ αα0 + αα1 t + αα2 t2 + αα3 t3 e−|t| .
If u is as above and

v : t 7→ β0 + β1 t + β2 t2 + β3 t3 e−|t| ,
then

u+v : t 7→ (α0 + β0 ) + (α1 + β1 )t + (α2 + β2 )t2 + (α3 + β3 )t3 e−|t| .
c
Linear Combinations, Span, and Independence
• v ∈ L2 is a linear combination of (v1 , . . . , vn ) if it equals
α 1 v1 + · · · + α n vn ,
i.e.,
n
X
α ν vν ,
ν=1
for some α1 , . . . , αn ∈ C.
• span(v1 , . . . , vn ) is the set of all vectors in L2 that are linear
combinations of (v1 , . . . , vn ).
• span(v1 , . . . , vn ) is a linear subspace of L2 .
• The n-tuple (v1 , . . . , vn ) is linearly independent if
X n
αν vν = 0 =⇒ αν = 0, ν = 1, . . . , n .
ν=1
c
Dimension, Finite and Infinite
• A subspace U of L2 is finite-dimensional if there exists an

n-tuple (u1 , . . . , un ) such that span(u1 , . . . , un ) = U.
• (u1 , . . . , ud ) is a basis for U if it is
1. linearly independent and
2. span(u1 , . . . , ud ) = U.
• All bases for a finite-dimensional subspace U have the same
number of elements: the dimension of U — dim U.
c
Some Examples
• The set of all signals of the form
t 7→ p(t) e−|t| ,
where p(·) is any polynomial is infinite dimensional.

• If p(·) is restricted to degree ≤ 3, then dim U = 4, and a basis
is
t 7→ e−|t| , t 7→ t e−|t| , t 7→ t2 e−|t| , t 7→ t3 e−|t| .
• If U comprises all signals that vanish whenever t 6= 17, then

dim U = 1 and a basis is
t 7→ I{t = 17}.
c
kuk2 as the “length” of the Signal u(·)
For u, v ∈ L2 and α ∈ C
kαuk2 = |α| kuk2 ,
ku + vk2 ≤ kuk2 + kvk2 ,

and
kuk2 = 0 ⇐⇒ u ≡ 0 .
Also,

kuk − kvk ≤ ku + vk ≤ kuk + kvk , u, v ∈ L2 ,
2 2 2 2 2
because
kvk2 = k(v + u) + (−u)k2 ≤ kv + uk2 +k−uk2 = kv + uk2 +kuk2
and likewise when you swap u and v.

c
The Triangle Inequality for Energy-Limited Signals
u+v v
c
A Pythagorean Theorem
Last time:

ku + vk22 = kuk22 + kvk22 + 2 Re hu, vi ,
so
ku + vk22 = kuk22 + kvk22 , u and v are orthogonal.
By induction
ku1 + · · · + un k22 = ku1 k22 +· · ·+kun k22 , u1 , . . . , un pairwise orthog.
Indeed, if u , u1 , and v , u2 + · · · un , then the pairwise
orthogonality implies hu, vi = 0 because
hu, vi = hu1 , u2 + · · · un i = hu1 , u2 i + · · · + hu1 , un i = 0.
Hence ku + vk22 = kuk22 + kvk22 , i.e.,
ku1 + · · · + un k22 = ku1 k22 + ku2 + · · · + un k22 .
Now use the induction hypothesis.
c
Projecting v onto u (1)
u
w = (length of v) cos(angle between v and u) .
length of u
c
The projection of the signal v ∈ L2 onto the signal u ∈ L2 is the

signal w that satisfies both
1. w = αu for some α ∈ C and
2. v − w is orthogonal to u.
c
The projection of the signal v ∈ L2 onto the signal u ∈ L2 is the
signal w that satisfies both
1. w = αu for some α ∈ C and
2. v − w is orthogonal to u.
hv − αu, ui = 0,
i.e.,
hv, ui − α kuk22 = 0.
For kuk2 > 0 (strictly),1
hv, ui
α= ,
kuk22
and the projection w is thus unique and is given by
hv, ui
w= u.
kuk22
1
Otherwise the projection is not defined.
c
Since v − w is orthogonal to u, and since w equals αu, it follows

that v − w is orthogonal to w. Consequently, the Pythagorean
Theorem yields that projection reduces length:
kvk22 = k(v − w) + wk22 = kv − wk22 + kwk22 ≥ kwk22 .
And since
hv, ui
w= u,
kuk22
we obtain the Cauchy-Schwarz Inequality
|hu, vi| ≤ kuk2 kvk2 .
c
Orthonormal Tuples
The n-tuple of L2 signals (φ1 , . . . , φn ) is orthonormal, if

(
0 if ` 6= `0 ,
hφ` , φ`0 i = `, `0 ∈ {1, . . . , n}.
1 if ` = `0 ,
c
Orthonormal Tuples Are Linearly Independent
If
n
X
α` φ` = 0,
`=1
then for every `0 ∈ {1, . . . , n}
0 = h0, φ`0 i
X n
= α` φ` , φ`0
`=1
n
X
= α` hφ` , φ`0 i
`=1
n
X
= α` I{` = `0 }
`=1
= α`0 .
c
An Orthonormal Basis
A d-tuple of signals in L2 is an orthonormal basis for the linear

subspace U ⊂ L2 if it is orthonormal and its span is U.
c
If (φ1 , . . . , φd ) is an orthonormal basis for U ⊂ L2 ,
d
X
u= hu, φ` i φ` , u ∈ U.
`=1
Since (φ1 , . . . , φd ) is a basis for U, any u ∈ U can be expressed as
X d
u= α` φ` .
`=1
X
d
hu, φ`0 i = α` φ` , φ`0
`=1
d
X
= α` hφ` , φ`0 i
`=1
Xd
= α` I{` = `0 }
`=1
= α`0 .
c
Projections (1)
Lemma
Let (φ1 , . . . , φd ) be an orthonormal basis for U ⊂ L2 . Let v ∈ L2
be arbitrary.
P
1. v − d`=1 hv, φ` i φ` is orthogonal to every signal in U:
d
X
v− hv, φ` i φ` , u = 0, v ∈ L2 , u∈U .
`=1
2. If w ∈ U is s.t. v − w is orthogonal to every signal in U, then

d
X
w= hv, φ` i φ` .
`=1
c
Projections (2)
Proof. Pd
The signal v − `=1 α` φ` is orthogonal to φ`0 iff α`0 = hv, φ`0 i.
Hence:
P
1. If α`0 = hv, φ`0 i for all `0 ∈ {1, . . . , d} then v − d`=1 α` φ` is
orthogonal to each basis function and hence to every u ∈ U.
P
2. If w is in U, it can be written as d`=1 α` φ` , and if,
additionally, v − w is orthogonal to each φ`0 , then α`0 must
equal hv, φ`0 i.
c
Projections (3)
Definition (Projection of v ∈ L2 onto U)

Let U ⊂ L2 be a finite-dimensional linear subspace of L2 having
an orthonormal basis. Let v ∈ L2 be an arbitrary energy-limited
signal. Then the projection of v onto U is the unique element w of
U satisfying
hv − w, ui = 0, u ∈ U.
Note: If (φ1 , . . . , φd ) is an orthonormal basis for U, then the

projection of v ∈ L2 onto U is
d
X
hv, φ` i φ` .
`=1
c
Projection as Best Approximation
Let (φ1 , . . . , φd ) be an orthonormal basis for U ⊂ L2 . Let v ∈ L2
be arbitrary. Then the projection w of v onto U is the element in
U that, among all the elements of U, is closest to v in the sense
that
kv − uk2 ≥ kv − wk2 , u ∈ U.
Proof.
Let u be any element of U. Then so is w − u. Since v − w is
orthogonal to all elements of U it is a fortiori also orthogonal to
w − u. Thus, by Pythagoras,
kv − uk22 = k(v − w) + (w − u)k22

= kv − wk22 + kw − uk22
≥ kv − wk22 .
c
Energy and Inner Products and Orthonormal Bases (1)
Let (φ1 , . . . , φd ) be an orthonormal basis for U ⊂ L2 . Then

d
X
kuk22 = hu, φ` i2 , u ∈ U.
`=1
Proof.
Follows from the Pythagorean Theorem and
d
X
u= hu, φ` i φ` , u ∈ U.
`=1
c
d
X
kvk22 ≥ hv, φ` i2 , v ∈ L2 .
`=1
Proof.
Let w be the projection of v onto U. Then
kvk22 = k(v − w) + wk22 = kv − wk22 + kwk22 ≥ kwk22 .
And by the expression for w and the Pythagorean Theorem

2
Xd d
X
hv, φ` i2 .
kwk22 = hv, φ` i φ` =

`=1 2 `=1
c
d
X
hv, ui = hv, φ` i hu, φ` i∗ , v ∈ L2 , u ∈ U .
`=1
Since u ∈ U,
d
X
u= hu, φ` i φ` .
`=1
Consequently, using the sesquilinearity of the inner product,
X d
hv, ui = v, hu, φ` i φ`
`=1
d
X
= hu, φ` i∗ hv, φ` i .
`=1
c
Does Every Finite-Dimensional Subspace Have an
Orthonormal Basis?
In general, no:

u ∈ L2 : u(t) = 0 whenever t 6= 17
is a one-dimensional subspace that does not.
If U is a finite-dimensional subspace of L2 , then the following two

statements are equivalent:
1. U has an orthonormal basis.
2. The only element of U of zero energy is the all-zero signal 0.
c
Gram-Schmidt
Input: a basis (u1 , . . . , ud ).

u1
φ1 , .
ku1 k2
P
uν − ν−1
`=1 huν , φ` i φ`

φν = Pν−1 .
uν − `=1 huν , φ` i φ`
2
c
The Benefits of Orthonormal Bases
d
X
u= hu, φ` i φ` , u ∈ U.
`=1
d
X
kuk22 = hu, φ` i2 , u∈U
`=1
d
X
hv, ui = hv, φ` i hu, φ` i∗ , v ∈ L2 , u ∈ U .
`=1
d
X
w= hv, φ` i φ` .
`=1
c
Complete Orthonormal System (CONS)
Definition: . . . , φ−1 , φ0 , φ1 , . . . is a complete orthonormal
system (CONS) for the linear subspace U ⊆ L2 if:
1. φ` ∈ U, ` ∈ Z.
2. hφ` , φ`0 i = I{` = `0 }, `, `0 ∈ Z.
P
3. kuk2 = ∞ hu, φ` i2 , u ∈ U.
2 `=−∞
Note: Orthonormality suffices for

L
X
kuk22 ≥ hu, φ` i2 , u∈U
`=−L
and hence, letting L → ∞,

∞
X
kuk22 ≥ hu, φ` i2 , u ∈ U.
`=−∞
c
If {φ` } ⊂ U are orthonormal, then the following are equivalent:
• ∀u ∈ U and ∀ > 0 there exists some positive integer L() and
coefficients α−L() , . . . , αL() ∈ C s.t.
L()
X

u − α ` φ` (3)
< .
`=−L() 2
• For every u ∈ U
L
X

lim u − hu, φ` i φ` (4)
L→∞ = 0.
`=−L 2
∞
X
kuk22 = hu, φ` i2 . (5)
`=−∞
• For every u, v ∈ U
∞
X
hu, vi = hu, φ` i hv, φ` i∗ . (6)
c
Lecture 3, Amos Lapidoth 2017 `=−∞
L()
X
u − α φ < .
` `
`=−L() 2
L
X
lim
u − hu, φ` i φ`
= 0.
L→∞ 2
`=−L
One direction is obvious, and the other because

L
X
hu, φ` i φ`
`=−L
is the closest element to u in span(φ−L() , . . . , φ−L() ).

c
L
X
lim
u − hu, φ` i φ`
= 0.
L→∞ 2
`=−L
implies
∞
X
kuk22 = hu, φ` i2 .
`=−∞
because

kuk − kvk ≤ ku + vk ≤ kuk + kvk , u, v ∈ L2 .
2 2 2 2 2
(The distance → 0 only if lengths have same limit.) Conversely,
X L
u− hu, φ` i φ` , φ`0 = hu, φ`0 i I{|`0 | > L}, `0 ∈ Z, u ∈ L2 .
`=−L
| {z }
u0
L 2
X X
u − hu, φ` i φ` hu, φ` i2 → 0, u ∈ U.
=
`=−L 2 |`|>L
c
∞
X
kuk22 = hu, φ` i2 , u ∈ U.
`=−∞
is implied by
∞
X
hu, vi = hu, φ` i hv, φ` i∗ , u, v ∈ U.
`=−∞
Conversely, the former implies

lim u − uL = 0,2
lim v − vL 2 = 0, (7)
L→∞ L→∞
L
X L
X
uL , hu, φ` i φ` , vL , hv, φ` i φ` .
`=−L `=−L

hu, vi = (u − uL ) + uL , (v − vL ) + vL
L
X
huL , vL i = hu, φ` i hv, φ` i∗ .
`=−L
The cross-terms
c
Lecture 3, Amos
tend to zero by (7) and Cauchy-Schwarz.
Lapidoth 2017
Fourier Series
Let S be positive. The bi-infinite sequence of functions defined for

every η ∈ Z by

1 i2πηs/S S S
s 7→ √ e I − ≤s< , s∈R
S 2 2
forms a CONS for the subspace of square-integrable functions that

vanish outside the interval [−S/2, S/2).
See Theorem A.3.3 in the appendix.
c
The Frequency Domain
For W > 0 define the linear subspace

V = g ∈ L2 : g(f ) = 0 whenever |f | > W .
The functions defined for every integer ` by

1
f 7→ √ eiπ`f /W I{|f | ≤ W}
2W
form a CONS for V.
c
Energy-Limited Signals that Are Bandlimited to W Hz
The signal x is an energy-limited signal that is bandlimited to W

Hz if, and only if, x = ǧ for some function g : f 7→ g(f ) satisfying
g(f ) = 0, |f | > W
and Z W
|g(f )|2 df < ∞.
−W
Thus, if

V = g ∈ L2 : g(f ) = 0 whenever |f | > W .
then V̌ is the set of all energy-limited signals that are bandlimited

to W Hz.
c
If {ψ` } is a CONS for V, then {ψ̌` } is a CONS for V̌
Let {ψ` } be a CONS for V. Orthonormality follows from Parseval:

ψ̌` , ψ̌`0 = hψ` , ψ`0 i , `, `0 ∈ Z.
We need to verify that for every x ∈ V̌,

∞
X

x, ψ̌` 2 = kxk2 .
2
`=−∞
Since x is in V̌, there exists some g ∈ V s.t. x = ǧ. Then,

∞
X ∞
X

ǧ, ψ̌` 2 = hg, ψ` i2
`=−∞ `=−∞
= kgk22
= kǧk22 , g ∈ V.
c
A CONS for V̌
The sequence of signals that are defined for every integer ` by
√
t 7→ 2W sinc(2Wt + `)
forms a CONS for the space of energy-limited signals that are

bandlimited to W Hz.
1
ψ` : f 7→ √ eiπ`f /W I{|f | ≤ W}
2W
form a CONS for the subspace V. Hence the IFT is a CONS for V̌.
Z ∞
ψ̌` (t) = ψ` (f ) ei2πf t df
−∞
ZW
1
= √ eiπ`f /W ei2πf t df
−W 2W
√
= 2W sinc(2Wt + `).
c
D √ E
x, t 7→ 2W sinc(2Wt + `) = √1
2W
`
x − 2W , ` ∈ Z.
Let g ∈ V be such that x = ǧ, i.e,
Z W
x(t) = g(f ) ei2πf t df, t ∈ R.
−W
D √ E

x, t 7→ 2W sinc(2Wt + `) = x, ψ̌`

= ǧ, ψ̌`
= hg, ψ` i
Z W 1 ∗
= g(f ) √ eiπ`f /W df
−W 2W
Z W
1
=√ g(f ) e−iπ`f /W df
2W −W
1 `
=√ x − , ` ∈ Z.
2W 2W
c
If {φ` } ⊂ U are orthonormal, then the following are equivalent:
L()
X

u − α ` φ` (8)
< .
`=−L() 2
L
X

lim u − hu, φ` i φ` (9)
L→∞ = 0.
`=−L 2
∞
X
kuk22 = hu, φ` i2 . (10)
`=−∞
• For every u, v ∈ U
∞
X
hu, vi = hu, φ` i hv, φ` i∗ . (11)
c
Lecture 3, Amos Lapidoth 2017 `=−∞
Applying this with φ` = ψ̌`
Theorem (L2 -Sampling Theorem)
Let x be an energy-limited signal that is bandlimited to W Hz,
1
T= .
2W
Let . . . , x(−2T), x(−T), x(0), x(T), x(2T), . . . be the samples of x.
Z ∞ XL t 2

lim x(t) − x(−`T) sinc + ` dt = 0.
L→∞ −∞ T
`=−L
Z ∞ ∞
X
|x(t)|2 dt = T |x(`T)|2 .
−∞ `=−∞
If y is another energy-limited signal that is bandlimited to W Hz,

X∞
hx, yi = T x(`T) y ∗ (`T).
`=−∞
c
Pointwise Sampling Theorem
If the signal x can be represented as
Z W
x(t) = g(f ) ei2πf t df, t∈R
−W
for some function g satisfying

Z W
|g(f )| df < ∞,
−W
and if 0 < T ≤ 1/(2W), then for every t ∈ R

L
X
t
x(t) = lim x(−`T) sinc +` .
L→∞ T
`=−L
A special case is when x is an energy-limited signals that is

bandlimited to W Hz.
c
V̌ V
energy-limited signals that energy-limited functions that
are bandlimited to W Hz vanish outside the interval [−W, W)
generic element of V̌ generic element of V
x : t �→ x(t) g : f �→ g(f )
a CONS a CONS
. . . , ψ̌−1 , ψ̌0 , ψ̌1 , . . . . . . , ψ−1 , ψ0 , ψ1 , . . .
√ � � 1
ψ̌� (t) = 2W sinc 2Wt + � ψ� (f ) = √ eiπ�f /W I{−W ≤ f < W}
2W
inner product inner product
� �
x, ψ̌� �g, ψ� �
� ∞ √ � W
� � 1
x(t) 2W sinc 2Wt + � dt g(f ) √ e−iπ�f /W df
−∞ � � � −W 2W
1
=√ x − = g’s �-th Fourier Series Coefficient (� c� )
2W 2W
Sampling Theorem Fourier Series
� �L � � �L �
� � � � � �
�
lim �x − x, ψ̌� ψ̌� � �
lim �g − �g, ψ� � ψ� �
L→∞ � = 0, L→∞ � = 0,
�=−L 2 �=−L 2
i.e., i.e.,
� � � � � �2
∞ � L
� � �� 2 �W L
� 1 �
�x(t) − � �
� x − sinc 2Wt + � �� dt → 0 �g(f ) − c� √ eiπ�f /W � df → 0
−∞ 2W −W � 2W �
�=−L �=−L
Table 8.1: The duality between the Sampling Theorem and the Fourier Series Representation.
c
The Sampling Theorem also holds for
complex signals.
c
An Isomorphism
If {α` } is a bi-infinite square-summable sequence, then there exists

an energy-limited signal u that is bandlimited to W Hz such that
its samples are given by
u(`T) = α` , ` ∈ Z.
c
Next Week
• Sampling of real passband signals (Chapter 9).

• Mapping bits to waveforms (Chapter 10).
• A glimpse at stochastic processes (Chapter 12).
Thank you!
c
Amos Lapidoth
ETH Zurich
March 14, 2017
Sampling Real Passband Signals
c
Today
• Sampling real passband signals (Chapter 9).

• Mapping bits to waveforms (Chapter 10).
• A glimpse at stochastic processes (Chapter 12).
c
Bandwidth vs. Bandwidth-around-fc
ŷ(f )
W
fc + 2
f
W W
−fc fc − 2
fc fc + 2
The bandwidth is fc + W/2; the bandwidth around fc is W.

A direct application of the Sampling Theorem would require
W (real) samples
2 fc + .
2 sec
c
The Holy Grail
2W real samples per second,
or
W complex samples per second.
• Sampling rate should not depend on fc .

• Separation of carrier-selection and sampling circuits.
c
Recall
• If xPB is a real passband signal whose bandwidth around fc is

W, then its baseband representation xBB (w.r.t. fc ) is a
complex signal of bandwidth W/2.
• The Sampling Theorem applies also to complex signals.
• xPB can be recovered from xBB and fc .
c
The Solution
Sampling:
• From xPB generate xBB .
• Sample xBB at its Nyquist rate
complex samples W complex samples

2×(bandwidth of xBB ) = 2×
sec 2 sec
complex samples
=W .
sec
Reconstruction:
• From the samples of xBB reconstruct xBB .
• From xBB (and fc ) reconstruct xPB :

xPB (t) = 2 Re ei2πfc t xBB (t) , t ∈ R.
c
x̂PB (f )
f
−fc fc
x̂PB (f + fc )
f
−2fc −fc −W
2
W
2
g0 (f )
f
−Wc Wc
x̂BB (f )
f
−W
2
W
2
c
Complex Sampling
We seek the samples of xBB :
`
xBB , ` ∈ Z.
W
Recalling that

xBB = t 7→ e−i2πfc t xPB (t) ? LPFWc ,
where
W W
≤ Wc ≤ 2fc − ,
2 2
` `
−i2πfc t
xBB = t 7→ e xPB (t) ? LPFWc
W W
`
= t→ 7 xPB (t) cos(2πfc t) ? LPFWc
W
`
− i t 7→ xPB (t) sin(2πfc t) ? LPFWc , ` ∈ Z.
W
c
Complex Sampling

xPB (t) cos(2πfc t) Re xBB (t) Re xBB (`/W)
× LPFWc
`/W
cos(2πfc t)
W W
xPB (t) 2 ≤ Wc ≤ 2fc − 2
90◦

−xPB (t) sin(2πfc t) Im xBB (t) Im xBB (`/W)
× LPFWc
`/W
c
Reconstruction
By the (Pointwise) Sampling
∞
X `
xBB (t) = xBB sinc(Wt − `), t ∈ R.
W
`=−∞
Consequently,
∞
X `
i2πfc t
xPB (t) = 2 Re e xBB sinc(Wt − `) , t ∈ R.
W
`=−∞
Since sinc (·) is real,
X∞ `
xPB (t) = 2 Re ei2πfc t xBB sinc(Wt − `)
W
`=−∞
X∞ `
=2 Re xBB sinc(Wt − `) cos(2πfc t)
W
`=−∞
∞
X `
−2 Im xBB sinc(Wt − `) sin(2πfc t), t ∈ R.
W
`=−∞
c
Convergence in L2 (1)
L
X ` 2

lim t 7→ xBB (t) − xBB sinc(Wt − `)
L→∞ W = 0.
`=−L 2
We’ll show shortly that

L
X `
t 7→ xBB (t) − xBB sinc(Wt − `)
W
`=−L
is the baseband representation of the real passband signal

L
X `
t 7→ xPB (t) − 2 Re ei2πfc t xBB sinc(Wt − `) .
W
`=−L
So the energy in the latter is twice the energy of the former and
thus also converges to zero.
c
Thus,

L
X `
i2πfc t
lim t 7→ xPB (t)−2 Re e xBB sinc(Wt−`) = 0.
L→∞ W
`=−L 2
c
To deliver what we owe, recall that:

“if xPB (t) = 2 Re z(t) ei2πfc t and z is bandlimited to
W/2 Hz, then z equals xBB .” (Lecture 2).
Hence,
L
X `
t 7→ xBB (t) − xBB sinc(Wt − `)
W
`=−L
is the baseband representation of

L
X `
i2πfc t
t 7→ xPB (t) − 2 Re e xBB sinc(Wt − `) .
W
`=−L
c
The Sampling Theorem for Real Passband Signals
Let xPB be a real energy-limited passband signal that is
bandlimited to W Hz around the carrier frequency fc . Let
xBB (`/W) denote the time-`/W sample of xBB .
1. xPB can be reconstructed from the samples in the L2 sense
Z ∞ L
X ` !2
i2πfc t
lim xPB (t)−2 Re e xBB sinc(Wt−`) dt = 0.
L→∞ −∞ W
`=−L
2. kxPB k22 can be computed from the complex samples:

2 X ` 2
∞

kxPB k22 = xBB .
W W
`=−∞
3. If yPB is another real energy-limited passband signal that is
bandlimited to W Hz around fc , then
X ∞ ` `
2 ∗
hxPB , yPB i = Re xBB y .
W W BB W
`=−∞
c
Mapping Bits to Waveforms (Modulation)
• Data bits have no physical attributes.
• Upper-case letters because they are random.
• The modulator maps them to waveforms.
• The modulator’s output is a stochastic process.
Mapping a single bit:
(
x0 (t) if D = 0,
X(t) = t ∈ R.
x1 (t) if D = 1,
For example,
(
A e−t/T if t/T ≥ 0,
x0 (t) = t ∈ R,
0 otherwise,
and (
A if 0 ≤ t/T ≤ 1,
x1 (t) = t ∈ R.
0 otherwise,
c
Stochastic Process
A probability space is a triple (Ω, F, P ), where

• the elements of Ω are the “outcomes,”
• the elements of F are the “events,”
• and P (·) assigns a probability to every event.
A SP is a function of time and “luck/draw”, i.e., of time and the

results of all the random experiments in the system.
X: Ω × R → R
(ω, t) 7→ X(ω, t).
c
Sample Paths
Fixing a draw ω ∈ Ω, the SP becomes a function of time:

sample-path, or trajectory, or sample-path realization, or sample
function
X(ω, ·) : R → R
t 7→ X(ω, t).
c
A Stochastic Process and its Sample-Paths
(
x0 (t) if D = 0,
X(t) = t ∈ R,
x1 (t) if D = 1,
where (
A e−t/T if t/T ≥ 0,
x0 (t) = t ∈ R,
0 otherwise,
(
A if 0 ≤ t/T ≤ 1,
x1 (t) = t ∈ R,
0 otherwise,
has two sample-paths: x0 (·) and x1 (·).
c
Time-t Sample
Fixing an epoch t ∈ R and viewing the SP as a function of “luck”

only, we obtain a random variable: the value of the process at
time t or the time-t sample of the process
X(·, t) : Ω → R
ω 7→ X(ω, t).
c
A Formal Definition

A stochastic process X(t), t ∈ T is an indexed family of random
variables that are defined on a common probability space
(Ω, F, P ).
• T = R ⇒ continuous-time stochastic process.

• T = Z ⇒ discrete-time stochastic process.
c
From Bits to Real Numbers
k—number of data bits transmitted during the system’s lifetime.
The data bits transmitted are
D1 , D2 , . . . , Dk .
The mapping
ϕ : {0, 1}k → Rn
(d1 , . . . , dk ) 7→ (x1 , . . . , xn )
maps k-tuples of data bits to n-tuples of real numbers.
We only consider mappings ϕ(·) that are one-to-one (injective).
c
Examples (1)
• n = k, i.e., one bit per real symbol, and
(
+1 if Dj = 0,
Xj = j = 1, . . . , k.
−1 if Dj = 1,
• k is even; the data bits {Dj } are broken into pairs
(D1 , D2 ), (D3 , D4 ), . . . , (Dk−1 , Dk );
and each pair is mapped to a single real number



+3 if D2j−1 = D2j = 0,


+1 if D
2j−1 = 0 and D2j = 1, k
(D2j−1 , D2j ) 7→ j = 1, . . . , .
−3 if D2j−1 = D2j = 1,
 2


−1 if D
2j−1 = 1 and D2j = 0,
Here n = k/2, and the rate is two bits per real symbol.
c
Examples (2)
• Each data bit Dj produces two real numbers by repetition:

(
(+1, +1) if Dj = 0,
Dj 7→ j = 1, . . . , k.
(−1, −1) if Dj = 1,
Here n = 2k, and the rate is half a bit per real symbol.
c
Block-Mode Mapping of Bits to Real Numbers
D1 , D2 , . . . , DK , DK+1 , . . . , D2K , , Dk−K+1 , . . . , Dk

enc(·) enc(·) enc(·)
X1 , X2 , ... , XN , XN+1 , ... , X2N , , Xn−N+1 , ... , Xn
enc(D1 , . . . , DK ) enc(DK+1 , . . . , D2K ) enc(Dk−K+1 , . . . , Dk )
enc : {0, 1}K → RN is a (K, N) binary-to-reals block encoder of rate

K bit
.
N real symbol
Always assumed one-to-one.

c
Zero Padding
D1 , D2 , . . . , DK , DK+1 , . . . , D2K , , Dk0 −K+1 , . . . , Dk , 0, . . . , 0

X1 , X2 , ... , XN ,XN+1 , ... , X2N , , Xn0 −N+1 , ... , Xn0
enc(D1 , . . . , DK ) enc(DK+1 , . . . , D2K ) enc(Dk−K+1 , . . . , Dk , 0, . . . , 0)
Apply the (K, N) encoder in block-mode to

D1 , . . . , Dk , 0, . . . , 0
| {z }
k0 − k zeros
where
k
k0 = K.
K
c
From Real Numbers to Waveforms with Linear Modulation
We map D1 , . . . , Dk to the real numbers X1 , . . . , Xn and produce

the waveform
n
X
X(t) = A X` g` (t), t ∈ R.
`=1

X is a scaled-by-A linear combination of the tuple g1 , . . . , gn
with the coefficients X1 , . . . , Xn :
n
X
X=A X` g` .
`=1
c
The Energy in the Transmitted Waveform
The transmitted energy is a random variable!
Z ∞
2
kXk2 = X 2 (t) dt
−∞
Z ∞ n
X 2
= A X` g` (t) dt
−∞ `=1
Z ∞ Xn X
n
= A X` g` (t) A X`0 g`0 (t) dt
−∞ `=1 `0 =1
n
XXn Z ∞
= A2 X` X`0 g` (t) g`0 (t) dt
`=1 `0 =1 −∞
Xn X n
= A2 X` X`0 hg` , g`0 i .
`=1 `0 =1
n
X
kXk22 = A2 X`2 , {g` } orthonormal.
c
Lecture 4, Amos Lapidoth 2017 `=1
Gram-Schmidt to the Rescue
There is no loss in generality in assuming that (g1 , . . . , gn ) is

orthonormal.
c
Recovering the Symbols with a Matched Filter
ϕ : (D1 , . . . , Dk ) 7→ (X1 , . . . , Xn ).
n
X
X(t) = A X` φ` (t), t ∈ R,
`=1
(φ1 , . . . , φn ) orthonormal.
How can we recover the symbols?
1
hX, φ` i , ` = 1, . . . , n.
X` =
A
This requires n inner-product computations.
c
Computing Inner Products with a Matched Filter

~ ∗ (0),
hu, φi = u ? φ u, φ ∈ L2 .
More generally, if g : t 7→ φ(t − t0 ), then

Z ∞

hu, gi = ~ ∗ (t0 ).
u(t) φ∗ (t − t0 ) dt = u ? φ
−∞ | {z }
g ∗ (t)
We express the time-t0 output of the matched filter as:

Z ∞

u?φ ~ ∗ (t0 ) = ~ ∗ (t0 − τ ) dτ
u(τ ) φ
Z−∞
∞
= u(τ ) φ∗ (τ − t0 ) dτ.
−∞
c
Computing Many Inner Products Is sometimes Easy
If g1 , . . . , gJ ∈ L2 are all time shifts of the same signal φ
gj : t 7→ φ(t − tj ), j = 1, . . . , J,
and if u ∈ L2 is arbitrary, then all J inner products
hu, gj i , j = 1, . . . , J
can be computed using one filter by feeding u to a matched filter

for φ and sampling the output at the appropriate times t1 , . . . , tJ :

~ ∗ (tj ), j = 1, . . . , J.
hu, gj i = u ? φ
c
Back to Linear Modulation
n
X
X(t) = A X` φ` (t), t ∈ R,
`=1
(φ1 , . . . , φn ) orthonormal.
Suppose now that

φ` (t) = φ(t − `Ts ), ` ∈ {1, . . . , n}, t ∈ R .
Then we can compute all the inner products using one circuit!
Z
1 ∞
X` = X(τ ) φ` (τ ) dτ
A −∞
Z
1 ∞
= X(τ ) φ(τ − `Ts ) dτ
A −∞
Z
1 ∞ ~ s − τ ) dτ
= X(τ ) φ(`T
A −∞
1
~ (`Ts ), ` = 1, . . . , n.
= X ?φ
c
A
Recovering the Symbols Using One MF
X(·) ~
φ AX`
`Ts
This motivates
n
X
X(t) = A X` φ(t − `Ts ), t ∈ R.
`=1
c
Pulse Amplitude Modulation
The bits D1 , . . . , Dk are mapped to the symbols X1 , . . . , Xn , and
n
X
X(t) = A X` g(t − `Ts ), t ∈ R.
`=1
• A ≥ 0 is a scaling factor.
• g : R → R—pulse shape.
• Ts > 0—baud period.
• 1/Ts —baud rate.

1 real symbols
.
Ts sec
If the time shifts of g are orthonormal—φ—

1 real dimensions
. J.M.E. Baudot
Ts sec
(1845–1903)
c
The Constellation
The constellation of ϕ(·) is denoted X . It contains x iff for some

choice of the binary k-tuple (d1 , . . . , dk ) and for some
` ∈ {1, . . . , n} the `-th component of ϕ (d1 , . . . , dk ) is equal to x.
For example,
−5, −3, −1, +1, +3, +5
or
{−2, −1, +1, +2}.
c
Constellation Parameters
The minimum distance of a constellation X is
δ , min
0
|x − x0 |.
x,x ∈X
x6=x0
The second moment of X is

1 X 2
x .
#X
x∈X
c
Uncoded Transmission
The transmission is uncoded or the system is uncoded if the

range of ϕ equals X n

ϕ(d) : d ∈ {0, 1}k = X n ,
i.e., if every sequence in X n can be produced by the encoder by

feeding it the appropriate data sequence.
Examples:
1. uncoded: Antipodal signaling Dj 7→ 1 − 2Dj .
2. coded: Repetition coding: 0 7→ +1, +1 and 1 7→ −1, −1.
c
Next Week
• PAM and Nyquist’s Criterion (Chapter 11).

• Energy and Power in PAM (Chapter 14).
Reading Assignment:
• PAM Implementation Considerations (Section 10.12).
• Stationary Discrete-Time SP (Chapter 13).
Thank you!
c
Amos Lapidoth
ETH Zurich
March 21, 2017
Nyquist’s Criterion
c
• Nyquist’s Criterion.
• The Self-Similarity Function.
• Excess Bandwidth.
• Band-edge symmetry.
c
PAM with a Shift-Orthonormal Pulse Shape
n
X
X(t) = A X` φ(t − `Ts ), t ∈ R,
`=1
where
Z ∞
φ(t − `Ts ) φ(t − `0 Ts ) dt = I{`0 = `}, `, `0 ∈ {1, . . . , n}.
−∞
• Energy:
n
X
kXk22 = A2 X`2 .
`=1
• Recovering X` is easy

~ (`Ts ),
X` = A−1 X, t 7→ φ(t − `Ts ) = A−1 X?φ ` = 1, . . . , n.
One matched filter computes all the inner products.

c
Massaging the Constraint
• Instead of
Z ∞
φ(t − `Ts ) φ(t − `0 Ts ) dt = I{`0 = `}, `, `0 ∈ {1, . . . , n},
−∞
we’ll require
Z ∞
φ(t − `Ts ) φ(t − `0 Ts ) dt = I{`0 = `}, `, `0 ∈ Z.
−∞
• And we’ll solve the complex case:

Z ∞
φ(t − `Ts ) φ∗ (t − `0 Ts ) dt = I{`0 = `}, `, `0 ∈ Z.
−∞
c
The Time Shifts Are Orthogonal when they don’t Overlap
c
The Self-Similarity Function Rvv of v ∈ L2
Z ∞
Rvv : τ 7→ v(t + τ ) v ∗ (t) dt, τ ∈ R.
−∞
c
v ∗(t)
τ
v(t + τ )
c
The Self-Similarity Function Rvv of v ∈ L2
Z ∞
Rvv : τ 7→ v(t + τ ) v ∗ (t) dt, τ ∈ R.
−∞
1. Value at zero:
Z ∞
Rvv (0) = |v(t)|2 dt.
−∞
2. Maximum at zero:
|Rvv (τ )| ≤ Rvv (0), τ ∈ R.
3. Conjugate symmetry:
∗
Rvv (−τ ) = Rvv (τ ), τ ∈ R.
4. Integral representation:
Z ∞
Rvv (τ ) = |v̂(f )|2 ei2πf τ df, τ ∈ R.
−∞
c
Some Proofs
Z ∞
Rvv : τ 7→ v(t + τ ) v ∗ (t) dt, τ ∈ R.
−∞
1. Rvv (0) = kvk22 follows by substituting τ = 0.

2. |Rvv (τ )| ≤ Rvv (0) follows from

|Rvv (τ )| = t 7→ v(t + τ ), v
≤ kt 7→ v(t + τ )k2 kvk2
= kvk22 .
c
3. Conjugate symmetry follows by substituting s , t + τ in
Z ∞
∗
Rvv (τ ) = v(t| +
{z τ}) v (t) dt
−∞
s
Z ∞
= v(s) v ∗ (s − τ ) ds
−∞
Z ∞ ∗
∗
= v(s − τ ) v (s) ds
−∞
∗
= Rvv (−τ ), τ ∈ R.
4. The integral representation follows from Parseval:
Z ∞
Rvv (τ ) = v(t + τ ) v ∗ (t) dt
−∞

= t 7→ v(t + τ ), t 7→ v(t)
D E
= f 7→ ei2πf τ v̂(f ), f 7→ v̂(f )
Z ∞
= ei2πf τ |v̂(f )|2 df, τ ∈ R.
−∞
c
More Properties of Rvv
5. Uniform Continuity: Rvv is uniformly continuous.
The IFT of an integrable function is uniformly continuous.
6. Convolution Representation:
Rvv (τ ) = (v ? ~v∗ ) (τ ), τ ∈ R.
Z ∞
Rvv (τ ) = v(t| +
{z τ}) v ∗ (t) dt
−∞
s
Z ∞
= v(s) v ∗ (s − τ ) ds
−∞
Z ∞
= v(s) ~v ∗ (τ − s) ds
−∞
= (v ? ~v∗ )(τ ).
c
Shift-Orthonormality and the Self-Similarity Func.
Let φ be energy limited. The shift-orthonormality condition
Z ∞
φ(t − `Ts ) φ∗ (t − `0 Ts ) dt = I{` = `0 }, `, `0 ∈ Z
−∞
is equivalent to the condition
Rφφ (`Ts ) = I{` = 0}, ` ∈ Z.
Proof: Recall
Z ∞
Rvv : τ 7→ v(t + τ ) v ∗ (t) dt, τ ∈ R.
−∞
Z ∞ Z ∞
φ(t − `Ts ) φ∗ (t − `0 Ts ) dt = φ s + (`0 − `)Ts φ∗ (s) ds
−∞ | {z } −∞
s

= Rφφ (`0 − `)Ts .
c
Nyquist Pulse
We say that v : R 7→ C is a Nyquist Pulse of parameter Ts if
v(`Ts ) = I{` = 0}, ` ∈ Z.
• Can we characterize Nyquist pulses in the

frequency domain?
• Under what conditions on φ̂ is Rφφ a
Nyquist Pulse of parameter Ts ?
Harry Nyquist
(1889–1976)
c
Let Ts > 0 be given, and assume v = ǧ for some integrable
g : f 7→ g(f ). Then v is a Nyquist Pulse of parameter Ts iff
Z 1/(2Ts )
j
J
X

lim Ts − g f+ df = 0.
J→∞ −1/(2Ts ) Ts
j=−J
“Equivalently” (ignoring pointwise convergence issues),

∞
X j 1 1
g f+ = Ts , − ≤f ≤ ,
Ts 2Ts 2Ts
j=−∞
which is equivalent (by periodicity) to

∞
X j
g f+ = Ts , f ∈ R.
Ts
j=−∞
c
g(f )
Ts
f
− 2T1 s 1
2Ts
� 1�
g f+
Ts
Ts
f
− T1s − 2T1 s
� 1�
g f−
Ts
Ts
f
1 1
2Ts Ts
∞
� � j �
g f+ = Ts
j=−∞
Ts
f
− T2s − T1s 1
Ts
2
Ts
c
Proof of Nyquist’s Criterion (1)
We’ll show that v(−`Ts ) is the `-th Fourier Series Coefficient of
1 X j
∞
1 1
f 7→ √ g f+ , − ≤f ≤ .
Ts j=−∞ Ts 2Ts 2Ts
• A functions is (indistinguishable from) a constant iff all but its

zeroth Fourier Series coefficients are zero.
• The zeroth Fourier Series coefficient of the constant c with
√
respect to the interval [−1/(2Ts ), +1/(2Ts )] is c/ Ts .
c
Proof of Nyquist’s Criterion (2)
Z ∞
v(−`Ts ) = g(f ) e−i2πf `Ts df
−∞
∞
X Z j 1
+ 2T
Ts s
= g(f ) e−i2πf `Ts df
j 1
j=−∞ Ts
− 2T
s
∞ Z j −i2π f˜+ Tjs `Ts ˜
1
X 2Ts

= g f˜ + e df
1 Ts
j=−∞ − 2Ts
∞ Z 1
X 2Ts j −i2πf˜`Ts ˜
= g f˜ + e df
1 Ts
j=−∞ − 2Ts
Z 1 X j −i2πf˜`Ts ˜
∞
2Ts
= g f˜ + e df
1
− 2T Ts
s j=−∞
Z 1
1 X ˜ j p −i2πf˜`Ts ˜
∞
2Ts
= √ g f+ Ts e df .
− 1 Ts Ts
2Ts j=−∞
c
Characterization of Shift-Orthonormal Pulses
Let φ : R 7→ C be energy-limited and Ts > 0. The condition
Z ∞
φ(t − `Ts ) φ∗ (t − `0 Ts ) dt = I{` = `0 }, `, `0 ∈ Z
−∞
is equivalent to the condition

∞
X j 2
φ̂ f + ≡ Ts .
Ts
j=−∞
Proof: The shift-orthonormality is equivalent to
Rφφ (`Ts ) = I{` = 0}, ` ∈ Z.
And Z ∞
Rφφ (τ ) = |φ̂(f )|2 ei2πf τ df, τ ∈ R.
−∞
c
Minimum Bandwidth of Shift-Orthonormal Pulses
Let Ts > 0 be fixed, and let φ be an energy-limited signal that is
bandlimited to W Hz. If the time shifts of φ by integer multiples
of Ts are orthonormal, then
1
W≥ .
2Ts
Equality is achieved if
p n o
φ̂(f ) = Ts I |f | ≤ 1 , f ∈R
2Ts
and, in particular, by the sinc(·) pulse
1 t
φ(t) = √ sinc , t∈R
Ts Ts
or any time-shift thereof.

c
194 Nyquist’s Criterion
φ̂(f )
f
−W W
� �
�φ̂(f )�2
f
−W W
� � ��2
�φ̂ f − 1 �
Ts
f
1 1
Ts
−W Ts
� � ��2
�φ̂ f + 1 �
Ts
f
− T1s − T1s + W
� � ��
1 �2
� �2 � � ��2
�φ̂ f + + �φ̂(f )� + �φ̂ f − 1 �
Ts Ts
f
− 2T1 s −W W 1
2Ts
c
Lecture 5, Amos Lapidoth 2017 � � ��2
Excess Bandwidth
The excess bandwidth in percent of a signal φ relative to Ts > 0 is

bandwidth of φ
100% −1 .
1/(2Ts )
c
Band-Edge Symmetry
Assume Ts > 0 and φ a real energy-limited signal that is
bandlimited to W Hz, where W < 1/Ts so φ is of excess
bandwidth smaller than 100%. Then φ is shift orthonormal iff
1 2 1 2 1

φ̂ − f + φ̂ + f ≡ Ts , 0 < f ≤ .
2Ts 2Ts 2Ts

φ̂(f )2
2
φ̂ f 0 + 1 − Ts
2Ts 2
Ts
Ts
2 f0
f
1 1
c
Lecture 5, Amos Lapidoth 2017 2Ts Ts
g(f )
Ts
f
− 2T1 s 1
2Ts
� 1�
g f+
Ts
Ts
f
− T1s − 2T1 s
� 1�
g f−
Ts
Ts
f
1 1
2Ts Ts
∞
� � j �
g f+ = Ts
j=−∞
Ts
f
− T2s − T1s 1
Ts
2
Ts
c
Sketch of Proof (1)

φ̂(f )2
2
φ̂ f 0 + 1 − Ts
2Ts 2
Ts
Ts
2 f0
f
1 1
2Ts Ts
c
Sketch of Proof (2)
• φ real ⇒ |φ̂(f )| symmetric ⇒ it suffices to consider f ≥ 0.
• Excess bandwidth < 100% ⇒ only two terms contribute to
the sum (for f > 0):
1
φ̂(f )2 + φ̂(f − 1/Ts )2 ≡ Ts , 0≤f < .
2Ts
Substituting f 0 , 1
2Ts − f leads to the condition
1 2 1 2 1

φ̂ − f 0 + φ̂ −f 0 − ≡ Ts , 0 < f0 ≤ ,
2Ts 2Ts 2Ts
which, in view of the symmetry of |φ̂(·)|, is equivalent to

1 2 1 2 1

φ̂ − f 0 + φ̂ f 0 + ≡ Ts , 0 < f0 ≤ .
2Ts 2Ts 2Ts
c
Raised-Cosine (1)

φ̂(f )2
Ts
f
1−β 1 1+β
2Ts 2Ts 2Ts

 1−β
Ts
 if 0 ≤ |f | ≤ 2Ts ,
1−β 1−β
|φ̂(f )|2 = 2
Ts
1 + cos πT
β
s
(|f | − 2Ts ) if 2Ts < |f | ≤ 1+β2Ts ,


0 if |f | > 1+β
2Ts ,
c
Raised-Cosine (2)
√
 1−β

 Ts if 0 ≤ |f | ≤ 2Ts ,
q r
Ts πTs 1−β 1−β 1+β
φ̂(f ) = 1 + cos β (|f | − 2Ts ) if < |f | ≤ 2Ts ,


2 2Ts

0 1+β
if |f | > 2Ts ,
sin ((1−β)π Tt )
cos (1 + β)π Tts + 4β Tt
s
4β
φ(t) = √ s
, t ∈ R.
π Ts 1 − (4β Tts )2
c
φ(t)
Rφφ (τ )
τ
−2Ts −Ts Ts 2Ts
c
A refresher on discrete-time stochastic processes.
Read Chapter 13!
c
Stationary Processes

A discrete-time SP Xν is stationary or strict sense stationary or
strongly stationary if for every n ∈ N and all integers η, η 0
L
Xη , . . . Xη+n−1 = Xη0 , . . . Xη0 +n−1 .

L
Xν , ν ∈ Z stationary =⇒ Xν = X1 , ν ∈ Z .

Xν , ν ∈ Z stationary

L
=⇒ (Xν , Xν 0 ) = (Xη+ν , Xη+ν 0 ), ν, ν 0 , η ∈ Z .
c
Wide-Sense Stationary Processes

We say that a discrete-time SP Xν , ν ∈ Z is wide-sense
stationary (WSS) if the following three conditions are satisfied:
Var[Xν ] < ∞, ν ∈ Z. (12a)
E[Xν ] = E[X1 ] , ν ∈ Z. (12b)

E[Xν 0 Xν ] = E Xη+ν 0 Xη+ν , ν, ν 0 , η ∈ Z. (12c)
Choosing ν = ν 0 we obtain

Xν , ν ∈ Z WSS =⇒ Var[Xν ] = Var[X1 ] , ν∈Z .
c
Stationarity and Wide-Sense Stationarity
Every finite-variance discrete-time stationary SP is WSS.

L
Xν , ν ∈ Z stationary =⇒ Xν = X1 , ν ∈ Z ,
so

Xν , ν ∈ Z stationary =⇒ E Xν = E X1 , ν ∈ Z .


L
=⇒ (Xν , Xν 0 ) = (Xη+ν , Xη+ν 0 ), ν, ν 0 , η ∈ Z ,
so


=⇒ E Xν , Xν 0 = E Xη+ν , Xη+ν 0 , ν, ν 0 , η ∈ Z ,
c
The Autocovariance Function
The autocovariance
function KXX : Z → R of a WSS discrete-time
SP Xν is defined by
KXX (η) , Cov[Xν+η , Xν ] , η ∈ Z.
Key Properties:
KXX (η) = KXX (−η), η ∈ Z.
n X
X n
αν αν 0 KXX (ν − ν 0 ) ≥ 0, α1 , . . . , αn ∈ R.
ν=1 ν 0 =1
c
Proof of Key Properties
KXX (η) = Cov[Xν+η , Xν ]

= Cov[Xν̃ , Xν̃−η ]
= Cov[Xν̃−η , Xν̃ ]
= KXX (−η), η ∈ Z,
n X
X n n X
X n
0
αν α KXX (ν − ν ) =
ν0 αν αν 0 Cov[Xν , Xν 0 ]
ν=1 ν 0 =1 ν=1 ν 0 =1
X
n n
X
= Cov αν Xν , αν 0 Xν 0
ν=1 ν 0 =1
Xn
= Var αν Xν
ν=1
≥ 0.
c
The Power Spectral Density Function

We say that the discrete-time WSS SP Xν is of
power spectral density SXX if SXX : [−1/2, 1/2) → R is
nonnegative, symmetric, integrable, and
Z 1/2
KXX (η) = SXX (θ) ei2πηθ dθ, η ∈ Z. (13)
−1/2
• Two PSDs of the same SP must be indistinguishable.

• (13) implies that SXX (θ) is negative on a set of Lebesgue
measure zero and that it is indistinguishable from a symmetric
function.
c
PSD when KXX Is Absolutely Summable
If
∞
X
KXX (η) < ∞,
η=−∞
then the function

∞
X
S(θ) = KXX (η) e−i2πηθ , θ ∈ [−1/2, 1/2]
η=−∞
is continuous, symmetric, nonnegative, and satisfies

Z 1/2
S(θ) ei2πηθ dθ = KXX (η), η ∈ Z.
−1/2
Consequently, S(·) is a PSD for KXX .
c
Next Week
Energy and Power in PAM (Chapter 14).
Thank you!
c
Amos Lapidoth
ETH Zurich
March 28, 2017
Energy and Power in PAM
c
Today
• Energy in PAM.
• Defining power in PAM.
• Zero-mean signals for additive noise channels.
• The power when:
• The symbols form a centered WSS discrete-time SP.
• Bi-infinite block-mode.
• The pulse shape is shift orthonormal.
c
Pulse Amplitude Modulation
Mapping the bits to symbols,
ϕ : {0, 1}k → Rn
(d1 , . . . , dk ) 7→ (x1 , . . . , xn ),
and mapping the symbols to waveform

n
X
X(t) = A X` g(t − `Ts ), t ∈ R.
`=1
• k—number of data bits sent over the system’s lifetime.

• ϕ(·) is one-to-one (injective).
c
Block-Mode Mapping of Bits to Real Numbers
D1 , D2 , . . . , DK , DK+1 , . . . , D2K , , Dk−K+1 , . . . , Dk

X1 , X2 , ... , XN , XN+1 , ... , X2N , , Xn−N+1 , ... , Xn
enc(D1 , . . . , DK ) enc(DK+1 , . . . , D2K ) enc(Dk−K+1 , . . . , Dk )
enc : {0, 1}K → RN is a (K, N) binary-to-reals block encoder of rate

K bit
.
N real symbol
Always assumed one-to-one.

c
Zero Padding
D1 , D2 , . . . , DK , DK+1 , . . . , D2K , , Dk0 −K+1 , . . . , Dk , 0, . . . , 0

X1 , X2 , ... , XN ,XN+1 , ... , X2N , , Xn0 −N+1 , ... , Xn0
enc(D1 , . . . , DK ) enc(DK+1 , . . . , D2K ) enc(Dk−K+1 , . . . , Dk , 0, . . . , 0)
Apply the (K, N) encoder in block-mode to

D1 , . . . , Dk , 0, . . . , 0
| {z }
k0 − k zeros
where
k
k0 = K.
K
c
Energy in Transmitting a Single Block (1)
K IID random data bits D1 , . . . , DK are mapped by

enc : {0, 1}K → RN to N real numbers (X1 , . . . , XN ), and
N
X
X(t) = A X` g(t − `Ts ), t ∈ R.
`=1
The energy, Z ∞
ω 7→ X 2 (ω, t) dt
−∞
is a RV whose expectation—expected energy—is
Z ∞
E,E X 2 (t) dt .
−∞
c
Z ∞
E=E X 2 (t) dt
−∞
"Z #
∞X
N 2
= A2 E X` g(t − `Ts ) dt
−∞`=1
"Z #
∞XN X
N
2 0
=A E X` g(t − `Ts ) X`0 g(t − ` Ts ) dt
−∞ `=1 `0 =1
"Z N X
N
#
∞ X
= A2 E X` X`0 g(t − `Ts ) g(t − `0 Ts ) dt
−∞ `=1 `0 =1
Z N X
∞ X N
=A 2
E[X` X`0 ] g(t − `Ts ) g(t − `0 Ts ) dt
−∞ `=1 `0 =1
N
XX N Z ∞
=A 2
E[X` X ] `0 g(t − `Ts ) g(t − `0 Ts ) dt
`=1 `0 =1 −∞
XN X N

= A2 E[X` X`0 ] Rgg (` − `0 )Ts .
`=1 `0 =1
c
• Since Z ∞
Rgg (τ ) = ĝ(f )2 ei2πf τ df, τ ∈ R,
−∞
we can also express the energy as
Z N X
X N
∞
0 2
E=A 2
E[X` X`0 ] ei2πf (`−` )Ts ĝ(f ) df.
−∞ `=1 `0 =1
• The energy per bit is

h energy i E
Eb ,
bit K
• The energy per real symbol is

energy E
Es ,
real symbol N
c
The expression
N X
X N

E = A2 E[X` X`0 ] Rgg (` − `0 )Ts
`=1 `0 =1
sometimes simplifies:
N
X N−1
E = A2 kgk22 E X`2 , t 7→ g(t − `Ts ) `=0 orthogonal .
`=1
N
X
E=A 2
kgk22 E X`2 , X` , ` ∈ Z zero-mean & uncorrelated .
`=1
c
Defining Power

The power P in the SP X(t), t ∈ R is
Z T
1
P , lim E 2
X (t) dt .
T→∞ 2T −T
c
Gordian Knot
Over its lifetime, the system will transmit finite energy!
So with this definition, the power is zero.
c
The Alexandrian Solution
We “pretend” to send infinitely many symbols

∞
X
X(t) = A X` g(t − `Ts ), t ∈ R.
`=−∞
But new questions arise:

• Does this converge?
• How are the infinitely-many symbols generated?
c
Convergence
We shall assume
• Bounded Symbols:

X` ≤ γ, ` ∈ Z.
• The pulse shape decays faster than 1/t:
β
|g(t)| ≤ , t∈R
1 + |t/Ts |1+α
for some
α, β > 0.
c
Generating Infinitely Many Symbols

• Just assume Xν for a WSS SP.
• Bi-Infinite Block Encoding.
• Shift-orthonormal pulse shape.
c
Bi-Infinite Block Encoding
D−K+1 , . . . , D0 , D1 , . . . , DK , DK+1 , · · · , D2K

, X−N+1 , . . . , X0 , X1 , ... , XN ,XN+1 , · · · , X2N ,
enc(D−K+1 , . . . , D0 ) enc(D1 , . . . , DK ) enc(DK+1 , . . . , D2K )
c
Zero-Mean Signals for Additive-Noise Channels
{Dj } X Y =X+N {Djest }

TX1 + RX1
N
TX2 RX2
{Dj } X X−c Y =X−c+N X+N {Djest }
TX1 + + + RX1
−c c
How should we choose c(·)?

c
Subtracting the Mean (1)

E (W − c)2 ≥ Var[W ] , c∈R
with equality iff
c = E[W ] .

E (W − c)2
h 2 i
= E (W − E[W ]) + (E[W ] − c)

= E (W − E[W ])2 + 2 E W − E[W ] E[W ] − c + (E[W ] − c)2
| {z }
0

= E (W − E[W ])2 + (E[W ] − c)2

≥ E (W − E[W ])2
= Var[W ] ,
with equality iff c = E[W ]. (Huygens-Steiner)

c
To minimize Z
1 T h 2 i
E X(t) − c(t) dt,
2T −T
we minimize the integrand, i.e., we choose c(t) to minimize

h 2 i
E X(t) − c(t) ,
and thus choose

c(t) = E X(t) , t ∈ R.
The transmitted signal X − c is then centered!
c

The Power when X` Is Zero-Mean and WSS
We ignore how the symbols were generated and assume
E[X` ] = 0, ` ∈ Z,
E[X` X`+m ] = KXX (m) , `, m ∈ Z.

The former guarantees a zero-mean transmitted waveform:
X∞

E X(t) = E A X` g(t − `Ts )
`=−∞
∞
X
=A E X` g(t − `Ts )
`=−∞
= 0, t ∈ R.
c
Z τ +Ts Z τ +Ts X
∞ 2
2 2
E X (t) dt = A E X` g(t − `Ts ) dt
τ τ `=−∞
Z τ +Ts X
∞ ∞
X
2 0
=A E X` X`0 g(t − `Ts ) g(t − ` Ts ) dt
τ `=−∞ `0 =−∞
Z τ +Ts ∞
X X∞
= A2 E[X` X`0 ] g(t − `Ts ) g(t − `0 Ts ) dt
τ `=−∞ `0 =−∞
Z X∞ X∞
τ +Ts
= A2 E[X` X`+m ] g(t − `Ts ) g t − (` + m)Ts dt
τ `=−∞ m=−∞
Z X∞ ∞
X
τ +Ts
= A2 KXX (m) g(t − `Ts ) g t − (` + m)Ts dt
τ m=−∞ `=−∞
∞
X ∞ Z
X τ +Ts −`Ts
= A2 KXX (m) g(t0 ) g(t0 − mTs ) dt0
m=−∞ `=−∞ τ −`Ts
∞
X Z ∞
= A2 KXX (m) g(t ) g(t − mTs ) dt0 .
0 0
m=−∞ | −∞ {z }
Rgg (mTs )
c

Since [−T, +T) contains b(2T)/Ts c disjoint intervals of the form

[τ, τ + Ts ), and since it is contained in the union of d(2T)/Ts e such
intervals,
Z τ +Ts Z T Z τ +Ts
2T 2 2 2T 2
E X (t) dt ≤ E X (t) dt ≤ E X (t) dt .
Ts τ −T Ts τ
We now divide 2T and study the limit.
c
The Sandwich Theorem
Salami Sandwich John Montagu, 4th Earl of Sandwich
If the sequence {an } is sandwiched between {bn } and {cn }
bn ≤ an ≤ cn , n = 1, 2, 3, . . . ,
and if {bn } and {cn } converge to the same limit,

then {an } also converge and to that same limit.
(a.k.a. Two Carabinieri Theorem.)
c
First Application of the Sandwich Theorem
Using
ξ − 1 < bξc ≤ ξ, ξ∈R
we obtain
12T
1 1 2T 12T

− < ≤ .
2T
Ts 2T 2T Ts 2T
Ts
|
{z } | {z }

→1/Ts →1/Ts
Consequently,

1 2T 1
lim = , Ts > 0.
T→∞ 2T Ts Ts
c
Second Application of the Sandwich Theorem
Using
ξ ≤ dξe < ξ + 1, ξ∈R
we obtain
1 2T
1 2T 1 2T
1
≤ < + .
2T
Ts
2T Ts 2T
Ts
2T
Consequently,

1 2T 1
lim = , Ts > 0.
T→∞ 2T Ts Ts
c
Third Application of the Sandwich Theorem
Z τ +Ts Z T Z τ +Ts
2T 2 2 2T 2
E X (t) dt ≤ E X (t) dt ≤ E X (t) dt .
Ts τ −T Ts τ
Dividing by 2T
Z τ +Ts
1 2T 2
E X (t) dt
2T Ts τ
| {z }
→1/Ts
Z T
1 2
≤ E X (t) dt ≤
2T −T
Z τ +Ts
1 2T 2
E X (t) dt .
2T Ts τ
| {z }
→1/Ts
Hence, by the Sandwich Theorem

Z T Z τ +Ts
1 2 1 2
lim E X (t) dt = E X (t) dt .
T→∞ 2T −T Ts τ
c

From
Z T Z τ +Ts
1 2 1 2
lim E X (t) dt = E X (t) dt ,
and
Z τ +Ts ∞
X
2 2
E X (t) dt = A KXX (m) Rgg (mTs ),
τ m=−∞
we conclude
∞
1 2 X
P= A KXX (m) Rgg (mTs ).
Ts m=−∞
c
Special Cases and Different Forms
Since
∞
1 2 X
P= A KXX (m) Rgg (mTs ),
Ts m=−∞
1 2
P= A kgk22 σX
2
, 2
X` centered, variance σX , uncorrelated .
Ts
Also,
∞ Z ∞
1 2 X
P= A KXX (m) |ĝ(f )|2 ei2πf mTs df
Ts m=−∞ | −∞ {z }
Rgg (mTs )
2 Z ∞ ∞
X
A
= KXX (m) ei2πf mTs |ĝ(f )|2 df.
Ts −∞ m=−∞
c
The Power in Bi-Infinite Block-Mode (1)
D−K+1 , . . . , D0 , D1 , . . . , DK , DK+1 , · · · , D2K

, X−N+1 , . . . , X0 , X1 , ... , XN ,XN+1 , · · · , X2N ,
enc(D−K+1 , . . . , D0 ) enc(D1 , . . . , DK ) enc(DK+1 , . . . , D2K )

Dν , DνK+1 , . . . , DνK+K , ν ∈ Z.
Xν , enc(Dν ), ν ∈ Z.

XνN+1 , . . . , XνN+N = Xν , ν ∈ Z.
c
• If the data bits are IID random bits, and

• if enc(D) is zero mean whenever D comprises IID random bits
then
"Z #
∞ N
X 2
1
P= E A X` g(t − `Ts ) dt .
NTs −∞ `=1
Thus,
Es
P= .
Ts
Stop and smell the roses.
c
Z ∞
E=E X 2 (t) dt
−∞
"Z 2 #
∞X
N
= A2 E X` g(t − `Ts ) dt
−∞`=1
"Z #
∞XN X
N
2 0
=A E X` g(t − `Ts ) X`0 g(t − ` Ts ) dt
−∞ `=1 `0 =1
"Z N X N
#
∞ X
= A2 E X` X`0 g(t − `Ts ) g(t − `0 Ts ) dt
−∞ `=1 `0 =1
Z N X
∞ X N
=A 2
E[X` X`0 ] g(t − `Ts ) g(t − `0 Ts ) dt
−∞ `=1 `0 =1
N
XX N Z ∞
=A 2
E[X` X ] `0 g(t − `Ts ) g(t − `0 Ts ) dt
`=1 `0 =1 −∞
XN X N

= A2 E[X` X`0 ] Rgg (` − `0 )Ts ,
`=1 `0 =1
c
N X
X N

E = A2 E[X` X`0 ] Rgg (` − `0 )Ts ,
`=1 `0 =1
so
N N
A2 X X
P= E[X` X`0 ] Rgg (` − `0 )Ts ,
NTs 0 `=1 ` =1
or
Z N X
X N
A2 ∞
0 2
P= E[X` X`0 ] ei2πf (`−` )Ts ĝ(f ) df.
NTs −∞ `=1 `0 =1
c
To derive the power we express X(·) as
∞
X
X(t) = A X` g(t − `Ts )
`=−∞
∞ X
X N

=A XνN+η g t − (νN + η)Ts
ν=−∞ η=1
X∞

=A u Xν , t − νNTs , t ∈ R,
ν=−∞
where
N
X
u : (x1 , . . . , xN , t) 7→ xη g(t − ηTs ).
η=1
c
• Because the law of Dν does not depend on ν, neither does
the law of Xν (= enc(Dν )):
L
Xν = Xν 0 , ν, ν 0 ∈ Z.
• The assumption that enc(D) is of zero mean whenever

D ∼ U {0, 1}K implies

E u Xν , t = 0, ν ∈ Z, t ∈ R .

• Since the data bits are IID, so are Dν , ν ∈ Z , and hence

Xν , ν ∈ Z are also IID. Since the independence
of Xν and
Xν 0 implies the independence of u Xν , t and u Xν 0 t0 ,

E u Xν , t u Xν 0 , t0 = 0, t, t0 ∈ R, ν 6= ν 0 , ν, ν 0 ∈ Z .
c
Z Z " ∞ 2 #
τ +NTs τ +NTs X
E X 2 (t) dt = E A u Xν , t − νNTs dt
τ τ ν=−∞
Z h
τ +NTs ∞
X ∞
X i
= A2 E u Xν , t − νNTs u Xν 0 , t − ν 0 NTs dt
τ ν=−∞ ν 0 =−∞
Z τ +NTs X∞
2
= A2 E u Xν , t − νNTs dt
τ ν=−∞
Z X∞
τ +NTs
= A2 E u2 X0 , t − νNTs dt
τ ν=−∞
∞ Z
X τ −(ν−1)NTs
= A2 E u2 X0 , t0 dt0
ν=−∞ τ −νNTs
Z ∞
= A2 E u2 X0 , t0 dt0
"Z−∞ N 2 #
∞ X
=E A X` g(t0 − `Ts ) dt0 , τ ∈ R.
−∞ `=1
c
There are b2T/(NTs )c disjoint length-NTs half-open intervals
contained in the interval [−T, T); and d2T/(NTs )e such intervals
suffice to cover the interval [−T, T),
"Z ∞ X N 2 #
2T
E A X` g(t − `Ts ) dt
NTs −∞ `=1
Z T
2
≤E X (t) dt ≤
−T
"Z ∞ X N 2 #
2T
E A X` g(t − `Ts ) dt .
NTs −∞ `=1
Dividing by 2T, letting T → ∞, and using the Sandwich Theorem

Z T "Z N 2 #
1 1 ∞ X
lim E X 2 (t) dt = E A X` g(t − `Ts ) dt .
T→∞ 2T −T NTs −∞ `=1
c
Time Shifts of Pulse Shape Are Orthonormal
∞
X
X(t) = A X` φ(t − `Ts ), t ∈ R,
`=−∞
Z ∞
φ(t − `Ts ) φ(t − `0 Ts ) dt = I{` = `0 }, `, `0 ∈ Z.
−∞
Assume
β
φ(t) ≤ , t∈R
1 + |t/Ts |1+α

for some α, β > 0 and that X` ≤ γ, ` ∈ Z. Then
Z T L
X
1 2 A2 1
lim E X (t) dt = lim E X`2
T→∞ 2T −T Ts L→∞ 2L + 1
`=−L
whenever the limit on the RHS exists.

c
Some Comments
The theorem is cool:

1. Except for boundedness, there are no statistical assumptions
on the symbols.
2. Beautifully connects power in continuous-time with power in
discrete-time.
But
1. Does not hold for general pulse shapes.
2. The proof is a pain.
c
Some Intuition (1)
We focus on the case Ts = 1. We need to study
Z T Z ∞ ∞
X 2
2
E X (t) dt = E A X` φ(t − `Ts ) I{|t| ≤ T} dt .
−T −∞ `=−∞
Define
φ` : t 7→ φ(t − `)
and its “windowed version”
φ`,w : t 7→ φ(t − `) I{|t| ≤ T},
so ∞ 2
Z T X
2 2
X (t) dt = A X` φ`,w .
−T
`=−∞ 2
The windowed time-shifts {φ`,w } are not orthogonal. . .
c
Some Intuition (2)
For fixed (large) ν and all T > ν, define
X
X0 = X` φ`,w ,
|`|≤T−ν
X
X1 = X` φ`,w ,
T−ν<|`|≤T+ν
X
X2 = X` φ`,w ,
T+ν<|`|<∞
We seek h i
E kX0 + X1 + X2 k22
• The terms in X0 are “nearly orthogonal” (for ν large).

• Only 4ν (bounded) terms in X1 —many but independent of T.
• Many terms in X2 , but very small (by the decay condition).
c
Some Intuition (3)
t
−T − ν −T −T + ν T−ν T T+ν
c
Some Intuition (4)
Z
1 T 1 h i
E X 2 (t) dt = E kX0 + X1 + X2 k22
2T −T 2T
1 h i
≈ E kX0 k22
2T X 2
1 2

= A E X` φ`,w

2T 2
|`|≤T−ν
X 2
1
≈ A2 E
X` φ`

2T 2
|`|≤T−ν
T−ν
X
1 2
= A E X`2
2T
`=−(T−ν)
T−ν
X
2(T − ν) + 1 1
= A2 E X`2 .
| 2T
{z } 2(T − ν) + 1 `=−(T−ν)
→1
c
Recap (1)
• Energy in transmitting a single block:
N X
X N

E=A 2
E[X` X`0 ] Rgg (` − `0 )Ts
`=1 `0 =1
h energy iE
Eb ,
bit K

energy E
Es , .
real symbol N
• Issues related to defining power as

Z T
1
P , lim E 2
X (t) dt .
T→∞ 2T −T
c
Recap (2)
• Zero-mean signals for additive noise channels.

• The Sandwich Theorem.

• The power when X` is zero-mean and WSS
∞
1 2 X
P= A KXX (m) Rgg (mTs )
Ts m=−∞
2 Z ∞ X∞
A
Ts −∞ m=−∞
c
Recap (3)
• The power in bi-infinite block-mode

"Z N 2 #
1 ∞ X
P= E A X` g(t − `Ts ) dt
NTs −∞ `=1
Es
= .
Ts
• The power when the pulse shape is shift-orthonormal:

Z T L
X
1 A2 1
lim E X 2 (t) dt = lim E X`2 .
T→∞ 2T −T Ts L→∞ 2L + 1
`=−L
c
Next Week
Operational Power Spectral Density (Chapter 15).
Thank you!
c
Amos Lapidoth
ETH Zurich
April 4, 2017
The Operational Power Spectral Density
c
Today
• Defining the Operational Power Spectral Density.

• Computing the OPSD for PAM signals.
• The bandwidth of a SP.
• The bandwidth of PAM.
c
What Are the Issues?
• Traditionally defined only for WSS SPs.

• PAM signals are typically not WSS.
• We would like a general definition.
• The result should be useful.
c
Two Approaches to Definitions
1. How is the quantity computed?

• The Fourier Transform of x is
Z ∞
x̂(f ) = x(t) e−i2πf t dt, f ∈ R.
−∞
• The derivative of y(·) at ξ is
y(ξ + h) − y(ξ)
lim .
h→0 h
2. How is the quantity used:
• A map’s coloring number is the minimum number of colors
that suffice to color the countries under the restriction that no
two countries sharing a border have the same color.
c
The Preservation-of-Sweat Law
1. If you give an explict “formula” for the quantity

=⇒ must work to explain why it is useful.
2. If you define a quantity by how it is used
=⇒ must work to show how to compute it.
c
An Example: Charge Density
%(x, y, z) is

Charge in Box (x0 , y 0 , z 0 ) : |x − x0 | ≤ ∆ 0 ∆ 0
2 , |y − y | ≤ 2 , |z − z | ≤ 2
∆
lim
∆↓0 Volume of Box (x0 , y 0 , z 0 ) : |x − x0 | ≤ ∆ , |y − y 0 | ≤ ∆ , |z − z 0 | ≤ ∆
2 2 2
i.e,

Charge in box (x0 , y 0 , z 0 ) : |x − x0 | ≤ ∆
2 , |y − y0| ≤ ∆
2 , |z − z0| ≤ ∆
2
lim .
∆↓0 ∆3
%(·) is the charge density if for every region D ⊂ R3

Z
Charge in D = %(x, y, z) dx dy dz, D ⊂ R3 .
(x,y,z)∈D
c
Pros and Cons of the Second Approach
• The motivation comes first.

• No need for a general formula for the quantity.
• Does such a quantity exist?

• Is it unique?
• If not, does it matter? Can we be more explicit?
c
The Definition of Charge Density Revisited
%(·) is the charge density if for every region D ⊂ R3
Z
Charge in D = %(x, y, z) dx dy dz, D ⊂ R3 .
(x,y,z)∈D
• Does such a function exist? Not if there are point charges. . .

• Is it unique? Two such functions can differ on a null set.
• Is such a function necessarily nonnegative? No, but if a
function like this exists, then so does one that is nonnegative.
• So let us add nonnegativity to the definition.
This is the approach we’ll adopt.
c
246 Some Etymology
Operational Power Spectral Density
function quantity of interest per unit of

charge (spatial) density charge space
mass (spatial) density mass space
mass line density mass length
probability (per unit of X) density probability unit of X
power spectral density power spectrum (Hz)
Table 15.1:like
This suggest something Various densities and their units
Z
charge need Power
not be uniformly distributed,
of X in D = �(·)
SXXis typically
(f ) df, notD ⊂ R, so the charge
constant
density is a function of location. Thus, we usually write �(x, y, z) for the charge
f ∈D
density at the location (x, y, z). This can be defined differentially or integrally.
Z
The differential definition is
Power of X in D = I{f ∈ D} SXX (f ) df, D ⊂ R.
�(x, y, z)
�all frequencies �
Charge in Box (x� , y � , z � ) : |x − x� | ≤ ∆ � ∆ � ∆
2 , |y − y | ≤ 2 , |z − z | ≤ 2 �
= lim �
But what does this
∆↓0 Volume mean?
of Box (x� , y � , z � ) : |x − x� | ≤ ∆ � ∆ � ∆
2 , |y − y | ≤ 2 , |z − z | ≤ 2
� � � � � ∆ � ∆ � ∆
�
Charge in box (x , y , z ) : |x − x | ≤ 2 , |y − y | ≤ 2 , |z − z | ≤ 2
= lim ,
c
Amos
Lecture 7, ∆↓0 Lapidoth 2017 ∆3
Some Hand-waving
Imagine a filter of frequency response
ĥ(f ) = I{f ∈ D},
and think of the power of X(·) in the frequencies D as the

the average power of X ? h.
We extend the requirement to more general filters, but “nice”:

Z
Power of X ? h = |ĥ(f )|2 SXX (f ) df, h “nice.”
all frequencies
c
Uniqueness
For real filters:

Z ∞
Power of X ? h = |ĥ(f )|2 SXX (f ) df, h real and “nice.”
Z−∞
∞
= |ĥ(f )|2 SXX (f ) + |ĥ(−f )|2 SXX (−f ) df
Z0 ∞
= |ĥ(f )|2 SXX (f ) + SXX (−f ) df.
0
Thus, if SXX satisfies the requirement and
S̃(f ) + S̃(−f ) = SXX (f ) + SXX (−f ), f ∈ R,
then S̃(·) also satisfies the requirement. No uniqueness!
c
Insisting on Symmetry
Suppose we have found some S(·) satisfying
Z ∞
Power of X ? h = |ĥ(f )|2 S(f ) df, h real and “nice.”
−∞
Define
1
S̃(f ) = S(f ) + S(−f ) , f ∈ R.
2
Then
S̃(f ) + S̃(−f ) = S(f ) + S(−f ), f ∈ R,
so
Z ∞
Power of X ? h = |ĥ(f )|2 S̃(f ) df, h real and “nice”
−∞
and
S̃(·) is symmetric.
c
The Definition of the Operational PSD

The continuous-time real SP X(t) is of operational power
spectral density SXX if it is a measurable SP; SXX : R → R is
integrable and symmetric; and for every stable real filter of impulse
response h ∈ L1
Z ∞
Power in X ? h = SXX (f ) |ĥ(f )|2 df.
−∞
c
Uniqueness

0 (·) are operational PSDs for X(t) , then the
If both SXX and SXX
set of frequencies at which they differ is of Lebesgue measure zero.
(Corollary 15.3.6)
c
Nonnegativity

If X(t) is of operational PSD SXX , then SXX must be
nonnegative except possibly on a set of frequencies of Lebesgue
measure zero.
(Corollary 15.3.3)
c
Filtering PAM Signals
Passing a PAM signal of pulse shape g through a stable filter of

impulse response h is tantamount to changing its pulse shape from
g to g ? h:
X X
σ 7→ A X` g(σ−`Ts ) ?h (t) = A X` (g?h)(t−`Ts ), t ∈ R.
` `
If you know how to compute the power in PAM, then you also
know how to compute the power in a filtered PAM.
c
Filtering a PAM Signal—Proof
• Convolution is linear
(αg1 + βg2 ) ? h = α(g1 ? h) + β(g2 ? h).
• It commutes with the shift
(time-shifted g) ? h = time-shift of (g ? h).
∞
X

X ? h (t) = σ 7→ A X` g(σ − `Ts ) ? h (t)
`=−∞
∞
X Z ∞
=A X` h(s) g(t − s − `Ts ) ds
`=−∞ −∞
X∞
=A X` (g ? h)(t − `Ts ), t ∈ R.
`=−∞
c

X` Is Centered and WSS
∞
1 2 X
P= A KXX (m) Rgg (mTs )
Ts m=−∞
Z ∞
A2 ∞ X
Ts −∞ m=−∞
For the power in X ? h we replace g with g ? h and use
g[
? h(f ) = ĝ(f ) ĥ(f ), f ∈R:
Z ∞ ∞
A2 X i2πf mTs
Power in X ? h = KXX (m) e |ĝ(f )| |ĥ(f )|2 df.
2
−∞ Ts m=−∞
| {z }
SXX (f )
But we must verify symmetry!

c
Verifying Symmetry
• |ĝ(−f )|2
= |ĝ(f )|2 (g is real).
• KXX (−m) = KXX (m) (autocovariance of a real DT SP).
∞
X
KXX (m) ei2π(−f )mTs |ĝ(−f )|2
m=−∞
∞
X
= KXX (m) ei2π(−f )mTs |ĝ(f )|2
m=−∞
X∞
0
= KXX −m0 ei2π(−f )(−m )Ts |ĝ(f )|2
m0 =−∞
X ∞
0
= KXX m0 ei2πf m Ts |ĝ(f )|2
m0 =−∞
X ∞
= KXX (m) ei2πf mTs |ĝ(f )|2 .
m=−∞
c
Bi-Infinite Block Mode
N N
A2 X X
P= E[X` X`0 ] Rgg (` − `0 )Ts
NTs 0 `=1 ` =1
Z∞ X N X
N
A 2
0 2
= E[X` X`0 ] ei2πf (`−` )Ts ĝ(f ) df.
NTs −∞ `=1 `0 =1
For the power in X ? h we replace g with g ? h and use
g[
? h(f ) = ĝ(f ) ĥ(f ), f ∈R:
Z ∞ N N
A2 X X i2πf (`−`0 )Ts
Power in X ? h = E[X` X`0 ] e |ĝ(f )| |ĥ(f )|2 df.
2
−∞ NTs
`=1 `0 =1
| {z }
SXX (f )
But we must still verify symmetry. . .

c
Verifying Symmetry
N N
A2 X X 0
E[X` X`0 ] ei2πf (`−` )Ts |ĝ(f )|2
NTs 0
| {z }
`=1 ` =1 a`,`0
Use
N X
X N N
X N X
X `−1

a`,`0 = a`,` + a`,`0 + a`0 ,`
`=1 `0 =1 `=1 `=2 `0 =1
E[X` X`0 ] = E[X`0 X` ] :

0 0
a`,`0 + a`0 ,` = E[X` X`0 ] ei2πf (`−` )Ts + E[X`0 X` ] ei2πf (` −`)Ts

= 2E[X` X`0 ] cos 2πf (` − `0 )Ts .
N N `−1
!
A2 X 2 X X
E X` + 2 E[X` X`0 ] cos 2πf (` − `0 )Ts |ĝ(f )|2
NTs 0
`=1 `=2 ` =1
which is symmetric in f .
c
Haven’t We Forgotten Something?
What about when the time shifts of the pulse shape by integer
multiples of Ts are orthonormal?
Z T L
X
1 2 A2 1
lim E X (t) dt = lim E X`2 .
T→∞ 2T −T Ts L→∞ 2L + 1
`=−L
This property isn’t preserved under filtering: φ ? h need not have it.
c
The Bandwidth of a SP

We say that a SP X(t) of operational PSD SXX is
bandlimited to W Hz if, except on a set of frequencies of Lebesgue
measure zero, SXX (f ) is zero whenever |f | > W.

The smallest
W to which X(t) is limited is the bandwidth of
X(t) .
c
The Bandwidth of PAM
Assume bi-infinite block-mode and

N
X
A > 0, E X`2 > 0,
`=1
so X is not deterministically zero.
The bandwidth of X is the bandwidth of the pulse shape g.
c
Proof (1)
It cannot be larger because

N N
A2 X X 0
SXX (f ) = E[X` X`0 ] ei2πf (`−` )Ts |ĝ(f )|2
NTs 0
`=1 ` =1
so
ĝ(f ) = 0 =⇒ SXX (f ) = 0 .
c
Proof (2)
N N
A XX 2
0
SXX (f ) = E[X` X`0 ] ei2πf (`−` )Ts |ĝ(f )|2
NTs 0 `=1 ` =1
There could be frequencies where SXX (f ) is zero but ĝ(f ) is not,
i.e., the zeros of m
N X
2 X N z }| {
A 0
σ(f ) , E[X` X`0 ] ei2πf (` − ` )Ts
NTs 0 `=1 ` =1
N−1
X
= γm ei2πf mTs
m=−N+1
N−1
X

= γm z m ,
z=ei2πf Ts
m=−N+1
min{N,N+m}
A2 X
γm = E[X` X`−m ] , m ∈ {−N + 1, . . . , N − 1}.
NTs
`=max{1,m+1}
c
Proof (3)
SXX (f ) is zero while ĝ(f ) is not only if ei2πf Ts is a root of
N−1
X
z 7→ γm z m .
m=−N+1
Since ei2πf Ts is nonzero, we can multiply by z N−1 . Thus, σ(f ) is

zero iff ei2πf Ts is a root of
2N−2
X
z 7→ γν−N+1 z ν .
ν=0
Our assumptions guarantee that γ0 > 0, so the polynomial is

nonzero. Hence, it has at most 2N − 2 distinct roots. Denote
those of unit magnitude
eiθ1 , . . . , eiθd , d ≤ 2N − 2 and θ1 , . . . , θd ∈ [−π, π).

c
Proof (4)
SXX (f ) is zero while ĝ(f ) is not only if ei2πf Ts ∈ {eiθ1 , . . . , eiθd }.

θ η
ei2πf Ts = eiθ ⇐⇒ f= + , η∈Z .
2πTs Ts
Thus, SXX (f ) is zero while ĝ(f ) is not only if f is in the set

θ1 η θd η
+ : η ∈ Z ∪ ··· ∪ + :η∈Z .
2πTs Ts 2πTs Ts
This set is countable, so the bandwidth of X cannot be smaller

than the bandwidth of g.
(If g is bandlimited, then there can be at most a finite number of

frequencies at which SXX is zero and ĝ is not.)
c
Recap (1)

• X(t) is of operational PSD SXX if SXX : R → R is
symmetric and for every stable real filter
Z ∞
Power in X ? h = SXX (f ) |ĥ(f )|2 df.
−∞
• Two such functions must be equal (outside a null set).

• Any such function must be nonnegative (outside a null set).
• Passing a PAM signal of pulse shape g through a stable filter
of impulse response h is tantamount to changing its pulse
shape from g to g ? h.
c
Recap (2)

• If X` is WSS and centered
∞
A2 X
SXX (f ) = KXX (m) ei2πf mTs |ĝ(f )|2 .
Ts m=−∞
• In bi-infinite block-mode with enc(·):
N N
A2 X X 0
SXX (f ) = E[X` X`0 ] ei2πf (`−` )Ts |ĝ(f )|2 .
NTs 0
`=1 ` =1
• No analogous result if we only assume shift-orthonormality.
c
Recap (3)

• A SP X(t) of operational PSD SXX is bandlimited to W Hz
if, except on a set of frequencies of Lebesgue measure zero,
SXX (f ) is zero whenever |f | > W.

• The smallest W to which X(t) is bandlimited is called the

bandwidth of X(t) .
• The bandwidth of a nonzero PAM signal equals the
bandwidth of its pulse shape.
c
Next Week
Quadrature Amplitude Modulation (Chapter 16).
Thank you!
c
Amos Lapidoth
ETH Zurich
April 11, 2017
Quadrature Amplitude Modulation (QAM)
c
Today
• Linear passband communication.

• Quadrature Amplitude Modulation (QAM).
• Bandwidth around fc .
• Orthogonality.
• Spectral efficiency.
• Constellations.
• Symbol recovery in the absence of noise.
• A glimpse at complex random variables.
c
Passband Communication
The transmitted signals must be bandlimited to W Hz around the

carrier frequency fc .
We assume throughout
W
fc > .
2
c
The Good-Old Baseband
The pulse shape √
t 7→ 2W sinc(2Wt)
is of bandwidth W, and its time shifts by integer multiples of
1/(2W) are orthonormal. By using it with PAM we can send
symbols arriving at rate

real symbol
Rs
second
as the coefficients in a linear combination of orthonormal signals
whose bandwidth does not exceed
Rs
[Hz] .
2
For each 1 Hz at baseband we obtain 2 real-dimensions per second:
our spectral efficiency is
[real dimension/sec]
2 .
[baseband Hz]
c
Objective
Transmit real symbols arriving at rate Rs [real symbol/second] as
the coefficients in a linear combination of orthonormal passband
signals occupying a bandwidth of W Hz around the carrier
frequency fc , where W equals Rs /2:
2 .
[passband Hz]
Since real symbols at rate Rs [real symbols/second] can be viewed

as complex symbols at rate Rs /2 [complex symbol/second]
[complex dimension/sec]
1 .
[passband Hz]
And don’t make things too carrier dependent.
c
The PAM Solution—Not Great
Find a pulse shape φ that is bandlimited to W Hz around fc and

that satisfies the Nyquist criterion
∞
X j 2
φ̂ f + ≡ Ts .
Ts
j=−∞
Why is this not so great?

• Can achieve the spectral efficiency only if
4fc Ts is an odd integer.
• The choice of the pulse shape depends on fc .
c
QAM in a Nutshell
The baseband representation of the transmitted signal is PAM with

complex symbols and (possibly) complex pulse shapes.
c
The QAM Signal
• Map the bits to complex symbols
ϕ : {0, 1}k → Cn .
• The rate is
k bit
.
n complex symbol
• The baseband representation of the transmitted signal is
n
X
XBB (t) = A C` g(t − `Ts ), t ∈ R.
`=1
• The transmitted signal is

Xn
XPB (t) = 2 Re A C` g(t − `Ts ) ei2πfc t , t ∈ R.
`=1
c
Alternative Representation
Using
Re(wz) = Re(w) Re(z) − Im(w) Im(z), Im(z) = − Re(iz),
with w = C` :
gI,` (t)
z }| !{
n
X
√ 1
XPB (t) = 2A Re(C` ) 2 Re √ g(t − `Ts ) ei2πfc t
2
`=1 | {z }
gI,`,BB (t)
gQ,` (t)
z }| !{
n
X
√ 1
+ 2A Im(C` ) 2 Re i √ g(t − `Ts ) ei2πfc t , t ∈ R.
2
`=1 | {z }
gQ,`,BB (t)
c
If the Pulse Shape is Real:
n
X
XPB (t) = 2A Re(C` ) g(t − `Ts ) cos(2πfc t)
`=1
n
X
− 2A Im(C` ) g(t − `Ts ) sin(2πfc t), g real.
`=1
When g is real, the QAM signal is the sum of:

• the result of feeding {Re(C` )} to a PAM modulator of pulse
shape g and multiplying the result by cos(2πfc t), and
• the result of feeding {Im(C` )} to a PAM modulator of pulse
shape g and multiplying the result by − sin(2πfc t).
c
QAM Modulator with a Real Pulse Shape
P P
A ` Re(C` )g(t − `Ts ) A ` Re(C` )g(t − `Ts ) cos(2πfc t)
PAM ×
Re(C` )
Re(·)
cos(2πfc t)
xPB (t)/2
{C` } +
90◦
Im(·)
−sin(2πfc t)
Im(C` )
PAM P × P
A ` Im(C` )g(t − `Ts ) −A ` Im(C` )g(t − `Ts ) sin(2πfc t)
c
Bandwidth Considerations
• The bandwidth of a xPB around fc is twice the bandwidth

of xBB .
• The bandwidth of xBB (if nonzero) is the bandwidth of g.
The bandwidth of a QAM signal around the carrier frequency is

twice the bandwidth of its pulse shape.
We multiplied the PAM signal by a carrier.
c
Orthogonality Considerations (1)
If the pulse shape φ satisfies
Z ∞
φ(t − `Ts ) φ∗ (t − `0 Ts ) dt = I{` = `0 }, `, `0 ∈ Z,
−∞
then the QAM signal XPB (·) can be expressed as

n
X n
X
√ √
XPB = 2A Re(C` ) ψI,` + 2A Im(C` ) ψQ,`
`=1 `=1
where
. . . , ψI,−1 , ψQ,−1 , ψI,0 , ψQ,0 , ψI,1 , ψQ,1 , . . .
are orthonormal functions:

1 i2πfc t
ψI,` : t 7→ 2 Re √ φ(t − `Ts ) e , ` ∈ Z,
2

1 i2πfc t
ψQ,` : t 7→ 2 Re i √ φ(t − `Ts ) e , ` ∈ Z.
2
c
ψI,` (t)
n z}| {
√ X 1
XPB (t) = 2A Re(C` ) 2 Re √ φ(t − `Ts ) ei2πfc t
2
`=1 | {z }
ψI,`,BB (t)
ψQ,` (t)
n z }| {
√ X 1
+ 2A Im(C` ) 2 Re i √ φ(t − `Ts ) ei2πfc t , t ∈ R.
2
`=1 | {z }
ψQ,`,BB (t)
Recall (Theorem 7.6.10)

hxPB , yPB i = 2 Re hxBB , yBB i ,
so xPB and yPB are orthogonal iff hxBB , yBB i is purely imaginary.
c

ψI,` , ψI,`0 = 2 Re ψI,`,BB , ψI,`0 ,BB
D 1 1 E
= 2 Re t 7→ √ φ(t − `Ts ), t 7→ √ φ(t − `0 Ts )
0
2 2
= Re I ` = `
= I{` = `0 },

ψQ,` , ψQ,`0 = 2 Re ψQ,`,BB , ψQ,`0 ,BB
D 1 1 E
= 2 Re t 7→ i √ φ(t − `Ts ), t 7→ i √ φ(t − `0 Ts )
2 2
∗ 0
= i i Re I ` = `
= I{` = `0 },

D 1 1 E
ψI,` , ψQ,`0 = 2 Re t 7→ √ φ(t − `Ts ), t 7→ i √ φ(t − `0 Ts )
2
2
∗ 0
= Re i I ` = ` = 0.
c
Spectral Efficiency
√
• Choose φ of bandwidth W/2, e.g., φ : t 7→ W sinc(Wt).
• The QAM signal is then of bandwidth W around fc .
• To satisfy Nyquist,
1
. Ts ≥
W
• We are then sending complex symbols at rate 1/Ts , i.e., W,
complex symbols per second.
• This corresponds to 2W real symbols per second.
• By orthogonality, we achieve
2 .
[passband Hz]
c
Mission Accomplished
QAM with√the bandwidth-W/2, unit-energy, pulse

shape t 7→ W sinc(Wt) transmits a sequence of real
symbols arriving at a rate of 2W real symbols per sec-
ond as the coefficients in a linear combination of or-
thogonal signals, with the resulting waveform being
bandlimited to W Hz around the carrier frequency fc .
It thus achieves a spectral efficiency of
[real dimension/sec] [complex dimension/sec]

2 =1 .
[passband Hz] [passband Hz]
c
QAM Constellations
The constellation C is the smallest subset of C s.t.
Ci ∈ C, i = 1, . . . , n.
c
4-QAM 16-QAM
8-PSK 32-QAM
c
The Constellation’s Parameters
The minimum distance of C is
δ , min
0
|c − c0 |.
c,c ∈C
c6=c0
The second moment of C is

1 X 2
|c| .
#C
c∈C
c
Recovering the Symbols
If the time shifts of φ by integer multiples of Ts are orthonormal,

n
X n
X
√ √
XPB = 2A Re(C` ) ψI,` + 2A Im(C` ) ψQ,` ,
`=1 `=1
where . . . , ψI,−1 , ψQ,−1 , ψI,0 , ψQ,0 , ψI,1 , ψQ,1 , . . . are orthonormal.

Hence
1
Re(C` ) = √ hXPB , ψI,` i , ` ∈ {1, . . . , n},
2A
1
Im(C` ) = √ hXPB , ψQ,` i , ` ∈ {1, . . . , n}.
2A
c
Computing hr, ψI,` i and hr, ψQ,` i (1)
More generally, we’ll compute
hr, gI,` i , hr, gQ,` i .
gI,` (t)
z }| !{
n
X
√ 1
2
`=1 | {z }
gI,`,BB (t)
gQ,` (t)
z }| !{
n
X
√ 1
2
`=1 | {z }
gQ,`,BB (t)
Both gI,` and gQ,` are bandlimited to W Hz around fc .

c
gI,` (t)
z }| !{
n
X
√ 1
2
`=1 | {z }
gI,`,BB (t)
gQ,` (t)
z }| !{
n
X
√ 1
2
`=1 | {z }
gQ,`,BB (t)
Since gI,` and gQ,` are bandlimited to W Hz around fc ,

hr, gI,` i = hs, gI,` i ,
hr, gQ,` i = hs, gQ,` i ,
where
s = r ? BPFW,fc .
c
hr, gI,` i = hs, gI,` i ,

hr, gQ,` i = hs, gQ,` i ,
where
s = r ? BPFW,fc .
Denoting the baseband representation of s by sBB ,
hr, gI,` i = hs, gI,` i

= 2 Re hsBB , gI,`,BB i
√
= 2 Re hsBB , t 7→ g(t − `Ts )i .
hr, gQ,` i = hs, gQ,` i

= 2 Re hsBB , gQ,`,BB i
√
= 2 Re hsBB , t 7→ i g(t − `Ts )i
√
= 2 Im hsBB , t 7→ g(t − `Ts )i .
c
Bandpass Filtering and Baseband Conversion
× LPFWc Re(sBB )
cos(2πfc t)
s(t)
r(t) BPFW,fc W
≤ Wc ≤ 2fc − W
2 2
90◦
× LPFWc Im(sBB )
c
√
hr, gI,` i = 2 Re hsBB , t 7→ g(t − `Ts )i ,
√
hr, gQ,` i = 2 Im hsBB , t 7→ g(t − `Ts )i
can be computed with real operations:
Z ∞
√ ∗
hr, gI,` i = 2 Re sBB (t) g (t − `Ts ) dt
−∞
√ Z ∞
= 2 Re sBB (t) Re g(t − `Ts ) dt
−∞
√ Z ∞
+ 2 Im sBB (t) Im g(t − `Ts ) dt,
−∞
and Z ∞
√ ∗
hr, gQ,` i = 2 Im sBB (t) g (t − `Ts ) dt
−∞
√ Z ∞
= 2 Im sBB (t) Re g(t − `Ts ) dt
−∞
√ Z ∞
− 2 Re sBB (t) Im g(t − `Ts ) dt.
−∞
c
Computing hr, ψI,` i and
Z
hr, ψQ,` i: Real Pulse

Shape
√ ∞
hr, gI,` i = 2 Re sBB (t) g ∗ (t − `Ts ) dt
−∞
√ Z ∞
= 2 Re sBB (t) Re g(t − `Ts ) dt
−∞
√ Z ∞
+ 2 Im sBB (t) Im g(t − `Ts ) dt
−∞
√ Z ∞
= 2 Re sBB (t) g(t − `Ts ) dt
−∞
Z ∞
√
hr, gQ,` i = 2 Im sBB (t) g ∗ (t − `Ts ) dt
−∞
√ Z ∞
= 2 Im sBB (t) Re g(t − `Ts ) dt
−∞
√ Z ∞
− 2 Re sBB (t) Im g(t − `Ts ) dt
−∞
√ Z ∞
= 2 Im sBB (t) g(t − `Ts ) dt
c
Lecture 8, Amos Lapidoth 2017 −∞
Bandpass Filtering and Baseband Conversion
× LPFWc Re(sBB )
cos(2πfc t)
s(t)
r(t) BPFW,fc W
≤ Wc ≤ 2fc − W
2 2
90◦
× LPFWc Im(sBB )
c
Matched Filtering in Baseband (g Real)
Re(sBB ) ~g √1 hr, gI,` i

2
`Ts
Im(sBB ) ~g √1 hr, gQ,` i

2
`Ts
This circuit does not depend on fc .
c
Filtering QAM Signals
Let us recall our discussion (Section 7.6.7) of

xPB ? h BB ,
when
• xPB is an integrable signal that is bandlimited to W Hz
around fc , and
• h ∈ L1 is a real stable filter.
c
x̂PB (f )
W
1
f
−fc fc
ĥ(f )
f
−fc fc
x̂PB (f ) ĥ(f )
f
−fc fc
c
ĥ(f )
W
f
fc
f
−W
2
W
2
c
The frequency response of the real impulse response h ∈ L1
with respect to the bandwidth W around the carrier
frequency fc is the mapping
n Wo
f 7→ ĥ(f + fc ) I |f | ≤ .
2
The FT of
xPB ? h BB
is the product of x̂BB by the filter’s frequency response with
respect to the bandwidth W around the carrier frequency fc
n Wo
f 7→ x̂BB (f ) ĥ(f + fc ) I |f | ≤ .
2
c
x̂PB (f )
W
1
f
−fc fc
ĥ(f )
f
−fc fc
x̂PB (f ) ĥ(f )
f
−fc fc
c
x̂BB (f )
f
−W
2
W
2
c
Returning to Filtered QAM
The baseband representation of QAM is a complex PAM, so
Xn
X̂BB (f ) = A C` e−i2πf `Ts ĝ(f ), f ∈ R.
`=1
The baseband representation of XPB ? h is hence of FT
Xn
f 7→ A C` e−i2πf `Ts ĝ(f ) ĥ(f + fc ), f ∈ R.
`=1
In the time domain,
n
X

XPB ? h BB
(t) = A C` p(t − `Ts ),
`=1
where Z ∞
p(t) = ĝ(f ) ĥ(f + fc ) ei2πf t df, t ∈ R.
−∞

XPB ? h BB is a complex PAM with g replaced by p.
c
Filtering a QAM Signal
Filtering a QAM signal xPB through h ∈ L1 is tantamount to

replacing its pulse shape g by the pulse shape p, where
Z ∞
p(t) = ĝ(f ) ĥ(f + fc ) ei2πf t df
−∞
Note that p may be complex even if g is real.
c
Complex Random Variables
• A CRV maps experiment outcomes ω ∈ Ω to C.

• Its real and imaginary parts are (real) RVs.
• Any two real RVs X and Y can be used to construct the CRV
Z = X + iY.
• The distribution of a CRV is determined by the joint law of its

real and imaginary parts.
So far nothing more than a trivial data structure for pairs of real
random variables. . .
c
The Density of a CRV
The PDF fZ (·) of Z at z ∈ C is the joint PDF of the real pair

(Re(Z), Im(Z)) at (Re(z), Im(z)):

fZ (z) , fRe(Z),Im(Z) Re(z), Im(z) , z ∈ C.
Thus,

∂2
fZ (z) = Pr Re(Z) ≤ x, Im(Z) ≤ y , z ∈ C.
∂x ∂y x=Re(z),y=Im(z)
c
The Expectation of a CRV
E[Z] = E[Re(Z)] + i E[Im(Z)] .

Thus,

Re E[Z] = E Re(Z) ,

Im E[Z] = E Im(Z) .
Consequently, conjugation and expectation commute
E[Z ∗ ] = (E[Z])∗ .
If g : C → C, then
Z

E g(Z) = fZ (z) g(z) dz
Zz∈C
∞ Z ∞
= fZ (x + iy) Re g(x + iy) dx dy
−∞ −∞
Z ∞Z ∞

+i fZ (x + iy) Im g(x + iy) dx dy.
−∞ −∞
c
The Variance
Here we do not treat Z as a pair!

Var[Z] , E |Z − E[Z]|2

= E |Z|2 − |E[Z]|2

= Var Re(Z) + Var Im(Z) .
Contrast with the covariance matrix of the pair (Re(Z), Im(Z))

 
Var Re(Z) Cov Re(Z), Im(Z)
 .

Cov Re(Z), Im(Z) Var Im(Z)
Var[Z] is the trace of the covariance matrix of (Re(Z), Im(Z)).
c
Proper CRV
A CRV Z is proper if it is zero-mean; of finite-variance; and

E Z 2 = 0.
Since

E Z 2 = E Re(Z)2 − Im(Z)2 + i2E Re(Z) Im(Z) ,

the condition E Z 2 = 0 is equivalent to

E Re(Z)2 = E Im(Z)2
and
E Re(Z) Im(Z) = 0.
Z is proper iff:Z is of zero mean; Re(Z) & Im(Z) have the same
finite variance; and Re(Z) & Im(Z) are uncorrelated.
c

The Covariance Matrix of Re(Z), Im(Z)

In general, the covariance matrix of Re(Z), Im(Z) is
 
Var Re(Z) Cov Re(Z), Im(Z)
 .

Cov Re(Z), Im(Z) Var Im(Z)
But if Z is proper,
1 
2 Var[Z] 0
 .
1
0 2 Var[Z]
c
The Covariance
∗
Cov[Z, W ] , E Z − E[Z] W − E[W ] .
This is not a matrix!
c
Properties of the Covariance (1)
1. Conjugate Symmetry:
∗
Cov[Z, W ] = Cov[W, Z] .
2. Sesquilinearity:
Cov[αZ, W ] = αCov[Z, W ] ,
Cov[Z1 + Z2 , W ] = Cov[Z1 , W ] + Cov[Z2 , W ] ,
Cov[Z, βW ] = β ∗ Cov[Z, W ] ,
Cov[Z, W1 + W2 ] = Cov[Z, W1 ] + Cov[Z, W2 ] ,
and, more generally,

X
n n
X
0 n X
X n 0

Cov αj Zj , βj 0 Wj 0 = αj βj∗0 Cov Zj , Wj 0 .
j=1 j 0 =1 j=1 j 0 =1
c
Properties of the Covariance (2)
3. Relation with Variance:
Var[Z] = Cov[Z, Z] .
4. Variance of Linear Functionals:

Xn X n X n

Var αj Zj = αj αj∗0 Cov Zj , Zj 0 .
j=1 j=1 j 0 =1
c
WSS Discrete-Time Complex Stochastic Processes

A discrete-time CSP Zν is wide-sense stationary if:
1. For every ν ∈ Z the CRV Zν is of finite variance.
2. The mean of Zν does not depend on ν.

3. E Zν Zν∗0 depends on ν 0 and ν only via ν − ν 0 :

E[Zν Zν∗0 ] = E Zν+η Zν∗0 +η , ν, ν 0 , η ∈ Z.
Note: we do not require that E[Zν 0 Zν ] (unconjugated) be

computable from ν − ν 0 ; it may or may not be.
c
Autocovariance Function
KZZ (η) , Cov[Zν+η , Zν ]

∗
= E Zν+η − E[Z1 ] Zν − E[Z1 ] , η ∈ Z.
Key properties:
• KZZ is conjugate symmetric:
∗
KZZ (−η) = KZZ (η) , η ∈ Z.
• KZZ is a positive-definite function:

n X
X n
αν αν∗ 0 KZZ (ν − ν 0 ) ≥ 0, α1 , . . . , αn ∈ C.
ν=1 ν 0 =1
c
The PSD of a Complex Discrete-Time SP

Zν is of power spectral density SZZ if
Z 1/2
KZZ (η) = SZZ (θ) ei2πηθ dθ, η ∈ Z.
−1/2
Note that SZZ need not be symmetric!
SZZ must be nonnegative outside a null set. By altering it on that

set, we can always assume that the PSD, if it exists, is nonnegative.
c
The PSD when the Autocovariance Function is Summable
If the autocovariance function KZZ is absolutely summable, i.e.,
∞
X
KZZ (η) < ∞,
η=−∞
then the function

∞
X
S(θ) = KZZ (η) e−i2πηθ , θ ∈ [−1/2, 1/2]
η=−∞
is continuous, nonnegative, and

Z 1/2
S(θ) ei2πηθ dθ = KZZ (η), η ∈ Z.
−1/2
c
The Intuition
The complex exponentials are orthonormal:
Z 1/2
0
ei2π(η−η )θ dθ = I{η = η 0 }, η, η 0 ∈ Z.
−1/2
Hence,
Z 1/2 Z 1/2 ∞
X
i2πηθ −i2πη 0 θ
S(θ) e dθ = KXX (η ) e 0
ei2πηθ dθ
−1/2 −1/2 η 0 =−∞
∞
X Z 1/2
0
= KXX (η )0
e−i2πη θ ei2πηθ dθ
η 0 =−∞ −1/2
X∞ Z 1/2
0
= KXX (η 0 ) ei2π(η−η )θ dθ
η 0 =−∞ −1/2
X∞
= KXX (η 0 ) I{η = η 0 }
η 0 =−∞
= KXX (η), η ∈ Z.
c
Next Week
Energy, Power, and Operational PSD of QAM (Chapter 18).
Please read Chapter 19 through Section 19.7.
Thank you!
c
Amos Lapidoth
ETH Zurich
April 25, 2017
Energy, Power, and Operational PSD of QAM
c
Today
• Energy in QAM.
• Power in QAM.
• Operational PSD of QAM.
c
Sending a Single Block
• K IID random bits D1 , . . . , DK are transmitted.
• These bits are mapped by
enc : {0, 1}K → CN
to N complex symbols C1 , . . . , CN .
• The transmitted signal is

X(t) = 2 Re XBB (t) ei2πfc t
N
!
X
i2πfc t
= 2 Re A C` g(t − `Ts ) e , t ∈ R,
`=1
where the baseband representation of the transmitted signal is

N
X
XBB (t) = A C` g(t − `Ts ), t ∈ R.
`=1
c
Assumptions
• D1 , . . . , DK are IID random bits.

• g is bandlimited to W/2 Hz.
• fc > W/2.
c
The Energy in a Single Block
We seek Z ∞
E,E 2
X (t) dt .
−∞
Since XBB (·) is bandlimited to W/2 Hz, and since fc > W/2,
Z ∞

E = 2E XBB (t)2 dt .
−∞
Calculate as in PAM, but with complex symbols and pulse shape:

• Use |w|2 = ww∗ , w ∈ C and
• swap summations, integrations, and expectations.
c
The Energy in Baseband
Z Z " N 2 #
∞ ∞ X
E XBB (t)2 dt = E A C` g(t − `Ts ) dt
−∞ −∞ `=1
Z " N X
N ∗ #
∞ X
= E A C` g(t − `Ts ) A C`0 g(t − `0 Ts ) dt
−∞ `=1 `0 =1
N X
X N Z ∞
= A2 E[C` C`∗0 ] g(t − `Ts ) g ∗ (t − `0 Ts ) dt
`=1 `0 =1 −∞
XN X N

= A2 E[C` C`∗0 ] Rgg (`0 − `)Ts ,
`=1 `0 =1
where Rgg is the self-similarity function of the pulse shape g

Z ∞
Rgg (τ ) = g(t + τ ) g ∗ (t) dt, τ ∈ R.
−∞
c
Simplifications
This simplifies if {C` } are of zero mean and uncorrelated
Z ∞ N
X

E XBB (t)2 dt = A2 kgk2 E |C` |2 ,
2
−∞ `=1

E[C` C`∗0 ] 2
= E |C` | I{` = `0 }, `, `0 ∈ {1, . . . , N} ,
or if the time shifts of the pulse shape by integer multiples of Ts

are orthonormal
Z ∞ N
X
2
E XBB (t) dt = A2 E |C` |2 ,
−∞ `=1
Z ∞
∗ 0 0 0
g(t − `Ts )g (t − ` Ts ) dt = I{` = ` }, `, ` ∈ {1, . . . , N} .
−∞
c
The Energy in XPB
Z ∞
Rgg (τ ) = |ĝ(f )|2 ei2πf τ df, τ ∈ R,
−∞
so
N X
X N

E = 2A2 E[C` C`∗0 ] Rgg (`0 − `)Ts
`=1 `0 =1
Z ∞ X N X
N
0
= 2A2 E[C` C`∗0 ] ei2πf (` −`)Ts |ĝ(f )|2 df.
−∞ `=1 `0 =1

Only expectations of the form E C` C`∗0 show up; not E[C` C`0 ].
We define the energy per bit Eb
E
Eb ,
K
and the energy per complex symbol Es
E
Es , .
N
c
Relating Power in Passband to Power in Baseband
• The energy in a passband signal is twice the energy in its
baseband representation.
• But power is trickier:
"Z # "Z #
T T
1 1 2
XBB (t) dt ,
E X 2 (t) dt 6= 2 E
2T −T 2T −T
because t 7→ X(t) I{|t| ≤ T} is not bandlimited around fc .

Fortunately, equality holds in the limit:
"Z # "Z #
T T
1 1 2
XBB (t) dt .
lim E X 2 (t) dt = 2 lim E
T→∞ 2T −T T→∞ 2T −T
The power in QAM is twice the power in its baseband

representation.
c

C` Is Zero-Mean and WSS (1)

Assume that C` is a zero-mean WSS discrete-time CSP of
autocovariance function KCC :
E[C` ] = 0, ` ∈ Z,
E[C`+m C`∗ ] = KCC (m) , m, ` ∈ Z.

We calculate Z
τ +Ts
E XBB (t)2 dt
τ
and show that it does not depend on τ .
c
Z Z " ∞ 2 #
τ +Ts τ +Ts X
E XBB (t)2 dt = A2 E C` g(t − `Ts ) dt
τ τ `=−∞
Z τ +Ts X
∞ ∞
X
=A 2
E C` C`∗0 ∗ 0
g(t − `Ts ) g (t − ` Ts ) dt
τ `=−∞ `0 =−∞
Z ∞
τ +Ts X ∞
X
= A2 E[C` C`∗0 ] g(t − `Ts ) g ∗ (t − `0 Ts ) dt
τ `=−∞ `0 =−∞
Z τ +Ts X∞ ∞
X
= A2 E[C`0 +m C`∗0 ] g t − (`0 + m)Ts g ∗ (t − `0 Ts ) dt
τ m=−∞ `0 =−∞
Z ∞
τ +Ts X ∞
X
=A 2
KCC (m) g t − (`0 + m)Ts g ∗ (t − `0 Ts ) dt
τ | {z }
m=−∞ `0 =−∞ t0
∞
X ∞
X Z τ +Ts −`0 Ts
= A2 KCC (m) g(t0 − mTs ) g ∗ (t0 ) dt0
m=−∞ `0 =−∞ τ −`0 Ts
∞
X Z ∞
= A2 KCC (m) g ∗ (t0 ) g(t0 − mTs ) dt0
m=−∞ | −∞ {z }
∗ (mT )
Rgg
c
Lecture 9, Amos Lapidoth 2017 s
We lower-bound the energy of XBB (·) in the interval [−T, +T ] by
Z τ +Ts
2T
E XBB (t)2 dt
Ts τ
and upper-bound it by
Z τ +Ts
2T 2
E
XBB (t) dt ,
Ts τ
to obtain (Sandwich Theorem)

Z +T Z τ +Ts
1 2 1 2
lim E
XBB (t) dt = E
XBB (t) dt
∞
A2 X ∗
= KCC (m) Rgg (mTs ).
Ts m=−∞
c
The Power in Passband
Since the power in passband is twice the power in baseband:

Z ∞
1 T
2A2 X ∗
lim E X 2 (t) dt = KCC (m) Rgg (mTs ),
T→∞ 2T −T Ts m=−∞
and
Z Z ∞
1 T
2A2 ∞ X
lim E 2
X (t) dt = KCC (m) e−i2πf mTs |ĝ(f )|2 df.
T→∞ 2T −T Ts −∞ m=−∞
c
The Power in QAM in Bi-Infinite Block-Mode
If enc(·) produces zero-mean symbols from IID random bits:
"Z N 2 #
∞ X
1 A
PBB = E C ` g(t − `Ts )
NTs −∞ `=1
Z ∞X N X N
A2 0
= E[C` C`∗0 ] ei2πf (` −`)Ts |ĝ(f )|2 df.
NTs −∞ 0 `=1 ` =1
Consequently
Z T
1 2 Es
lim E X (t) dt =
T→∞ 2T −T Ts
where Es = E/N, and
N X
X N

E = 2A 2
E[C` C`∗0 ] Rgg (`0 − `)Ts
`=1 `0 =1
Z ∞ X N X
N
0
= 2A2 E[C` C`∗0 ] ei2πf (` −`)Ts |ĝ(f )|2 df.
c
Lecture 9, Amos −∞
Lapidoth 2017 0
Time Shifts of Pulse Shape Are Orthonormal
Suppose
X∞
i2πfc t
X(t) = 2 Re A C` φ(t − `Ts ) e , t ∈ R,
`=−∞
where φ is bandlimited to W/2 Hz and satisfies

Z ∞
φ(t − `Ts ) φ∗ (t − `0 Ts ) dt = I{` = `0 }, `, `0 ∈ Z,
−∞
and fc > W/2 > 0. Then
Z T L
X
1 2A2 1
lim E X 2 (t) dt = lim E |C` |2 ,
T→∞ 2T −T Ts L→∞ 2L + 1
`=−L
whenever the limit on the RHS exists.

c
The Operational PSD of a Complex Stochastic Process

We say that a CSP Z(t) is of operational power spectral density
SZZ if, for every integrable complex-valued function h
Z ∞
Power in Z ? h = SZZ (f ) |ĥ(f )|2 df.
−∞
We dropped the symmetry requirement. Nevertheless:
The operational PSD of a CSP is unique in the sense that if a CSP

is of two different operational power spectral densities, then the
two must be indistinguishable.
c
The Operational PSD of QAM
If XBB is of operational PSD SBB (·), then the operational PSD of

the QAM signal is

SXX (f ) = SBB |f | − fc , f ∈ R.
For a formal proof, see Section 18.6; intuition follows.
c
XBB Is Bandlimited to W/2 Hz
We argue that, because g is bandlimited to W/2 Hz,
W
SBB (f ) = 0, |f | > .
2
More precisely, we’ll assume that XBB is of operational PSD
SBB (·) and show that it is also of operational PSD
n Wo
f 7→ SBB (f ) I |f | ≤ .
2
c
X
Power in XBB ? h = Power in t 7→ A C` g(t − `Ts ) ? h
`∈Z
X
= Power in t 7→ A C` (g ? h)(t − `Ts )
`∈Z
X
= Power in t 7→ A C` (g ? LPFW/2 ) ? h (t − `Ts )
`∈Z
X
= Power in t 7→ A C` g ? (h ? LPFW/2 ) (t − `Ts )
`∈Z
X
= Power in t 7→ A C` g(t − `Ts ) ? (h ? LPFW/2 )
`∈Z
Z ∞ 2
= SBB (f ) ĥ(f ) I{|f | ≤ W/2} df
Z−∞
∞ 2
= SBB (f ) I{|f | ≤ W/2} ĥ(f ) df.
−∞
c
The Baseband Representation of X ? h
Loosely speaking, if h : R → R is integrable, then the baseband

representation of X ? h is XBB ? h0BB , where h0BB : R → C is the
baseband representation of h ? BPFW,fc
ĥ0BB (f ) = ĥ(f + fc ) I{|f | ≤ W/2}, f ∈ R.
f 7→ ĥ(f + fc ) I{|f | ≤ W/2} is the frequency response of h w.r.t.

the bandwidth W around fc .
c
x̂PB (f )
W
1
f
−fc fc
ĥ(f )
f
−fc fc
x̂PB (f ) ĥ(f )
f
−fc fc
c
ĥ(f )
W
f
fc
ĥ0BB (f )
f
−W
2
W
2
c
Power in X ? h
= 2 Power in XBB ? h0BB
Z ∞
2
=2 SBB (f ) ĥ0BB (f ) df
Z−∞
∞ 2
=2 SBB (f ) ĥ(f + fc ) I{|f | ≤ W/2} df
Z−∞
∞ 2
=2 SBB (f ) ĥ(f + fc ) df
Z−∞
∞ 2
=2 SBB (f˜ − fc ) ĥ(f˜) df˜
−∞
Z ∞ Z ∞
2 2
= ˜ ĥ ˜
SBB (f − fc ) (f ) df + ˜ SBB (f˜ − fc ) ĥ(−f˜) df˜
Z−∞ Z−∞
∞ 2 ∞ 2
= SBB (f˜ − fc ) ĥ(f˜) df˜ + SBB (−f 0 − fc ) ĥ(f 0 ) df 0
Z−∞ −∞
∞ 2
= SBB (f − fc ) + SBB (−f − fc ) ĥ(f ) df
Z−∞∞ 2
= SBB (|f | − fc ) ĥ(f ) df .
−∞
c
Computing SBB (·)
• To compute SBB (·) we need the power in XBB ? h.
• Also for complex PAM, feeding XBB to a filter of impulse
response h is tantamount to changing its pulse shape from g
to g ? h.
∞
X

X ? h (t) = σ 7→ A X` g(σ − `Ts ) ? h (t)
`=−∞
∞
X Z ∞
=A X` h(s) g(t − s − `Ts ) ds
`=−∞ −∞
X∞
=A X` (g ? h)(t − `Ts ), t ∈ R.
`=−∞
• If you know how to compute the power in a complex PAM,

you also know how to compute it for a filtered complex PAM.
c

C` Zero-Mean WSS and Bounded
We compute the operational PSD of XBB by replacing g with
g ? h:
Z ∞ ∞
X
A2
Power in XBB ? h = KCC (m) e−i2πf mTs |ĝ(f )|2 |ĥ(f )|2 df.
Ts −∞ m=−∞
The operational PSD of XBB is thus

∞
A2 X
SBB (f ) = KCC (m) e−i2πf mTs |ĝ(f )|2 , f ∈ R.
Ts m=−∞
Consequently,
∞
A2 X 2
SXX (f ) = KCC (m) e−i2π(|f |−fc )mTs ĝ |f | − fc , f ∈ R.
Ts m=−∞
c

C` Zero-Mean, Variance-σC2 , and Uncorrelated
In this case
∞
A2 X 2
KCC (m) e−i2π(|f |−fc )mTs ĝ |f | − fc , f ∈R
Ts m=−∞
simplifies to
A2 2 2
SXX (f ) = σC ĝ |f | − fc , f ∈ R.
Ts
c
ĝ(f )
|ĝ(f )|2
� � ��
�ĝ |f | − fc �2
f
−fc fc
Figure
c
Lecture 9, Amos 18.1: The
Lapidoth relationship between the Fourier Transform� of�the pulse shape
2017
The Operational PSD of QAM in Bi-Infinite Block-Mode
To compute the operational PSD of XBB , replace g with g ? h:
Power in XBB ? h
Z ∞ 2 X N XN
A
∗ i2πf (`0 −`)Ts
2
= E[C` C`0 ] e ĝ(f ) ĥ(f )2 df.
−∞ NTs 0 `=1 ` =1
Hence,
N N
A2 X X 0 2
SBB (f ) = E[C` C`∗0 ] ei2πf (` −`)Ts ĝ(f ) , f ∈ R.
NTs 0 `=1 ` =1
Consequently,
N N
A2 X X 0 2
SXX (f ) = E[C` C`∗0 ] ei2π(|f |−fc )(` −`)Ts ĝ |f | − fc .
NTs 0`=1 ` =1
c
You have all read Chapter 19.
But let’s quickly review the Q-function.
c
Standard Gaussian
1 w2
fW (w) = √ e− 2 , w ∈ R.
2π
It is of zero mean and unit variance.
fW (w)
c
Gaussian Random Variables
• X is a centered Gaussian if
X = aW
for some deterministic a ∈ R and for some standard

Gaussian W .
• X is Gaussian if
X = aW + b
for some deterministic a, b ∈ R and for some standard
Gaussian W .
• Note that a may be zero, in which case X is deterministic.
• There is only one mean-µ variance-σ 2 Gaussian distribution,

N µ, σ 2 .
c
Standardizing a Gaussian
• Any affine transformation of a Gaussian is Gaussian.

• There is only one zero-mean unit-variance Gaussian, the
Standard Gaussian.

• If X ∼ N µ, σ 2 with σ 2 > 0, then
X −µ
∼ N (0, 1)
σ
and is thus a standard Gaussian.
c
The Q-Function
The Q-function maps every α ∈ R to the probability that a
standard Gaussian exceeds it:
Z ∞
1 2 /2
Q(α) , √ e−ξ dξ, α ∈ R.
2π α
Q(α)
α
c
The Q-Function
Q(α)
1
2
α
1
c
The Q-Function and Intervals
The CDF of a Standard Gaussian:
FW (w) = Pr[W ≤ w]
= 1 − Pr[W ≥ w]
= 1 − Q(w), w ∈ R.
More generally,
Pr[a ≤ W ≤ b] = Pr[W ≥ a] − Pr[W ≥ b]
= Q(a) − Q(b), a ≤ b.
2

If X ∼ N µ, σ with σ > 0, then
Pr[a ≤ X ≤ b] = Pr[X ≥ a] − Pr[X ≥ b], a ≤ b

X −µ a−µ X −µ b−µ
= Pr ≥ − Pr ≥ , σ>0
σ σ σ σ
a − µ b − µ
=Q −Q , a ≤ b, σ > 0 .
σ σ
c
The Q-Function and Rays
Letting b → +∞, we obtain the probability of a half ray:

a − µ
Pr[X ≥ a] = Q , σ > 0.
σ
Letting a → −∞ we obtain
b − µ
Pr[X ≤ b] = 1 − Q , σ > 0.
σ
c
The Q-Function with Negative Arguments
• The standard Gaussian density is symmetric. Let
W ∼ N (0, 1).
Pr[W ≥ −α] = Pr[−W ≤ α]

= Pr[W ≤ α]
= 1 − Pr[W ≥ α], α ∈ R.
Consequently,
Q(−α) = 1 − Q(α), α ∈ R.
• Please use only nonnegative arguments to the Q-function!

• Q(0) = 1/2.
c
19.4 The Q-Function 343
Q(α)
Q(α)
−α
Q(−α)
−α
Q(α) Q(−α)
−α
Figure 19.4: The identity Q(α) + Q(−α) = 1.
c
Linear Combinations of Independent Gaussians
Suppose Z1 , . . . , ZJ are independent centered Gaussians

Zj ∼ N 0, σj2 , j = 1, . . . , J.
Let
α1 , . . . , αJ ∈ R
be deterministic constants. Then
J
X J
X
2
2
αj Zj ∼ N 0, σ , σ = αj2 σj2 .
j=1 j=1
c
Next Week
Binary Hypothesis Testing (Chapter 20).
Thank you!
c
Communication and Detection Theory:
Lecture 10
Amos Lapidoth
ETH Zurich
May 2, 2017
Binary Hypothesis Testing
c
Today
• Binary Hypothesis Testing.
c
Guessing H
H takes on the values 0 and 1 according to the prior
π0 = Pr[H = 0], π1 = Pr[H = 1].
We wish to guess H based on the observation Y

T
Y = Y (1) , . . . , Y (d) .
Given the prior (π0 , π1 ) and the conditional densities
fY|H=0 (·), fY|H=1 (·),
we wish to design a guessing rule
φGuess : Rd → {0, 1}
that maps the observed value yobs of Y to our guess of H.

c
The Probability of Error
• The probability of error associated with φGuess : Rd → {0, 1} is
Pr(error) , Pr[φGuess (Y) 6= H].
• A guessing rule is optimal if no other guessing rule attains a

smaller probability of error.
• The probability of error associated with optimal guessing rules
is the optimal probability of error
p∗ (error).
1. How to find an optimal decision rule?

2. What is its performance?
c
Guessing in the Absence of Observables
• There are only two guessing rules: φ0 , which guesses

“H = 0,” and φ1 , which guesses “H = 1.”
• The probability of error associated with φ0 is π1 .
• The probability of error associated with φ1 is π0 .
• φ0 is optimal if π0 ≥ π1 .
• φ1 is optimal if π0 ≤ π1 .
Guess the value of H that has the highest a priori probability.

p∗ (error) = min Pr[H = 0], Pr[H = 1] .
Check by case!
c
The Joint Law of H and Y
We are typically given the prior (π0 , π1 ) and the conditionals
fY|H=0 (·), fY|H=1 (·).
The (unconditional) density of Y is
fY (y) = π0 fY|H=0 (y) + π1 fY|H=1 (y), y ∈ Rd .
The a posteriori probabilities are

(π f
0 Y|H=0 (yobs )
if fY (yobs ) > 0,

Pr H = 0 Y = yobs , fY (yobs )
1
2 otherwise,
(π f
1 Y|H=1 (yobs )
if fY (yobs ) > 0,
Pr H = 1 Y = yobs , fY (yobs )
1
2 otherwise.
c
Intuition
Suppose the observation is a scalar Y .

Pr H = 0, Y ∈ y obs − δ, yobs + δ
Pr H = 0 Y = yobs ≈ lim .
δ↓0 Pr Y ∈ yobs − δ, yobs + δ
Now approximate
Z
yobs +δ
Pr H = 0, Y ∈ (yobs − δ, yobs + δ) = π0 fY |H=0 (y) dy
yobs −δ
≈ π0 2δfY |H=0 (yobs ), δ 1,
Z yobs +δ

Pr Y ∈ (yobs − δ, yobs + δ) = fY (y) dy
yobs −δ
≈ 2δfY (yobs ), δ 1,
and the ratio is approximately π0 fY |H=0 (yobs )/fY (yobs ).
c
Advice
In a first reading of this chapter,

assume fY|H=0 (·) and fY|H=1 (·) are positive.
And assume π0 , π1 > 0.
c
Guessing after Observing Y—Heuristics
Having observed that Y = yobs , we associate with H the a
posteriori probabilities Pr[H = 0|Y = yobs ], Pr[H = 1|Y = yobs ].
(
0 if Pr[H = 0| Y = yobs ] ≥ Pr[H = 1|Y = yobs ],
φ∗Guess (yobs ) =
1 otherwise,
i.e.,
(
0 if π0 fY|H=0 (yobs ) ≥ π1 fY|H=1 (yobs ),
φ∗Guess (yobs ) =
1 otherwise.
The conditional error probability is

p∗ (error|Y = yobs ) = min Pr[H = 0|Y = yobs ], Pr[H = 1|Y = yobs ] ,
so Z
∗

p (error) = min Pr[H = 0| Y = y], Pr[H = 1| Y = y] fY (y) dy
ZRd

= min π0 fY|H=0 (y), π1 fY|H=1 (y) dy.
Rd
c
The Error
Let φGuess : Rd → {0, 1} be any guessing rule, and let
D = {y ∈ Rd : φGuess (y) = 0}.
Then
Z
p(error|H = 0) = fY|H=0 (y) dy,
Zy∈D
/
p(error|H = 1) = fY|H=1 (y) dy,

y∈D
and
Z Z
p(error) = π0 fY|H=0 (y) dy + π1 fY|H=1 (y) dy
y∈D
/ y∈D
Z
= π0 fY|H=0 (y) I{y ∈
/ D} + π1 fY|H=1 (y) I{y ∈ D} dy.
Rd
c
The Main Result
If φ∗Guess guesses “H = 0” only when
π0 fY|H=0 (yobs ) ≥ π1 fY|H=1 (yobs )

φ∗Guess (yobs ) = 0 =⇒ π0 fY|H=0 (yobs ) ≥ π1 fY|H=1 (yobs ) ,
and guesses “H = 1” only when

π1 fY|H=1 (yobs ) ≥ π0 fY|H=0 (yobs ),

φ∗Guess (yobs ) = 1 =⇒ π1 fY|H=1 (yobs ) ≥ π0 fY|H=0 (yobs ) ,
then φ∗Guess is optimal and

Z

Pr φ∗Guess (Y) 6= H = min π0 fY|H=0 (y), π1 fY|H=1 (y) dy.
Rd
c
Proof
Let φGuess : Rd → {0, 1} be any guessing rule, and
D = {y ∈ Rd : φGuess (y) = 0}.
Then

Pr φGuess (Y) 6= H
Z
= π0 fY|H=0 (y) I{y ∈
/ D} + π1 fY|H=1 (y) I{y ∈ D} dy
ZRd

≥ min π0 fY|H=0 (y), π1 fY|H=1 (y) dy.
Rd
But ∗
φGuess (·) achieves this lower bound! Indeed, with
D∗ = {y ∈ Rd : φ∗Guess (y) = 0}
we have
/ D∗ } + π1 fY|H=1 (y) I{y ∈ D∗ }
π0 fY|H=0 (y) I{y ∈

= min π0 fY|H=0 (y), π1 fY|H=1 (y) , y ∈ Rd .
c
Randomized Guessing Rules
yobs b(yobs ) Θ < b(yobs ) ⇒ “H = 0” Guess

Bias
Calculator Θ ≥ b(yobs ) ⇒ “H = 1”
Θ ∼ U ([0, 1])
Random Number
Generator
c
Randomized Guessing Rules Are not Better
Deterministic rules are randomized rules where b(yobs ) ∈ {0, 1}.
For a randomized rule

Pr error Y = yobs

= b(yobs ) Pr H = 1Y = yobs + 1 − b(yobs ) Pr H = 0Y = yobs
| {z } | {z }
Pr[“H = 0” | Y=yobs ] Pr[“H = 1” | Y=y
obs ]

≥ min Pr H = 1 Y = yobs , Pr H = 0 Y = yobs ,
because the weighted average of Pr[H = 0|Y = yobs ] and

Pr[H = 1|Y = yobs ] cannot be smaller than the minimum.
The minimum is achieved by a deterministic rule that guesses

“H = 0” iff π0 fY|H=0 (y) ≥ π1 fY|H=1 (y).
c
Alternative Proof
The randomized rule is a deterministic rule based on (Y, Θ)!
fY,Θ|H=0 (y, θ) = fY|H=0 (y) fΘ|Y=y,H=0 (θ)

= fY|H=0 (y) fΘ (θ)
= fY|H=0 (y) I{0 ≤ θ ≤ 1}.
Similarly,
fY,Θ|H=1 (y, θ) = fY|H=1 (y) I{0 ≤ θ ≤ 1}.
The rule ‘guess “H = 0” iff π0 fY|H=0 (y) ≥ π1 fY|H=1 (y)’ is

optimal for this setting too because it guesses “H = 0” only when
fY,Θ|H=0 (y, θ) ≥ fY,Θ|H=1 (y, θ) and it guesses “H = 1” only
when fY,Θ|H=1 (y, θ) ≥ fY,Θ|H=0 (y, θ).
c
The Maximum A Posteriori Rule
The MAP rule resolves ties at random:
φMAP (yobs )


0 if Pr[H = 0|Y = yobs ] > Pr[H = 1| Y = yobs ],
, 1 if Pr[H = 0|Y = yobs ] < Pr[H = 1| Y = yobs ],


U {0, 1} if Pr[H = 0|Y = yobs ] = Pr[H = 1| Y = yobs ],


0 if π0 fY|H=0 (yobs ) > π1 fY|H=1 (yobs ),
= 1 if π0 fY|H=0 (yobs ) < π1 fY|H=1 (yobs ),


U {0, 1} if π0 fY|H=0 (yobs ) = π1 fY|H=1 (yobs ).
c
The Likelihood-Ratio Function
LR : Rd → [0, ∞]
fY|H=0 (y)
LR(y) , , y ∈ Rd
fY|H=1 (y)
using the convention
α 0
= ∞, α>0 and = 1.
0 0
Using this function,
 π1

0 if LR(yobs ) > π0 ,
π1

φMAP (yobs ) = 1 if LR(yobs ) < π0 ,
π0 , π1 , fY (yobs ) > 0 .

 π1
U {0, 1} if LR(yobs ) = π0 ,
c
The Maximum-Likelihood Rule
• The ML rule ignores the prior.

• It is the MAP corresponding to a uniform prior.
• In general, it is suboptimal.


0 if fY|H=0 (yobs ) > fY|H=1 (yobs ),
φML (yobs ) , 1 if fY|H=0 (yobs ) < fY|H=1 (yobs ),


U {0, 1} if fY|H=0 (yobs ) = fY|H=1 (yobs ),


0 if LR(yobs ) > 1,
= 1 if LR(yobs ) < 1,


U {0, 1} if LR(yobs ) = 1.
c
The Bhattacharyya Bound
Z
∗

p (error) = min π0 fY|H=0 (y), π1 fY|H=1 (y) dy
ZRd q
≤ π0 fY|H=0 (y)π1 fY|H=1 (y) dy
Rd Z q
√
= π0 π1 fY|H=0 (y)fY|H=1 (y) dy
Z qRd
1
≤ fY|H=0 (y)fY|H=1 (y) dy,
2 Rd
where we have used
√ 1
min{a, b} ≤ ab ≤ (a + b), a, b ≥ 0.
2
Thus,
Z q
1
p∗ (error) ≤ fY|H=0 (y)fY|H=1 (y) dy.
2 y∈Rd
c
Testing the Mean of a Univariate Gaussian (1)
H is uniform and
1 2 /(2σ 2 )
fY |H=0 (y) = √ e−(y−A) , y ∈ R,
2πσ 2
1 2 /(2σ 2 )
fY |H=1 (y) = √ e−(y+A) , y∈R
2πσ 2
for some deterministic A, σ > 0.
Since the prior is uniform, the MAP and the ML rules both guess
“H = 0” or “H = 1” depending on whether LR(yobs ) is greater or
smaller than one.
c
fY |H=0 (y)
LR(y) =
fY |H=1 (y)
2 2
√ 1 e−(y−A) /(2σ )
2πσ 2
=
√ 1 e−(y+A)2 /(2σ2 )
2πσ 2
4yA/(2σ 2 )
=e , y ∈ R.
2
LR(yobs ) > 1 ⇐⇒ e4yobs A/(2σ ) > 1
2

⇐⇒ ln e4yobs A/(2σ ) > ln 1
⇐⇒ 4yobs A/(2σ 2 ) > 0
⇐⇒ yobs > 0.
c
Likewise
2
LR(yobs ) < 1 ⇐⇒ e4yobs A/(2σ ) < 1
2

⇐⇒ ln e4yobs A/(2σ ) < ln 1
⇐⇒ 4yobs A/(2σ 2 ) < 0
⇐⇒ yobs < 0.
The MAP and ML rules thus guess “H = 0,” if yobs > 0; they
guess “H = 1,” if yobs < 0; and they guess “H = 0” or “H = 1”
equiprobably, if yobs = 0 (i.e., in the case of a tie).
c
The probability of a tie is zero. Indeed, under both hypotheses, the

probability that the observed variable Y is exactly equal to zero is
zero:

Pr Y = 0 H = 0 = Pr Y = 0 H = 1 = Pr Y = 0 = 0.
Consequently, the way ties are resolved is immaterial.
c
pMAP (error|H = 1) = Pr[Y > 0 |H = 1]
A
=Q ,
σ

because, conditional on H = 1, the RV Y is N −A, σ 2 so the
origin is A/σ standard deviations away (to the right).
pMAP (error|H = 0) = Pr[Y < 0 |H = 0]

A
=Q ,
σ

because, conditional on H = 0, the RV Y is N A, σ 2 , and the
origin is again A/σ standard deviations away (to the left).
It is a coincidence that the two types of error are of equal
probability.
c
Since
p∗ (error) = π0 pMAP (error|H = 0) + π1 pMAP (error|H = 1),
we conclude that A

∗
p (error) = Q .
σ
c
Guess “H = 1” Guess “H = 0”
fY |H=1 (y) fY |H=0 (y)
fY (y)
pMAP (error|H = 0)
y
−A A
c
The Bhattacharyya Bound:
Z
1 ∞q
p∗ (error) ≤ fY |H=0 (y)fY |H=1 (y) dy
2 −∞
Z s
1 ∞ 1 1
= √ e−(y−A)2 /(2σ2 ) √ e−(y+A)2 /(2σ2 ) dy
2 −∞ 2πσ 2 2πσ 2
Z ∞
1 2 2 1 2 2
= e−A /2σ √ e−y /2σ dy
2 −∞ 2πσ 2
1 −A2 /2σ2
= e .
2
As an aside, we obtained
1 −α2 /2
Q(α) ≤ e , α ≥ 0.
2
c
Deterministic Processing is Futile
yobs g(yobs ) Guess H based Guess

g(·)
on g(yobs )
No rule based on g(yobs ) can outperform an optimal rule based

on yobs .
Computing g(yobs ) and then deciding based on the answer is a

special case of guessing based on yobs .
c
More General Processing
The processor generates Θ independently of (H, Y) and forms
g(Y, Θ).
This too is futile!
Cannot outperform an optimal rule based on (yobs , θobs ), where
fY,Θ|H=0 (yobs , θobs ) = fY|H=0 (yobs ) fΘ (θobs ),
fY,Θ|H=1 (yobs , θobs ) = fY|H=1 (yobs ) fΘ (θobs ).
But,
fY,Θ|H=0 (yobs , θobs )
LR(yobs , θobs ) =
fY|H=0 (yobs ) fΘ (θobs )
=
fY|H=1 (yobs ) fΘ (θobs )
fY|H=0 (yobs )
= , fΘ (θobs ) 6= 0
fY|H=1 (yobs )
c
= LR(yobs ), fΘ (θobs ) 6= 0.
Recall that X and Y are conditionally independent given Z
−Z(−
X(− −Y
if
PX,Y |Z (x, y|z) = PX|Z (x|z)PY |Z (y|z), PZ (z) > 0.
We say that Z is the result of processing Y with respect to H if H

and Z are conditionally independent given Y.
Processing the observables does not decrease the optimal

probability of error.
c
yobs yobs + W MAPfor testing Guess
+ N α0 , σ 2 + δ 2 vs. N α1 , σ 2 + δ 2
with prior (π0 , π1 )

W ∼ N 0, δ 2
Gaussian RV
Generator
Local
Randomness
W independent of (Y, H).

A randomized rule for N α0 , σ 2 vs. N α1 , σ2 that attains the
optimal probability of error for N α0 , σ 2 + δ 2 vs. N α1 , σ 2 + δ 2 .
c
Sufficient Statistics—an Example (1)
Let H have a uniform prior. We observe (Y1 , Y2 ). Conditional on
H = 0, they areIID N 0, σ02 , whereas conditional on H = 1 they
are IID N 0, σ12 , where σ0 > σ1 > 0. Thus,
1 1
2 2
fY1 ,Y2 |H=0 (y1 , y2 ) = exp − (y 1 + y2 ) , y1 , y2 ∈ R,
2πσ02 2σ02
1 1
2 2
fY1 ,Y2 |H=1 (y1 , y2 ) = exp − (y 1 + y2 ) , y1 , y2 ∈ R.
2πσ12 2σ12
fY1 ,Y2 |H=0 (y1 , y2 )

LR(y1 , y2 ) =
fY1 ,Y2 |H=1 (y1 , y2 )

1 1 2 + y2)
2πσ0 2 exp − 2σ0 2 (y1 2
=
1 1 2 + y2)
2πσ12
exp − 2σ 2 1
(y 2
2
1
σ1 1 1 1 2 2
= 2 exp − (y1 + y2 ) , y1 , y2 ∈ R.
σ0 2 σ12 σ02
c

1 1 1 2 2 σ02
LR(y1 , y2 ) > 1 ⇐⇒ exp − (y 1 + y2 ) >
2 σ12 σ02 σ12

1 1 1 σ 2
⇐⇒ 2 − 2 (y12 + y22 ) > ln 02
2 σ1 σ0 σ1
2
σ0 − σ1 2 2 σ 2
⇐⇒ 2 2 (y1 + y22 ) > ln 02
2σ0 σ1 σ1
2
2σ σ 2 σ 2
⇐⇒ y12 + y22 > 2 0 12 ln 02 .
σ0 − σ1 σ1
The ML/MAP compares Y12 + Y22 to a threshold.

To implement it, one need not observe Y1 and Y2 directly; it
suffices to observe
T , Y12 + Y22 .
c
• Being the result of processing (Y1 , Y2 ) with respect to H, no

guess based on T can outperform an optimal guess based on
(Y1 , Y2 ).
• In this example, even though pre-processing the observations
to produce T = Y12 + Y22 is not reversible, basing one’s
decision on T incurs no loss in optimality.
This is all because LR(y1 , y2 ) is computable from y12 + y22 . In this
sense T = Y12 + Y22 forms a sufficient statistic for guessing H from
(Y1 , Y2 ).
c
Sufficient Statistics—Informal Definition
0
A mapping T : Rd → Rd forms a sufficient statistic for the
densities fY|H=0 (·) and fY|H=1 (·) if the likelihood-ratio LR(yobs )
can be computed from T (yobs ) for every yobs in Rd .
Has nothing to do with the prior!
For technical reasons

• we require that LR(yobs ) be computable from T (yobs ) only
when fY|H=0 (yobs ) and fY|H=1 (yobs ) are not both zero;
• and we allow some Y0 ⊂ Rd of Lebesgue measure zero
containing observations where the computation of LR(yobs )
from T (yobs ) may fail.
c
Sufficient Statistics—Formal Definition
0
A mapping T : Rd → Rd forms a sufficient statistic for the
densities fY|H=0 (·) and fY|H=1 (·) on Rd if it is Borel measurable
and if there exists a set Y0 ⊂ Rd of Lebesgue measure zero and a
0
Borel measurable function ζ : Rd → [0, ∞] such that for all
yobs ∈ Rd satisfying
yobs ∈
/ Y0 and fY|H=0 (yobs ) + fY|H=1 (yobs ) > 0.
we have
fY|H=0 (yobs )
= ζ T (yobs ) ,
fY|H=1 (yobs )
where on the LHS of the above we define a/0 to be +∞ whenever
a > 0.
c
Basing the Decision on a Sufficient Statistic Is Optimal
0
If T : Rd → Rd is a sufficient statistic for the densities fY|H=0 (·)
and fY|H=1 (·), then, for every prior of H, there exists a decision
rule that guesses H based on T (Y) and which is as good as any
optimal guessing rule based on Y.
Indeed, the rule


 0 if ζ T (yobs ) > ππ10 ,

φT T (yobs ) = 1 if ζ T (yobs ) < ππ10 ,


U {0, 1} if ζ T (yobs ) = ππ10
has the same performance as the MAP rule based on Y.
c
Computability of the a Posteriori Distribution—Informal
0
T : Rd → Rd is a sufficient statistic for fY|H=0 (·) and fY|H=1 (·)
iff
for every prior (π0 , π1 ) there exist functions

t 7→ ψm π0 , π1 , t , m = 0, 1,
such that the vector

T
ψ0 π0 , π1 , T (yobs ) , ψ1 π0 , π1 , T (yobs )
equals
T
Pr[H = 0|Y = yobs ], Pr[H = 1| Y = yobs ] ,
where the above a posteriori distributions are computed for H of

prior (π0 , π1 ) and for the conditional densities fY|H=0 (·) and
fY|H=1 (·).
c
Informal Proof (1)
Suppose LR(yobs ) = ζ T (yobs ) for every yobs , and let (π0 , π1 ) be
any (nondegenerate) prior. Then,
Pr[H = 0|Y = yobs ]

Pr[H = 0] fY|H=0 (yobs )
=
Pr[H = 0] fY|H=0 (yobs ) + Pr[H = 1] fY|H=1 (yobs )
π0 LR(yobs )
=
π0 LR(yobs ) + π1

π0 ζ T (yobs )
= .
π0 ζ T (yobs ) + π1
And
Pr[H = 1|Y = yobs ] = 1 − Pr[H = 0|Y = yobs ]

π0 ζ T (yobs )
=1− .
π0 ζ T (yobs ) + π1
c
Informal Proof (2)
Suppose the a posteriori distribution is computable from
π0 , π1 , T (yobs ) for every prior and, a fortiori, the uniform prior.
Pr[H = 0|Y = yobs ]

=
Pr[H = 1|Y = yobs ]
=
Substituting the uniform prior and dividing the equations,
Pr[H = 0|Y = yobs ]

= LR(yobs ) (uniform prior).
Pr[H = 1|Y = yobs ]
So if the LHS is computable from T (yobs ) then so is the RHS.

c
After Identifying a Sufficient Statistic (1)
Method 1: Ignore this fact and use the MAP rule

 π1

0 if LR(yobs ) > π0 ,
φMAP (yobs ) = 1 π1
if LR(yobs ) < π0 ,

 π1
U {0, 1} if LR(yobs ) = π0 .
(Because T (Y) is a sufficient, LR(yobs ) will be computable from

T (yobs ), but who cares?)
c
After Identifying a Sufficient Statistic (2)
Method 2: Use the MAP rule for guessing H based on the new
d0 -dimensional observations tobs = T (yobs ). You’ll need the
conditional densities of T = T (Y) given H.
(
0 if π0 fT|H=0 T (yobs ) > π1 fT|H=1 T (yobs ) ,
φGuess (T (yobs )) =
1 if π0 fT|H=0 T (yobs ) < π1 fT|H=1 T (yobs ) ,
with ties being resolved at random.
c
Applying Method 2 in the Example
The squares of two IID centered Gaussians sum to an exponential
1 t
fT |H=0 (t) = 2 exp − 2 , t ≥ 0,
2σ0 2σ
1 t0
fT |H=1 (t) = 2 exp − 2 , t ≥ 0.
2σ1 2σ1
So,

fT |H=0 (t) σ12 1 1
= 2 exp t − , t ≥ 0,
fT |H=1 (t) σ0 2σ12 2σ02
fT |H=0 (t) σ2 1 1
ln = ln 12 + t − , t ≥ 0.
fT |H=1 (t) σ0 2σ12 2σ02
We thus guess “H = 0” if the log likelihood-ratio is positive,
2σ 2 σ 2 σ2
t ≥ 2 0 12 ln 02
σ0 − σ1 σ1
2σ 2 σ 2 σ2
⇐⇒ y12 + y22 ≥ 2 0 12 ln 02 .
σ0 − σ1 σ1
c
Multi-Dimensional Binary Gaussian Hypothesis Testing
H is of nondegenerate prior (π0 , π1 ). The observable is
T
Y = Y (1) , . . . , Y (J)
(j)
H = 0 : Y (j) = s0 + Z (j) , j = 1, 2, . . . , J,
(j) (j) (j)
H=1:Y = s1 +Z ,
j = 1, 2, . . . , J,

where Z (1) , Z (2) , . . . , Z (J)
are IID N and 0, σ 2
(1)
(J) T (1) (J) T
s0 = s0 , . . . , s0 , s1 = s1 , . . . , s1
are deterministic. The Euclidean inner product and norm in RJ are
J
X
hu, viE , u(j) v (j) ,
j=1
v
q u J
uX 2
kuk , hu, uiE = t u(j) .
j=1
c
The Likelihood Function
fY|H=0 (y)
LR(y) =
fY|H=1 (y)
(j) 2
!
QJ y (j) −s0
√ 1 exp −
j=1 2πσ 2 2σ 2
= (j) 2
!
QJ y (j) −s1
√ 1 exp −
j=1 2πσ 2 2σ 2
J (j) 2 (j) 2
!
Y y (j) − s0 y (j) − s1
= exp − + , y ∈ RJ .
2σ 2 2σ 2
j=1
c
The Log-Likelihood Function
1 X (j)
J
(j) 2 (j) (j) 2
LLR(y) = y − s1 − y − s0
2σ 2
j=1

1 ks1 k2 − ks0 k2
= hy, s0 − s1 iE +
σ2 2

1 hs0 , s0 − s1 iE + hs1 , s0 − s1 iE
= hy, s0 − s1 iE −
σ2 2
 D E D E 
s , s0 −s1
+ s , s0 −s1
ks0 − s1 k  s0 − s1 0 ks0 −s1 k E 1 ks0 −s1 k E

= 2
y, −
σ ks0 − s1 k E 2
ks0 − s1 k 1
= hy, φi E − hs 0 , φi E + hs1 , φi E , y ∈ RJ ,
σ2 2
where
s0 − s1
φ=
ks0 − s1 k
is a unit-norm vector pointing from s1 to s0 .
c
Decision Rule
An optimal rule is to guess “H = 0” when LLR(y) ≥ ln ππ10 :
hs0 , φiE + hs1 , φiE σ2 π1

Guess “H = 0” if hy, φiE ≥ + ln .
2 ks0 − s1 k π0
guess 0 guess 0 guess 0

s0 guess 1 s0 guess 1 s0 guess 1
φ φ φ
s1 s1 s1
π0 < π1 π0 = π1 π0 > π1
c
y
φ
s0
s1
φ
The projection of Y onto φ = (s0 − s1 )/ks0 − s1 k forms a

sufficient statistic for guessing H based on Y.
c
Error Probability Lemma
Suppose s0 , s1 ∈ RJ are deterministic and different. Let

Y = s0 + Z, Z1 , . . . , Zd ∼ IID N 0, σ 2 .
Then,
h i
ks0 − s1 k
Pr kY − s1 k ≤ kY − s0 k = Q .
2σ
• (s0 − s1 )/2 is half the Euclidean distance.

• (s0 − s1 )/(2σ) is half the distance measured in standard
deviations of the noise.
• For a more general result see Lemma 20.14.1.
c
Error Probability Lemma
h i
Pr kY − s1 k ≤ kY − s0 k
h i
= Pr kZ + s0 − s1 k ≤ kZk
h i
= Pr kZ + s0 − s1 k2 ≤ kZk2
h i
2
kZk
= Pr + ks0 − s1 k2 + 2 hZ, s0 − s1 iE ≤
kZk
2
h i
= Pr −2 hZ, s0 − s1 iE ≥ ks0 − s1 k2
h i
= Pr 2 hZ, s0 − s1 iE ≥ ks0 − s1 k2
and the result follows because

hZ, s0 − s1 iE ∼ N 0, ks0 − s1 k2 σ 2 .
c
Linear Combinations of Independent Gaussians
Suppose Z1 , . . . , ZJ are independent centered Gaussians

Zj ∼ N 0, σj2 , j = 1, . . . , J.
Let
α1 , . . . , αJ ∈ R
be deterministic constants. Then
J
X J
X

αj Zj ∼ N 0, σ 2 , σ2 = αj2 σj2 .
j=1 j=1
(Choose αj as the j-th component of s0 − s1 , and σj2 as σ 2 .)
c
For Our Problem
For a uniform prior,

ks0 − s1 k
Pr[error |H = 0] = Pr[error |H = 1] = Pr[error] = Q .
2σ
More generally,

ks0 − s1 k σ π0
pMAP (error|H = 0) = Q + ln .
2σ ks0 − s1 k π1

ks0 − s1 k σ π1
pMAP (error|H = 1) = Q + ln .
2σ ks0 − s1 k π0

∗ ks0 − s1 k σ π0
p (error) = π0 Q + ln
2σ ks0 − s1 k π1

ks0 − s1 k σ π1
+ π1 Q + ln .
2σ ks0 − s1 k π0
c
Random Parameter Not Observed—Nuisance Parameter
Instead of fY|H=0 (yobs ) and fY|H=1 (yobs ), we are given
fΘ (·), fY|Θ=θ,H=0 (·), fY|Θ=θ,H=1 (·), Θ ind. of H.
Z
fY|H=0 (yobs ) = fY,Θ|H=0 (yobs , θ) dθ
Zθ
= fY|Θ=θ,H=0 (yobs ) fΘ|H=0 (θ) dθ
Zθ
= fY|Θ=θ,H=0 (yobs ) fΘ (θ) dθ.
θ
(Think about conditioning on H = 0 as specifying the law.)
Z
fY|H=1 (yobs ) = fY|Θ=θ,H=1 (yobs ) fΘ (θ) dθ.
θ
R
fY|Θ=θ,H=0 (yobs ) fΘ (θ) dθ
LR(yobs ) = Rθ .
θ fY|Θ=θ,H=1 (yobs ) fΘ (θ) dθ
c
Random Parameter Observed
If Θ is observed, we merely view the observable as (Y, Θ).

LR(yobs , θobs ) = .
The twist is that, because Θ is independent of H,
fY,Θ|H=0 (yobs , θobs ) = fΘ|H=0 (θobs )fY|Θ=θobs ,H=0 (yobs )

= fΘ (θobs )fY|Θ=θobs ,H=0 (yobs ).
Likewise,
fY,Θ|H=1 (yobs , θobs ) = fΘ (θobs )fY|Θ=θobs ,H=1 (yobs ),
so
fY|H=0,Θ=θobs (yobs )
LR(yobs , θobs ) = .
fY|H=1,Θ=θobs (yobs )
c
Next Week
Multi-Hypothesis Testing (Chapter 21 & 22).
Thank you!
c
Lecture 11
Amos Lapidoth
ETH Zurich
May 9, 2017
Multi-Hypothesis Testing
c
Today
• A bit more on binary hypothesis testing.

• Multi-hypothesis testing.
c
Multiple Hypotheses
M takes value in the set M = {1, . . . , M}, where M ≥ 2
according to the prior
πm = Pr[M = m], m ∈ M.
The prior is nondegenerate if
πm > 0, m ∈ M.
The observation Y is a d-dimensional random vector. Conditional

on M = m, its density is
fY|M =m (·), m ∈ M.
A guessing rule is a mapping
φGuess : Rd → M.
After observing that Y = yobs we guess that M is φGuess (yobs ).

c
Performance
The error probability associated with φGuess (·) is

Pr φGuess (Y) 6= M .
A rule is optimal if no rule achieves a lower probability of error.

The optimal error probability
p∗ (error)
is the probability of error associated with an optimal decision rule.
c
Guessing in the Absence of Observables
• Only M deterministic decision rules: φ1 , . . . , φM , where
φm guesses “M = m”.
• The probability of success of φm is πm .

• The guessing rule “guess m̃” is optimal iff
πm̃ = max
0
πm0 .
m ∈M
• For an optimal guessing rule the probability of success is

p∗ (correct) = max
0
πm 0 ,
m ∈M
and the optimal error probability is thus

p∗ (error) = 1 − max0
π m0 .
m ∈M
c
The Joint Law of M and Y
In terms of the prior and the conditional densities

X
fY (y) = πm fY|M =m (y), y ∈ Rd .
m∈M
And, as in the binary case,

(π fY|M =m (yobs )
m
if fY (yobs ) > 0,
Pr[M = m|Y = yobs ] , fY (yobs )
1
M otherwise.
c
Guessing in the Presence of Observables
• After observing that Y = yobs , we associate with each
m ∈ M the a posteriori probability Pr[M = m|Y = yobs ].
• We pick the message of highest a posteriori probability.
• A tie occurs when more than one outcome attains the highest
a posteriori probability. Any one of the maximum-achieving
messages will do.
• We thus guess “m̃,” only if
0

Pr[M = m̃| Y = yobs ] = max 0
Pr[M = m |Y = y obs ] .
m ∈M
• For this rule

p∗ (correct|Y = yobs ) = max Pr[M = m 0
|Y = y obs ] ,
m0 ∈M

p∗ (error|Y = yobs ) = 1 − max Pr[M = m 0
| Y = y obs ] ,
m0 ∈M
Z

p∗ (error) = 1 − max
0
Pr[M = m0
| Y = y] fY (y) dy.
Rd m ∈M
c
The Main Result
Consider the set of messages of maximal a posteriori probability
M̃(yobs )
n o
, m̃ ∈ M : Pr[M = m̃ |Y = yobs ] = max Pr[M = m 0
| Y = y obs ]
m0 ∈M
n o
= m̃ ∈ M : πm̃ fY|M =m̃ (yobs ) = max
0
π 0 f
m Y|M =m 0 (y obs ) .
m ∈M
Any guessing rule φ∗Guess : Rd → M that satisfies
φ∗Guess (yobs ) ∈ M̃(yobs ), yobs ∈ Rd
is optimal.
c
Proof
Given any φGuess (·), define the disjoint sets

Dm = yobs ∈ Rd : φGuess (yobs ) = m , m ∈ M.
X Z
Pr(correct) = πm fY|M =m (y) dy
m∈M Dm
X Z
= πm fY|M =m (y) I{y ∈ Dm } dy
m∈M Rd
Z X
= πm fY|M =m (y) I{y ∈ Dm } dy
Rd m∈M
Z n o
≤ max πm fY|M =m (y) dy.
Rd m∈M
Equality is attained if

y ∈ Dm̃ =⇒ πm̃ fY|M =m̃ (y) = max
0
π 0 f
m Y|M =m 0 (y) .
m ∈M
c
Randomized Rules, the MAP, and the ML Rules
• Randomization does not help.

• The Maximum A Posteriori rule picks uniformly at random an
element of M̃(yobs ). It is optimal.
• The Maximum-Likelihood rule ignores the prior. It picks
uniformly at random an element of
n o
m̃ ∈ M : fY|M =m̃ (yobs ) = max0
fY|M =m 0 (y obs ) .
m ∈M
It is optimal when the prior is uniform.
c
Processing
Z is the result of processing Y with respect to M if
M (−
−Y(−
−Z.
If Z is the result of processing Y with respect to M , then no

decision rule based on Z can outperform an optimal decision rule
based on Y.
c
Multi-Hypothesis Testing for 2D Signals
• M is uniform over M = {1, . . . , M}.

• Y is two-dimensional of components Y (1) and Y (2) .
• Conditional on M = m, the random variables Y (1) and Y (2)

are independent with Y (1) ∼ N am , σ 2 and

Y (2) ∼ N bm , σ 2 . Here σ 2 > 0.

fY (1) ,Y (2) |M =m y (1) , y (2)

1 (y (1) − am )2 + (y (2) − bm )2
= exp − .
2πσ 2 2σ 2
c
8PSK
8PSK corresponds to M = 8 and
2πm 2πm
am = A cos , bm = A sin , m = 1, . . . , 8.
8 8
(a2 , b2 )
(a3 , b3 ) (a1 , b1 )
(a4 , b4 ) A (a8 , b8 )
(a5 , b5 ) (a7 , b7 )
(a6 , b6 )
c
The “Nearest-Neighbor” Decoding Rule
Since M is uniform, the MAP picks an element of

argmax fY (1) ,Y (2) |M =m0 (y (1) , y (2) )
m0 ∈M ( 2 2)
1 −
(y(1) −am0 ) +(y(2) −bm0 )
= argmax e 2σ 2
m0 ∈M 2πσ 2
( 2 2)
−
(y(1) −am0 ) +(y(2) −bm0 )
= argmax e 2σ 2
m0 ∈M
2 2
y (1) − am0 + y (2) − bm0
= argmax −
m0 ∈M 2σ 2
(1) 2 2
y − am0 + y (2) − bm0
= argmin
m0 ∈M 2σ 2
n 2 o
(1) 2 (2)
= argmin y − am + y − bm
0 0
m0 ∈M
n o
= argmin ky − sm0 k .
0 ∈M
m2017
c
Lecture 11, Amos Lapidoth
Nearest-Neighbor Decoding for 8PSK
y (2)
guess 1
m=1
y (1)
Observations in the shaded region lead the ML to guess “M = 1.”
c
Error Analysis for 8PSK
• By symmetry, it suffices to study pMAP (error|M = 4).

• Conditional on M = 4,
T T
Y (1) , Y (2) = (−A, 0)T + Z (1) , Z (2) ,

where Z (1) , Z (2) are IID N 0, σ 2 :

1 (z (1) )2 + (z (2) )2
fZ (1) ,Z (2) (z (1) , z (2) ) = exp − .
2πσ 2 2σ 2
• The contour lines of fY (1) ,Y (2) |M =4 (·) are circles centered
around the conditional mean (a4 , b4 ) = (−A, 0).
• We need to integrate this density over the complement of the
decoding region of 4.
c
y (2)
y (1)
Contour lines of the density fY1 ,Y2 |M =4 (·). Shaded region

corresponds to guessing “M = 4”.
c
The Union-of-Events Bound
• The probability of the union of two disjoint events is the sum
of their probabilities.
• Given two not necessarily disjoint events V and W,
V ∪ W = W ∪ (V \ W),
so
Pr(V ∪ W) = Pr(W) + Pr(V \ W).
• To study Pr(V \ W), note that
V = (V \ W) ∪ (V ∩ W).
so
Pr(V \ W) = Pr(V) − Pr(V ∩ W).
Hence,
Pr(V ∪ W) = Pr(V) + Pr(W) − Pr(V ∩ W)
≤ Pr(V) + Pr(W).
c
The Union-of-Events Bound for Finite Collections of Events
If V1 , V2 , . . . , is a finite (or countably-infinite) collection of events,
then
[ X
Pr Vj ≤ Pr(Vj ).
j j
The proof in the finite case is by induction:

[ n [n
Pr Vj = Pr V1 ∪ Vj
j=1 j=2
[
n
≤ Pr V1 + Pr Vj
j=2
n
X

≤ Pr V1 + Pr Vj
j=2
n
X
= Pr Vj .
j=1
c
Applications to Hypothesis Testing
Define for every m0 6= m the set Bm,m0 ⊂ Rd by
n o
Bm,m0 = y ∈ Rd : πm0 fY|M =m0 (y) ≥ πm fY|M =m (y) .
Note:
y ∈ Bm,m0 =⇒
6 MAP rule will guess m0 .
(A third hypothesis might be a posteriori even more likely.)
y ∈ Bm,m0 =⇒
6 the MAP will not guess m.
(There could be a tie resolved in m’s favor.)
If m was not guessed by the MAP rule, then some m0 which is not
equal to m must have had an a posteriori probability that is at
least as high as that of m
[
m was not guessed =⇒ Y∈ Bm,m0 .
m0 6=m
c
[
m was not guessed =⇒ Y∈ Bm,m0 ,
m0 6=m
implies that
h [ i
Pr m was not guessed ≤ Pr Y ∈ Bm,m0 .
m0 6=m

[
pMAP (error|M = m) ≤ Pr Y ∈
Bm,m0 M = m
m0 6=m
[ n o
= Pr ω ∈ Ω : Y(ω) ∈ Bm,m0 M = m
m0 6=m
X
≤ Pr {ω ∈ Ω : Y(ω) ∈ Bm,m0 } M = m
m0 6=m
X
= Pr Y ∈ Bm,m0 M = m
m0 6=m
X Z
= fY|M =m (y) dy.
m0 6=m Bm,m0
c
The Union-of-Events Bound in Hypothesis Testing
pMAP (error|M = m)
X
≤ Pr Y ∈ Bm,m0 M = m
m0 6=m
X Z
= fY|M =m (y) dy
m0 6=m Bm,m0
X
= Pr πm0 fY|M =m0 (Y) ≥ πm fY|M =m (Y) M = m ,
m0 6=m
where
n o
Bm,m0 = y ∈ Rd : πm0 fY|M =m0 (y) ≥ πm fY|M =m (y) .
If ties occur with probability zero, then Pr[Y ∈ Bm,m0 |M = m] is
the conditional probability of error of the MAP rule for
π πm0
m
fY|M =m (·) vs. fY|M =m0 (·) with prior , .
πm + πm0 πm + πm0
c
The Union Bound for 8-PSK—pMAP (error|M = 4)
B4,3
3 3 3
4 4 = 4
∪
5 5 5
B4,5 B4,3 ∪ B4,5
Here Bm,m0 comprises the vectors that are at least as close to

(am0 , bm0 ) as to (am , bm ):
n 2 2 2 2 o
y ∈ R2 : y (1) −am0 + y (2) −bm0 ≤ y (1) −am + y (2) −bm .
Given M = 4, an error occurs only if Y is at least as close to

(a3 , b3 ) as to (a4 , b4 ), or if it is at least as close to (a5 , b5 ) as to
(a4 , b4 ), i.e., only if Y ∈ B4,3 ∪ B4,5 .
c
The Union Bound for 8-PSK—pMAP (error|M = 4)
B4,3
3 3 3
4 4 = 4
∪
5 5 5
B4,5 B4,3 ∪ B4,5
The events Y ∈ B4,5 and Y ∈ B4,3 are not mutually exclusive, but,
pMAP (error|M = 4) ≤ Pr[Y ∈ B4,3 ∪ B4,5 | M = 4]

≤ Pr[Y ∈ B4,3 |M = 4] + Pr[Y ∈ B4,5 | M = 4].
(The first inequality is an equality because, for this problem, the

probability of a tie is zero.)
c
B4,3
3 3 3
4 4 = 4
∪
5 5 5
B4,5 B4,3 ∪ B4,5
From our analysis of multi-dimensional binary hypothesis testing

p π
(a4 − a3 )2 + (b4 − b3 )2 A

Pr Y ∈ B4,3 M = 4 = Q =Q sin
2σ σ 8
p

2
(a4 − a5 ) + (b4 − b5 ) 2 A π
Pr Y ∈ B4,5 M = 4 = Q =Q sin
2σ σ 8

A π
pMAP (error|M = 4) ≤ 2Q sin .
σ 8
By symmetry,
A π
∗
p (error) ≤ 2Q sin .
σ 8
c
Multi-Dimensional M-ary Gaussian Hypothesis Testing
• M in M = {1, . . . , M} with nondegenerate prior {πm }.

• The observation Y is a J-dimensional vector.
• Conditional on M = m, the components of Y are
(j)
independent, with Y (j) ∼ N (sm , σ 2 ), where sm ∈ RJ is
deterministic and σ 2 > 0:
J (j) 2
!
Y 1 y (j) − sm
fY|M =m (y) = √ exp − .
2πσ 2 2σ 2
j=1
c
Optimal Guessing Rule
Having observed that Y = y, the MAP rule randomly picks an
element from the set
M̃(y)
n o
= m̃ ∈ M : πm̃ fY|M =m̃ (y) = max πm0 fY|M =m0 (y)
m0 ∈M
n
o
= m̃ ∈ M : ln πm̃ fY|M =m̃ (y) = max0
ln πm fY|M =m0 (y)
0 .
m ∈M
Here
J
J 1 X (j) 2
ln πm fY|M =m (y) = ln πm − ln(2πσ 2 ) − 2 y − s(j)
m .
2 2σ
j=1
The term in red is common to all, so

( J (j) 2 )
X y (j) − sm̃
M̃(y) = argmax ln πm̃ − .
m̃∈M 2σ 2
j=1
c
Optimal Rule for a Uniform Prior
When the prior is uniform, the expression for M̃(y) simplifies:

( J (j) 2
)
X y (j) − sm̃
M̃(y) = argmax −
m̃∈M 2σ 2
j=1
( J )
X (j) 2
= argmin y (j) − sm̃
m̃∈M j=1

2
= argmin ky − sm̃ k
m̃∈M

= argmin ky − sm̃ k , M uniform.
m̃∈M
This is the “nearest-neighbor rule”. No need to know σ 2 .
c
Uniform Prior and Equi-Norm Vectors
A further simplification arises when
ks1 k = ks2 k = · · · = ksM k .
In this case the nearest-neighbor rule coincides with the ”highest
correlation rule”
( J )
X (j)
M̃(y) = argmax y (j) sm̃ .
m̃∈M j=1
Indeed, starting from the nearest-neighbor rule, we note

J
X (j) 2
ky − sm̃ k2 = y (j) − sm̃
j=1
J
X J
X J
X
2 (j) (j) 2
= y (j) −2 y (j) sm̃ + sm̃ ,
j=1 j=1 j=1
where the red term is (always) common, and the blue term by
assumption. c
Ties
If the mean vectors s1 , . . . , sM are distinct,
ksm0 − sm00 k > 0, m0 6= m00 ,
then the probability of ties is zero. That is,

• the probability of observing a vector y for which # M̃(y) > 1
is zero;
• the probability that the observable Y will be such that the
MAP will require randomization is zero;
• with probability one the observed vector y is such that there
is a unique message of highest a posteriori probability.
To prove this we show that, irrespective of m, for all m0 6= m00

Pr score of Message m0 = score of Message m00 M = m = 0.
See Proposition 21.6.2 for the details.

c
The Union Bound
pMAP (error|M = m)
X
≤ Pr πm0 fY|M =m0 (Y) ≥ πm fY|M =m (Y) M = m
m0 6=m
X
ksm − sm0 k σ πm
= Q + ln .
2σ ksm − sm0 k πm0
m0 6=m
Thus,
X X
∗ ksm − sm0 k σ πm
p (error) ≤ πm Q + ln .
m∈M m0 6=m
For a uniform prior these simplify to:
c
The Union Bound for the Gaussian Problem with a
Uniform Prior
X
ksm − sm0 k
pMAP (error|M = m) ≤ Q , M uniform,
2σ
m0 6=m

∗ 1 X X ksm − sm0 k
p (error) ≤ Q , M uniform.
M 0
2σ
m∈M m 6=m
c
A Lower Bound
If the score of Message m0 is higher than that of Message m, then
the MAP decoder will surely not guess “M = m.”
(Whether it will guess “M = m0 ” depends on the other scores.)
Thus, for each message m0 6= m
pMAP (error|M = m)

≥ Pr πm0 fY|M =m0 (Y) > πm fY|M =m (Y) M = m

=Q + ln .
To tighten the bound we maximize over m0 :

pMAP (error|M = m) ≥ max Q + ln
m0 ∈M\{m} 2σ ksm − sm0 k πm0
X
∗ ksm − sm0 k σ πm
p (error) ≥ πm max Q + ln .
m0 ∈M\{m} 2σ ksm − sm0 k πm0
m∈M
This simplifies for a uniform prior:
c
A Lower Bound for the Gaussian Problem with a
Uniform Prior
When the prior is uniform,

ksm − sm0 k
pMAP (error|M = m) ≥ max Q .
m0 ∈M\{m} 2σ
Since Q(·) is strictly decreasing,

ksm − sm0 k
pMAP (error|M = m) ≥ Q min , M uniform,
m0 ∈M\{m} 2σ

∗ 1 X ksm − sm0 k
p (error) ≥ Q min , M uniform.
M m0 ∈M\{m} 2σ
m∈M

∗ ksm0 − sm00 k
p (error) ≥ Q min , M uniform.
m0 6=m00 2σ
The minimum distance!
c
Sufficient Statistic—Informal Definition
22.2 Definition and Main Consequence 429
� � � � ��T
yobs T (yobs ) ψ1 {πm } , T (yobs ) , . . . , ψM {πm } , T (yobs )
T (·) Black Box
�
πm }m∈M
T (·) forms
Figure a sufficient
22.1: A black box statistic for fed
that when guessing
any priorM{πbased
m } and on obs )if (but
T (yY there
not the observation yobs directly) produces a probability vector that is equal to
exists
(Pr[Ma black box that, when fed T (y
= 1| Y = yobs ], . . . , Pr[M = M | Y = yobs ) (but not
obs]) whenever both
T y obs ) and
the conditionany
�
{πmπm} fproduces
priorm∈M Y |M =m (yobsthe
) > 0aand posteriori distribution
the condition yobs ∈/ Y0 areofsatisfied.
M given
Y = yobs .
is a probability vector and such that this probability vector is equal to
� �T
Pr[M = 1 | Y = yobs ], . . . , Pr[M = M | Y = yobs ] (22.7)
c
Technicalities
While the black box must always produce a probability vector, we

only require that this vector be the a posteriori distribution of M
given Y = yobs for observations yobs that satisfy
X
πm fY|M =m (yobs ) > 0
m∈M
and that lie outside some prespecified null set Y0 ⊂ Rd .
The exception set Y0 is not allowed to depend on {πm }.
Pbox need not indicate whether yobs is in Y0 and/or

The black
whether m∈M πm fY|M =m (yobs ) > 0.
c
Sufficient Statistic—Formal Definition
0
T: Rd→ Rd forms a sufficient statistic for the densities fY|M =1 , . . . ,
fY|M =M on Rd if it is Borel measurable and if for some Y0 ⊂ Rd
of Lebesgue measure zero we have that for every prior {πm } there
0
exist M Borel measurable functions from Rd to [0, 1]

T (yobs ) 7→ ψm {πm }, T (yobs ) , m ∈ M,
such that the vector
T
ψ1 {πm }, T (yobs ) , . . . , ψM {πm }, T (yobs )
is a probability vector and such that this probability vector equals
T
Pr[M = 1|Y = yobs ], . . . , Pr[M = M| Y = yobs ]
whenever both the condition yobs ∈
/ Y0 and the condition
M
X
πm fY|M =m (yobs ) > 0
m=1
are satisfied.
c
Guessing Based on a Sufficient Statistic Is Optimal
The MAP is optimal, and it is computable from T (yobs ).
(Ignoring the technicalities.)
c
Sufficiency Implies Pairwise Sufficiency

 π f (y)
 m Y|M =m if fY (y) > 0,
Pr[M = m|Y = y] , fY (y) m ∈ M,
1

otherwise, y ∈ Rd ,
M
and ignore the second case. For a uniform prior
−1
M fY|M =m0 (y)

Pr[M = m0 |Y = y] = P

−1
M fY|M =m (y)

m∈M
−1
M fY|M =m00 (y)

Pr[M = m00 |Y = y] = P

−1
.
M f (y)

m∈M Y|M =m
Dividing
Pr[M = m0 |Y = y] fY|M =m0 (y)
00
= ,
Pr[M = m | Y = y] fY|M =m00 (y)
so if the LHS is computable from T (y) then so is the RHS.
c
Pairwise Sufficiency Implies Sufficiency
Consider M densities {fY|M =m (·)}m∈M on Rd , and assume that
0
T : Rd → Rd forms a sufficient statistic for every pair of densities
fY|M =m0 (·), fY|M =m00 (·), where m0 6= m00 are both in M. Then
T (·) is a sufficient statistic for the M densities {fY|M =m (·)}m∈M .
πm fY|M =m (y)
Pr[M = m|Y = y] = , fY (y) > 0, m ∈ M, y ∈ Rd .
fY (y)
Consequently, for any prior,
πm fY|M =m (y)
Pr[M = m |Y = y] =
fY (y)
πm fY|M =m (y)
=P
m0 ∈M πm0 fY|M =m0 (y)
πm
=P fY|M =m0 (y)
m0 ∈M πm0 f (y)
Y|M =m
c
A Markov Condition
0
A Borel measurable function T : Rd → Rd forms a sufficient
statistic for the M densities {fY|M =m (·)}m∈M if, and only if, for
any prior {πm }
M (− −T (Y)(− −Y.
Intuition: Sufficiency is tantamount to the a posteriori distribution

of M given Y being computable from T (·).
This is equivalent
to the conditional distribution of M given
Y, T (Y) being the same as given T (Y).
c
Simulating the Observables
• The condition
M (−
−T (Y)(−
−Y

is equivalent to the distribution of Y given M, T (Y) being
the same as given T (Y).
• Stated differently, the distribution of Y given T (Y) under
fY|M =m does not depend on m.
• If we generate Ỹ from T (Y) according to this conditional
law, then Ỹ will be of the same conditional law given M = m
as Y.
• We could then feed Ỹ to decoder that was designed for Y
and get the same performance!
c
Guessing Based on the Simulated Observables
438 Sufficient Statistics
� �
T (yobs ) Ỹ T (yobs ), Θ Guess
PY|T (Y)=T (yobs ) Given Rule for Guessing
M based on Y
Random Number
Generator
Figure 22.2: If T (Y) forms a sufficient statistic for guessing M based on Y, then—
even though Y cannot typically be recovered from T (Y)—the performance of any
given detector based on Y can be achieved based on T (Y) and a local random
number generator as follows. Using T (yobs ) and local randomness Θ, one produces
a Ỹ whose conditional law given M = m is the same as that of Y, for each m ∈ M.
One then feeds Ỹ to the given detector.
c
The Example Revisited (1)
Let H have a uniform prior. We observe (Y1 , Y2 ). Conditional on
H = 0, they areIID N 0, σ02 , whereas conditional on H = 1 they
are IID N 0, σ12 , where σ0 > σ1 > 0. Thus,
1 1
2 2
fY1 ,Y2 |H=0 (y1 , y2 ) = exp − (y 1 + y2 ) , y1 , y2 ∈ R,
2πσ02 2σ02
1 1
2 2
fY1 ,Y2 |H=1 (y1 , y2 ) = exp − (y 1 + y2 ) , y1 , y2 ∈ R.
2πσ12 2σ12
fY1 ,Y2 |H=0 (y1 , y2 )

LR(y1 , y2 ) =
fY1 ,Y2 |H=1 (y1 , y2 )

1 1 2 + y2)
2πσ0 2 exp − 2σ0 2 (y1 2
=
1 1 2 + y2)
2πσ12
exp − 2σ 2 1
(y 2
2
1
σ1 1 1 1 2 2
= 2 exp − (y1 + y2 ) , y1 , y2 ∈ R.
σ0 2 σ12 σ02
c

1 1 1 2 2 σ02
LR(y1 , y2 ) > 1 ⇐⇒ exp − (y 1 + y 2 ) >
2 σ12 σ02 σ12

1 1 1 σ 2
⇐⇒ 2 − 2 (y12 + y22 ) > ln 02
2 σ1 σ0 σ1
2
σ0 − σ1 2 2 σ 2
⇐⇒ 2 2 (y1 + y22 ) > ln 02
2σ0 σ1 σ1
2
2σ σ 2 σ 2
⇐⇒ y12 + y22 > 2 0 12 ln 02 .
σ0 − σ1 σ1
The ML/MAP compares Y12 + Y22 to a threshold.

To implement it, one need not observe Y1 and Y2 directly; it
suffices to observe
T , Y12 + Y22 .
c
• Being the result of processing (Y1 , Y2 ) with respect to H, no

guess based on T can outperform an optimal guess based on
(Y1 , Y2 ).
• In this example, even though pre-processing the observations
to produce T = Y12 + Y22 is not reversible, basing one’s
decision on T incurs no loss in optimality.
This is all because LR(y1 , y2 ) is computable from y12 + y22 . In this
sense T = Y12 + Y22 forms a sufficient statistic for guessing H from
(Y1 , Y2 ).
c
The Example Revisited—Simulating the Observables
From T (y1 , y2 ) = y12 + y22 we can generate Ỹ using Θ ∼ U ([0, 1)):

p
Ỹ1 = T (Y) cos 2πΘ
p
Ỹ2 = T (Y) sin 2πΘ .

That is, after observing T (yobs ) = t, we generate Ỹ1 , Ỹ2
√
uniformly over the tuples that are at radius t from the origin.
c
Guessing Based on the Simulated Observables
438 Sufficient Statistics
� �
T (yobs ) Ỹ T (yobs ), Θ Guess
PY|T (Y)=T (yobs ) Given Rule for Guessing
M based on Y
Random Number
Generator
Figure 22.2: If T (Y) forms a sufficient statistic for guessing M based on Y, then—
even though Y cannot typically be recovered from T (Y)—the performance of any
given detector based on Y can be achieved based on T (Y) and a local random
number generator as follows. Using T (yobs ) and local randomness Θ, one produces
a Ỹ whose conditional law given M = m is the same as that of Y, for each m ∈ M.
One then feeds Ỹ to the given detector.
c
Guessing whether M Lies in a Given Subset of M
Let K ⊂ M be a nonempty strict subset of M. Let {πm } be a

prior under which Pr[M ∈ K] and Pr[M ∈/ K] are both positive:
X X
Pr[M ∈ K] = πm , Pr[M ∈/ K] = πm .
m∈K m∈K
/
The conditional densities of Y given M ∈ K and given M ∈

/ K are
1 X
fY|M ∈K (y) = πm fY|M =m (y),
Pr[M ∈ K]
m∈K
1 X
fY|M ∈K
/ (y) = πm fY|M =m (y).
Pr[M ∈/ K]
m∈K
/
c
Indeed
1
Pr[Y ∈ A|M ∈ K] = Pr[M ∈ K, Y ∈ A]
Pr[M ∈ K]
1 X
= Pr[M = m, Y ∈ A]
Pr[M ∈ K]
m∈K
1 X
= Pr[M = m] Pr[Y ∈ A |M = m]
Pr[M ∈ K]
m∈K
X Z
1
= πm fY|M =m (y) dy
Pr[M ∈ K] A
m∈K
Z !
1 X
= πm fY|M =m (y) dy.
A Pr[M ∈ K]
m∈K
| {z }
fY|M ∈K (y)
c
Sufficiency and Testing whether M is in K
0
Let T : Rd → Rd form a sufficient statistic for the M densities
{fY|M =m (·)}m∈M . Then T (·) is also sufficient for
fY|M ∈K (·), fY|M ∈K
/ (·).
1 X
fY|M ∈K (y) = πm fY|M =m (y),
Pr[M ∈ K]
m∈K
1 X
fY|M ∈K
/ (y) = πm fY|M =m (y),
Pr[M ∈/ K]
m∈K
/
so P
fY|M ∈K (y) Pr[M ∈
/ K] πm fY|M =m (y)
= Pm∈K
fY|M ∈K
/ (y) Pr[M ∈ K] / πm fY|M =m (y)
m∈K
P
Pr[M ∈
/ K] πm fY|M =m (y)/fY|M =1 (y)
= Pm∈K
Pr[M ∈ K] / πm fY|M =m (y)/fY|M =1 (y)
m∈K
and all the terms are computable from T (y) and M ’s PMF.
c
Next Week
The Multivariate Gaussian Distribution (Chapter 23).
Thank you!
c
Lecture 12
Amos Lapidoth
ETH Zurich
May 16, 2017
The Multivariate Gaussian Distribution
c
Today
• Sufficient Statistics in multi-hypothesis testing

• Gaussian Random Vectors
c
The Multivariate Gaussian Distribution (Chapter 23).
c
Definitions
• A random vector W is a standard Gaussian if its components

are IID N (0, 1).
• A random n-vector X is a centered Gaussian if there exists
some deterministic n × m matrix A such that
L
X = AW,
where W is a standard Gaussian m-vector.

• A random n-vector X is Gaussian if there exists some
deterministic n × m matrix A and some deterministic vector
µ ∈ Rn such that
L
X = AW + µ,
where W is a standard Gaussian m-vector.
c
Orthogonal Matrix—Definition
An n × n real matrix U is orthogonal if
UUT = In .
This condition is equivalent to
UT U = In .
c
Writing U in terms of its columns,
 
↑ ··· ↑
 ··· 
 
U=  1ψ · ·· ψn 
,
 ··· 
↓ ··· ↓
the condition UT U = In is
 
 T
 ↑ ··· ↑
← ψ1 →  
· · · · · · · · ·  ··· 
In =    ψ1 · · · ψn
· · · · · · · · · 


 ··· 
← ψnT →
↓ ··· ↓
 T T T

ψ1 ψ1 ψ1 ψ2 · · · ψ1 ψn
ψ T ψ1 ψ T ψ2 · · · ψ T ψn 
 2 2 2 
= . .. . ..  .
 . . . . . . 
ψnT ψ1 ψnT ψ2 · · · ψnT ψn
c
The Columns of an Orthogonal Matrix
A square real matrix is orthogonal iff its columns are orthonormal.
ψνT ψν 0 = I{ν = ν 0 }, ν, ν 0 ∈ {1, . . . , n}.

Using the equivalent condition UUT = In
A square real matrix is orthogonal iff its rows are orthonormal.
c
2 × 2 Orthogonal Matrices
The 2 × 2 orthogonal matrices are

cos θ − sin θ cos θ sin θ
, .
sin θ cos θ sin θ − cos θ
The blue corresponds to a rotation by θ and has determinant +1,

and the red to a reflection followed by a rotation

cos θ sin θ cos θ − sin θ 1 0
=
sin θ − cos θ sin θ cos θ 0 −1
and has determinant −1.
c
Eigenvectors of Symmetric Matrices (1)
• A matrix is symmetric if it equals its transpose.

• The vector ψ is an eigenvector of the matrix A corresponding
to the eigenvalue λ if it is nonzero and
Aψ = λψ.
• If A ∈ Rn×n is symmetric, then A has n (not necessarily

distinct) real eigenvalues λ1 , . . . , λn ∈ R with corresponding
orthonormal eigenvectors ψ1 , . . . , ψn ∈ Rn
ψνT ψν 0 = I{ν = ν 0 }, ν, ν 0 ∈ {1, . . . , n}.
c
   
↑ ··· ↑ ↑ ··· ↑
 ···   ··· 
   
A
 ψ1 ··· ψn  
 = Aψ1 ··· Aψn 

 ···   ··· 
↓ ··· ↓ ↓ ··· ↓
and
    
↑ ··· ↑ λ1 0 ··· 0 ↑ ··· ↑
 ···  .. ..   ··· 
  0 λ2 . .   
 ψ1 ··· ψn    = λ 1 ψ1 · · · λn ψn 
   .. .. ..   .
 ···  . . . 0  ··· 
↓ ··· ↓ 0 ··· 0 λn ↓ ··· ↓
c
The eigen-vectors/values relation is thus AU = UΛ, where
   
↑ ··· ↑ λ1 0 · · · 0
   . 
 ···   0 λ2 . . . .. 
 
U =  ψ1 · · · ψn  and Λ =  . .  .
. . .. 
 ···   . . . 0
↓ ··· ↓ 0 · · · 0 λn
Thus,
A = UΛU−1 .
The orthonormality of the eigenvectors is equivalent to U being
orthogonal. So, alternatively,
A = UΛUT .
c
Spectral Decomposition Theorem for Real Symmetric
Matrices
If A ∈ Rn×n is symmetric, then
A = UΛUT ,
where Λ ∈ Rn×n is a diagonal matrix whose diagonal elements are

the eigenvalues of A, and where U ∈ Rn×n is an orthogonal matrix
whose ν-th column is an eigenvector of A corresponding to the
eigenvalue in the ν-th position on the diagonal of Λ.
c
Positive Semidefinite Matrices
• K ∈ Rn×n is positive semidefinite or nonnegative definite
K0
if K is symmetric and
αT Kα ≥ 0, α ∈ Rn .
• K ∈ Rn×n is positive definite
K0
if K is symmetric and

αT Kα > 0, α 6= 0, α ∈ Rn .
c
Characterizing Positive Semidefinite Matrices
Let K be a real n × n matrix. Then the statement that K is positive

semidefinite is equivalent to each of the following statements:
1. K can be written in the form
K = ST S
for some S ∈ Rn×n .

2. K is symmetric and all its eigenvalues are nonnegative.
3. K can be expressed as
K = UΛUT ,
where Λ ∈ Rn×n is diagonal with nonnegative entries on the

diagonal and where U ∈ Rn×n is orthogonal.
c
Characterizing Positive Definite Matrices
Let K be a real n × n matrix. Then the statement that K is

positive definite is equivalent to each of the following statements.
1. K = ST S for some nonsingular S ∈ Rn×n .
2. K is symmetric and all its eigenvalues are positive.
3. K can be expressed as
K = UΛUT ,
where Λ ∈ Rn×n is diagonal with positive diagonal entries and

where U ∈ Rn×n is orthogonal.
c
Finding S Satisfying K = ST S
Given K 0, how can we find a matrix S satisfying K = ST S?
There are many. E.g., find matrices U and Λ as above satisfying
K = UΛUT .
Define √ 
λ1 0 ··· 0
 √ .. .. 
 0 λ2 . . 
Λ 1/2
=
 .. .. ..
.

 . . .0 
√
0 ··· 0 λn
Now choose
S = Λ1/2 UT .
Indeed, with this definition of S we have
T
ST S = Λ1/2 UT Λ1/2 UT
= UΛ1/2 Λ1/2 UT = UΛUT = K.
c
Random Vectors
• A random n-vector over the probability space (Ω, F, P ) is a

(measurable) mapping from Ω to Rn .
• It can be viewed as an array of n random variables.
• Its density is the joint density of its components.
An n × m random matrix H is an n × m array of random variables
defined over a common probability space.
c
Expectations
The expectation E[X] of a random n-vector

X = (X (1) , . . . , X (n) )T is a vector whose components are the
expectations of the corresponding components of X:
T
E[X] , E X (1) , . . . , E X (n) .
The j-th element of E[X] isthus the expectation of the j-th

component of X, namely, E X (j) . Similarly, the expectation of a
random matrix is the matrix of expectations.
c
The Covariance Matrix
The n × n covariance matrix KXX of the random n-vector X is
h i
KXX , E (X − E[X]) (X − E[X])T
  
X (1) − E X (1)
 ..  
 .  
= E
 ..
 X (1) − E X (1) · · ·
 ··· X (n) −E X (n) 

 .  
(n) (n)

X −E X
 (1) (1) (2) 
Var
X Cov X , X ··· CovX (1) , X (n)
 Cov X (2) , X (1) Var X (2) ··· Cov X (2) , X (n) 
 
= .. .. .. .. .
 . . . . 
(n) (1)
(n) (2)
(n)
Cov X , X Cov X , X ··· Var X
c
The Covariance Matrix of a Subset of the Components
The r × r covariance matrix of
(X (j1 ) , X (j2 ) , . . . , X (jr ) )T
where 1 ≤ j1 < j2 < · · · < jr ≤ n is obtained from KXX by

picking Rows and Columns j1 , . . . , jr . For example, if
 
30 31 9 7
31 39 11 13
KXX =   9 11 9 12 ,

7 13 12 26
T
then the covariance matrix of X (2) , X (4) is 39 13
13 26 .
c
Mean and Covariance under Linear Transformation
If H is a random matrix and A, B are deterministic matrices then
E[AH] = AE[H] , E[HB] = E[H] B.
The transpose operation commutes with expectation:

E HT = (E[H])T .
As to the covariance matrix,

Y = AX =⇒ KYY = A KXX AT .
Indeed,
KYY , E (Y − E[Y])(Y − E[Y])T

= E (AX − E[AX])(AX − E[AX])T

= E A(X − E[X])(A(X − E[X]))T

= E A(X − E[X])(X − E[X])T AT

= AE (X − E[X])(X − E[X])T AT
= A KXX AT .
c
Singular Covariance Matrices—Example
Let X be centered with covariance matrix
 
3 5 7
KXX = 5 9 13 .
7 13 19
As we’ll see, because the columns of KXX satisfy
     
3 5 7
− 5 + 2  9  − 13 = 0,
7 13 19
it follows that
−X (1) + 2X (2) − X (3) = 0, with probability one.
Consequently, in manipulating X we can pick the two components
X (2) , X (3) , which are of nonsingular covariance matrix 13
9 13 and
19
keep track “on the side” of the fact that X (1) is equal, with
probability one, to 2X (2) − X (3) .
c
Manipulating Singular Covariance Matrices
Let X be a centered random n-vector. Its `-th component X (`) is

a deterministic linear combination of X (`1 ) , . . . , X (`η ) iff the `-th
column of KXX is a linear combination of Columns `1 , . . . , `η .
(Proposition 23.4.1.)
Let X be a centered n-vector. Then:

• KXX is singular iff a component of X is a linear combination
of the other components.
• If Columns `1 , . . . , `d of KXX form a basis for the subspace of
Rn spanned by the columns of KXX , then every component of
X can be written as a linear combination of X (`1 ) , . . . , X (`d ) ,
T
and X (`1 ) , . . . , X (`d ) has a nonsingular covariance matrix.
c
The Characteristic Function
If X is a random n-vector, then its characteristic function ΦX (·) is

a mapping from Rn to C that maps each vector
$ = ($(1) , . . . , $(n) )T in Rn to ΦX ($), where
h T i
ΦX ($) , E ei$ X
Xn
(`) (`)
= E exp i $ X , $ ∈ Rn .
`=1
If X has the density fX (·), then

Z ∞ Z ∞ Pn (`) (`)
ΦX ($) = ··· fX (x) ei `=1 $ x dx(1) · · · dx(n) .
−∞ −∞
c
Random Vectors of Identical Characteristic Functions
Have Identical Laws
Two random n-vectors are of equal distribution iff they have

identical characteristic functions:

L
X = Y ⇐⇒ ΦX ($) = ΦY ($), $ ∈ Rn .
c
Establishing Independence via the Characteristic Function
X and Y are independent iff
h i
E ei($1 X+$2 Y ) = E ei$1 X E ei$2 Y , $1 , $2 ∈ R. (16)
• Independence implies (16), because if X & Y are independent

then so are ei$1 X & ei$2 Y , and hence
h i
E ei($1 X+$2 Y ) = E ei$1 X ei$2 Y = E ei$1 X E ei$2 Y .
• To prove the reverse, let X 0 and Y 0 be independent with

L L
X 0 = X and Y 0 = Y . The c.f. of (X 0 , Y 0 ) is thus

($1 , $2 ) 7→ E ei$1 X E ei$2 Y .
If (16) holds, then (X, Y ) and (X 0 , Y 0 ) have identical
characteristic functions and hence identical laws. Hence, like
(X 0 , Y 0 ), also the vector (X, Y ) has independent components.
c
A Standard Gaussian Vector
W is a standard Gaussian if its components are IID N (0, 1):

n 2 !
Y 1 w(`)
fW (w) = √ exp −
2π 2
`=1
n
1 1 X (`) 2
= exp − w
(2π)n/2 2
`=1
− 21 kwk2
= (2π)−n/2 e , w ∈ Rn .
A standard Gaussian random variable can be viewed as a standard

Gaussian 1-vector. Also,
E[W] = 0, and KWW = In .
c
Gaussian Random Vectors
A random n-vector X is Gaussian if for some positive integer m
there exists an n × m matrix A; a standard Gaussian random
m-vector W; and a deterministic vector µ ∈ Rn such that
L
X = AW + µ.
Computing the expectation and covariance of both sides, we obtain

L
X = AW + µ and W standard =⇒ E[X] = µ and KXX = AAT .
The law of X does not determine A. Not even the number of its
columns m. It only determines AAT .
Every positive semidefinite matrix is the covariance matrix of some
centered Gaussian random vector.
c
Examples and Basic Properties

1. Every N µ, σ 2 random variable, when viewed as a random
1-vector, is Gaussian.
Such a random variable has the same law as σW +µ, when
W is a standard univariate Gaussian.
2. Every deterministic vector is a Gaussian vector.
Choose the matrix A as the all-zero matrix 0.
3. If the components of X are independent univariate Gaussians
(not necessarily of equal variance), then X is a Gaussian
vector.
Choose A to be an appropriate diagonal matrix.
c
The Definition of Independent Vectors
The random vectors

T T
X = X (1) , . . . , X (nx ) and Y = Y (1) , . . . , Y (ny )
are independent if, for every choice of ξ1 , . . . , ξnx ∈ R and

η1 , . . . , ηny ∈ R,
h i
Pr X (1) ≤ ξ1 , . . . , X (nx ) ≤ ξnx , Y (1) ≤ η1 , . . . , Y (ny ) ≤ ηny
h i h i
= Pr X (1) ≤ ξ1 , . . . , X (nx ) ≤ ξnx Pr Y (1) ≤ η1 , . . . , Y (ny ) ≤ ηny .
c
Stacking Independent Gaussians Yields a Gaussian
L
Let A1 ∈ Rn1 ×m1 and µ1 ∈ Rn1 be such that X1 = A1 W1 + µ1 ,
where W1 is a standard Gaussian m1 -vector. Similarly, let
A2 ∈ Rn2 ×m2 and µ2 ∈ Rn2 represent the n2 -vector X2 . Let W1
& W2 be independent standard Gaussians.

A1 0 W1 µ1 A1 W1 + µ1
+ =
0 A2 W2 µ2 A2 W2 + µ2
| {z } | {z }
A µ

L X1
= ,
X2
L
where we have used that if X1 & X2 are independent, X1 = X01 ,
L
X2 = X02 , and X01 & X02 are independent, then
0
X1 L X1
= .
X2 X02
c
An Affine Transformation of a Gaussian Is a Gaussian
Let X be a Gaussian n-vector. If C ∈ Rν×n and if d ∈ Rν , then

the random ν-vector CX + d is Gaussian.
L
Indeed, if X = AW + µ, then
L
CX + d = C(AW + µ) + d
= (CA)W + (Cµ + d),
so CX + d is Gaussian.
c
Permuting and Selecting Components
Permuting the components of a Gaussian vector results in a
Gaussian vector. Hence we speak of jointly Gaussian without
specifying the order.
Choose C as a permutation matrix, e.g.,
 (3)     (1) 
X 0 0 1 X
X (1)  = 1 0 0 X (2)  .
X (2) 0 1 0 X (3)
Constructing a random p-vector from a Gaussian n-vector by
picking p of its components (allowing for repetition) yields a
Gaussian vector.
Picking is also an affine transformation, e.g.,
 
(3) X (1)
X 0 0 1  (2) 
= X .
X (1) 1 0 0
X (3)
c
Every Component of a Gaussian Vector is a Gaussian RV
• Picking a component of a Gaussian vector yields a Gaussian

1-vector.
• We need to show that the sole component of a Gaussian
1-vector is a Gaussian RV.
• Let X be such a 1-vector and let it be represented by the row
matrix A and the scalar µ, so
m
X
L
X= a(1,`) W (`) + µ.
`=1
• The RHS is Gaussian because a linear combination of the

independent univariate Gaussians W (1) , . . . , W (m) is Gaussian,
and adding a constant to a Gaussian results in a Gaussian.
c
The Mean and Covariance Determine the
Law of a Gaussian
We show that if X is Gaussian of mean µ and covariance KXX ,

then
1 T T
ΦX ($) = e− 2 $ KXX $+i$ µ , $ ∈ Rn .
The c.f. is thus fully specified by the mean vector and the
covariance matrix of X, and consequently so is the distribution.
c
Computing the Characteristic Function of
a Gaussian Vector
• We compute ΦX (·) when X is a Gaussian n-vector.
T
• We need to compute E ei$ X for every $ ∈ Rn .
• $ T X is a Gaussian 1-vector, whose sole component is thus a
Gaussian RV. Its mean is $ T µ and its variance is $ T KXX $

$ T X ∼ N $ T µ, $ T KXX $ , $ ∈ Rn .
• From the c.f. of the univariate Gaussian distribution (with the

substitution $ T µ for µ, the substitution $ T KXX $ for σ 2 ,
and the substitution 1 for $), we obtain
T 1 T T
E ei$ X = e− 2 $ KXX $+i$ µ , $ ∈ Rn .
c
There Is only One Gaussian Distribution of Given Mean
and Covariance
• Every positive semidefinite matrix is the covariance matrix of
some centered Gaussian random vector.
• The law of a Gaussian is determined by the mean and
covariance.
For every µ ∈ Rn and every positive semidefinite matrix K ∈ Rn×n
there exists one, and only one, Gaussian distribution of mean µ
and covariance matrix K. We denote it
N (µ, K) .
1 T T

X ∼ N (µ, K) =⇒ ΦX ($) = e− 2 $ K$+i$ µ , $ ∈ Rn .
c
Jointly Gaussian Vectors
Two random vectors are said to be jointly Gaussian if the vector

that results when one is stacked on top of the other is Gaussian.
c
Independence between Jointly Gaussian Vectors
Suppose that X and Y are jointly Gaussian. Then they are
independent iff they are uncorrelated.
• Independence always implies uncorrelatedness.
• Suppose now that X and Y are centered, jointly Gaussian,
and uncorrelated. Let X0 and Y0 be independent random
L L
vectors such that X0 = X and Y0 = Y.
0
X KXX 0 X
0 is Gaussian of covariance , like !
Y 0 KYY Y
• The two are thus centered Gaussians of identical covariances,

and hence of identical laws.
• Since X0 & Y0 are independent, so must hence also X & Y
be.
c
More Generally
If the components of a Gaussian vector are uncorrelated, then they

are independent.
c
Pairwise Independence
The RVs X1 , . . . , Xn are pairwise independent if for each pair of

distinct indices ν 0 , ν 00 ∈ {1, . . . , n} and all ξν 0 , ξν 00 ∈ R

Pr Xν 0 ≤ ξν 0 , Xν 00 ≤ ξν 00 = Pr Xν 0 ≤ ξν 0 Pr[Xν 00 ≤ ξν 00 ].
The RVs X1 , . . . , Xn are independent if for all ξ1 , . . . , ξn ∈ R

n
Y
Pr Xj ≤ ξj , for all j ∈ {1, . . . , n} = Pr Xj ≤ ξj .
j=1
Independence implies pairwise independence, but the two are not

equivalent. However,
c
Pairwise Independence of Jointly Gaussians
If the components of a Gaussian random vector are pairwise

independent, then they are independent.
Pairwise independence implies a diagonal covariance matrix.
c
The Matrix A Can be Chosen Square
If X is a centered Gaussian n-vector, then there exists a
L
deterministic square n × n matrix A such that X = AW, where
W is a standard Gaussian n-vector.
• Being a covariance matrix, KXX must be positive semidefinite.

• There thus exists some square S ∈ Rn×n such that
KXX = ST S.
• Consider now the centered Gaussian ST W, where W is a

standard Gaussian n-vector.
• Its covariance is ST S, which equals KXX .
• The Gaussian vectors ST W and X are both centered and have
identical covariance matrices. They are thus of equal law.
c
A Canonical Representation of a Centered Gaussian
We can generate any Gaussian by stretching a standard Gaussian
and rotating the result:
Let X be a centered Gaussian n-vector. Then
√ 
λ1 W (1)
L  .. 
X = UΛ1/2 W = U  . ,
√ (n)
λn W
where W is a standard Gaussian n-vector; U ∈ Rn×n is

orthogonal; Λ ∈ Rn×n is diagonal; the diagonal elements of Λ are
the eigenvalues of KXX ; and the j-th column of U is an
eigenvector corresponding to the eigenvalue of KXX that is equal
to the j-th diagonal element of Λ.
Proof: Choose U and Λ as in the spectral representation of KXX ;
define S = Λ1/2 UT ; and verify that KXX = ST S.
c
Transforming a Gaussian to a Standard Gaussian

If X ∼ N µ, σ 2 with σ 6= 0, then (X − µ)/σ is standard. What
about vectors?
Suppose X ∼ N (µ, K), where µ ∈ Rn and K0. Let Λ and U be
as in the spectral representation of K. Then,
Λ−1/2 UT (X − µ) ∼ N (0, In ) ,
where Λ−1/2 is the diagonal matrix whose diagonal entries are the
reciprocals of the square roots of the diagonal elements of Λ.
Proof: Being the result of linearly transforming X, the vector is

Gaussian. Now check that its covariance is In .
c
The Density of a Gaussian Vector (1)
L
If X ∼ N (0, K), then X = BW, where
B = UΛ1/2 ,
and U and Λ are as in the spectral representation.

fW B−1 x
fX (x) = .
|det(B)|
Since BBT = K,
p
| det(B)| = det(B) det(B)
q
= det(B) det(BT )
q
= det(BBT )
p
= det(K).
L
We next use the density of the standard Gaussian and X = BW:
c

fW B−1 x
fX (x) =
|det(B)|
T
exp − 12 B−1 x B−1 x
=
(2π)n/2 |det(B)|
T −1
exp − 12 xT B−1 B x
=
(2π)n/2 |det(B)|
−1
exp − 12 xT BBT x
=
(2π)n/2 |det(B)|

exp − 12 xT K−1 x
=
(2π)n/2 |det(B)|

= p .
(2π)n/2 det(K)
Thus,
c

fX (x) = p , x ∈ Rn .
(2π)n det(K)
If X ∼ N (µ, K) where K 0, then

exp − 12 (x − µ)T K−1 (x − µ)
fX (x) = p , x ∈ Rn .
(2π)n det(K)
c
Linear Functionals of Gaussian Vectors
• A linear functional on Rn is a linear mapping from Rn to R.
• All linear functionals are of the form
x 7→ αT x.
(The j-th component of α is the result of applying the linear

functional to the vector ej .)
If X is a Gaussian n-vector and α ∈ Rn , then αT X is a Gaussian

RV.
αT X is a random 1-vector (because it is the result of lin-
early transforming X) and all the components of a Gaus-
sian vector are Gaussian RVs.
A random vector X is Gaussian iff every linear functional of X has

a univariate Gaussian distribution.
c
Proof
• To prove the remaining direction, we compute ΦX ($).

• For every $ ∈ Rn the mapping x 7→ $ T x is a linear
functional. Consequently, our assumption that the result of
the application of every linear functional to X has a univariate
Gaussian distribution implies

$ T X ∼ N $ T µ, $ T KXX $ , $ ∈ Rn .
• Using the c.f. of a univariate Gaussian, we compute

T 1 T T
E ei$ X = e− 2 $ KXX $+i$ µ , $ ∈ Rn .
• The c.f. of X is thus that of a Gaussian, so X is Gaussian.
c
Next Week
Continuous-Time Stochastic Processes and White Noise

(Chapter 25).
Thank you!
c
Lecture 13
Amos Lapidoth
ETH Zurich
May 23, 2017
Continuous-Time Stochastic Processes

and White Noise
c
Today
Continuous-Time Stochastic Processes:

• Definition
• FDDs
• Stationarity
• Gaussian SPs
• Linear functionals of Gaussian SP
• White noise w.r.t. some bandwidth
• Linear functionals of white noise
• Projecting white noise onto a finite-dimensional subspace
c
Notation
A continuous-time stochastic process X(t), t ∈ R is a family of
random variables that are defined on a common probability space
(Ω, F, P ) and that are indexed by the reals.
X : Ω × R → R, (ω, t) 7→ X(ω, t).
• If t ∈ R is fixed, then X(t), or ω 7→ X(ω, t), or X(·, t) is the

time-t sample of X(t), t ∈ R or the the state at time t.
• If ω ∈ Ω is fixed, then X(ω, ·) or t 7→ X(ω, t) is trajectory,
sample-path, path, sample-function, or realization.
ω 7→ X(ω, t) time-t sample for a fixed t ∈ R (random variable)

t 7→ X(ω, t) trajectory for a fixed ω ∈ Ω (function of time)
c
The Finite-Dimensional Distributions

The FDDs of a continuous-time SP X(t) is the collection of all
joint distributions of

X(t1 ), . . . , X(tn )
where
• n can be any positive integer and
• t1 , . . . , tn ∈ R are arbitrary epochs.

To specify the FDDs of X(t) we must specify for every n ∈ N
and for every choice of the epochs t1 , . . . , tn ∈ R the distribution of

X(t1 ), . . . , X(tn ) .
c
Do the FDDs Tell Us Everything about a SP?
What is the probability that the sample-path is continuous?

n o
Pr ω ∈ Ω : X(ω, ·) is continuous =?
This cannot be answered based on the FDDs alone!

The σ-algebra generated by X(t) is the set of events
whose
probability can be computed from the FDDs of X(t) using only
the axioms of probability.
c
Independent Stochastic Processes

X(t) and Y (t) are independent stochastic processes if for
every n ∈ N and any choice of the epochs t1 , . . . , tn ∈ R,

X(t1 ), . . . , X(tn ) & Y (t1 ), . . . , Y (tn ) are independent.
c
Gaussian SP: Definition

X(t) is a Gaussian stochastic process if for every n ∈ N and
every choice of the epochs t1 , . . . , tn ∈ R, the random vector
(X(t1 ), . . . , X(tn ))T is Gaussian.
c

The FDDs of Gaussian SPs
If X(t) is a centered Gaussian SP, then its FDDs are determined
by the mapping

(t1 , t2 ) 7→ Cov X(t1 ), X(t2 ) , t1 , t2 ∈ R.

Proof: Since X(t) is a Gaussian SP,
T
X(t1 ), . . . X(tn )
is Gaussian, and its law is specified by its mean (which is zero) and
its covariance matrix, which is determined by the mapping:
 
Cov[X(t1 ), X(t1 )] Cov[X(t1 ), X(t2 )] · · · Cov[X(t1 ), X(tn )]
 ··· ··· ··· ··· 
 
 ··· ··· ··· ··· .
 
 ··· ··· ··· ··· 
Cov[X(tn ), X(t1 )] Cov[X(tn ), X(t2 )] · · · Cov[X(tn ), X(tn )]
c
Stationary Stochastic Processes

X(t) is stationary if all its time shifts have identical FDDs: for
every τ ∈ R, every n ∈ N and all epochs t1 , . . . , tn ∈ R,
L
X(t1 + τ ), . . . , X(tn + τ ) = X(t1 ), . . . , X(tn ) .

• Choosing n = 1: if X(t) is stationary, then all its samples
have the same distribution
L
X(t) = X(t + τ ), t, τ ∈ R.

• Choosing n = 2: if X(t) is stationary, then the joint
distribution of any two of its samples depends on the elapsed
time between them and not on the absolute time at which
they are taken
L
X(t1 ), X(t2 ) = X(t1 + τ ), X(t2 + τ ) , t1 , t2 , τ ∈ R.
c
Wide-Sense Stationary Stochastic Processes

X(t), t ∈ R is wide-sense stationary if
1. It is of finite variance;
2. its mean is constant

E X(t) = E X(t + τ ) , t, τ ∈ R;
3. and the covariance between its samples satisfies

Cov X(t1 ), X(t2 ) = Cov X(t1 + τ ), X(t2 + τ ) , t1 , t2 , τ ∈ R.
Every finite-variance
stationary SP is also wide-sense stationary.
Indeed, if X(t) is stationary then
L
X(t) = X(t + τ ), t, τ ∈ R,
L
X(t1 ), X(t2 ) = X(t1 + τ ), X(t2 + τ ) , t1 , t2 , τ ∈ R.
c
Autocovariance Function

The autocovariance function KXX : R → R of a WSS SP X(t) is

KXX (τ ) , Cov X(t + τ ), X(t) ,

(which does not depend on t because X(t) is WSS).
c
Stationary Gaussian Stochastic Processes
A Gaussian SP is stationary iff it is wide-sense stationary.
Proof: Every Gaussian SP is of finite-variance, so if it is

additionally stationary, it must also be WSS.

Assume now that X(t) is WSS. We’ll show
L
X(t1 + τ ), . . . , X(tn + τ ) = X(t1 ), . . . , X(tn ) .
Both vectors are Gaussian (because X is a Gaussian SP), so we

only need to show identical means and covariances. The mean
vectors are both (E[X(0)] , . . . , E[X(0)])T (because X is WSS).
As to the covariance matrices:
c
The former’s covariance is
 
Cov[X(t1 ), X(t1 )] ··· Cov[X(t1 ), X(tn )]
 ··· ··· ··· 
 
 ··· ··· ··· 
 
 ··· ··· ··· 
Cov[X(tn ), X(t1 )] ··· Cov[X(tn ), X(tn )]
and the latter’s

 
Cov[X(t1 + τ ), X(t1 + τ )] ··· Cov[X(t1 + τ ), X(tn + τ )]
 ··· ··· ··· 
 
 ··· ··· ··· .
 
 ··· ··· ··· 
Cov[X(tn + τ ), X(t1 + τ )] ··· Cov[X(tn + τ ), X(tn + τ )]
They are identical by wide-sense stationarity.
c
The FDDs of a Stationary Gaussian SP
The FDDs of a centered stationary Gaussian SP are fully specified
by its autocovariance function.

Proof: Since X(t) is Gaussian, the vector
T
X(t1 ), . . . , X(tn )
is Gaussian, and its law thus determined by its mean & covariance.

• Its mean is 0 because X(t) is centered.
• The Row-j Column-` entry of its covariance matrix is
Cov[X(tj ), X(t` )] ,
which is KXX (t` − tj ) and thus determined by KXX .
c
The PSD of a Continuous-Time WSS SP

The WSS SP X(t) is of power spectral density (PSD) SXX if
SXX : R → R is nonnegative, symmetric, integrable,
and its IFT is
the autocovariance function KXX of X(t) :
Z ∞
KXX (τ ) = SXX (f ) ei2πf τ df, τ ∈ R.
−∞
c
Remarks on the PSD

If KXX is continuous at the origin and integrable, then X(t) is of
PSD K̂XX (·). (Proposition 25.7.1.)
Every nonnegative, symmetric, integrable function is the PSD of

some stationary Gaussian SP whose autocovariance function is
continuous. (Proposition 25.7.3.)
c
The PSD and Operational PSD of a WSS SP

Let X(t) be a measurable, centered, WSS SP of a continuous
autocovariance function KXX . Let S(·) be a nonnegative,
symmetric, integrable function. Then the following two conditions
are equivalent:
1. KXX is the Inverse Fourier Transform of S(·).
2. For every integrable h : R → R, the power in X ? h is given by
Z ∞
Power of X ? h = S(f ) |ĥ(f )|2 df.
−∞
(Theorem 25.14.3)
c
The Average Power
We would like to discuss

Z Z
1 T/2 2 1 T/2 2
X (ω, t) dt or ω 7→ X (ω, t) dt, ω ∈ Ω.
T −T/2 T −T/2
Some technicalities:
• For some ω ∈ Ω the mapping t 7→ X 2 (ω, t) might be
ill-behaved.
• The mapping from ω to the result of the integral might not
be measurable.

These difficulties are eliminated if X(t) is measurable.
c

Power in a Centered WSS SP
If X(t) is a measurable, centered, WSS SP of autocovariance
function KXX , then for all a < b
Z b
1
ω 7→ X 2 (ω, t) dt
b−a a
defines a RV (possibly taking on the value +∞) satisfying
Z b
1 2
E X (t) dt = KXX (0).
b−a a

The power in X(t) is thus KXX (0).
Proof: Swapping integration and expectation we obtain
Z b Z b
2

E X (t) dt = E X 2 (t) dt
a a
Z b
= KXX (0) dt
a
= (b − a) KXX (0).
c
Linear Functionals

Let X(t) be WSS. We wish to study the RV
Z ∞
ω 7→ X(ω, t) s(t) dt.
−∞
We focus on the mean and variance:

Z ∞ Z ∞

E X(t) s(t) dt = E X(t) s(t) dt
−∞
Z −∞
∞
= E X(t) s(t) dt
−∞
Z
∞
= E X(0) s(t) dt.
−∞
As to the variance:
c
Linear Functionals
We first consider the centered case:

Z ∞ "Z 2 #
∞
Var X(t) s(t) dt = E X(t) s(t) dt
−∞ −∞
Z ∞ Z ∞
=E X(t) s(t) dt X(τ ) s(τ ) dτ
−∞ −∞
Z ∞ Z ∞
=E X(t) s(t) X(τ ) s(τ ) dt dτ
Z ∞−∞
Z ∞−∞

= s(t) s(τ ) E X(t) X(τ ) dt dτ
Z−∞ −∞
∞ Z ∞
= s(t) KXX (t − τ ) s(τ ) dt dτ.
−∞ −∞
This can be written in two forms:
c
Linear Functionals
Z ∞ Z ∞Z ∞
Var X(t) s(t) dt = s(t) KXX (t − τ ) s(τ ) dt dτ
−∞ −∞ −∞
Z ∞Z ∞
= s(σ + τ ) KXX (σ) s(τ ) dσ dτ
−∞ −∞
Z ∞ Z ∞
= KXX (σ) s(σ + τ ) s(τ ) dτ dσ
Z−∞
∞
−∞
= KXX (σ) Rss (σ) dσ.

−∞
Or Z Z
∞ ∞ 2
Var X(t) s(t) dt = SXX (f )ŝ(f ) df .
−∞ −∞
c

What if X(t) is WSS but not Centered?

Consider the centered SP X̃(t)
X̃(t) = X(t) − µ, µ = E[X(t)] .
Z ∞ Z ∞

Var X(t) s(t) dt = Var X̃(t) + µ s(t) dt
−∞
Z−∞
∞ Z ∞
= Var X̃(t) s(t) dt + µ s(t) dt
Z−∞
∞ −∞
= Var X̃(t) s(t) dt

Z ∞ Z−∞∞
= s(t) KX̃X̃ (t − τ ) s(τ ) dt dτ
−∞ −∞
Z ∞Z ∞
= s(t) KXX (t − τ ) s(τ ) dt dτ.
−∞ −∞
c
Linear Functionals of Gaussian Processes

If X(t) is stationary and Gaussian, then
Z ∞ n
X
X(t) s(t) dt + αν X(tν ) is a Gaussian RV.
−∞ ν=1
Here:
• s : R → R is deterministic and integrable;
• n is an arbitrary nonnegative integer;
• α1 , . . . , αn ∈ R are arbitrary coefficients; and
• the epochs t1 , . . . , tn ∈ R are arbitrary.
The mean and variance determine the distribution!
c
Some Intuition
Approximating the integral with a Riemann sum

Z ∞ Xn K
X Xn
X(t) s(t) dt+ αν X(tν ) ≈ δ s(δk) X(δk)+ αν X(tν ).
−∞ ν=1 k=−K ν=1
The RHS is a linear functional of the vector

T
X(−Kδ), . . . , X(Kδ), X(t1 ), . . . , X(tν ) ,

which is Gaussian because X(t) is a Gaussian SP.
Being a linear functional of a Gaussian vector, the RHS is thus
Gaussian.
c
Computing the Mean
Z ∞ n
X
E X(t) s(t) dt + αν X(tν )
−∞ ν=1
Z ∞ Xn

=E X(t) s(t) dt + αν E X(tν )
−∞ ν=1
Z ∞ Xn
= E[X(0)] s(t) dt + αν .
−∞ ν=1
c
The Variance (1)
Z ∞ n
X Z ∞
Var X(t) s(t) dt + αν X(tν ) = Var X(t) s(t) dt
−∞ ν=1 −∞
X
n n
X Z ∞
+ Var αν X(tν ) + 2 αν Cov X(t) s(t) dt, X(tν ) .
ν=1 ν=1 −∞
We already saw
Z ∞ Z ∞
Var X(t) s(t) dt = KXX (σ) Rss (σ) dσ.
−∞ −∞
The sesquilinearity of the variance yields

X
n n X
X n
Var αν X(tν ) = αν αν 0 KXX (tν − tν 0 ).
ν=1 ν=1 ν 0 =1
c
The Variance (2)
It remains to compute the covariance
Z ∞ Z ∞
E X(tν ) X(t) s(t) dt = E X(t)X(tν ) s(t) dt
−∞
Z ∞−∞
= s(t) E[X(t)X(tν )] dt
Z−∞
∞
= s(t) KXX (t − tν ) dt.
−∞
Combining all the terms we obtain

Z ∞ Xn Z ∞
Var X(t) s(t) dt + αν X(tν ) = KXX (σ) Rss (σ) dσ
−∞ ν=1 −∞
n X
X n n
X Z ∞
+ αν αν 0 KXX (tν − tν 0 ) + 2 αν s(t) KXX (t − tν ) dt.
ν=1 ν 0 =1 ν=1 −∞
c
Linear Functionals of a Gaussian SP Are Jointly Gaussian
The m linear functionals
Z ∞ Xn1

X(t) s1 (t) dt + α1,ν X t1,ν , . . . ,
−∞ ν=1
Z ∞ nm
X
X(t) sm (t) dt + αm,ν X tm,ν
−∞ ν=1

of a measurable, stationary, Gaussian SP X(t) are jointly
Gaussian.
Here:
• m ∈ N is the number of functionals;
• the m functions {sj }m j=1 are integrable;
• the coefficients {αj,ν } and the epochs {tj,ν } are deterministic
real numbers for all j ∈ {1, . . . , m} and all ν ∈ {1, . . . , nj }.
All we need is the mean vector and covariance matrix.

c
Proof
We’ll show that any linear combination of these m RVs is Gaussian:
For any choice of γ1 , . . . , γm ∈ R the linear combination
Z ∞ n1
X

γ1 X(t) s1 (t) dt + α1,ν X t1,ν + ···
−∞ ν=1
Z ∞ nm
X

+ γm X(t) sm (t) dt + αm,ν X tm,ν
−∞ ν=1

can also be written as linear functional of X(t)
Z ∞ X
m nj
m X
X
X(t) γj sj (t) dt + γj αj,ν X tj,ν ,
−∞ j=1 j=1 ν=1
and is thus Gaussian.

c
Computing the Covariance Matrix (1)
"Z nj Z nk
#
∞ X ∞ X
Cov X(t) sj (t) dt + αj,ν X(tj,ν ), X(t) sk (t) dt + αk,ν 0 X(tk,ν 0 )
−∞ ν=1 −∞ ν 0 =1
Z ∞ Z ∞
= Cov X(t) sj (t) dt, X(t) sk (t) dt
−∞ −∞
nj
X Z ∞
+ αj,ν Cov X(tj,ν ), X(t) sk (t) dt
ν=1 −∞
Xnk Z ∞
+ αk,ν 0 Cov X(tk,ν 0 ), X(t) sj (t) dt
ν 0 =1 −∞
nj nk
X X
+ αj,ν αk,ν 0 Cov X(tj,ν ), X(tk,ν 0 ) , j, k ∈ {1, . . . , m}.
ν=1 ν 0 =1
We have seen all the terms except the first:

c
Z ∞ Z ∞
Cov X(t) sj (t) dt, X(t) sk (t) dt
−∞ −∞
Z ∞ Z ∞
=E X(t) sj (t) dt X(τ ) sk (τ ) dτ
Z−∞
∞ Z ∞
−∞

=E X(t) sj (t) X(τ ) sk (τ ) dt dτ
Z ∞−∞
Z ∞−∞
= E[X(t) X(τ )] sj (t) sk (τ ) dt dτ
−∞ −∞
Z ∞Z ∞
= KXX (t − τ ) sj (t) sk (τ ) dt dτ
−∞ −∞
Z ∞ Z ∞
= KXX (σ) sj (t) sk (t − σ) dt dσ
Z−∞
∞ Z−∞
∞
= KXX (σ) sj (t) ~sk (σ − t) dt dσ
Z−∞
∞
−∞

= KXX (σ) sj ? ~sk (σ) dσ.
c
Lecture 13, Amos −∞
Lapidoth 2017
Z ∞ Z ∞
−∞ −∞
Z ∞

= KXX (σ) sj ? ~sk (σ) dσ.
−∞

If X(t) is of PSD SXX , then we can rewrite this as
Z ∞ Z ∞
−∞ −∞
Z ∞
= SXX (f ) ŝj (f ) ŝ∗k (f ) df,
−∞
because the FT of sj ? ~sk is the product of the FT of sj and the

FT of ~sk , and because the FT of ~sk is f 7→ ŝk (−f ), which,
because sk is real, is also given by f 7→ ŝ∗k (f ).
(The covariances are summarized in Theorem 25.12.2.)
c

White Noise
N (t) is white Gaussian noise of double-sided spectral
density
N0 /2 with respect to the bandwidth W if N (t) is a measurable,
stationary, centered, Gaussian SP that has a PSD SNN satisfying
N0
SNN (f ) = , f ∈ [−W, W ].
2
SNN (f )
N0 /2
f
−W W
c
Key Properties of White Gaussian Noise (1)
• If s is any integrable function that is bandlimited to W Hz,
Z ∞
N0 2
N (t) s(t) dt ∼ N 0, ksk2 .
−∞ 2
• If s1 , . . . , sm are integrable functions that are bandlimited to
W Hz, then the m random variables
Z ∞ Z ∞
N (t) s1 (t) dt, . . . , N (t) sm (t) dt
−∞ −∞
are jointly Gaussian centered random variables of covariance

 
hs1 , s1 i hs1 , s2 i · · · hs1 , sm i
N0  
 hs2 , s1 i hs2 , s2 i · · · hs2 , sm i 
 .. .. .. .
.. .
2  . . . 
hsm , s1 i hsm , s2 i · · · hsm , sm i
c
• If φ1 , . . . , φm are integrable functions that are bandlimited to
W Hz and are orthonormal, then
Z ∞ Z ∞ N
0
N (t) φ1 (t) dt, . . . , N (t) φm (t) dt ∼ IID N 0, .
−∞ −∞ 2
N0
KNN ? s = s.
2
• If s is an integrable function that is bandlimited to W Hz,
then for every epoch t ∈ R
Z ∞
N0
Cov N (σ) s(σ) dσ, N (t) = s(t).
−∞ 2
c
Proof (1)
Z ∞ Z ∞
Cov N (t) sj (t) dt, N (t) sk (t) dt
−∞ −∞
Z ∞
= SNN (f ) ŝj (f ) ŝ∗k (f ) df
−∞
Z W
= SNN (f ) ŝj (f ) ŝ∗k (f ) df
−W
Z
N0 W
= ŝj (f ) ŝ∗k (f ) df
2 −W
N0
= hsj , sk i , j, k ∈ {1, . . . , m}.
2
c
Proof (2)
Z ∞

KNN ? s (t) = s(τ ) KNN (t − τ ) dτ
Z −∞
∞ Z ∞
= s(τ ) SNN (f ) ei2πf (t−τ ) df dτ
Z−∞∞
−∞
Z ∞
i2πf t
= SNN (f ) e s(τ ) e−i2πf τ dτ df
−∞ −∞
Z ∞
= SNN (f ) ŝ(f ) ei2πf t df
−∞
Z W
−W
Z
N0 W
= ŝ(f ) ei2πf t df
2 −W
N0
= s(t), t ∈ R.
2
c
Proof (3)
Z ∞ Z ∞
Cov N (σ) s(σ) dσ, N (t) = SNN (f ) ŝ(f ) ei2πf t df
−∞ −∞
Z W
−W
Z
N0 W
= ŝ(f ) ei2πf t df
2 −W
N0
= s(t), t ∈ R.
2
c
Projecting a SP
If X is a measurable WSS SP, and if φ1 , . . . , φd ∈ L1 ∩ L2 are
orthonormal, then the projection of X onto span(φ1 , . . . , φd ) is
the SP
d
X
(ω, t) 7→ hX, φ` i(ω) φ` (t),
`=1
i.e.,
d
X
hX, φ` i φ` .
`=1
For a given ω ∈ Ω, it is the projection of the sample-path X(ω, ·):

d Z
X ∞ d
X

X(ω, t) φ` (t) dt φ` t 7→ X(ω, t), φ` φ` .
`=1 −∞ `=1
c
Projecting White Noise

If N (t) is white Gaussian noise of power spectral density N0 /2
w.r.t. the bandwidth W, and if φ1 , . . . , φd are orthonormal
integrable signals that are bandlimited to W Hz. Then
d
X d
X
hN, φ` i φ` and N − hN, φ` i φ`
`=1 `=1
are independent Gaussian stochastic processes.
c
A Small Detour (1)
Suppose
N = g(N ) + h(N ), (17a)
where
g(N ) and h(N ) are independent. (17b)
Then
L
N =G+H (18a)
whenever
L L
G = g(N ), H = h(N ), G and H are independent. (18b)
One way to generate such G and H is to generate N and set

G = g(N ) and H = h(N ).
But here is another way.
c
A Small Detour (2)
• Generate N 0 of the same law as N but independently of it.

• set G = g(N ) and H = h(N 0 ).
In this case too
L
N = G + H.
Indeed,
• G and H are independent because N and N 0 are independent.
L L L
• G= g(N ) and H = h(N ) because N 0 = N .
c
Simulating White Noise of a Given Projection
Let N be white Gaussian noise of double-sided power spectral

density N0 /2 w.r.t. the bandwidth W. Let N0 be of the same law
as N but independent of it. Let φ1 , . . . , φd be orthonormal
d
X d
X

hN, φ` i φ` + N0 − N0 , φ` φ`
`=1 `=1
is a measurable SP of the same FDDs as N.
c
Projecting White Noise

If N (t) is white Gaussian noise of power spectral density N0 /2
w.r.t. the bandwidth W, and if φ1 , . . . , φd are orthonormal
d
X d
X
hN, φ` i φ` and N − hN, φ` i φ`
`=1 `=1
are independent Gaussian stochastic processes.
c
Define
d
X d
X
N1 , hN, φ` i φ` , N2 , N − hN, φ` i φ` .
`=1 `=1
d
X d
X
N1 (t) , hN, φ` i φ` (t), N2 (t) , N (t) − hN, φ` i φ` (t).
`=1 `=1
We need to show that for every n ∈ N and epochs t1 , . . . , tn ∈ R
T T
N1 (t1 ), . . . , N1 (tn ) and N2 (t1 ), . . . , N2 (tn )
are independent Gaussian vectors. They are jointly Gaussian
because they are linear functionals of a Gaussian SP:
X d Z ∞ X d
N1 (tν ) = N, φ` (tν )φ` = N (t) φ` (tν )φ` dt,
`=1 −∞ `=1
Z ∞ Xd
N2 (tν 0 ) = N (t) − φ` (tν 0 )φ` dt + N (tν 0 ).
−∞ `=1
It thus remains to establish that they are uncorrelated.
c

Because N (t) is centered, N1 (tν ) and N2 (tν 0 ) are centered and

Cov N1 (tν ), N2 (tν 0 ) = E N1 (tν ) N2 (tν 0 ) .
It thus remains to establish

E N1 (tν ) N2 (tν 0 ) = 0, ν, ν 0 ∈ {1, . . . , n}.
More generally,

E N1 (t) N2 (t0 )
X d d
X
0 0
=E hN, φ` iφ` (t) N (t ) − hN, φ`0 iφ`0 (t )
`=1 `0 =1
d
X d d
X X
= φ` (t)E hN, φ` iN (t0 ) − φ` (t)φ`0 (t0 )E hN, φ` ihN, φ`0 i
`=1 `=1 `0 =1
d
X d
X
N0 N0
= φ` (t)φ` (t0 ) − φ` (t)φ` (t0 )
2 2
`=1 `=1
= 0, t, t0 ∈ R.
c
Next Week
Detection in White Noise (Chapter 26).
Thank you!
c
Lecture 14
Amos Lapidoth
ETH Zurich
May 30, 2017
Detection in White Noise
c
Signals in White Noise
The “message” M takes value in M = {1, . . . , M}, with prior
πm = Pr[M = m], m ∈ M.

The “observation” Y (t) is a continuous-time SP.
Conditional on M = m,
Y (t) = sm (t) + N (t), t ∈ R.
• The “mean signals” s1 , . . . , sM are real, deterministic,

integrable signals that are bandlimited to W Hz.

• The “noise” N (t) is independent of M and is white
Gaussian noise of double-sided spectral density N0 /2 w.r.t.
the bandwidth W.

Based on Y (t) we wish to guess M with the smallest possible
probability of error.
c
A Technicality
We only consider guessing rules whose performance is determined

by the FDDs.
(I.e., that
are measurable w.r.t. the σ-algebra generated by
Y (t) .)
c
From a SP to a Random Vector
If (φ1 , . . . , φd ) is an orthonormal basis for span(s1 , . . . , sM ), then

to every decision rule2 based on Y there corresponds a
(randomized) decision-rule based on
T
T , hY, φ1 i, . . . , hY, φd i
of identical performance. Consequently,
no measurable decision rule based on Y can outperform an

optimal rule based on T.
2
that is measurable w.r.t. the σ-algebra generated by Y
c
Computing the Inner Products
~1 hY, φ1 i
φ
~2 hY, φ2 i
Decision Rule
φ
Guess
Y (t)
...
~d hY, φd i
φ
sample at t = 0
c
More Generally
• span(φ1 , . . . , φd ) need not equal span(s1 , . . . , sM ): it suffices
that
span(φ1 , . . . , φd ) ⊇ span(s1 , . . . , sM ).
• The same holds for any vector S provided that T is
computable from S.
• If s̃1 , . . . , s̃n are integrable signals that are bandlimited to W
Hz and
span(s1 , . . . , sM ) ⊆ span(s̃1 , . . . , s̃n ),
then it is optimal to base our guess on
T
hY, s̃1 i, . . . , hY, s̃n i .
Indeed, T is computable from this vector because
X n Xn
φ` = α`,j s̃j =⇒ hY, φ` i = α`,j hY, s̃j i .
j=1 j=1
c
The Conditional Law of T
Given M = m, what is the conditional law of

T
T , hY, φ1 i, . . . , hY, φd i ?
Given M = m, we have Y = sm + N so
T T
T = hsm , φ1 i, . . . , hsm , φd i + hN, φ1 i, . . . , hN, φd i .
And since N is white w.r.t. W and (φ1 , . . . , φd ) are orthonormal

and bandlimited to W Hz,

N0
hN, φ1 i, . . . , hN, φd i ∼ IID N 0, .
2
c
• If φ1 , . . . , φm are integrable functions that are bandlimited to
W Hz and are orthonormal, then
Z ∞ Z ∞ N
0
N (t) φ1 (t) dt, . . . , N (t) φm (t) dt ∼ IID N 0, .
−∞ −∞ 2
N0
KNN ? s = s.
2
• If s is an integrable function that is bandlimited to W Hz,
then for every epoch t ∈ R
Z ∞
N0
Cov N (σ) s(σ) dσ, N (t) = s(t).
−∞ 2
c
Reduction to the Multi-Dimensional Multi-Hypothesis
Gaussian Problem
Before Now
observed vector Y T
number of components of
J d
the observed vector
variance of noise added to
σ2 N0 /2
each component
number of hypotheses M M
conditional mean of ob- (1) (J) T T
sm , . . . , sm hsm , φ1 i, . . . , hsm , φd i
servation given M = m
J
X d
X Z ∞
sum of squared compo- (j) 2 2
sm hsm , φ` i = s2m (t) dt
nents of mean vector −∞
j=1 `=1
c
Optimal Rule Based on T
Picking uniformly at random from

( Pd 2 )
`=1 hY, φ` i − hsm0 , φ` i
argmax ln πm0 −
m0 ∈M N0
minimizes the probability of a guessing error.
If the prior is uniform:
c
Optimal Rule Based on T—Uniform Prior
If M is uniform, then this rule does not depend on the value of

N0 . It picks uniformly at random from
( d )
X 2
argmin hY, φ` i − hsm0 , φ` i .
m0 ∈M `=1
c
What if s1 , . . . , sM all Have the same Energy?
Since (φ1 , . . . , φd ) is orthonormal
d
X
sm = hsm , φ` iφ` , m ∈ M,
`=1
and,
d
X
ksm k22 = hsm , φ` i2 , m ∈ M.
`=1
Consequently,

ks1 k2 = ks2 k2 = · · · = ksM k2
X
d d
X d
X
2 2 2
=⇒ hs1 , φ` i = hs2 , φ` i · · · = hsM , φ` i .
`=1 `=1 `=1
In this case the Euclidean norms of all mean vectors are equal!
c
Optimal Rule for a Uniform Prior and Equi-Energy Mean
Signals
If M has a uniform distribution and, in addition, the mean signals

are of equal energy, i.e.,
ks1 k2 = ks2 k2 = · · · = ksM k2 ,
then it is optimal to use the maximum-correlation

d
X
argmax hsm0 , φ` ihY, φ` i.
m0 ∈M `=1
c
The Decision Rule without Reference to a Basis
Pd 2
hY, φ` i − hsm0 , φ` i
`=1
ln πm0 −
N0
can be expressed by opening the square as
d d d
1 X 2 2 X 1 X
ln πm0 − hY, φ` i + hY, φ` ihsm0 , φ` i− hsm0 , φ` i2 .
N0 N0 N0
`=1 `=1 `=1
The term in red does not depend on m0 , so we choose at random a

message in
 

 


 


 d d 

2 X 1 X 2
argmax ln πm0 + hY, φ` ihsm0 , φ` i − hsm0 , φ` i .
m0 ∈M  N0 N0 


 `=1
| {z } `=1
| {z } 


 

hY,sm0 i 2
ksm0 k2
c
d
X X d
hY, φ` ihsm , φ` i = Y, hsm , φ` iφ`
`=1 `=1
= hY, sm i, m ∈ M.
c
Optimal Rule
Pick at random an element of

Z ∞ Z
2 1 ∞ 2
argmax ln πm0 + Y (t) sm0 (t) dt − s 0 (t) dt .
m0 ∈M N0 −∞ 2 −∞ m
For a uniform prior

Z ∞ Z
1 ∞ 2
argmax Y (t) sm0 (t) dt − s 0 (t) dt .
m0 ∈M −∞ 2 −∞ m
For a uniform prior and equi-energy mean signals

Z ∞
argmax Y (t) sm (t) dt .
0
m0 ∈M −∞
c
Performance Analysis
As in the Mutli-Dimensional Gaussian Multi-Hypothesis Problem!
Before Now
observed vector Y T
number of components of
J d
the observed vector
variance of noise added to
σ2 N0 /2
each component
number of hypotheses M M
conditional mean of ob- (1) (J) T T
sm , . . . , sm hsm , φ1 i, . . . , hsm , φd i
servation given M = m
J
X d
X Z ∞
sum of squared compo- (j) 2 2
sm hsm , φ` i = s2m (t) dt
nents of mean vector −∞
j=1 `=1
And note
d
X 2
hsm , φ` i − hsm0 , φ` i = ksm − sm0 k22 .
`=1
c
p
X
ksm − sm0 k2 N0 /2 πm
pMAP (error|M = m) ≤ Q √ + ln
2N0 ksm − sm0 k2 πm0
m0 6=m
s 
X ksm −sm0 k22
pMAP (error|M = m) ≤ Q , M uniform
2N0
m0 6=m
p
ksm − sm0 k2 N0 /2 πm
pMAP (error|M = m) ≥ max Q √ + ln
m0 6=m 2N0 ksm − sm0 k2 πm0
s 
ksm −sm0 k22
pMAP (error|M = m) ≥ max Q , M uniform
0 m 6=m 2N0
c
Antipodal Signaling
Consider the binary case with a uniform prior
s0 = −s1 = s,
where s is a nonzero integrable signal that is bandlimited to W Hz,
Es , ksk22 .
Here span(s0 , s1 ) is one dimensional and is spanned by the
unit-norm signal
s
φ= .
ksk2
We guess based on
T = hY, φi .
√
Conditional on H = 0, we have T ∼ N √ Es , N0 /2 ,whereas,
conditional on H = 1, we have T ∼ N − Es , N0 /2 .
How to guess H based on T we have already seen with
p N0
A = Es , σ2 = .
2
c
Guess “H = 1” Guess “H = 0”
fY |H=1 (y) fY |H=0 (y)
fY (y)
pMAP (error|H = 0)
y
−A A
c
Antipodal Signaling
It is optimal to guess “H = 0” if T ≥ 0 and to guess “H = 1” if
T < 0. That is,
Z ∞
Guess “H = 0” if Y (t) s(t) dt ≥ 0.
−∞
Substituting
p N0
Es = A, = σ2
2
we obtain !
r
2Es
p∗ (error) = Q .
N0
The √distance is ks − (−s)k2 , i.e., 2 ksk2 . Half the

√ distance
p is ksk2 ,
i.e., Es . Measured in standard deviations it is Es / N0 /2.
c
General Binary Signaling (1)
Assume a uniform prior and mean signals s0 and s1 .
We could find an orthonormal basis for span(s0 , s1 ).
Instead, we subtract (s0 + s1 )/2 from Y so
1
Ỹ (t) = Y (t) − s0 (t) + s1 (t) , t ∈ R.
2
Since Y can be recovered from Ỹ we can guess based on Ỹ.
Conditional on H = 0,
Ỹ = Y − (s0 + s1 )/2
= s0 + N − (s0 + s1 )/2
= (s0 − s1 )/2 + N.
Conditional on H = 1,
Ỹ = Y − (s0 + s1 )/2
= s1 + N − (s0 + s1 )/2
= −(s0 − s1 )/2 + N.
c
General Binary Signaling

(2)
Thus, the guessing problem given Ỹ (t) is the antipodal signal
problem with
s0 − s1
s, .
2
An
R optimal decision rule is to guess “H = 0” if
Ỹ (t) s0 (t) − s1 (t) /2 dt is nonnegative, i.e.,
Z ∞
s0 (t) + s1 (t) s0 (t) − s1 (t)
Guess “H = 0” if Y (t) − dt ≥ 0.
−∞ 2 2
s 
2
ks 0 − s k
1 2 
p∗ (error) = Q  .
2N0
Half the distance is ks0 − s1 k2 /2, which in standard deviations is

ks0 − s1 k2 /2
p .
N0 /2
c
M-ary Orthogonal Keying
Suppose M is uniform, and the mean signals are orthogonal of

equal energy Es > 0:
hsm0 , sm00 i = Es I{m0 = m00 }, m0 , m00 ∈ M.
Since M is uniform, and since the mean signals are of equal

energy, the “max-correlation” rule is optimal:
Guess “m” if hY, sm i = max

0
hY, sm0 i
m ∈M
with ties resolved by picking any message achieving the maximum.
c
Define Z ∞
(`) s` (t)
T = Y (t) √ dt, ` ∈ {1, . . . , M}.
−∞ Es
0
We guess “M = m” if T (m) = maxm0 ∈M T (m ) , with ties being
resolved at random among the components of T that are maximal.
The mean signals are distinct, and hence the probability of a tie is
zero, and
pMAP (error|M = m)

= Pr max T (1) , . . . , T (m−1) , T (m+1) , . . . , T (M) > T (m) M = m .
c
The Conditional Law of T
• the components of T are independent
√
• with the m-th being N Es , N0 /2 and
• the other components N (0, N0 /2).
pMAP (error|M = m) is the probability that at least one of M − 1

N (0, N0 /2)
IID √ random variables exceeds the value of a
N Es , N0 /2 random variable that is independent of them.
Consequently,
pMAP (error|M = m) = pMAP (error|M = 1), m ∈ M.
c
pMAP (error|M = 1)

= Pr max T (2) , . . . , T (M) > T (1) M = 1

= 1 − Pr max T (2) , . . . , T (M) ≤ T (1) M = 1
Z ∞

= 1− fT (1) |M =1 (t) Pr max T (2) , . . . , T (M) ≤ t M = 1, T (1) = t dt
Z−∞
∞
=1− fT (1) |M =1 (t) Pr max T (2) , . . . , T (M) ≤ t M = 1 dt
Z−∞
∞
=1− fT (1) |M =1 (t) Pr T (2) ≤ t, . . . , T (M) ≤ t M = 1 dt
Z−∞
∞ M−1
=1− fT (1) |M =1 (t) Pr T (2) ≤ t M = 1 dt
−∞
Z ∞ M−1
t
=1− fT (1) |M =1 (t) 1 − Q p dt
−∞ N0 /2
Z ∞ √
(t− Es )2
M−1
1 − t
=1− √ e N 0 1−Q p dt
−∞ πN0 N0 /2
Z ∞ r !M−1
1 2 2E s
=1− √ e−τ /2 1 − Q τ + dτ.
c
2π
Lecture 14, Amos −∞ 2017
Lapidoth
N 0
Zero-Mean Signals for Additive-Noise Channels
{Dj } X Y =X+N {Djest }

TX1 + RX1
N
TX2 RX2
{Dj } X X−c Y =X−c+N X+N {Djest }
TX1 + + + RX1
−c c
How should we choose c(·)?

c

E (W − c)2 ≥ Var[W ] , c∈R
with equality iff
c = E[W ] .

E (W − c)2
h 2 i
= E (W − E[W ]) + (E[W ] − c)

= E (W − E[W ])2 + 2 E W − E[W ] E[W ] − c + (E[W ] − c)2
| {z }
0

= E (W − E[W ])2 + (E[W ] − c)2

≥ E (W − E[W ])2
= Var[W ] ,
with equality iff c = E[W ]. (Huygens-Steiner)

c
To minimize Z
1 T h 2 i
E X(t) − c(t) dt,
2T −T
we minimize the integrand, i.e., we choose c(t) to minimize

h 2 i
E X(t) − c(t) ,
and thus choose

c(t) = E X(t) , t ∈ R.
The transmitted signal X − c is then centered!
c
The M-ary Simplex
Start from Orthogonal Keying and subtract the mean.
Let φ1 , . . . , φM be orthonormal. Let φ̄ be their “center of gravity”
1 X
φ̄ = φm .
M
m∈M
The M-ary Simplex is a scaled version of
φ1 − φ̄, . . . , φM − φ̄.
φ2
−
φ̄
φ2 φ1
φ̄
−
φ̄
φ1
c
Constructing the 3-Ary Simplex
c
The Simplex: Inner Products and Energies
φ1 , . . . , φM are orthonormal and
1 X
φ̄ = φm .
M
m∈M
Consequently
D E
φm0 − φ̄, φm00 − φ̄

= hφm0 , φm00 i − φm0 , φ̄ − φm00 , φ̄ + φ̄, φ̄
n o 1D X E 1D X E
= I m0 = m00 − φm0 , φm − φm00 , φm
M m
M m
2

1 X

+ 2 φm
M m

2
n o 1 1 1
0 00
=I m =m − − + M
M M M2
n o 1
= I m0 = m00 − , m0 , m00 ∈ M.
M
c
Normalization
Since

φm − φ̄ 2 = 1 − 1 = M − 1 ,
2 M M
we define the energy-Es M-ary simplex constellation as
r
p M
sm = Es φm − φ̄ , m ∈ M,
M−1
with the result that
Es
ksm k22 = Es and hsm0 , sm00 i = − , m0 6= m00 .
M−1
{sm } can be viewed as the result of subtracting the center of
gravity from orthogonal signals of energy Es M/(M − 1):
r r
p M p M
sm = Es φm − Es φ̄, m ∈ M.
M−1 M−1
c
The Probability of Error for the Simplex
Since {sm } can be viewed as the result of subtracting the center of

gravity from orthogonal signals of energy Es M/(M − 1), p∗ (error)
for the energy-Es simplex is the same as for orthogonal keying with
energy
M
Es .
M−1
p∗ (error)
Z ∞ r !M−1
1 −τ 2 /2 M 2Es
=1− √ e 1−Q τ + dτ.
2π −∞ M − 1 N0
c
From the Simplex to Orthogonal Keying
If ψ is of unit-energy and orthogonal to {s1 , . . . , sM }, then
p
1
sm + √ Es ψ
M−1 m∈M
are orthogonal, each of energy Es M/(M − 1).
s1
ψ
+ √
s
E
s1
√
Es ψ
s2
+
√ Es
ψ
s2
c
Decoding the Simplex
1
√
To decode, add √M−1 Es ψ to Y and use a decoder for
orthogonal keying.
c
Bi-Orthogonal Keying
The 2κ mean signals are

p p
sν,u = + Es φν and sν,d = − Es φν , ν ∈ {1, . . . , κ},
where (φ1 , . . . , φκ ) are orthonormal.
p
ksν,u k2 = ksν,d k2 = Es , ν ∈ {1, . . . , κ}.
c
Optimal Guessing Rule
Since the prior is uniform and the mean signals of equal energy, we
should pick the message corresponding to the largest of
hY, s1,u i , hY, s1,d i , . . . , hY, sκ,u i , hY, sκ,d i .
Since sν,u = −sν,d
p
max hY, sν,u i , hY, sν,d i = hY, sν,u i = Es hY, φν i, ν ∈ {1, . . . , κ}.
We can also compare in pairs and then compare the κ results:
n o
max hY, s1,u i , hY, s1,d i , . . . , hY, sκ,u i , hY, sκ,d i
n o
= max max hY, s1,u i , hY, s1,d i , . . . , max hY, sκ,u i , hY, sκ,d i .
Find which ν ∗ in {1, . . . , κ} attains

max hY, φν i
ν∈{1,...,κ}
and then guess “sν ∗ ,u ” if hY, φν ∗ i > 0 and guess “sν ∗ ,d ”

otherwise.
c
pMAP (correct|s1,u ) (ties occur with probability zero)

= Pr − hY, φ1 i ≤ hY, φ1 i and max hY, φν i ≤ hY, φ1 i s1,u
2≤ν≤κ

= Pr hY, φ1 i ≥ 0 and max hY, φν i ≤ hY, φ1 i s1,u
2≤ν≤κ
Z ∞ h i

= fhY,φ1 i|s1,u (t) Pr max hY, φν i ≤ t s1,u , hY, φ1 i = t dt
2≤ν≤κ
Z0 ∞ h i

= fhY,φ1 i|s1,u (t) Pr max hY, φν i ≤ t s1,u dt
2≤ν≤κ
Z0 ∞ κ−1
= fhY,φ1 i|s1,u (t) Pr |hY, φ2 i| ≤ t s1,u dt
0
Z ∞ √
(t− Es )2
!κ−1
1 − t
= √ e N0 1 − 2Q p dt
0 πN0 N0 /2
Z ∞ r !κ−1
2 2Es
= (2π)−1/2 q e−τ /2 1 − 2Q τ + dτ.
− 2Es N0
N0
c
Justification for the Red Terms
Conditional on s1,u being sent, hY, φ2 i ∼ N (0, N0 /2), so

h i

Pr hY, φ2 i ≤ t s1,u

|hY, φ2 i| t
= Pr p ≤p s1,u
N /2 N0 /2
0
|hY, φ2 i| t
= 1 − Pr p ≥p s1,u
N0 /2 N0 /2

hY, φ2 i t hY, φ2 i −t
= 1 − Pr p ≥p s1,u − Pr p ≤p s1,u
N0 /2 N0 /2 N0 /2 N0 /2

t
= 1 − 2Q p .
N0 /2
c
From a SP to a Random Vector
If (φ1 , . . . , φd ) is an orthonormal basis for span(s1 , . . . , sM ), then

to every decision rule3 based on Y there corresponds a randomized
decision-rule based on
T
T , hY, φ1 i, . . . , hY, φd i
of identical performance.
Consequently, no measurable decision rule based on Y can

outperform an optimal rule based on T.
3
that is measurable w.r.t. the σ-algebra generated by Y
c
A Toy Problem
We observe a pair (Y1 , Y2 ).
H=0: Y1 = s0 + N1 , Y2 = N2 .
H=1: Y1 = s1 + N1 , Y2 = N2 .
Here
N1 ∼ N 0, σ 2 N2 ∼ N 0, σ 2
are independent of H.
(Later s0 , s1 will be waveforms and N1 , N2 stochastic processes.)
• Can Y2 be discarded?
• Does (y1 , y2 ) 7→ y1 form a sufficient statistic?
Not necessarily!
c
Can Y2 be Discarded?
If N1 and N2 are not independent, Y2 could be useful!
If N2 is equal to N1 , we can guess H error-free based on Y1 − Y2 !
But if N2 is independent of N1 , then Y2 can be discarded!
c
Discarding Y2 when N1 and N2 Are Independent
fY1 ,Y2 |H=0 (y1 , y2 )

LR(y1 , y2 ) =
fY1 ,Y2 |H=1 (y1 , y2 )
fY |H=0 (y1 )fN2 (y2 )
= 1
fY1 |H=1 (y1 )fN2 (y2 )
fN (y1 − s0 )
= 1 ,
fN1 (y1 − s1 )
which is computable from y1 .
Here is a proof that extends better to stochastic processes:
c
y1
Given rule for guessing Guess
y2 H based on (Y1 , Y2 )
c
y1
y1
H based on (Y1 , Y2 )
c
y1
y1
ỹ2 H based on (Y1 , Y2 )
Generate
Ỹ2 ∼ fN2 (·)
Local
Randomness
c
Back to the Real Problem (1)
d
X d
X
Y1 , hY, φ` i φ` , Y2 , Y − hY, φ` i φ` .
`=1 `=1
Since Y = Y1 + Y2 , we can guess based on (Y1 , Y2 ).
Conditional on M = m
Xd
Y1 = h(sm + N), φ` i φ`
`=1
d
X d
X
= hsm , φ` i φ` + hN, φ` i φ`
`=1 `=1
d
X
= sm + hN, φ` i φ` ,
`=1
and
d
X
Y2 = N − hN, φ` i φ` .
c
Lecture 14, Amos Lapidoth 2017 `=1
Back to the Real Problem (2)
Y1 = sm + N1 , Y2 = N2 ,
where
d
X d
X
N1 = hN, φ` i φ` N2 = N − hN, φ` i φ` .
`=1 `=1
Since N1 and N2 are independent, Y2 can be discarded!
And Y1 can be reconstructed from

T
hY, φ1 i, . . . , hY, φd i .
QED.
c
� �
Y (t), t ∈ R Decision Guess
Device
�Y, φ1 �
Reconstruction
�Y, φ2 �
Projection
� � �
� �Y, φ� � φ� Y � (t) Decision Guess
+
Device
�
�Y, φd � N� − � �N� , φ� � φ�
Projection
Subtraction
N�
� �
Generate N � (t)
� �
of same FDDs as N (t)
Local
Randomness
c
Thank you!
Kindly read Section 26.9; it has numerous useful examples.
c

Communication and Detection Theory: Lecture 1: Amos Lapidoth

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Communication and Detection Theory: Lecture 1: Amos Lapidoth

Uploaded by

Copyright:

Available Formats

Communication and Detection Theory: Lecture 1

February 21, 2017

Teaching Assistant: Mr. Tibor Keresztfalvi.

• Digital communication is everywhere.

• You are trying to avoid math.

• Some classes cover material you cannot understand:

waveform bits bits waveform

file, etc (encrypted)

associates with each element in its domain A a unique

u : R→(−5, ∞), t 7→t2 .

• u(t) is the result of applying u to t, e.g., u(17).

• If the domain of a function u is R and its co-domain is R,

(t 7→ x(t) cos(2πfc t)) ? h.

x(t) cos(2πfc t) ? h(t).

• We define the energy in a real signal u : R → R as

• If this is finite, then we say that u is a finite-energy signal.

This use of the word “energy” is justified when u corresponds to

• We define the energy in a complex signal u : R → C as

• If this is finite, then we say that u is a finite-energy signal.

Justification in the baseband representation of passband signals.

whenever the integral exists.

Mathematicians might object. . . .

kuk22 = hu, ui.

The Inverse Fourier Transform (IFT) of an integrable function

• Very similar to the FT—instead of e−i2πf t use ei2πf t .

The Fourier Transform preserves inner products and energies:

hu, vi = hû, v̂i

If x is a real signal, then its FT is conjugate symmetric

Thus, if x is real, then

Figure: The FT x̂ of a real signal x.

Figure: The FT ŷ of a real signal y.

The bandwidth of x is the smallest W to which it is bandlimited.

We say that the signal x is an integrable signal that is

x(t) = (x ? LPFW )(t), t ∈ R.

• Changing a signal at one point can cause it to be

• Changing the value of an integrand at one point does not

• Changing the value of an integrand at a set of points of

We say that a subset N of the real line R is a set of Lebesgue

and the union of the intervals covers the set N

February 28, 2017

Passband Signals: Bandwidth and Representation

Loosely speaking, xPB is a passband signal that is bandlimited to

BPFW,fc (t) = 2W cos(2πfc t) sinc(Wt) t ∈ R.

A signal xPB is said to be an integrable passband signal that is

The bandwidth of xPB around fc is the smallest W s.t. xPB is

• We look at equal-length, symmetric intervals around fc and

If x is of bandwidth W Hz and if fc > W, then

• We’ll use the analytic representation as a stepping stone

• Only real passband signals have such representations.

xA is a complex signal whose FT x̂A is

It is obviously complex, because its FT is not conjugate symmetric.

x̂PB (−f ) = x̂∗PB (f ), f ∈ R.

x∗ has FT f 7→ x̂∗ (−f )

Since Re(x) equals (x + x∗ )/2,

• The representation is linear in the sense that if α and β are

In the time domain:

xBB (t) , e−i2πfc t xA (t), t ∈ R.

In the frequency domain:

where Wc is any cutoff frequency in the range

Proof: Start with

And because we are convolving t 7→ xPB (t) e−i2πfc t with a real

Re(xBB ) , in-phase component

The bandwidth of xPB around fc is twice the bandwidth of xBB .

is the closest element to u in span(φ−L() , . . . , φ−L() ).