You are on page 1of 632

Communication and Detection Theory: Lecture 1

Amos Lapidoth
ETH Zurich

February 21, 2017

Teaching Assistant: Mr. Tibor Keresztfalvi.

c
Lecture 1, Amos Lapidoth 2017
Why Should I Take this Class?

• Digital communication is everywhere.


• The wireless industry is huge.
• Math and Engineering in beautiful harmony.
• You have taken and enjoyed:
Signals and Systems, Linear Algebra, and Probability Theory.

c
Lecture 1, Amos Lapidoth 2017
This Class Is Not for You if:

• You are trying to avoid math.


• You are looking for easy credit.
• You prefer to skip the theory and rush to the practice.

c
Lecture 1, Amos Lapidoth 2017
If You Are Staying:

c
Lecture 1, Amos Lapidoth 2017
The Plan for Today

• Course information
• A point-to-point digital communication system.
• Functions, signals, and time-reversal.
• The inner product, orthogonality, and energy.
• The Fourier Transform (review):
• On 2π, f , ω, i, and j.
• Conjugate Symmetry
• Parseval’s Theorem
• The definition of bandwidth.

c
Lecture 1, Amos Lapidoth 2017
The Textbook

AMOS LAPIDOTH
• Greater mathematical precision than the
lecture.
A Foundation in
Digital • The lecture’s level suffices for the exam.
Communication • The exercises, however, are at the
Second Edition

course’s level.
• Open-book exam.
• No electronic devices allowed in the exam.

If you have seen some of the material before, now is the chance to
cover it in depth. I love questions!

c
Lecture 1, Amos Lapidoth 2017
Warning!

• Some classes cover material you cannot understand:


• You may not have the prerequisites.
• There may not be enough lecture hours.
• Or maybe the whole field is on shaky grounds.
• Homework problems then indicate what you need to know.
• The good news: Here you can understand everything.
• The bad news: Here you must understand everything.
• The problems check your understanding, but the exam will be
different!

c
Lecture 1, Amos Lapidoth 2017
encryption
encoder

encoder
channel
source

source
waveform bits bits waveform
file, etc (encrypted)

channel
reconstruction

decryption

decoder
channel
source

waveform bits bits waveform


sink

file, etc (encrypted)

c
Lecture 1, Amos Lapidoth 2017
Functions
• A function or a mapping

u: A → B

associates with each element in its domain A a unique


element in its co-domain B.
• The rule specifying for each element of the domain the
element in the range to which it is mapped is often written to
the right or underneath, e.g.,

u : R→(−5, ∞), t 7→t2 .

• u(t) is the result of applying u to t, e.g., u(17).


• The function t 7→ x(t) cos(2πfc t) has no name, and its
domain and co-domain are unspecified.

c
Lecture 1, Amos Lapidoth 2017
Signals

• If the domain of a function u is R and its co-domain is R,


then we sometimes say that u is real-valued signal or a real
signal, especially if the argument of u stands for time.
• Similarly, we sometimes refer to a function u : R → C as a
complex-valued signal or a complex signal.
• If I say that u is “a signal,” then whether it is real or complex
should be:
• clear from the context, or
• immaterial, or
• you should ask!

c
Lecture 1, Amos Lapidoth 2017
Caution
• While u and u(·) denote functions, u(t) denotes the result of
applying u to t. If u is a real-valued signal then u(t) is a real
number!
• If x and y are signals, then

x?y
denotes their convolution.
• The value of their convolution at time 17 is

(x ? y)(17).
• It is also perfectly fine to write

(t 7→ x(t) cos(2πfc t)) ? h.


• But I really don’t like

x(t) cos(2πfc t) ? h(t).


c
Lecture 1, Amos Lapidoth 2017
Shifting a Signal in Time
x(t)

t
1

x(t − 2)

t
3

c
Lecture 1, Amos Lapidoth 2017
Reflecting a Signal: ~x : t 7→ x(−t)
x(t)

t
1

~x(t)

t
−1

c
Lecture 1, Amos Lapidoth 2017
The Energy in a Real Signal: Ch. 3

• We define the energy in a real signal u : R → R as


Z ∞
u2 (t) dt.
−∞

• If this is finite, then we say that u is a finite-energy signal.


• The square root of the energy is denoted kuk2 ,
sZ

kuk2 , u2 (t) dt.
−∞

This use of the word “energy” is justified when u corresponds to


the current through a unit load or the voltage across such a load.

c
Lecture 1, Amos Lapidoth 2017
The Energy in a Complex Signal

• We define the energy in a complex signal u : R → C as


Z ∞

u(t) 2 dt.
−∞

• If this is finite, then we say that u is a finite-energy signal.


• The square root of the energy is denoted kuk2 ,
sZ

kuk2 , u(t) 2 dt.
−∞

Justification in the baseband representation of passband signals.

c
Lecture 1, Amos Lapidoth 2017
The Inner Product
We define the inner product hu, vi between the signals u and v as

Z ∞
hu, vi , u(t) v ∗ (t) dt,
−∞

whenever the integral exists.


Z ∞ Z ∞

 
hu, vi = Re u(t) v (t) dt + i Im u(t) v ∗ (t) dt,
−∞ −∞

Mathematicians might object. . . .


If hu, vi is zero, then we say that u and v are orthogonal.

kuk22 = hu, ui.

c
Lecture 1, Amos Lapidoth 2017
The Fourier Transform: Ch. 6
The Fourier Transform of an integrable signal x : R → C is the
mapping x̂ : R → C defined by
Z ∞
x̂ : f 7→ x(t) e−i2πf t dt.
−∞

In contrast to
X(jω)


• we use i for −1;
• we use x̂ instead of X(·), so
• we can write things like
ˆ = ~x,

• and reserve upper-case letters for random things.
• We use f instead of ω.
• But most important is the 2π.
c
Lecture 1, Amos Lapidoth 2017
The Inverse Fourier Transform

The Inverse Fourier Transform (IFT) of an integrable function


g : R → C is denoted by ǧ and is defined as
Z ∞
ǧ : t 7→ g(f ) ei2πf t df.
−∞

• Very similar to the FT—instead of e−i2πf t use ei2πf t .


• For symmetric functions, FT and IFT are identical!
• Hence, fewer pairs to memorize.

c
Lecture 1, Amos Lapidoth 2017
Parseval’s Theorem

The Fourier Transform preserves inner products and energies:

hu, vi = hû, v̂i

and
kuk2 = kûk2 .

c
Lecture 1, Amos Lapidoth 2017
The Fourier Transform of Real Signals (1)

If x is a real signal, then its FT is conjugate symmetric


 
x̂(−f ) = x̂∗ (f ), f ∈ R , x is real.

Thus, if x is real, then


• The magnitude of x̂ is symmetric, and
• the phase of x̂ is antisymmetric.
We indicate this by dashing the plot of x̂ at the negative
frequencies.

c
Lecture 1, Amos Lapidoth 2017
The Fourier Transform of Real Signals (2)

x̂(f )

Figure: The FT x̂ of a real signal x.

c
Lecture 1, Amos Lapidoth 2017
The Fourier Transform of Real Signals (3)

ŷ(f )

Figure: The FT ŷ of a real signal y.

c
Lecture 1, Amos Lapidoth 2017
The Fourier Transform of Real Signals (4)
Since Z ∞
x̂ : f 7→ x(t) e−i2πf t dt,
−∞
Z ∞
x̂(−f ) = x(t) e−i2π(−f )t dt
Z−∞

= x(t) ei2πf t dt
−∞
Z ∞  ∗ ∗
i2πf t
= x(t) e dt
−∞
Z ∞  ∗  ∗  ∗
i2πf t
= x(t) e dt
−∞
Z ∞ ∗
= x(t) e−i2πf t dt
−∞
= x̂∗ (f ).
c
Lecture 1, Amos Lapidoth 2017
An Important Fourier Pair

1
first zero at α

cutoff γ
1 1
δ = 2γβ, γ = .
α 2
c
Lecture 1, Amos Lapidoth 2017
Bandwidth
We say that x is bandlimited to W Hz if,

x̂(f ) = 0, |f | > W.

The bandwidth of x is the smallest W to which it is bandlimited.

x̂(f )

f
−W W

c
Lecture 1, Amos Lapidoth 2017
Some Notes on Bandwidth
• Find the shortest symmetric interval containing all the
frequencies where x̂ is not zero.
• Half the length of this interval is the bandwidth.
• We seek a symmetric interval, but we only measure its part
where the frequencies are positive.
• Not to be confused with bandwidth around the carrier
frequency fc .

W W

f
−W W

c
Lecture 1, Amos Lapidoth 2017
Another Example of Bandwidth

W W

f
−W W

c
Lecture 1, Amos Lapidoth 2017
Not to Be Confused with
Bandwidth around a Carrier Frequency

ŷ(f )

c
Lecture 1, Amos Lapidoth 2017
More Precise Definition of Bandwidth

We say that the signal x is an integrable signal that is


bandlimited to W Hz if x is integrable and if it is unaltered when
it is lowpass filtered by an ideal unit-gain lowpass filter of cutoff
frequency W:

x(t) = (x ? LPFW )(t), t ∈ R.

c
Lecture 1, Amos Lapidoth 2017
The Ideal Unit-Gain Lowpass Filter
Here
(
2Wc sin(2πW c t)
if t 6= 0,
LPFWc (t) , 2πWc t t ∈ R.
2Wc if t = 0,

d W (f )
LPF c

Wc

f
−Wc Wc

c
Lecture 1, Amos Lapidoth 2017
The Two Are not the same

• Changing a signal at one point can cause it to be


discontinuous, but it does not affect its Fourier Transform.
• The output of a LPF is continuous.

c
Lecture 1, Amos Lapidoth 2017
Set of Measure Zero (1): (Section 2.5)

• Changing the value of an integrand at one point does not


change the integral.
• Likewise for a finite number of points.
• Likewise for a countable infinite number of points.
• Likewise for?

c
Lecture 1, Amos Lapidoth 2017
Set of Measure Zero (2)

• Changing the value of an integrand at a set of points of


measure zero does not change the integral.
• If a nonnegative function integrates to zero, then it must be
zero outside a set of measure zero.
• We say that two functions are indistinguishable if they differ
on a set of measure zero.
• Every countable set is of measure zero, but there are some
sets of measure zero that are not countable.

c
Lecture 1, Amos Lapidoth 2017
Set of Measure Zero (3)

We say that a subset N of the real line R is a set of Lebesgue


measure zero (or a Lebesgue null set) if for every  > 0 we can
find a sequence of intervals [a1 , b1 ], [a2 , b2 ], . . . such that the total
length of the intervals is smaller than or equal to 

X
(bj − aj ) ≤ 
j=1

and the union of the intervals covers the set N

N ⊆ [a1 , b1 ] ∪ [a2 , b2 ] ∪ · · · .

c
Lecture 1, Amos Lapidoth 2017
Communication and Detection Theory: Lecture 2

Amos Lapidoth
ETH Zurich

February 28, 2017

Passband Signals: Bandwidth and Representation

c
Lecture 2, Amos Lapidoth 2017
Today: Ch. 7

• Passband signals.
• Bandpass filters (Sec. 6.3).
• Signals that are bandlimited to W Hz around a carrier
frequency fc .
• Bandwidth around a carrier frequency.
• Multiplying a signal by a carrier.
• The Analytic representation.
• The Baseband representation.
• Inner products in passband and baseband.
• Baseband representation of xPB ? yPB .
• Baseband representation of xPB ? h.

c
Lecture 2, Amos Lapidoth 2017
The FT of a Real Passband Signal

ŷ(f )

f
W W
−fc fc − 2
fc fc + 2

c
Lecture 2, Amos Lapidoth 2017
Passband Signals

Loosely speaking, xPB is a passband signal that is bandlimited to


W Hz around the carrier frequency fc if
W
fc > >0
2
and

x̂PB (f ) = 0, |f | − fc > W .
2

c
Lecture 2, Amos Lapidoth 2017
The Ideal Unit-Gain Bandpass Filter

BPFW,fc (t) = 2W cos(2πfc t) sinc(Wt) t ∈ R.

n o
d W,f (f ) , I |f | − fc ≤ W ,
BPF f ∈ R.
c
2

d W,f (f )
BPF c

W
1

f
−fc fc

c
Lecture 2, Amos Lapidoth 2017
Passband Signals

A signal xPB is said to be an integrable passband signal that is


bandlimited to W Hz around the carrier frequency fc if it is
integrable
xPB ∈ L1 ; (2a)
the carrier frequency fc satisfies
W
fc > > 0; (2b)
2
and if xPB is unaltered when it is fed to an ideal unit-gain
bandpass filter of bandwidth W around the carrier frequency fc

xPB (t) = xPB ? BPFW,fc (t), t ∈ R. (2c)

c
Lecture 2, Amos Lapidoth 2017
Bandwidth around fc

The bandwidth of xPB around fc is the smallest W s.t. xPB is


bandlimited to W Hz around fc .

ŷ(f )

f
W W
−fc fc − 2
fc fc + 2

c
Lecture 2, Amos Lapidoth 2017
Remarks on the Bandwidth around fc

• We look at equal-length, symmetric intervals around fc and


around −fc .
• We measure the “length” of the positive frequencies.
• Depends on both xPB and fc .

c
Lecture 2, Amos Lapidoth 2017
The Bandwidth around fc Depends on fc

f
W W
−fc fc − 2
fc fc + 2

c
Lecture 2, Amos Lapidoth 2017
The FT of t 7→ x(t) ei2πfc t is f 7→ x̂(f − fc ).
Z ∞  Z ∞
i2πfc t
x(t) e e −i2πf t
dt = x(t) ei2πfc t e−i2πf t dt
−∞
Z−∞

= x(t) e−i2π(f −fc )t dt
−∞
= x̂(f − fc ).

Likewise
t 7→ x(t) e−i2πfc t has FT f 7→ x̂(f + fc ).
So
1 1
t 7→ x(t) cos(2πfc t) has FT f 7→ x̂(f − fc ) + x̂(f + fc )
2 2
because
1 1
cos(2πfc t) = ei2πfc t + e−i2πfc t .
2 2
c
Lecture 2, Amos Lapidoth 2017
Multiplication by a Carrier Doubles the Bandwidth

If x is of bandwidth W Hz and if fc > W, then


t 7→ x(t) cos(2πfc t) is a passband signal of bandwidth 2W around
the carrier frequency fc .

Recall that
1 1
t 7→ x(t) cos(2πfc t) has FT f 7→ x̂(f − fc ) + x̂(f + fc )
2 2

c
Lecture 2, Amos Lapidoth 2017
x̂(f )

f
−W W

ŷ(f )

2W

1
2
f
fc − W fc fc + W

c
Lecture 2, Amos Lapidoth 2017
x̂(f )

f
−W W

ŷ(f )

2W

1
2
f
fc − W fc fc + W

c
Lecture 2, Amos Lapidoth 2017
The Analytic and the Baseband Representations

• We’ll use the analytic representation as a stepping stone


towards the baseband representation.
• The baseband representation allows us to separate between
things that depend on the carrier and things that don’t.
• It is important in sampling and in simulation.

c
Lecture 2, Amos Lapidoth 2017
First Aid

• Only real passband signals have such representations.


• The transmitted signals are real.
• Integrals and convolutions of complex signals can be handled
by working with the real and imaginary parts separately.

c
Lecture 2, Amos Lapidoth 2017
The Analytic Representation

xA is a complex signal whose FT x̂A is


(
x̂PB (f ) if f ≥ 0,
x̂A (f ) =
0 otherwise.

It is obviously complex, because its FT is not conjugate symmetric.

c
Lecture 2, Amos Lapidoth 2017
x̂PB (f )

f
−fc fc

x̂A (f )

f
fc

c
Lecture 2, Amos Lapidoth 2017
From xA to xPB
Because xPB is real,

x̂PB (−f ) = x̂∗PB (f ), f ∈ R.

And, by definition,
(
x̂PB (f ) if f ≥ 0,
x̂A (f ) =
0 otherwise.

So,
x̂PB (f ) = x̂A (f ) + x̂∗A (−f ), f ∈R
and hence, as we next argue,

xPB (t) = 2 Re xA (t) , t ∈ R.

c
Lecture 2, Amos Lapidoth 2017
The FT of x∗ and Re(x)

x∗ has FT f 7→ x̂∗ (−f )


because
Z ∞ Z ∞ ∗

x (t) e −i2πf t
dt = x(t) ei2πf t dt
−∞ −∞
Z∞ ∗
i2πf t
= x(t) e dt
−∞
= x̂∗ (−f ).

Since Re(x) equals (x + x∗ )/2,


1 1
Re(x) has FT f 7→ x̂(f ) + x̂∗ (−f )
2 2
And
2 Re(x) has FT f 7→ x̂(f ) + x̂∗ (−f )
c
Lecture 2, Amos Lapidoth 2017
kxPB k22 = 2 kxA k22

Proof:
• Parseval
• xPB is real, so |x̂| is symmetric.
Z ∞
kxPB k22 = x̂PB (f ) 2 df
−∞
Z ∞

=2 x̂PB (f ) 2 df
Z0 ∞

=2 x̂A (f ) 2 df
0
= 2 kxA k22 .

c
Lecture 2, Amos Lapidoth 2017
x̂PB (f )

f
−fc fc

x̂A (f )

f
fc

c
Lecture 2, Amos Lapidoth 2017



xPB , yPB = 2 Re xA , yA

ku + vk22 = hu + v, u + vi
= hu, ui + hu, vi + hv, ui + hv, vi
= kuk22 + hu, vi + hu, vi∗ + kvk22

= kuk22 + kvk22 + 2 Re hu, vi .
Thus,

kxA + yA k22 = kxA k22 + kyA k22 + 2 Re hxA , yA i ,
kxPB + yPB k22 = kxPB k22 + kyPB k22 + 2 hxPB , yPB i .
The result now follows from
kxPB k22 = 2kxA k22 ,
kyPB k22 = 2kyA k22 ,
kxPB + yPB k22 = 2kxA + yA k22 .
c
Lecture 2, Amos Lapidoth 2017
x̂PB (f )

f
−fc fc

ŷPB (f )

f
−fc fc
c
Lecture 2, Amos Lapidoth 2017
Some Comments on the Analytic Representation

• The representation is linear in the sense that if α and β are


real, then the representation of αxPB + βyPB is αxA + βyA .
• xPB = 2 Re(z) does not imply that z equals xA .
• But if xPB = 2 Re(z) and ẑ is zero at all negative frequencies,
then z indeed equals xA .

c
Lecture 2, Amos Lapidoth 2017
The Baseband Representation of xPB w.r.t. fc

In the time domain:

xBB (t) , e−i2πfc t xA (t), t ∈ R.

In the frequency domain:

x̂BB (f ) = x̂A (f + fc ), f ∈ R.

c
Lecture 2, Amos Lapidoth 2017
118 Passband Signals and Their Representation

x̂PB (f )

f
−fc fc

x̂A (f )

f
fc

x̂BB (f )

c
Lecture 2, Amos Lapidoth 2017
x̂PB (f )

f
−fc fc

x̂PB (f + fc )

f
−2fc −fc −W
2
W
2

g0 (f )

f
−Wc Wc

x̂BB (f )

f
−W
2
W
2

c
Lecture 2, Amos Lapidoth 2017
From xPB to xBB

xBB = t 7→ e−i2πfc t xPB (t) ? ǧ0 ,
where g0 : f 7→ g0 (f ) is any integrable function satisfying
W
g0 (f ) = 1, |f | ≤ ,
2
and
W
g0 (f ) = 0, |f + 2fc | ≤ .
2
For example,

xBB = t 7→ e−i2πfc t xPB (t) ? LPFWc ,

where Wc is any cutoff frequency in the range

W W
≤ Wc ≤ 2fc − .
2 2
c
Lecture 2, Amos Lapidoth 2017
Convolving a Complex Signal with a Real Signal


Re x ? h = Re(x) ? h,
 h is real-valued.
Im x ? h = Im(x) ? h,

Proof: Start with


Z ∞
(x ? h)(t) = x(τ ) h(t − τ ) dτ ;
−∞
R R R R
recall that Re( ) = Re and Im( ) = Im; and note that if h(·)
is real-valued, then for all t, τ ∈ R,
 
Re x(τ ) h(t − τ ) = Re x(τ ) h(t − τ ),
 
Im x(τ ) h(t − τ ) = Im x(τ ) h(t − τ ).

c
Lecture 2, Amos Lapidoth 2017
The In-Phase and Quadrature Components
Because xPB is real,

Re xPB (t) e−i2πfc t = xPB (t) cos(2πfc t), t ∈ R,

Im xPB (t) e−i2πfc t = −xPB (t) sin(2πfc t), t ∈ R.

And because we are convolving t 7→ xPB (t) e−i2πfc t with a real


filter LPFWc
 
Re(xBB ) = t 7→ xPB (t) cos(2πfc t) ? LPFWc ,
 
Im(xBB ) = − t 7→ xPB (t) sin(2πfc t) ? LPFWc .

Re(xBB ) , in-phase component


Im(xBB ) , quadrature component.

c
Lecture 2, Amos Lapidoth 2017
From xPB to xBB

xPB (t) cos(2πfc ) Re xBB (t)
× LPFWc

cos(2πfc t)

W W
xPB (t) 2 ≤ Wc ≤ 2fc − 2

90◦

× LPFWc 
−xPB (t) sin(2πfc t) Im xBB (t)

c
Lecture 2, Amos Lapidoth 2017
Bandwidth

The bandwidth of xPB around fc is twice the bandwidth of xBB .

Its all in the figure. . .

c
Lecture 2, Amos Lapidoth 2017
x̂PB (f )

f
−fc fc

x̂PB (f + fc )

f
−2fc −fc −W
2
W
2

g0 (f )

f
−Wc Wc

x̂BB (f )

f
−W
2
W
2

c
Lecture 2, Amos Lapidoth 2017
Recovering xPB from xBB and fc

Recall that
xBB (t) , e−i2πfc t xA (t), t∈R
and that 
xPB (t) = 2 Re xA (t) , t ∈ R.
Consequently,

xPB (t) = 2 Re xBB (t) ei2πfc t , t ∈ R.

c
Lecture 2, Amos Lapidoth 2017
x̂BB (f )

x̂BB (f − fc )

f
fc

x̂∗BB (−f )

x̂∗BB (−f − fc )

f
−fc

x̂PB (f ) = x̂BB (f − fc ) + x̂∗BB (−f − fc )

f
−fc −fc

c
Lecture 2, Amos Lapidoth 2017
Some Remarks on the Baseband Representation

• The baseband representation is linear in the sense that if α


and β are real, then the representation of αxPB + βyPB is
αxBB + βyBB .
• To recover xPB you need xBB and fc .
• This is not a bug; it is a feature!

• xPB (t) = 2 Re z(t) ei2πfc t does not imply that z equals xBB .

• However, if xPB (t) = 2 Re z(t) ei2πfc t and z is bandlimited
to W/2 Hz, then z equals xBB .

c
Lecture 2, Amos Lapidoth 2017
Inner Products


hxPB , yPB i = 2 Re hxBB , yBB i

kxPB k22 = 2 kxBB k22

Proof:
Z ∞

hxBB , yBB i = xBB (t) yBB (t) dt
Z−∞
∞ ∗
= e−i2πfc t xA (t) e−i2πfc t yA (t) dt
Z−∞

= e−i2πfc t xA (t) ei2πfc t yA∗ (t) dt
−∞
= hxA , yA i .
c
Lecture 2, Amos Lapidoth 2017
x̂PB (f )

f
−fc fc

ŷPB (f )

f
−fc fc
c
Lecture 2, Amos Lapidoth 2017
Orthogonality in Passband

Two real passband signals xPB , yPB are orthogonal iff the inner
product between their baseband representations is purely imaginary.

c
Lecture 2, Amos Lapidoth 2017
The Baseband Representation of xPB ? yPB

Recalling that the FT of a convolution is the product of the


transforms, we obtain

The baseband representation of xPB ? yPB


is xBB ? yBB .

c
Lecture 2, Amos Lapidoth 2017
x̂PB (f )

f
−fc fc

ŷPB (f )

f
−fc fc
c
Lecture 2, Amos Lapidoth 2017
7.6 Baseband Representation of Real Passband Signals 127

x̂PB (f )

f
−fc fc
ŷPB (f )

1.5

x̂PB (f ) ŷPB (f )

1.5

x̂BB (f )

ŷBB (f )

x̂BB (f ) ŷBB (f )

Figure 7.13: The convolution of two real passband signals and its baseband rep-
resentation.
c
Lecture 2, Amos Lapidoth 2017
The Baseband Representation of xPB ? h

Here
• xPB is a real passband signal that is bandlimited to W Hz
around fc , but
• h is a general (not necessarily bandpass) real impulse
response.
• xPB ? h is the filter’s response to xPB .

c
Lecture 2, Amos Lapidoth 2017
7.7 Energy-Limited Passband Signals 131

x̂PB (f )

W
1

f
−fc fc
ĥ(f )

f
−fc fc
x̂PB (f ) ĥ(f )

f
−fc fc

c
Lecture 2, Amos Lapidoth 2017
Frequency Response w.r.t. the bandwidth W around fc (1)
ĥ(f )
W

f
fc

f
c
Lecture 2, Amos Lapidoth 2017
−W
2
W
2
Frequency Response w.r.t. the bandwidth W around fc (2)

Definition (Frequency Response with Respect to a Band)


For a stable real filter of impulse response h we define the
frequency response with respect to the bandwidth W around
the carrier frequency fc (satisfying fc > W/2) as the mapping
n Wo
f 7→ ĥ(f + fc ) I |f | ≤ .
2

The FT of the baseband representation of xPB ? h is the product


of x̂BB by the filter’s frequency response with respect to the
bandwidth W around the carrier frequency fc .

c
Lecture 2, Amos Lapidoth 2017
7.7 Energy-Limited Passband Signals 131

x̂PB (f )

W
1

f
−fc fc
ĥ(f )

f
−fc fc
x̂PB (f ) ĥ(f )

f
−fc fc

c
Lecture 2, Amos Lapidoth 2017
x̂BB (f )

f
−W
2
W
2

c
Lecture 2, Amos Lapidoth 2017
Next Week

We’ll cover Chapter 8. Please review your linear algebra (inner


product spaces) by reading Chapter 4.

Thank you!

c
Lecture 2, Amos Lapidoth 2017
Communication and Detection Theory: Lecture 3

Amos Lapidoth
ETH Zurich

March 7, 2017

The Geometry of L2 and the Sampling Theorem

c
Lecture 3, Amos Lapidoth 2017
Today

Chapters 4 and 8:
• L2 as vector space.
• Finite-dimensional subspaces of L2 : bases, dimension,
orthonormal bases, and the Gram-Schmidt procedure (review).
• Expressing a signal in terms of a given orthonormal basis.
• Projections onto a finite dimensional sub-space.
• The projection and closest element in the subspace.
• Complete Orthonormal Systems (CONS).
• CONS and Parseval’s Theorem.
• The Sampling Theorem as an orthonormal expansion.

c
Lecture 3, Amos Lapidoth 2017
Amplification and Superposition
L2 is the space of energy-limited signals.

Given a complex signal u and α ∈ C, the amplification-by-α-of-u


is the signal αu
t 7→ αu(t), t ∈ R.

The superposition u + v of the signals u and v, is the signal

t 7→ u(t) + v(t), t ∈ R.

With these operations, L2 forms a vector space over C.

A finite sum u1 + · · · + un is similarly defined.

c
Lecture 3, Amos Lapidoth 2017
Linear Subspaces

A subset U ⊆ L2 is a linear subspace of L2 if it is not empty; it is


closed under superposition

u1 + u2 ∈ U, u1 , u2 ∈ U;

and it is closed under amplification



αu ∈ U, α ∈ C, u∈U .

Example: The set of all energy-limited signals that are zero


whenever t 6= 17.

c
Lecture 3, Amos Lapidoth 2017
Another Linear Subspace
All signals of the form

t 7→ p(t) e−|t| ,

where p(t) is any complex polynomial of degree ≤ 3.



u : t 7→ α0 + α1 t + α2 t2 + α3 t3 e−|t| .


αu : t 7→ αα0 + αα1 t + αα2 t2 + αα3 t3 e−|t| .
If u is as above and

v : t 7→ β0 + β1 t + β2 t2 + β3 t3 e−|t| ,

then

u+v : t 7→ (α0 + β0 ) + (α1 + β1 )t + (α2 + β2 )t2 + (α3 + β3 )t3 e−|t| .

c
Lecture 3, Amos Lapidoth 2017
Linear Combinations, Span, and Independence
• v ∈ L2 is a linear combination of (v1 , . . . , vn ) if it equals

α 1 v1 + · · · + α n vn ,

i.e.,
n
X
α ν vν ,
ν=1
for some α1 , . . . , αn ∈ C.
• span(v1 , . . . , vn ) is the set of all vectors in L2 that are linear
combinations of (v1 , . . . , vn ).
• span(v1 , . . . , vn ) is a linear subspace of L2 .
• The n-tuple (v1 , . . . , vn ) is linearly independent if
X n   
αν vν = 0 =⇒ αν = 0, ν = 1, . . . , n .
ν=1

c
Lecture 3, Amos Lapidoth 2017
Dimension, Finite and Infinite

• A subspace U of L2 is finite-dimensional if there exists an


n-tuple (u1 , . . . , un ) such that span(u1 , . . . , un ) = U.
• (u1 , . . . , ud ) is a basis for U if it is
1. linearly independent and
2. span(u1 , . . . , ud ) = U.
• All bases for a finite-dimensional subspace U have the same
number of elements: the dimension of U — dim U.

c
Lecture 3, Amos Lapidoth 2017
Some Examples

• The set of all signals of the form

t 7→ p(t) e−|t| ,

where p(·) is any polynomial is infinite dimensional.


• If p(·) is restricted to degree ≤ 3, then dim U = 4, and a basis
is

t 7→ e−|t| , t 7→ t e−|t| , t 7→ t2 e−|t| , t 7→ t3 e−|t| .

• If U comprises all signals that vanish whenever t 6= 17, then


dim U = 1 and a basis is

t 7→ I{t = 17}.

c
Lecture 3, Amos Lapidoth 2017
kuk2 as the “length” of the Signal u(·)
For u, v ∈ L2 and α ∈ C

kαuk2 = |α| kuk2 ,

ku + vk2 ≤ kuk2 + kvk2 ,


and    
kuk2 = 0 ⇐⇒ u ≡ 0 .

Also,

kuk − kvk ≤ ku + vk ≤ kuk + kvk , u, v ∈ L2 ,
2 2 2 2 2

because

kvk2 = k(v + u) + (−u)k2 ≤ kv + uk2 +k−uk2 = kv + uk2 +kuk2

and likewise when you swap u and v.


c
Lecture 3, Amos Lapidoth 2017
The Triangle Inequality for Energy-Limited Signals

u+v v

c
Lecture 3, Amos Lapidoth 2017
A Pythagorean Theorem
Last time:

ku + vk22 = kuk22 + kvk22 + 2 Re hu, vi ,
so
ku + vk22 = kuk22 + kvk22 , u and v are orthogonal.
By induction
ku1 + · · · + un k22 = ku1 k22 +· · ·+kun k22 , u1 , . . . , un pairwise orthog.
Indeed, if u , u1 , and v , u2 + · · · un , then the pairwise
orthogonality implies hu, vi = 0 because
hu, vi = hu1 , u2 + · · · un i = hu1 , u2 i + · · · + hu1 , un i = 0.
Hence ku + vk22 = kuk22 + kvk22 , i.e.,
ku1 + · · · + un k22 = ku1 k22 + ku2 + · · · + un k22 .
Now use the induction hypothesis.
c
Lecture 3, Amos Lapidoth 2017
Projecting v onto u (1)

u
w = (length of v) cos(angle between v and u) .
length of u

c
Lecture 3, Amos Lapidoth 2017
Projecting v onto u (2)

The projection of the signal v ∈ L2 onto the signal u ∈ L2 is the


signal w that satisfies both
1. w = αu for some α ∈ C and
2. v − w is orthogonal to u.
c
Lecture 3, Amos Lapidoth 2017
Projecting v onto u (3)
The projection of the signal v ∈ L2 onto the signal u ∈ L2 is the
signal w that satisfies both
1. w = αu for some α ∈ C and
2. v − w is orthogonal to u.
hv − αu, ui = 0,
i.e.,
hv, ui − α kuk22 = 0.
For kuk2 > 0 (strictly),1
hv, ui
α= ,
kuk22
and the projection w is thus unique and is given by
hv, ui
w= u.
kuk22
1
Otherwise the projection is not defined.
c
Lecture 3, Amos Lapidoth 2017
Projecting v onto u (4)

Since v − w is orthogonal to u, and since w equals αu, it follows


that v − w is orthogonal to w. Consequently, the Pythagorean
Theorem yields that projection reduces length:

kvk22 = k(v − w) + wk22 = kv − wk22 + kwk22 ≥ kwk22 .

And since
hv, ui
w= u,
kuk22
we obtain the Cauchy-Schwarz Inequality

|hu, vi| ≤ kuk2 kvk2 .

c
Lecture 3, Amos Lapidoth 2017
Orthonormal Tuples

The n-tuple of L2 signals (φ1 , . . . , φn ) is orthonormal, if


(
0 if ` 6= `0 ,
hφ` , φ`0 i = `, `0 ∈ {1, . . . , n}.
1 if ` = `0 ,

c
Lecture 3, Amos Lapidoth 2017
Orthonormal Tuples Are Linearly Independent
If
n
X
α` φ` = 0,
`=1

then for every `0 ∈ {1, . . . , n}

0 = h0, φ`0 i
X n 
= α` φ` , φ`0
`=1
n
X
= α` hφ` , φ`0 i
`=1
n
X
= α` I{` = `0 }
`=1
= α`0 .

c
Lecture 3, Amos Lapidoth 2017
An Orthonormal Basis

A d-tuple of signals in L2 is an orthonormal basis for the linear


subspace U ⊂ L2 if it is orthonormal and its span is U.

c
Lecture 3, Amos Lapidoth 2017
If (φ1 , . . . , φd ) is an orthonormal basis for U ⊂ L2 ,
d
X
u= hu, φ` i φ` , u ∈ U.
`=1
Since (φ1 , . . . , φd ) is a basis for U, any u ∈ U can be expressed as
X d
u= α` φ` .
`=1

X
d 
hu, φ`0 i = α` φ` , φ`0
`=1
d
X
= α` hφ` , φ`0 i
`=1
Xd
= α` I{` = `0 }
`=1
= α`0 .
c
Lecture 3, Amos Lapidoth 2017
Projections (1)

Lemma
Let (φ1 , . . . , φd ) be an orthonormal basis for U ⊂ L2 . Let v ∈ L2
be arbitrary.
P
1. v − d`=1 hv, φ` i φ` is orthogonal to every signal in U:
 d
X   
v− hv, φ` i φ` , u = 0, v ∈ L2 , u∈U .
`=1

2. If w ∈ U is s.t. v − w is orthogonal to every signal in U, then


d
X
w= hv, φ` i φ` .
`=1

c
Lecture 3, Amos Lapidoth 2017
Projections (2)

Proof. Pd
The signal v − `=1 α` φ` is orthogonal to φ`0 iff α`0 = hv, φ`0 i.
Hence:
P
1. If α`0 = hv, φ`0 i for all `0 ∈ {1, . . . , d} then v − d`=1 α` φ` is
orthogonal to each basis function and hence to every u ∈ U.
P
2. If w is in U, it can be written as d`=1 α` φ` , and if,
additionally, v − w is orthogonal to each φ`0 , then α`0 must
equal hv, φ`0 i.

c
Lecture 3, Amos Lapidoth 2017
Projections (3)

Definition (Projection of v ∈ L2 onto U)


Let U ⊂ L2 be a finite-dimensional linear subspace of L2 having
an orthonormal basis. Let v ∈ L2 be an arbitrary energy-limited
signal. Then the projection of v onto U is the unique element w of
U satisfying
hv − w, ui = 0, u ∈ U.

Note: If (φ1 , . . . , φd ) is an orthonormal basis for U, then the


projection of v ∈ L2 onto U is
d
X
hv, φ` i φ` .
`=1

c
Lecture 3, Amos Lapidoth 2017
Projection as Best Approximation
Let (φ1 , . . . , φd ) be an orthonormal basis for U ⊂ L2 . Let v ∈ L2
be arbitrary. Then the projection w of v onto U is the element in
U that, among all the elements of U, is closest to v in the sense
that
kv − uk2 ≥ kv − wk2 , u ∈ U.

Proof.
Let u be any element of U. Then so is w − u. Since v − w is
orthogonal to all elements of U it is a fortiori also orthogonal to
w − u. Thus, by Pythagoras,

kv − uk22 = k(v − w) + (w − u)k22


= kv − wk22 + kw − uk22
≥ kv − wk22 .

c
Lecture 3, Amos Lapidoth 2017
Energy and Inner Products and Orthonormal Bases (1)

Let (φ1 , . . . , φd ) be an orthonormal basis for U ⊂ L2 . Then


d
X
kuk22 = hu, φ` i 2 , u ∈ U.
`=1

Proof.
Follows from the Pythagorean Theorem and
d
X
u= hu, φ` i φ` , u ∈ U.
`=1

c
Lecture 3, Amos Lapidoth 2017
Energy and Inner Products and Orthonormal Bases (2)
Let (φ1 , . . . , φd ) be an orthonormal basis for U ⊂ L2 . Then
d
X
kvk22 ≥ hv, φ` i 2 , v ∈ L2 .
`=1

Proof.
Let w be the projection of v onto U. Then

kvk22 = k(v − w) + wk22 = kv − wk22 + kwk22 ≥ kwk22 .

And by the expression for w and the Pythagorean Theorem


2
Xd d
X
hv, φ` i 2 .
kwk22 = hv, φ` i φ` =

`=1 2 `=1

c
Lecture 3, Amos Lapidoth 2017
Energy and Inner Products and Orthonormal Bases (3)
Let (φ1 , . . . , φd ) be an orthonormal basis for U ⊂ L2 . Then
d
X  
hv, ui = hv, φ` i hu, φ` i∗ , v ∈ L2 , u ∈ U .
`=1

Since u ∈ U,
d
X
u= hu, φ` i φ` .
`=1
Consequently, using the sesquilinearity of the inner product,
 X d 
hv, ui = v, hu, φ` i φ`
`=1
d
X
= hu, φ` i∗ hv, φ` i .
`=1

c
Lecture 3, Amos Lapidoth 2017
Does Every Finite-Dimensional Subspace Have an
Orthonormal Basis?

In general, no:

u ∈ L2 : u(t) = 0 whenever t 6= 17

is a one-dimensional subspace that does not.

If U is a finite-dimensional subspace of L2 , then the following two


statements are equivalent:
1. U has an orthonormal basis.
2. The only element of U of zero energy is the all-zero signal 0.

c
Lecture 3, Amos Lapidoth 2017
Gram-Schmidt

Input: a basis (u1 , . . . , ud ).


u1
φ1 , .
ku1 k2
P
uν − ν−1
`=1 huν , φ` i φ`

φν = Pν−1 .
uν − `=1 huν , φ` i φ`
2

c
Lecture 3, Amos Lapidoth 2017
The Benefits of Orthonormal Bases

d
X
u= hu, φ` i φ` , u ∈ U.
`=1

d
X
kuk22 = hu, φ` i 2 , u∈U
`=1

d
X  
hv, ui = hv, φ` i hu, φ` i∗ , v ∈ L2 , u ∈ U .
`=1

d
X
w= hv, φ` i φ` .
`=1

c
Lecture 3, Amos Lapidoth 2017
Complete Orthonormal System (CONS)
Definition: . . . , φ−1 , φ0 , φ1 , . . . is a complete orthonormal
system (CONS) for the linear subspace U ⊆ L2 if:
1. φ` ∈ U, ` ∈ Z.
2. hφ` , φ`0 i = I{` = `0 }, `, `0 ∈ Z.
P
3. kuk2 = ∞ hu, φ` i 2 , u ∈ U.
2 `=−∞

Note: Orthonormality suffices for


L
X
kuk22 ≥ hu, φ` i 2 , u∈U
`=−L

and hence, letting L → ∞,



X
kuk22 ≥ hu, φ` i 2 , u ∈ U.
`=−∞

c
Lecture 3, Amos Lapidoth 2017
If {φ` } ⊂ U are orthonormal, then the following are equivalent:
• ∀u ∈ U and ∀ > 0 there exists some positive integer L() and
coefficients α−L() , . . . , αL() ∈ C s.t.
L()
X

u − α ` φ` (3)
< .
`=−L() 2

• For every u ∈ U
L
X

lim u − hu, φ` i φ` (4)
L→∞ = 0.
`=−L 2

• For every u ∈ U

X
kuk22 = hu, φ` i 2 . (5)
`=−∞
• For every u, v ∈ U

X
hu, vi = hu, φ` i hv, φ` i∗ . (6)
c
Lecture 3, Amos Lapidoth 2017 `=−∞
• ∀u ∈ U and ∀ > 0 there exists some positive integer L() and
coefficients α−L() , . . . , αL() ∈ C s.t.

L()
X
u − α φ < .
` `
`=−L() 2

• For every u ∈ U
L
X
lim
u − hu, φ` i φ`
= 0.
L→∞ 2
`=−L

One direction is obvious, and the other because


L
X
hu, φ` i φ`
`=−L

is the closest element to u in span(φ−L() , . . . , φ−L() ).


c
Lecture 3, Amos Lapidoth 2017
L
X
lim
u − hu, φ` i φ`
= 0.
L→∞ 2
`=−L
implies

X
kuk22 = hu, φ` i 2 .
`=−∞
because

kuk − kvk ≤ ku + vk ≤ kuk + kvk , u, v ∈ L2 .
2 2 2 2 2
(The distance → 0 only if lengths have same limit.) Conversely,
 X L   
u− hu, φ` i φ` , φ`0 = hu, φ`0 i I{|`0 | > L}, `0 ∈ Z, u ∈ L2 .
`=−L
| {z }
u0
L 2
X X
u − hu, φ` i φ` hu, φ` i 2 → 0, u ∈ U.
=
`=−L 2 |`|>L
c
Lecture 3, Amos Lapidoth 2017

X
kuk22 = hu, φ` i 2 , u ∈ U.
`=−∞
is implied by

X
hu, vi = hu, φ` i hv, φ` i∗ , u, v ∈ U.
`=−∞
Conversely, the former implies

lim u − uL = 0,2
lim v − vL 2 = 0, (7)
L→∞ L→∞
L
X L
X
uL , hu, φ` i φ` , vL , hv, φ` i φ` .
`=−L `=−L


hu, vi = (u − uL ) + uL , (v − vL ) + vL
L
X
huL , vL i = hu, φ` i hv, φ` i∗ .
`=−L
The cross-terms
c
Lecture 3, Amos
tend to zero by (7) and Cauchy-Schwarz.
Lapidoth 2017
Fourier Series

Let S be positive. The bi-infinite sequence of functions defined for


every η ∈ Z by
 
1 i2πηs/S S S
s 7→ √ e I − ≤s< , s∈R
S 2 2

forms a CONS for the subspace of square-integrable functions that


vanish outside the interval [−S/2, S/2).

See Theorem A.3.3 in the appendix.

c
Lecture 3, Amos Lapidoth 2017
The Frequency Domain

For W > 0 define the linear subspace



V = g ∈ L2 : g(f ) = 0 whenever |f | > W .

The functions defined for every integer ` by


1
f 7→ √ eiπ`f /W I{|f | ≤ W}
2W
form a CONS for V.

c
Lecture 3, Amos Lapidoth 2017
Energy-Limited Signals that Are Bandlimited to W Hz

The signal x is an energy-limited signal that is bandlimited to W


Hz if, and only if, x = ǧ for some function g : f 7→ g(f ) satisfying

g(f ) = 0, |f | > W

and Z W
|g(f )|2 df < ∞.
−W
Thus, if

V = g ∈ L2 : g(f ) = 0 whenever |f | > W .

then V̌ is the set of all energy-limited signals that are bandlimited


to W Hz.

c
Lecture 3, Amos Lapidoth 2017
If {ψ` } is a CONS for V, then {ψ̌` } is a CONS for V̌
Let {ψ` } be a CONS for V. Orthonormality follows from Parseval:


ψ̌` , ψ̌`0 = hψ` , ψ`0 i , `, `0 ∈ Z.

We need to verify that for every x ∈ V̌,



X

x, ψ̌` 2 = kxk2 .
2
`=−∞

Since x is in V̌, there exists some g ∈ V s.t. x = ǧ. Then,



X ∞
X


ǧ, ψ̌` 2 = hg, ψ` i 2
`=−∞ `=−∞

= kgk22
= kǧk22 , g ∈ V.

c
Lecture 3, Amos Lapidoth 2017
A CONS for V̌
The sequence of signals that are defined for every integer ` by

t 7→ 2W sinc(2Wt + `)

forms a CONS for the space of energy-limited signals that are


bandlimited to W Hz.
1
ψ` : f 7→ √ eiπ`f /W I{|f | ≤ W}
2W
form a CONS for the subspace V. Hence the IFT is a CONS for V̌.
Z ∞
ψ̌` (t) = ψ` (f ) ei2πf t df
−∞
ZW
1
= √ eiπ`f /W ei2πf t df
−W 2W

= 2W sinc(2Wt + `).
c
Lecture 3, Amos Lapidoth 2017
D √ E  
x, t 7→ 2W sinc(2Wt + `) = √1
2W
`
x − 2W , ` ∈ Z.
Let g ∈ V be such that x = ǧ, i.e,
Z W
x(t) = g(f ) ei2πf t df, t ∈ R.
−W

D √ E

x, t 7→ 2W sinc(2Wt + `) = x, ψ̌`


= ǧ, ψ̌`
= hg, ψ` i
Z W  1 ∗
= g(f ) √ eiπ`f /W df
−W 2W
Z W
1
=√ g(f ) e−iπ`f /W df
2W −W
1  ` 
=√ x − , ` ∈ Z.
2W 2W
c
Lecture 3, Amos Lapidoth 2017
If {φ` } ⊂ U are orthonormal, then the following are equivalent:
• ∀u ∈ U and ∀ > 0 there exists some positive integer L() and
coefficients α−L() , . . . , αL() ∈ C s.t.
L()
X

u − α ` φ` (8)
< .
`=−L() 2

• For every u ∈ U
L
X

lim u − hu, φ` i φ` (9)
L→∞ = 0.
`=−L 2

• For every u ∈ U

X
kuk22 = hu, φ` i 2 . (10)
`=−∞
• For every u, v ∈ U

X
hu, vi = hu, φ` i hv, φ` i∗ . (11)
c
Lecture 3, Amos Lapidoth 2017 `=−∞
Applying this with φ` = ψ̌`
Theorem (L2 -Sampling Theorem)
Let x be an energy-limited signal that is bandlimited to W Hz,
1
T= .
2W
Let . . . , x(−2T), x(−T), x(0), x(T), x(2T), . . . be the samples of x.
Z ∞ XL t  2

lim x(t) − x(−`T) sinc + ` dt = 0.
L→∞ −∞ T
`=−L
Z ∞ ∞
X
|x(t)|2 dt = T |x(`T)|2 .
−∞ `=−∞

If y is another energy-limited signal that is bandlimited to W Hz,


X∞
hx, yi = T x(`T) y ∗ (`T).
`=−∞
c
Lecture 3, Amos Lapidoth 2017
Pointwise Sampling Theorem
If the signal x can be represented as
Z W
x(t) = g(f ) ei2πf t df, t∈R
−W

for some function g satisfying


Z W
|g(f )| df < ∞,
−W

and if 0 < T ≤ 1/(2W), then for every t ∈ R


L
X  
t
x(t) = lim x(−`T) sinc +` .
L→∞ T
`=−L

A special case is when x is an energy-limited signals that is


bandlimited to W Hz.
c
Lecture 3, Amos Lapidoth 2017
V̌ V
energy-limited signals that energy-limited functions that
are bandlimited to W Hz vanish outside the interval [−W, W)
generic element of V̌ generic element of V
x : t �→ x(t) g : f �→ g(f )
a CONS a CONS
. . . , ψ̌−1 , ψ̌0 , ψ̌1 , . . . . . . , ψ−1 , ψ0 , ψ1 , . . .
√ � � 1
ψ̌� (t) = 2W sinc 2Wt + � ψ� (f ) = √ eiπ�f /W I{−W ≤ f < W}
2W
inner product inner product
� �
x, ψ̌� �g, ψ� �
� ∞ √ � W
� � 1
x(t) 2W sinc 2Wt + � dt g(f ) √ e−iπ�f /W df
−∞ � � � −W 2W
1
=√ x − = g’s �-th Fourier Series Coefficient (� c� )
2W 2W
Sampling Theorem Fourier Series
� �L � � �L �
� � � � � �

lim �x − x, ψ̌� ψ̌� � �
lim �g − �g, ψ� � ψ� �
L→∞ � = 0, L→∞ � = 0,
�=−L 2 �=−L 2

i.e., i.e.,
� � � � � �2
∞ � L
� � �� � ��2 �W L
� 1 �
�x(t) − � �
� x − sinc 2Wt + � �� dt → 0 �g(f ) − c� √ eiπ�f /W � df → 0
−∞ 2W −W � 2W �
�=−L �=−L

Table 8.1: The duality between the Sampling Theorem and the Fourier Series Representation.

c
Lecture 3, Amos Lapidoth 2017
The Sampling Theorem also holds for
complex signals.

c
Lecture 3, Amos Lapidoth 2017
An Isomorphism

If {α` } is a bi-infinite square-summable sequence, then there exists


an energy-limited signal u that is bandlimited to W Hz such that
its samples are given by

u(`T) = α` , ` ∈ Z.

c
Lecture 3, Amos Lapidoth 2017
Next Week

• Sampling of real passband signals (Chapter 9).


• Mapping bits to waveforms (Chapter 10).
• A glimpse at stochastic processes (Chapter 12).

Thank you!

c
Lecture 3, Amos Lapidoth 2017
Communication and Detection Theory: Lecture 4

Amos Lapidoth
ETH Zurich

March 14, 2017

Sampling Real Passband Signals

c
Lecture 4, Amos Lapidoth 2017
Today

• Sampling real passband signals (Chapter 9).


• Mapping bits to waveforms (Chapter 10).
• A glimpse at stochastic processes (Chapter 12).

c
Lecture 4, Amos Lapidoth 2017
Bandwidth vs. Bandwidth-around-fc
ŷ(f )
W
fc + 2

f
W W
−fc fc − 2
fc fc + 2

The bandwidth is fc + W/2; the bandwidth around fc is W.


A direct application of the Sampling Theorem would require
 W  (real) samples
2 fc + .
2 sec

c
Lecture 4, Amos Lapidoth 2017
The Holy Grail

2W real samples per second,

or

W complex samples per second.

• Sampling rate should not depend on fc .


• Separation of carrier-selection and sampling circuits.

c
Lecture 4, Amos Lapidoth 2017
Recall

• If xPB is a real passband signal whose bandwidth around fc is


W, then its baseband representation xBB (w.r.t. fc ) is a
complex signal of bandwidth W/2.
• The Sampling Theorem applies also to complex signals.
• xPB can be recovered from xBB and fc .

c
Lecture 4, Amos Lapidoth 2017
The Solution
Sampling:
• From xPB generate xBB .
• Sample xBB at its Nyquist rate

complex samples W complex samples


2×(bandwidth of xBB ) = 2×
sec 2 sec
complex samples
=W .
sec
Reconstruction:
• From the samples of xBB reconstruct xBB .
• From xBB (and fc ) reconstruct xPB :

xPB (t) = 2 Re ei2πfc t xBB (t) , t ∈ R.

c
Lecture 4, Amos Lapidoth 2017
x̂PB (f )

f
−fc fc

x̂PB (f + fc )

f
−2fc −fc −W
2
W
2

g0 (f )

f
−Wc Wc

x̂BB (f )

f
−W
2
W
2

c
Lecture 4, Amos Lapidoth 2017
Complex Sampling
We seek the samples of xBB :
`
xBB , ` ∈ Z.
W
Recalling that

xBB = t 7→ e−i2πfc t xPB (t) ? LPFWc ,

where
W W
≤ Wc ≤ 2fc − ,
2 2
`    ` 
−i2πfc t
xBB = t 7→ e xPB (t) ? LPFWc
W W
   ` 
= t→ 7 xPB (t) cos(2πfc t) ? LPFWc
W
   ` 
− i t 7→ xPB (t) sin(2πfc t) ? LPFWc , ` ∈ Z.
W
c
Lecture 4, Amos Lapidoth 2017
Complex Sampling
 
xPB (t) cos(2πfc t) Re xBB (t) Re xBB (`/W)
× LPFWc
`/W

cos(2πfc t)

W W
xPB (t) 2 ≤ Wc ≤ 2fc − 2

90◦

 
−xPB (t) sin(2πfc t) Im xBB (t) Im xBB (`/W)
× LPFWc
`/W

c
Lecture 4, Amos Lapidoth 2017
Reconstruction
By the (Pointwise) Sampling

X `
xBB (t) = xBB sinc(Wt − `), t ∈ R.
W
`=−∞
Consequently,
 ∞
X ` 
i2πfc t
xPB (t) = 2 Re e xBB sinc(Wt − `) , t ∈ R.
W
`=−∞
Since sinc (·) is real,
X∞  ` 
xPB (t) = 2 Re ei2πfc t xBB sinc(Wt − `)
W
`=−∞
X∞   ` 
=2 Re xBB sinc(Wt − `) cos(2πfc t)
W
`=−∞

X   ` 
−2 Im xBB sinc(Wt − `) sin(2πfc t), t ∈ R.
W
`=−∞
c
Lecture 4, Amos Lapidoth 2017
Convergence in L2 (1)
L
X ` 2


lim t 7→ xBB (t) − xBB sinc(Wt − `)
L→∞ W = 0.
`=−L 2

We’ll show shortly that


L
X `
t 7→ xBB (t) − xBB sinc(Wt − `)
W
`=−L

is the baseband representation of the real passband signal


 L
X ` 
t 7→ xPB (t) − 2 Re ei2πfc t xBB sinc(Wt − `) .
W
`=−L

So the energy in the latter is twice the energy of the former and
thus also converges to zero.
c
Lecture 4, Amos Lapidoth 2017
Convergence in L2 (2)

Thus,
 
L
X `
i2πfc t
lim t 7→ xPB (t)−2 Re e xBB sinc(Wt−`) = 0.
L→∞ W
`=−L 2

c
Lecture 4, Amos Lapidoth 2017
Convergence in L2 (3)

To deliver what we owe, recall that:



“if xPB (t) = 2 Re z(t) ei2πfc t and z is bandlimited to
W/2 Hz, then z equals xBB .” (Lecture 2).
Hence,
L
X `
t 7→ xBB (t) − xBB sinc(Wt − `)
W
`=−L

is the baseband representation of


 L
X ` 
i2πfc t
t 7→ xPB (t) − 2 Re e xBB sinc(Wt − `) .
W
`=−L

c
Lecture 4, Amos Lapidoth 2017
The Sampling Theorem for Real Passband Signals
Let xPB be a real energy-limited passband signal that is
bandlimited to W Hz around the carrier frequency fc . Let
xBB (`/W) denote the time-`/W sample of xBB .
1. xPB can be reconstructed from the samples in the L2 sense
Z ∞  L
X ` !2
i2πfc t
lim xPB (t)−2 Re e xBB sinc(Wt−`) dt = 0.
L→∞ −∞ W
`=−L

2. kxPB k22 can be computed from the complex samples:


2 X  `  2


kxPB k22 = xBB .
W W
`=−∞
3. If yPB is another real energy-limited passband signal that is
bandlimited to W Hz around fc , then
 X ∞ `  ` 
2 ∗
hxPB , yPB i = Re xBB y .
W W BB W
`=−∞
c
Lecture 4, Amos Lapidoth 2017
Mapping Bits to Waveforms (Modulation)
• Data bits have no physical attributes.
• Upper-case letters because they are random.
• The modulator maps them to waveforms.
• The modulator’s output is a stochastic process.
Mapping a single bit:
(
x0 (t) if D = 0,
X(t) = t ∈ R.
x1 (t) if D = 1,
For example,
(
A e−t/T if t/T ≥ 0,
x0 (t) = t ∈ R,
0 otherwise,
and (
A if 0 ≤ t/T ≤ 1,
x1 (t) = t ∈ R.
0 otherwise,
c
Lecture 4, Amos Lapidoth 2017
Stochastic Process

A probability space is a triple (Ω, F, P ), where


• the elements of Ω are the “outcomes,”
• the elements of F are the “events,”
• and P (·) assigns a probability to every event.

A SP is a function of time and “luck/draw”, i.e., of time and the


results of all the random experiments in the system.

X: Ω × R → R
(ω, t) 7→ X(ω, t).

c
Lecture 4, Amos Lapidoth 2017
Sample Paths

Fixing a draw ω ∈ Ω, the SP becomes a function of time:


sample-path, or trajectory, or sample-path realization, or sample
function

X(ω, ·) : R → R
t 7→ X(ω, t).

c
Lecture 4, Amos Lapidoth 2017
A Stochastic Process and its Sample-Paths

(
x0 (t) if D = 0,
X(t) = t ∈ R,
x1 (t) if D = 1,
where (
A e−t/T if t/T ≥ 0,
x0 (t) = t ∈ R,
0 otherwise,
(
A if 0 ≤ t/T ≤ 1,
x1 (t) = t ∈ R,
0 otherwise,
has two sample-paths: x0 (·) and x1 (·).

c
Lecture 4, Amos Lapidoth 2017
Time-t Sample

Fixing an epoch t ∈ R and viewing the SP as a function of “luck”


only, we obtain a random variable: the value of the process at
time t or the time-t sample of the process

X(·, t) : Ω → R
ω 7→ X(ω, t).

c
Lecture 4, Amos Lapidoth 2017
A Formal Definition


A stochastic process X(t), t ∈ T is an indexed family of random
variables that are defined on a common probability space
(Ω, F, P ).

• T = R ⇒ continuous-time stochastic process.


• T = Z ⇒ discrete-time stochastic process.

c
Lecture 4, Amos Lapidoth 2017
From Bits to Real Numbers

k—number of data bits transmitted during the system’s lifetime.

The data bits transmitted are

D1 , D2 , . . . , Dk .

The mapping
ϕ : {0, 1}k → Rn
(d1 , . . . , dk ) 7→ (x1 , . . . , xn )

maps k-tuples of data bits to n-tuples of real numbers.

We only consider mappings ϕ(·) that are one-to-one (injective).

c
Lecture 4, Amos Lapidoth 2017
Examples (1)
• n = k, i.e., one bit per real symbol, and
(
+1 if Dj = 0,
Xj = j = 1, . . . , k.
−1 if Dj = 1,

• k is even; the data bits {Dj } are broken into pairs

(D1 , D2 ), (D3 , D4 ), . . . , (Dk−1 , Dk );

and each pair is mapped to a single real number




+3 if D2j−1 = D2j = 0,


+1 if D
2j−1 = 0 and D2j = 1, k
(D2j−1 , D2j ) 7→ j = 1, . . . , .
−3 if D2j−1 = D2j = 1,
 2


−1 if D
2j−1 = 1 and D2j = 0,

Here n = k/2, and the rate is two bits per real symbol.
c
Lecture 4, Amos Lapidoth 2017
Examples (2)

• Each data bit Dj produces two real numbers by repetition:


(
(+1, +1) if Dj = 0,
Dj 7→ j = 1, . . . , k.
(−1, −1) if Dj = 1,

Here n = 2k, and the rate is half a bit per real symbol.

c
Lecture 4, Amos Lapidoth 2017
Block-Mode Mapping of Bits to Real Numbers

D1 , D2 , . . . , DK , DK+1 , . . . , D2K , , Dk−K+1 , . . . , Dk


enc(·) enc(·) enc(·)

X1 , X2 , ... , XN , XN+1 , ... , X2N , , Xn−N+1 , ... , Xn

enc(D1 , . . . , DK ) enc(DK+1 , . . . , D2K ) enc(Dk−K+1 , . . . , Dk )

enc : {0, 1}K → RN is a (K, N) binary-to-reals block encoder of rate


 
K bit
.
N real symbol

Always assumed one-to-one.


c
Lecture 4, Amos Lapidoth 2017
Zero Padding

D1 , D2 , . . . , DK , DK+1 , . . . , D2K , , Dk0 −K+1 , . . . , Dk , 0, . . . , 0


enc(·) enc(·) enc(·)

X1 , X2 , ... , XN ,XN+1 , ... , X2N , , Xn0 −N+1 , ... , Xn0

enc(D1 , . . . , DK ) enc(DK+1 , . . . , D2K ) enc(Dk−K+1 , . . . , Dk , 0, . . . , 0)

Apply the (K, N) encoder in block-mode to


D1 , . . . , Dk , 0, . . . , 0
| {z }
k0 − k zeros

where  
k
k0 = K.
K
c
Lecture 4, Amos Lapidoth 2017
From Real Numbers to Waveforms with Linear Modulation

We map D1 , . . . , Dk to the real numbers X1 , . . . , Xn and produce


the waveform
n
X
X(t) = A X` g` (t), t ∈ R.
`=1

X is a scaled-by-A linear combination of the tuple g1 , . . . , gn
with the coefficients X1 , . . . , Xn :
n
X
X=A X` g` .
`=1

c
Lecture 4, Amos Lapidoth 2017
The Energy in the Transmitted Waveform
The transmitted energy is a random variable!
Z ∞
2
kXk2 = X 2 (t) dt
−∞
Z ∞ n
X 2
= A X` g` (t) dt
−∞ `=1
Z ∞ Xn  X
n 
= A X` g` (t) A X`0 g`0 (t) dt
−∞ `=1 `0 =1
n
XXn Z ∞
= A2 X` X`0 g` (t) g`0 (t) dt
`=1 `0 =1 −∞

Xn X n
= A2 X` X`0 hg` , g`0 i .
`=1 `0 =1
n
X
kXk22 = A2 X`2 , {g` } orthonormal.
c
Lecture 4, Amos Lapidoth 2017 `=1
Gram-Schmidt to the Rescue

There is no loss in generality in assuming that (g1 , . . . , gn ) is


orthonormal.

c
Lecture 4, Amos Lapidoth 2017
Recovering the Symbols with a Matched Filter

ϕ : (D1 , . . . , Dk ) 7→ (X1 , . . . , Xn ).
n
X
X(t) = A X` φ` (t), t ∈ R,
`=1

(φ1 , . . . , φn ) orthonormal.
How can we recover the symbols?
1
hX, φ` i , ` = 1, . . . , n.
X` =
A
This requires n inner-product computations.

c
Lecture 4, Amos Lapidoth 2017
Computing Inner Products with a Matched Filter


~ ∗ (0),
hu, φi = u ? φ u, φ ∈ L2 .

More generally, if g : t 7→ φ(t − t0 ), then


Z ∞

hu, gi = ~ ∗ (t0 ).
u(t) φ∗ (t − t0 ) dt = u ? φ
−∞ | {z }
g ∗ (t)

We express the time-t0 output of the matched filter as:


Z ∞

u?φ ~ ∗ (t0 ) = ~ ∗ (t0 − τ ) dτ
u(τ ) φ
Z−∞

= u(τ ) φ∗ (τ − t0 ) dτ.
−∞

c
Lecture 4, Amos Lapidoth 2017
Computing Many Inner Products Is sometimes Easy

If g1 , . . . , gJ ∈ L2 are all time shifts of the same signal φ

gj : t 7→ φ(t − tj ), j = 1, . . . , J,

and if u ∈ L2 is arbitrary, then all J inner products

hu, gj i , j = 1, . . . , J

can be computed using one filter by feeding u to a matched filter


for φ and sampling the output at the appropriate times t1 , . . . , tJ :

~ ∗ (tj ), j = 1, . . . , J.
hu, gj i = u ? φ

c
Lecture 4, Amos Lapidoth 2017
Back to Linear Modulation
n
X
X(t) = A X` φ` (t), t ∈ R,
`=1
(φ1 , . . . , φn ) orthonormal.
Suppose now that
 
φ` (t) = φ(t − `Ts ), ` ∈ {1, . . . , n}, t ∈ R .
Then we can compute all the inner products using one circuit!
Z
1 ∞
X` = X(τ ) φ` (τ ) dτ
A −∞
Z
1 ∞
= X(τ ) φ(τ − `Ts ) dτ
A −∞
Z
1 ∞ ~ s − τ ) dτ
= X(τ ) φ(`T
A −∞
1 
~ (`Ts ), ` = 1, . . . , n.
= X ?φ
c
Lecture 4, Amos Lapidoth 2017
A
Recovering the Symbols Using One MF

X(·) ~
φ AX`
`Ts

This motivates
n
X
X(t) = A X` φ(t − `Ts ), t ∈ R.
`=1

c
Lecture 4, Amos Lapidoth 2017
Pulse Amplitude Modulation
The bits D1 , . . . , Dk are mapped to the symbols X1 , . . . , Xn , and
n
X
X(t) = A X` g(t − `Ts ), t ∈ R.
`=1

• A ≥ 0 is a scaling factor.
• g : R → R—pulse shape.
• Ts > 0—baud period.
• 1/Ts —baud rate.
 
1 real symbols
.
Ts sec
If the time shifts of g are orthonormal—φ—
 
1 real dimensions
. J.M.E. Baudot
Ts sec
(1845–1903)
c
Lecture 4, Amos Lapidoth 2017
The Constellation

The constellation of ϕ(·) is denoted X . It contains x iff for some


choice of the binary k-tuple (d1 , . . . , dk ) and for some 
` ∈ {1, . . . , n} the `-th component of ϕ (d1 , . . . , dk ) is equal to x.

For example, 
−5, −3, −1, +1, +3, +5
or
{−2, −1, +1, +2}.

c
Lecture 4, Amos Lapidoth 2017
Constellation Parameters

The minimum distance of a constellation X is

δ , min
0
|x − x0 |.
x,x ∈X
x6=x0

The second moment of X is


1 X 2
x .
#X
x∈X

c
Lecture 4, Amos Lapidoth 2017
Uncoded Transmission

The transmission is uncoded or the system is uncoded if the


range of ϕ equals X n

ϕ(d) : d ∈ {0, 1}k = X n ,

i.e., if every sequence in X n can be produced by the encoder by


feeding it the appropriate data sequence.
Examples:
1. uncoded: Antipodal signaling Dj 7→ 1 − 2Dj .
2. coded: Repetition coding: 0 7→ +1, +1 and 1 7→ −1, −1.

c
Lecture 4, Amos Lapidoth 2017
Next Week

• PAM and Nyquist’s Criterion (Chapter 11).


• Energy and Power in PAM (Chapter 14).

Reading Assignment:
• PAM Implementation Considerations (Section 10.12).
• Stationary Discrete-Time SP (Chapter 13).

Thank you!

c
Lecture 4, Amos Lapidoth 2017
Communication and Detection Theory: Lecture 5

Amos Lapidoth
ETH Zurich

March 21, 2017

Nyquist’s Criterion

c
Lecture 5, Amos Lapidoth 2017
Nyquist’s Criterion

• Nyquist’s Criterion.
• The Self-Similarity Function.
• Excess Bandwidth.
• Band-edge symmetry.

c
Lecture 5, Amos Lapidoth 2017
PAM with a Shift-Orthonormal Pulse Shape
n
X
X(t) = A X` φ(t − `Ts ), t ∈ R,
`=1
where
Z ∞
φ(t − `Ts ) φ(t − `0 Ts ) dt = I{`0 = `}, `, `0 ∈ {1, . . . , n}.
−∞

• Energy:
n
X
kXk22 = A2 X`2 .
`=1
• Recovering X` is easy


~ (`Ts ),
X` = A−1 X, t 7→ φ(t − `Ts ) = A−1 X?φ ` = 1, . . . , n.

One matched filter computes all the inner products.


c
Lecture 5, Amos Lapidoth 2017
Massaging the Constraint

• Instead of
Z ∞
φ(t − `Ts ) φ(t − `0 Ts ) dt = I{`0 = `}, `, `0 ∈ {1, . . . , n},
−∞

we’ll require
Z ∞
φ(t − `Ts ) φ(t − `0 Ts ) dt = I{`0 = `}, `, `0 ∈ Z.
−∞

• And we’ll solve the complex case:


Z ∞
φ(t − `Ts ) φ∗ (t − `0 Ts ) dt = I{`0 = `}, `, `0 ∈ Z.
−∞

c
Lecture 5, Amos Lapidoth 2017
The Time Shifts Are Orthogonal when they don’t Overlap

c
Lecture 5, Amos Lapidoth 2017
The Self-Similarity Function Rvv of v ∈ L2

Z ∞
Rvv : τ 7→ v(t + τ ) v ∗ (t) dt, τ ∈ R.
−∞

c
Lecture 5, Amos Lapidoth 2017
v ∗(t)
τ

v(t + τ )

c
Lecture 5, Amos Lapidoth 2017
The Self-Similarity Function Rvv of v ∈ L2
Z ∞
Rvv : τ 7→ v(t + τ ) v ∗ (t) dt, τ ∈ R.
−∞
1. Value at zero:
Z ∞
Rvv (0) = |v(t)|2 dt.
−∞

2. Maximum at zero:
|Rvv (τ )| ≤ Rvv (0), τ ∈ R.
3. Conjugate symmetry:

Rvv (−τ ) = Rvv (τ ), τ ∈ R.
4. Integral representation:
Z ∞
Rvv (τ ) = |v̂(f )|2 ei2πf τ df, τ ∈ R.
−∞
c
Lecture 5, Amos Lapidoth 2017
Some Proofs

Z ∞
Rvv : τ 7→ v(t + τ ) v ∗ (t) dt, τ ∈ R.
−∞

1. Rvv (0) = kvk22 follows by substituting τ = 0.


2. |Rvv (τ )| ≤ Rvv (0) follows from


|Rvv (τ )| = t 7→ v(t + τ ), v
≤ kt 7→ v(t + τ )k2 kvk2
= kvk22 .

c
Lecture 5, Amos Lapidoth 2017
3. Conjugate symmetry follows by substituting s , t + τ in
Z ∞

Rvv (τ ) = v(t| +
{z τ}) v (t) dt
−∞
s
Z ∞
= v(s) v ∗ (s − τ ) ds
−∞
Z ∞ ∗

= v(s − τ ) v (s) ds
−∞

= Rvv (−τ ), τ ∈ R.
4. The integral representation follows from Parseval:
Z ∞
Rvv (τ ) = v(t + τ ) v ∗ (t) dt
−∞


= t 7→ v(t + τ ), t 7→ v(t)
D E
= f 7→ ei2πf τ v̂(f ), f 7→ v̂(f )
Z ∞
= ei2πf τ |v̂(f )|2 df, τ ∈ R.
−∞
c
Lecture 5, Amos Lapidoth 2017
More Properties of Rvv
5. Uniform Continuity: Rvv is uniformly continuous.
The IFT of an integrable function is uniformly continuous.
6. Convolution Representation:

Rvv (τ ) = (v ? ~v∗ ) (τ ), τ ∈ R.

Z ∞
Rvv (τ ) = v(t| +
{z τ}) v ∗ (t) dt
−∞
s
Z ∞
= v(s) v ∗ (s − τ ) ds
−∞
Z ∞
= v(s) ~v ∗ (τ − s) ds
−∞
= (v ? ~v∗ )(τ ).

c
Lecture 5, Amos Lapidoth 2017
Shift-Orthonormality and the Self-Similarity Func.
Let φ be energy limited. The shift-orthonormality condition
Z ∞
φ(t − `Ts ) φ∗ (t − `0 Ts ) dt = I{` = `0 }, `, `0 ∈ Z
−∞

is equivalent to the condition

Rφφ (`Ts ) = I{` = 0}, ` ∈ Z.

Proof: Recall
Z ∞
Rvv : τ 7→ v(t + τ ) v ∗ (t) dt, τ ∈ R.
−∞

Z ∞ Z ∞ 
φ(t − `Ts ) φ∗ (t − `0 Ts ) dt = φ s + (`0 − `)Ts φ∗ (s) ds
−∞ | {z } −∞
s

= Rφφ (`0 − `)Ts .
c
Lecture 5, Amos Lapidoth 2017
Nyquist Pulse
We say that v : R 7→ C is a Nyquist Pulse of parameter Ts if

v(`Ts ) = I{` = 0}, ` ∈ Z.

• Can we characterize Nyquist pulses in the


frequency domain?
• Under what conditions on φ̂ is Rφφ a
Nyquist Pulse of parameter Ts ?

Harry Nyquist
(1889–1976)

c
Lecture 5, Amos Lapidoth 2017
Nyquist’s Criterion
Let Ts > 0 be given, and assume v = ǧ for some integrable
g : f 7→ g(f ). Then v is a Nyquist Pulse of parameter Ts iff
Z 1/(2Ts )
 j 
J
X

lim Ts − g f+ df = 0.
J→∞ −1/(2Ts ) Ts
j=−J

“Equivalently” (ignoring pointwise convergence issues),



X  j 1 1
g f+ = Ts , − ≤f ≤ ,
Ts 2Ts 2Ts
j=−∞

which is equivalent (by periodicity) to



X  j
g f+ = Ts , f ∈ R.
Ts
j=−∞

c
Lecture 5, Amos Lapidoth 2017
g(f )

Ts

f
− 2T1 s 1
2Ts

� 1�
g f+
Ts

Ts

f
− T1s − 2T1 s

� 1�
g f−
Ts

Ts

f
1 1
2Ts Ts


� � j �
g f+ = Ts
j=−∞
Ts

f
− T2s − T1s 1
Ts
2
Ts

c
Lecture 5, Amos Lapidoth 2017
Proof of Nyquist’s Criterion (1)

We’ll show that v(−`Ts ) is the `-th Fourier Series Coefficient of

1 X  j

1 1
f 7→ √ g f+ , − ≤f ≤ .
Ts j=−∞ Ts 2Ts 2Ts

• A functions is (indistinguishable from) a constant iff all but its


zeroth Fourier Series coefficients are zero.
• The zeroth Fourier Series coefficient of the constant c with

respect to the interval [−1/(2Ts ), +1/(2Ts )] is c/ Ts .

c
Lecture 5, Amos Lapidoth 2017
Proof of Nyquist’s Criterion (2)
Z ∞
v(−`Ts ) = g(f ) e−i2πf `Ts df
−∞

X Z j 1
+ 2T
Ts s
= g(f ) e−i2πf `Ts df
j 1
j=−∞ Ts
− 2T
s
∞ Z  j  −i2π f˜+ Tjs `Ts ˜
1
X 2Ts
 
= g f˜ + e df
1 Ts
j=−∞ − 2Ts
∞ Z 1 
X 2Ts j  −i2πf˜`Ts ˜
= g f˜ + e df
1 Ts
j=−∞ − 2Ts
Z 1 X  j  −i2πf˜`Ts ˜

2Ts
= g f˜ + e df
1
− 2T Ts
s j=−∞
Z 1  
1 X  ˜ j  p −i2πf˜`Ts ˜

2Ts
= √ g f+ Ts e df .
− 1 Ts Ts
2Ts j=−∞
c
Lecture 5, Amos Lapidoth 2017
Characterization of Shift-Orthonormal Pulses
Let φ : R 7→ C be energy-limited and Ts > 0. The condition
Z ∞
φ(t − `Ts ) φ∗ (t − `0 Ts ) dt = I{` = `0 }, `, `0 ∈ Z
−∞

is equivalent to the condition


∞ 
X j  2
φ̂ f + ≡ Ts .
Ts
j=−∞

Proof: The shift-orthonormality is equivalent to

Rφφ (`Ts ) = I{` = 0}, ` ∈ Z.

And Z ∞
Rφφ (τ ) = |φ̂(f )|2 ei2πf τ df, τ ∈ R.
−∞

c
Lecture 5, Amos Lapidoth 2017
Minimum Bandwidth of Shift-Orthonormal Pulses
Let Ts > 0 be fixed, and let φ be an energy-limited signal that is
bandlimited to W Hz. If the time shifts of φ by integer multiples
of Ts are orthonormal, then
1
W≥ .
2Ts
Equality is achieved if
p n o
φ̂(f ) = Ts I |f | ≤ 1 , f ∈R
2Ts
and, in particular, by the sinc(·) pulse
1 t
φ(t) = √ sinc , t∈R
Ts Ts

or any time-shift thereof.


c
Lecture 5, Amos Lapidoth 2017
194 Nyquist’s Criterion

φ̂(f )

f
−W W
� �
�φ̂(f )�2

f
−W W
� � ��2
�φ̂ f − 1 �
Ts

f
1 1
Ts
−W Ts

� � ��2
�φ̂ f + 1 �
Ts

f
− T1s − T1s + W
� � ��
1 �2
� �2 � � ��2
�φ̂ f + + �φ̂(f )� + �φ̂ f − 1 �
Ts Ts

f
− 2T1 s −W W 1
2Ts

c
Lecture 5, Amos Lapidoth 2017 � � ��2
Excess Bandwidth

The excess bandwidth in percent of a signal φ relative to Ts > 0 is


 
bandwidth of φ
100% −1 .
1/(2Ts )

c
Lecture 5, Amos Lapidoth 2017
Band-Edge Symmetry
Assume Ts > 0 and φ a real energy-limited signal that is
bandlimited to W Hz, where W < 1/Ts so φ is of excess
bandwidth smaller than 100%. Then φ is shift orthonormal iff
 1  2  1  2 1

φ̂ − f + φ̂ + f ≡ Ts , 0 < f ≤ .
2Ts 2Ts 2Ts

φ̂(f ) 2
 2
φ̂ f 0 + 1 − Ts
2Ts 2

Ts

Ts
2 f0

f
1 1
c
Lecture 5, Amos Lapidoth 2017 2Ts Ts
g(f )

Ts

f
− 2T1 s 1
2Ts

� 1�
g f+
Ts

Ts

f
− T1s − 2T1 s

� 1�
g f−
Ts

Ts

f
1 1
2Ts Ts


� � j �
g f+ = Ts
j=−∞
Ts

f
− T2s − T1s 1
Ts
2
Ts

c
Lecture 5, Amos Lapidoth 2017
Sketch of Proof (1)


φ̂(f ) 2
 2
φ̂ f 0 + 1 − Ts
2Ts 2

Ts

Ts
2 f0

f
1 1
2Ts Ts

c
Lecture 5, Amos Lapidoth 2017
Sketch of Proof (2)
• φ real ⇒ |φ̂(f )| symmetric ⇒ it suffices to consider f ≥ 0.
• Excess bandwidth < 100% ⇒ only two terms contribute to
the sum (for f > 0):
1
φ̂(f ) 2 + φ̂(f − 1/Ts ) 2 ≡ Ts , 0≤f < .
2Ts

Substituting f 0 , 1
2Ts − f leads to the condition
 1  2  1  2 1

φ̂ − f 0 + φ̂ −f 0 − ≡ Ts , 0 < f0 ≤ ,
2Ts 2Ts 2Ts

which, in view of the symmetry of |φ̂(·)|, is equivalent to


 1  2  1  2 1

φ̂ − f 0 + φ̂ f 0 + ≡ Ts , 0 < f0 ≤ .
2Ts 2Ts 2Ts

c
Lecture 5, Amos Lapidoth 2017
Raised-Cosine (1)

φ̂(f ) 2

Ts

f
1−β 1 1+β
2Ts 2Ts 2Ts


 1−β
Ts 
   if 0 ≤ |f | ≤ 2Ts ,
1−β 1−β
|φ̂(f )|2 = 2
Ts
1 + cos πT
β
s
(|f | − 2Ts ) if 2Ts < |f | ≤ 1+β2Ts ,


0 if |f | > 1+β
2Ts ,
c
Lecture 5, Amos Lapidoth 2017
Raised-Cosine (2)

√
 1−β

 Ts if 0 ≤ |f | ≤ 2Ts ,
q r  
Ts πTs 1−β 1−β 1+β
φ̂(f ) = 1 + cos β (|f | − 2Ts ) if < |f | ≤ 2Ts ,


2 2Ts

0 1+β
if |f | > 2Ts ,

 sin ((1−β)π Tt )
cos (1 + β)π Tts + 4β Tt
s

φ(t) = √ s
, t ∈ R.
π Ts 1 − (4β Tts )2

c
Lecture 5, Amos Lapidoth 2017
φ(t)

Rφφ (τ )

τ
−2Ts −Ts Ts 2Ts

c
Lecture 5, Amos Lapidoth 2017
A refresher on discrete-time stochastic processes.
Read Chapter 13!

c
Lecture 5, Amos Lapidoth 2017
Stationary Processes

A discrete-time SP Xν is stationary or strict sense stationary or
strongly stationary if for every n ∈ N and all integers η, η 0
L 
Xη , . . . Xη+n−1 = Xη0 , . . . Xη0 +n−1 .

    
L
Xν , ν ∈ Z stationary =⇒ Xν = X1 , ν ∈ Z .

  
Xν , ν ∈ Z stationary
 
L
=⇒ (Xν , Xν 0 ) = (Xη+ν , Xη+ν 0 ), ν, ν 0 , η ∈ Z .

c
Lecture 5, Amos Lapidoth 2017
Wide-Sense Stationary Processes

We say that a discrete-time SP Xν , ν ∈ Z is wide-sense
stationary (WSS) if the following three conditions are satisfied:

Var[Xν ] < ∞, ν ∈ Z. (12a)

E[Xν ] = E[X1 ] , ν ∈ Z. (12b)


 
E[Xν 0 Xν ] = E Xη+ν 0 Xη+ν , ν, ν 0 , η ∈ Z. (12c)

Choosing ν = ν 0 we obtain
    
Xν , ν ∈ Z WSS =⇒ Var[Xν ] = Var[X1 ] , ν∈Z .

c
Lecture 5, Amos Lapidoth 2017
Stationarity and Wide-Sense Stationarity
Every finite-variance discrete-time stationary SP is WSS.
    
L
Xν , ν ∈ Z stationary =⇒ Xν = X1 , ν ∈ Z ,
so
        
Xν , ν ∈ Z stationary =⇒ E Xν = E X1 , ν ∈ Z .

  
Xν , ν ∈ Z stationary
 
L
=⇒ (Xν , Xν 0 ) = (Xη+ν , Xη+ν 0 ), ν, ν 0 , η ∈ Z ,
so
  
Xν , ν ∈ Z stationary
     
=⇒ E Xν , Xν 0 = E Xη+ν , Xη+ν 0 , ν, ν 0 , η ∈ Z ,
c
Lecture 5, Amos Lapidoth 2017
The Autocovariance Function

The autocovariance
 function KXX : Z → R of a WSS discrete-time
SP Xν is defined by

KXX (η) , Cov[Xν+η , Xν ] , η ∈ Z.

Key Properties:

KXX (η) = KXX (−η), η ∈ Z.

n X
X n
αν αν 0 KXX (ν − ν 0 ) ≥ 0, α1 , . . . , αn ∈ R.
ν=1 ν 0 =1

c
Lecture 5, Amos Lapidoth 2017
Proof of Key Properties

KXX (η) = Cov[Xν+η , Xν ]


= Cov[Xν̃ , Xν̃−η ]
= Cov[Xν̃−η , Xν̃ ]
= KXX (−η), η ∈ Z,

n X
X n n X
X n
0
αν α KXX (ν − ν ) =
ν0 αν αν 0 Cov[Xν , Xν 0 ]
ν=1 ν 0 =1 ν=1 ν 0 =1
X
n n
X 
= Cov αν Xν , αν 0 Xν 0
ν=1 ν 0 =1
Xn 
= Var αν Xν
ν=1
≥ 0.
c
Lecture 5, Amos Lapidoth 2017
The Power Spectral Density Function


We say that the discrete-time WSS SP Xν is of
power spectral density SXX if SXX : [−1/2, 1/2) → R is
nonnegative, symmetric, integrable, and
Z 1/2
KXX (η) = SXX (θ) ei2πηθ dθ, η ∈ Z. (13)
−1/2

• Two PSDs of the same SP must be indistinguishable.


• (13) implies that SXX (θ) is negative on a set of Lebesgue
measure zero and that it is indistinguishable from a symmetric
function.

c
Lecture 5, Amos Lapidoth 2017
PSD when KXX Is Absolutely Summable
If

X
KXX (η) < ∞,
η=−∞

then the function



X
S(θ) = KXX (η) e−i2πηθ , θ ∈ [−1/2, 1/2]
η=−∞

is continuous, symmetric, nonnegative, and satisfies


Z 1/2
S(θ) ei2πηθ dθ = KXX (η), η ∈ Z.
−1/2

Consequently, S(·) is a PSD for KXX .

c
Lecture 5, Amos Lapidoth 2017
Next Week

Energy and Power in PAM (Chapter 14).

Thank you!

c
Lecture 5, Amos Lapidoth 2017
Communication and Detection Theory: Lecture 6

Amos Lapidoth
ETH Zurich

March 28, 2017

Energy and Power in PAM

c
Lecture 6, Amos Lapidoth 2017
Today

• Energy in PAM.
• Defining power in PAM.
• Zero-mean signals for additive noise channels.
• The power when:
• The symbols form a centered WSS discrete-time SP.
• Bi-infinite block-mode.
• The pulse shape is shift orthonormal.

c
Lecture 6, Amos Lapidoth 2017
Pulse Amplitude Modulation

Mapping the bits to symbols,

ϕ : {0, 1}k → Rn
(d1 , . . . , dk ) 7→ (x1 , . . . , xn ),

and mapping the symbols to waveform


n
X
X(t) = A X` g(t − `Ts ), t ∈ R.
`=1

• k—number of data bits sent over the system’s lifetime.


• ϕ(·) is one-to-one (injective).

c
Lecture 6, Amos Lapidoth 2017
Block-Mode Mapping of Bits to Real Numbers

D1 , D2 , . . . , DK , DK+1 , . . . , D2K , , Dk−K+1 , . . . , Dk


enc(·) enc(·) enc(·)

X1 , X2 , ... , XN , XN+1 , ... , X2N , , Xn−N+1 , ... , Xn

enc(D1 , . . . , DK ) enc(DK+1 , . . . , D2K ) enc(Dk−K+1 , . . . , Dk )

enc : {0, 1}K → RN is a (K, N) binary-to-reals block encoder of rate


 
K bit
.
N real symbol

Always assumed one-to-one.


c
Lecture 6, Amos Lapidoth 2017
Zero Padding

D1 , D2 , . . . , DK , DK+1 , . . . , D2K , , Dk0 −K+1 , . . . , Dk , 0, . . . , 0


enc(·) enc(·) enc(·)

X1 , X2 , ... , XN ,XN+1 , ... , X2N , , Xn0 −N+1 , ... , Xn0

enc(D1 , . . . , DK ) enc(DK+1 , . . . , D2K ) enc(Dk−K+1 , . . . , Dk , 0, . . . , 0)

Apply the (K, N) encoder in block-mode to


D1 , . . . , Dk , 0, . . . , 0
| {z }
k0 − k zeros

where  
k
k0 = K.
K
c
Lecture 6, Amos Lapidoth 2017
Energy in Transmitting a Single Block (1)

K IID random data bits D1 , . . . , DK are mapped by


enc : {0, 1}K → RN to N real numbers (X1 , . . . , XN ), and
N
X
X(t) = A X` g(t − `Ts ), t ∈ R.
`=1

The energy, Z ∞
ω 7→ X 2 (ω, t) dt
−∞
is a RV whose expectation—expected energy—is
Z ∞ 
E,E X 2 (t) dt .
−∞

c
Lecture 6, Amos Lapidoth 2017
Z ∞ 
E=E X 2 (t) dt
−∞
"Z #
∞X
N 2
= A2 E X` g(t − `Ts ) dt
−∞`=1
"Z #
∞XN  X
N 
2 0
=A E X` g(t − `Ts ) X`0 g(t − ` Ts ) dt
−∞ `=1 `0 =1
"Z N X
N
#
∞ X
= A2 E X` X`0 g(t − `Ts ) g(t − `0 Ts ) dt
−∞ `=1 `0 =1
Z N X
∞ X N
=A 2
E[X` X`0 ] g(t − `Ts ) g(t − `0 Ts ) dt
−∞ `=1 `0 =1
N
XX N Z ∞
=A 2
E[X` X ] `0 g(t − `Ts ) g(t − `0 Ts ) dt
`=1 `0 =1 −∞

XN X N

= A2 E[X` X`0 ] Rgg (` − `0 )Ts .
`=1 `0 =1
c
Lecture 6, Amos Lapidoth 2017
Energy in Transmitting a Single Block (3)
• Since Z ∞
Rgg (τ ) = ĝ(f ) 2 ei2πf τ df, τ ∈ R,
−∞
we can also express the energy as
Z N X
X N

0 2
E=A 2
E[X` X`0 ] ei2πf (`−` )Ts ĝ(f ) df.
−∞ `=1 `0 =1

• The energy per bit is


h energy i E
Eb ,
bit K
• The energy per real symbol is
 
energy E
Es ,
real symbol N
c
Lecture 6, Amos Lapidoth 2017
Energy in Transmitting a Single Block (4)
The expression
N X
X N

E = A2 E[X` X`0 ] Rgg (` − `0 )Ts
`=1 `0 =1

sometimes simplifies:

N
X    N−1 
E = A2 kgk22 E X`2 , t 7→ g(t − `Ts ) `=0 orthogonal .
`=1

N
X     
E=A 2
kgk22 E X`2 , X` , ` ∈ Z zero-mean & uncorrelated .
`=1

c
Lecture 6, Amos Lapidoth 2017
Defining Power


The power P in the SP X(t), t ∈ R is
Z T 
1
P , lim E 2
X (t) dt .
T→∞ 2T −T

c
Lecture 6, Amos Lapidoth 2017
Gordian Knot

Over its lifetime, the system will transmit finite energy!

So with this definition, the power is zero.

c
Lecture 6, Amos Lapidoth 2017
The Alexandrian Solution

We “pretend” to send infinitely many symbols



X
X(t) = A X` g(t − `Ts ), t ∈ R.
`=−∞

But new questions arise:


• Does this converge?
• How are the infinitely-many symbols generated?
c
Lecture 6, Amos Lapidoth 2017
Convergence

We shall assume

• Bounded Symbols:

X` ≤ γ, ` ∈ Z.

• The pulse shape decays faster than 1/t:

β
|g(t)| ≤ , t∈R
1 + |t/Ts |1+α

for some
α, β > 0.

c
Lecture 6, Amos Lapidoth 2017
Generating Infinitely Many Symbols


• Just assume Xν for a WSS SP.
• Bi-Infinite Block Encoding.
• Shift-orthonormal pulse shape.

c
Lecture 6, Amos Lapidoth 2017
Bi-Infinite Block Encoding

D−K+1 , . . . , D0 , D1 , . . . , DK , DK+1 , · · · , D2K


enc(·) enc(·) enc(·)

, X−N+1 , . . . , X0 , X1 , ... , XN ,XN+1 , · · · , X2N ,

enc(D−K+1 , . . . , D0 ) enc(D1 , . . . , DK ) enc(DK+1 , . . . , D2K )

c
Lecture 6, Amos Lapidoth 2017
Zero-Mean Signals for Additive-Noise Channels

{Dj } X Y =X+N {Djest }


TX1 + RX1

N
TX2 RX2
{Dj } X X−c Y =X−c+N X+N {Djest }
TX1 + + + RX1

−c c

How should we choose c(·)?


c
Lecture 6, Amos Lapidoth 2017
Subtracting the Mean (1)
 
E (W − c)2 ≥ Var[W ] , c∈R
with equality iff

c = E[W ] .
 
E (W − c)2
h 2 i
= E (W − E[W ]) + (E[W ] − c)
    
= E (W − E[W ])2 + 2 E W − E[W ] E[W ] − c + (E[W ] − c)2
| {z }
0
 
= E (W − E[W ])2 + (E[W ] − c)2
 
≥ E (W − E[W ])2
= Var[W ] ,

with equality iff c = E[W ]. (Huygens-Steiner)


c
Lecture 6, Amos Lapidoth 2017
Subtracting the Mean (2)

To minimize Z
1 T h 2 i
E X(t) − c(t) dt,
2T −T

we minimize the integrand, i.e., we choose c(t) to minimize


h 2 i
E X(t) − c(t) ,

and thus choose  


c(t) = E X(t) , t ∈ R.

The transmitted signal X − c is then centered!

c
Lecture 6, Amos Lapidoth 2017

The Power when X` Is Zero-Mean and WSS
We ignore how the symbols were generated and assume

E[X` ] = 0, ` ∈ Z,

E[X` X`+m ] = KXX (m) , `, m ∈ Z.


The former guarantees a zero-mean transmitted waveform:
 X∞ 
 
E X(t) = E A X` g(t − `Ts )
`=−∞

X  
=A E X` g(t − `Ts )
`=−∞
= 0, t ∈ R.

c
Lecture 6, Amos Lapidoth 2017
Z τ +Ts  Z τ +Ts  X
∞ 2 
2 2
E X (t) dt = A E X` g(t − `Ts ) dt
τ τ `=−∞
Z τ +Ts  X
∞ ∞
X 
2 0
=A E X` X`0 g(t − `Ts ) g(t − ` Ts ) dt
τ `=−∞ `0 =−∞
Z τ +Ts ∞
X X∞
= A2 E[X` X`0 ] g(t − `Ts ) g(t − `0 Ts ) dt
τ `=−∞ `0 =−∞
Z X∞ X∞
τ +Ts 
= A2 E[X` X`+m ] g(t − `Ts ) g t − (` + m)Ts dt
τ `=−∞ m=−∞
Z X∞ ∞
X
τ +Ts 
= A2 KXX (m) g(t − `Ts ) g t − (` + m)Ts dt
τ m=−∞ `=−∞

X ∞ Z
X τ +Ts −`Ts
= A2 KXX (m) g(t0 ) g(t0 − mTs ) dt0
m=−∞ `=−∞ τ −`Ts

X Z ∞
= A2 KXX (m) g(t ) g(t − mTs ) dt0 .
0 0

m=−∞ | −∞ {z }
Rgg (mTs )
c
Lecture 6, Amos Lapidoth 2017

The Power when X` Is Zero-Mean and WSS

Since [−T, +T) contains b(2T)/Ts c disjoint intervals of the form


[τ, τ + Ts ), and since it is contained in the union of d(2T)/Ts e such
intervals,
  Z τ +Ts  Z T    Z τ +Ts 
2T 2 2 2T 2
E X (t) dt ≤ E X (t) dt ≤ E X (t) dt .
Ts τ −T Ts τ

We now divide 2T and study the limit.

c
Lecture 6, Amos Lapidoth 2017
The Sandwich Theorem

Salami Sandwich John Montagu, 4th Earl of Sandwich

If the sequence {an } is sandwiched between {bn } and {cn }

bn ≤ an ≤ cn , n = 1, 2, 3, . . . ,

and if {bn } and {cn } converge to the same limit,


then {an } also converge and to that same limit.

(a.k.a. Two Carabinieri Theorem.)

c
Lecture 6, Amos Lapidoth 2017
First Application of the Sandwich Theorem

Using
ξ − 1 < bξc ≤ ξ, ξ∈R
we obtain  
12T
 1 1 2T 12T

− < ≤ .
2T
 Ts 2T 2T Ts 2T
 Ts
|
 {z } | {z }

→1/Ts →1/Ts

Consequently,
 
1 2T 1
lim = , Ts > 0.
T→∞ 2T Ts Ts

c
Lecture 6, Amos Lapidoth 2017
Second Application of the Sandwich Theorem

Using
ξ ≤ dξe < ξ + 1, ξ∈R
we obtain  
1 2T
 1 2T 1 2T
 1
≤ < + .
2T
 Ts
 2T Ts 2T
 Ts
 2T
Consequently,
 
1 2T 1
lim = , Ts > 0.
T→∞ 2T Ts Ts

c
Lecture 6, Amos Lapidoth 2017
Third Application of the Sandwich Theorem
  Z τ +Ts  Z T    Z τ +Ts 
2T 2 2 2T 2
E X (t) dt ≤ E X (t) dt ≤ E X (t) dt .
Ts τ −T Ts τ
Dividing by 2T
  Z τ +Ts 
1 2T 2
E X (t) dt
2T Ts τ
| {z }
→1/Ts
Z T 
1 2
≤ E X (t) dt ≤
2T −T
  Z τ +Ts 
1 2T 2
E X (t) dt .
2T Ts τ
| {z }
→1/Ts

Hence, by the Sandwich Theorem


Z T  Z τ +Ts 
1 2 1 2
lim E X (t) dt = E X (t) dt .
T→∞ 2T −T Ts τ
c
Lecture 6, Amos Lapidoth 2017

The Power when X` Is Zero-Mean and WSS
From
Z T  Z τ +Ts 
1 2 1 2
lim E X (t) dt = E X (t) dt ,
T→∞ 2T −T Ts τ

and
Z τ +Ts  ∞
X
2 2
E X (t) dt = A KXX (m) Rgg (mTs ),
τ m=−∞

we conclude

1 2 X
P= A KXX (m) Rgg (mTs ).
Ts m=−∞

c
Lecture 6, Amos Lapidoth 2017
Special Cases and Different Forms
Since

1 2 X
P= A KXX (m) Rgg (mTs ),
Ts m=−∞

1 2   
P= A kgk22 σX
2
, 2
X` centered, variance σX , uncorrelated .
Ts

Also,
∞ Z ∞
1 2 X
P= A KXX (m) |ĝ(f )|2 ei2πf mTs df
Ts m=−∞ | −∞ {z }
Rgg (mTs )
2 Z ∞ ∞
X
A
= KXX (m) ei2πf mTs |ĝ(f )|2 df.
Ts −∞ m=−∞

c
Lecture 6, Amos Lapidoth 2017
The Power in Bi-Infinite Block-Mode (1)

D−K+1 , . . . , D0 , D1 , . . . , DK , DK+1 , · · · , D2K


enc(·) enc(·) enc(·)

, X−N+1 , . . . , X0 , X1 , ... , XN ,XN+1 , · · · , X2N ,

enc(D−K+1 , . . . , D0 ) enc(D1 , . . . , DK ) enc(DK+1 , . . . , D2K )


Dν , DνK+1 , . . . , DνK+K , ν ∈ Z.

Xν , enc(Dν ), ν ∈ Z.


XνN+1 , . . . , XνN+N = Xν , ν ∈ Z.
c
Lecture 6, Amos Lapidoth 2017
The Power in Bi-Infinite Block-Mode (2)

• If the data bits are IID random bits, and


• if enc(D) is zero mean whenever D comprises IID random bits
then
"Z #
∞ N
X 2
1
P= E A X` g(t − `Ts ) dt .
NTs −∞ `=1

Thus,
Es
P= .
Ts

Stop and smell the roses.

c
Lecture 6, Amos Lapidoth 2017
Z ∞ 
E=E X 2 (t) dt
−∞
"Z 2 #
∞X
N
= A2 E X` g(t − `Ts ) dt
−∞`=1
"Z  #
∞XN  X
N
2 0
=A E X` g(t − `Ts ) X`0 g(t − ` Ts ) dt
−∞ `=1 `0 =1
"Z N X N
#
∞ X
= A2 E X` X`0 g(t − `Ts ) g(t − `0 Ts ) dt
−∞ `=1 `0 =1
Z N X
∞ X N
=A 2
E[X` X`0 ] g(t − `Ts ) g(t − `0 Ts ) dt
−∞ `=1 `0 =1
N
XX N Z ∞
=A 2
E[X` X ] `0 g(t − `Ts ) g(t − `0 Ts ) dt
`=1 `0 =1 −∞

XN X N

= A2 E[X` X`0 ] Rgg (` − `0 )Ts ,
`=1 `0 =1
c
Lecture 6, Amos Lapidoth 2017
The Power in Bi-Infinite Block-Mode (3)

N X
X N

E = A2 E[X` X`0 ] Rgg (` − `0 )Ts ,
`=1 `0 =1
so
N N
A2 X X 
P= E[X` X`0 ] Rgg (` − `0 )Ts ,
NTs 0 `=1 ` =1
or
Z N X
X N
A2 ∞
0 2
P= E[X` X`0 ] ei2πf (`−` )Ts ĝ(f ) df.
NTs −∞ `=1 `0 =1

c
Lecture 6, Amos Lapidoth 2017
The Power in Bi-Infinite Block-Mode (4)
To derive the power we express X(·) as

X
X(t) = A X` g(t − `Ts )
`=−∞
∞ X
X N

=A XνN+η g t − (νN + η)Ts
ν=−∞ η=1
X∞

=A u Xν , t − νNTs , t ∈ R,
ν=−∞

where
N
X
u : (x1 , . . . , xN , t) 7→ xη g(t − ηTs ).
η=1

c
Lecture 6, Amos Lapidoth 2017
The Power in Bi-Infinite Block-Mode (5)
• Because the law of Dν does not depend on ν, neither does
the law of Xν (= enc(Dν )):
L
Xν = Xν 0 , ν, ν 0 ∈ Z.

• The assumption that enc(D) is of zero mean whenever



D ∼ U {0, 1}K implies
   
E u Xν , t = 0, ν ∈ Z, t ∈ R .

• Since the data bits are IID, so are Dν , ν ∈ Z , and hence

Xν , ν ∈ Z are also IID. Since the independence
 of Xν and
Xν 0 implies the independence of u Xν , t and u Xν 0 t0 ,
    
E u Xν , t u Xν 0 , t0 = 0, t, t0 ∈ R, ν 6= ν 0 , ν, ν 0 ∈ Z .

c
Lecture 6, Amos Lapidoth 2017
Z Z " ∞ 2 #
τ +NTs   τ +NTs X 
E X 2 (t) dt = E A u Xν , t − νNTs dt
τ τ ν=−∞
Z h
τ +NTs ∞
X ∞
X  i
= A2 E u Xν , t − νNTs u Xν 0 , t − ν 0 NTs dt
τ ν=−∞ ν 0 =−∞
Z τ +NTs X∞
 2 
= A2 E u Xν , t − νNTs dt
τ ν=−∞
Z X∞
τ +NTs  
= A2 E u2 X0 , t − νNTs dt
τ ν=−∞
∞ Z
X τ −(ν−1)NTs  
= A2 E u2 X0 , t0 dt0
ν=−∞ τ −νNTs
Z ∞  
= A2 E u2 X0 , t0 dt0
"Z−∞  N 2 #
∞ X
=E A X` g(t0 − `Ts ) dt0 , τ ∈ R.
−∞ `=1

c
Lecture 6, Amos Lapidoth 2017
The Power in Bi-Infinite Block-Mode (7)
There are b2T/(NTs )c disjoint length-NTs half-open intervals
contained in the interval [−T, T); and d2T/(NTs )e such intervals
suffice to cover the interval [−T, T),
  "Z ∞  X N 2 #
2T
E A X` g(t − `Ts ) dt
NTs −∞ `=1
Z T 
2
≤E X (t) dt ≤
−T
  "Z ∞  X N 2 #
2T
E A X` g(t − `Ts ) dt .
NTs −∞ `=1

Dividing by 2T, letting T → ∞, and using the Sandwich Theorem


Z T  "Z  N 2 #
1 1 ∞ X
lim E X 2 (t) dt = E A X` g(t − `Ts ) dt .
T→∞ 2T −T NTs −∞ `=1

c
Lecture 6, Amos Lapidoth 2017
Time Shifts of Pulse Shape Are Orthonormal

X
X(t) = A X` φ(t − `Ts ), t ∈ R,
`=−∞
Z ∞
φ(t − `Ts ) φ(t − `0 Ts ) dt = I{` = `0 }, `, `0 ∈ Z.
−∞
Assume
β
φ(t) ≤ , t∈R
1 + |t/Ts |1+α

for some α, β > 0 and that X` ≤ γ, ` ∈ Z. Then

Z T  L
X
1 2 A2 1  
lim E X (t) dt = lim E X`2
T→∞ 2T −T Ts L→∞ 2L + 1
`=−L

whenever the limit on the RHS exists.


c
Lecture 6, Amos Lapidoth 2017
Some Comments

The theorem is cool:


1. Except for boundedness, there are no statistical assumptions
on the symbols.
2. Beautifully connects power in continuous-time with power in
discrete-time.
But
1. Does not hold for general pulse shapes.
2. The proof is a pain.

c
Lecture 6, Amos Lapidoth 2017
Some Intuition (1)
We focus on the case Ts = 1. We need to study
Z T  Z ∞ ∞
X 2 
2
E X (t) dt = E A X` φ(t − `Ts ) I{|t| ≤ T} dt .
−T −∞ `=−∞

Define
φ` : t 7→ φ(t − `)
and its “windowed version”

φ`,w : t 7→ φ(t − `) I{|t| ≤ T},

so ∞ 2
Z T X
2 2
X (t) dt = A X` φ`,w .
−T
`=−∞ 2
The windowed time-shifts {φ`,w } are not orthogonal. . .
c
Lecture 6, Amos Lapidoth 2017
Some Intuition (2)
For fixed (large) ν and all T > ν, define
X
X0 = X` φ`,w ,
|`|≤T−ν
X
X1 = X` φ`,w ,
T−ν<|`|≤T+ν
X
X2 = X` φ`,w ,
T+ν<|`|<∞

We seek h i
E kX0 + X1 + X2 k22

• The terms in X0 are “nearly orthogonal” (for ν large).


• Only 4ν (bounded) terms in X1 —many but independent of T.
• Many terms in X2 , but very small (by the decay condition).
c
Lecture 6, Amos Lapidoth 2017
Some Intuition (3)

t
−T − ν −T −T + ν T−ν T T+ν

c
Lecture 6, Amos Lapidoth 2017
Some Intuition (4)
Z
1 T  1 h i
E X 2 (t) dt = E kX0 + X1 + X2 k22
2T −T 2T
1 h i
≈ E kX0 k22
2T  X 2 
1 2

= A E X` φ`,w

2T 2
|`|≤T−ν
 X 2 
1
≈ A2 E
X` φ`

2T 2
|`|≤T−ν
T−ν
X
1 2  
= A E X`2
2T
`=−(T−ν)
T−ν
X
2(T − ν) + 1 1  
= A2 E X`2 .
| 2T
{z } 2(T − ν) + 1 `=−(T−ν)
→1
c
Lecture 6, Amos Lapidoth 2017
Recap (1)
• Energy in transmitting a single block:

N X
X N

E=A 2
E[X` X`0 ] Rgg (` − `0 )Ts
`=1 `0 =1
h energy iE
Eb ,
bit K
 
energy E
Es , .
real symbol N

• Issues related to defining power as


Z T 
1
P , lim E 2
X (t) dt .
T→∞ 2T −T

c
Lecture 6, Amos Lapidoth 2017
Recap (2)

• Zero-mean signals for additive noise channels.


• The Sandwich Theorem.

• The power when X` is zero-mean and WSS

1 2 X
P= A KXX (m) Rgg (mTs )
Ts m=−∞
2 Z ∞ X∞
A
= KXX (m) ei2πf mTs |ĝ(f )|2 df.
Ts −∞ m=−∞

c
Lecture 6, Amos Lapidoth 2017
Recap (3)

• The power in bi-infinite block-mode


"Z  N 2 #
1 ∞ X
P= E A X` g(t − `Ts ) dt
NTs −∞ `=1
Es
= .
Ts

• The power when the pulse shape is shift-orthonormal:


Z T  L
X
1 A2 1  
lim E X 2 (t) dt = lim E X`2 .
T→∞ 2T −T Ts L→∞ 2L + 1
`=−L

c
Lecture 6, Amos Lapidoth 2017
Next Week

Operational Power Spectral Density (Chapter 15).

Thank you!

c
Lecture 6, Amos Lapidoth 2017
Communication and Detection Theory: Lecture 7

Amos Lapidoth
ETH Zurich

April 4, 2017

The Operational Power Spectral Density

c
Lecture 7, Amos Lapidoth 2017
Today

• Defining the Operational Power Spectral Density.


• Computing the OPSD for PAM signals.
• The bandwidth of a SP.
• The bandwidth of PAM.

c
Lecture 7, Amos Lapidoth 2017
What Are the Issues?

• Traditionally defined only for WSS SPs.


• PAM signals are typically not WSS.
• We would like a general definition.
• The result should be useful.

c
Lecture 7, Amos Lapidoth 2017
Two Approaches to Definitions

1. How is the quantity computed?


• The Fourier Transform of x is
Z ∞
x̂(f ) = x(t) e−i2πf t dt, f ∈ R.
−∞

• The derivative of y(·) at ξ is

y(ξ + h) − y(ξ)
lim .
h→0 h
2. How is the quantity used:
• A map’s coloring number is the minimum number of colors
that suffice to color the countries under the restriction that no
two countries sharing a border have the same color.

c
Lecture 7, Amos Lapidoth 2017
The Preservation-of-Sweat Law

1. If you give an explict “formula” for the quantity


=⇒ must work to explain why it is useful.
2. If you define a quantity by how it is used
=⇒ must work to show how to compute it.

c
Lecture 7, Amos Lapidoth 2017
An Example: Charge Density
%(x, y, z) is

Charge in Box (x0 , y 0 , z 0 ) : |x − x0 | ≤ ∆ 0 ∆ 0
2 , |y − y | ≤ 2 , |z − z | ≤ 2

lim 
∆↓0 Volume of Box (x0 , y 0 , z 0 ) : |x − x0 | ≤ ∆ , |y − y 0 | ≤ ∆ , |z − z 0 | ≤ ∆
2 2 2

i.e,

Charge in box (x0 , y 0 , z 0 ) : |x − x0 | ≤ ∆
2 , |y − y0| ≤ ∆
2 , |z − z0| ≤ ∆
2
lim .
∆↓0 ∆3

%(·) is the charge density if for every region D ⊂ R3


Z
Charge in D = %(x, y, z) dx dy dz, D ⊂ R3 .
(x,y,z)∈D

c
Lecture 7, Amos Lapidoth 2017
Pros and Cons of the Second Approach

• The motivation comes first.


• No need for a general formula for the quantity.

• Does such a quantity exist?


• Is it unique?
• If not, does it matter? Can we be more explicit?

c
Lecture 7, Amos Lapidoth 2017
The Definition of Charge Density Revisited
%(·) is the charge density if for every region D ⊂ R3
Z
Charge in D = %(x, y, z) dx dy dz, D ⊂ R3 .
(x,y,z)∈D

• Does such a function exist? Not if there are point charges. . .


• Is it unique? Two such functions can differ on a null set.
• Is such a function necessarily nonnegative? No, but if a
function like this exists, then so does one that is nonnegative.
• So let us add nonnegativity to the definition.

This is the approach we’ll adopt.

c
Lecture 7, Amos Lapidoth 2017
246 Some Etymology
Operational Power Spectral Density

function quantity of interest per unit of


charge (spatial) density charge space
mass (spatial) density mass space
mass line density mass length
probability (per unit of X) density probability unit of X
power spectral density power spectrum (Hz)

Table 15.1:like
This suggest something Various densities and their units

Z
charge need Power
not be uniformly distributed,
of X in D = �(·)
SXXis typically
(f ) df, notD ⊂ R, so the charge
constant
density is a function of location. Thus, we usually write �(x, y, z) for the charge
f ∈D
density at the location (x, y, z). This can be defined differentially or integrally.
Z
The differential definition is
Power of X in D = I{f ∈ D} SXX (f ) df, D ⊂ R.
�(x, y, z)
�all frequencies �
Charge in Box (x� , y � , z � ) : |x − x� | ≤ ∆ � ∆ � ∆
2 , |y − y | ≤ 2 , |z − z | ≤ 2 �
= lim �
But what does this
∆↓0 Volume mean?
of Box (x� , y � , z � ) : |x − x� | ≤ ∆ � ∆ � ∆
2 , |y − y | ≤ 2 , |z − z | ≤ 2
� � � � � ∆ � ∆ � ∆

Charge in box (x , y , z ) : |x − x | ≤ 2 , |y − y | ≤ 2 , |z − z | ≤ 2
= lim ,
c
Amos
Lecture 7, ∆↓0 Lapidoth 2017 ∆3
Some Hand-waving

Imagine a filter of frequency response

ĥ(f ) = I{f ∈ D},

and think of the power of X(·) in the frequencies D as the


the average power of X ? h.

We extend the requirement to more general filters, but “nice”:


Z
Power of X ? h = |ĥ(f )|2 SXX (f ) df, h “nice.”
all frequencies

c
Lecture 7, Amos Lapidoth 2017
Uniqueness

For real filters:


Z ∞
Power of X ? h = |ĥ(f )|2 SXX (f ) df, h real and “nice.”
Z−∞
∞ 
= |ĥ(f )|2 SXX (f ) + |ĥ(−f )|2 SXX (−f ) df
Z0 ∞  
= |ĥ(f )|2 SXX (f ) + SXX (−f ) df.
0

Thus, if SXX satisfies the requirement and

S̃(f ) + S̃(−f ) = SXX (f ) + SXX (−f ), f ∈ R,

then S̃(·) also satisfies the requirement. No uniqueness!

c
Lecture 7, Amos Lapidoth 2017
Insisting on Symmetry
Suppose we have found some S(·) satisfying
Z ∞
Power of X ? h = |ĥ(f )|2 S(f ) df, h real and “nice.”
−∞

Define
1 
S̃(f ) = S(f ) + S(−f ) , f ∈ R.
2
Then
S̃(f ) + S̃(−f ) = S(f ) + S(−f ), f ∈ R,
so
Z ∞
Power of X ? h = |ĥ(f )|2 S̃(f ) df, h real and “nice”
−∞

and
S̃(·) is symmetric.
c
Lecture 7, Amos Lapidoth 2017
The Definition of the Operational PSD


The continuous-time real SP X(t) is of operational power
spectral density SXX if it is a measurable SP; SXX : R → R is
integrable and symmetric; and for every stable real filter of impulse
response h ∈ L1
Z ∞
Power in X ? h = SXX (f ) |ĥ(f )|2 df.
−∞

c
Lecture 7, Amos Lapidoth 2017
Uniqueness


0 (·) are operational PSDs for X(t) , then the
If both SXX and SXX
set of frequencies at which they differ is of Lebesgue measure zero.

(Corollary 15.3.6)

c
Lecture 7, Amos Lapidoth 2017
Nonnegativity


If X(t) is of operational PSD SXX , then SXX must be
nonnegative except possibly on a set of frequencies of Lebesgue
measure zero.

(Corollary 15.3.3)

c
Lecture 7, Amos Lapidoth 2017
Filtering PAM Signals

Passing a PAM signal of pulse shape g through a stable filter of


impulse response h is tantamount to changing its pulse shape from
g to g ? h:
 X   X
σ 7→ A X` g(σ−`Ts ) ?h (t) = A X` (g?h)(t−`Ts ), t ∈ R.
` `

If you know how to compute the power in PAM, then you also
know how to compute the power in a filtered PAM.

c
Lecture 7, Amos Lapidoth 2017
Filtering a PAM Signal—Proof
• Convolution is linear

(αg1 + βg2 ) ? h = α(g1 ? h) + β(g2 ? h).

• It commutes with the shift

(time-shifted g) ? h = time-shift of (g ? h).

 ∞
X  

X ? h (t) = σ 7→ A X` g(σ − `Ts ) ? h (t)
`=−∞

X Z ∞
=A X` h(s) g(t − s − `Ts ) ds
`=−∞ −∞

X∞
=A X` (g ? h)(t − `Ts ), t ∈ R.
`=−∞

c
Lecture 7, Amos Lapidoth 2017

X` Is Centered and WSS


1 2 X
P= A KXX (m) Rgg (mTs )
Ts m=−∞
Z ∞
A2 ∞ X
= KXX (m) ei2πf mTs |ĝ(f )|2 df.
Ts −∞ m=−∞

For the power in X ? h we replace g with g ? h and use

g[
? h(f ) = ĝ(f ) ĥ(f ), f ∈R:
Z ∞ ∞ 
A2 X i2πf mTs
Power in X ? h = KXX (m) e |ĝ(f )| |ĥ(f )|2 df.
2
−∞ Ts m=−∞
| {z }
SXX (f )

But we must verify symmetry!


c
Lecture 7, Amos Lapidoth 2017
Verifying Symmetry
• |ĝ(−f )|2
= |ĝ(f )|2 (g is real).
• KXX (−m) = KXX (m) (autocovariance of a real DT SP).

X
KXX (m) ei2π(−f )mTs |ĝ(−f )|2
m=−∞

X
= KXX (m) ei2π(−f )mTs |ĝ(f )|2
m=−∞
X∞
 0
= KXX −m0 ei2π(−f )(−m )Ts |ĝ(f )|2
m0 =−∞
X ∞
 0
= KXX m0 ei2πf m Ts |ĝ(f )|2
m0 =−∞
X ∞
= KXX (m) ei2πf mTs |ĝ(f )|2 .
m=−∞

c
Lecture 7, Amos Lapidoth 2017
Bi-Infinite Block Mode

N N
A2 X X 
P= E[X` X`0 ] Rgg (` − `0 )Ts
NTs 0 `=1 ` =1
Z∞ X N X
N
A 2
0 2
= E[X` X`0 ] ei2πf (`−` )Ts ĝ(f ) df.
NTs −∞ `=1 `0 =1

For the power in X ? h we replace g with g ? h and use

g[
? h(f ) = ĝ(f ) ĥ(f ), f ∈R:
Z ∞ N N 
A2 X X i2πf (`−`0 )Ts
Power in X ? h = E[X` X`0 ] e |ĝ(f )| |ĥ(f )|2 df.
2
−∞ NTs
`=1 `0 =1
| {z }
SXX (f )

But we must still verify symmetry. . .


c
Lecture 7, Amos Lapidoth 2017
Verifying Symmetry
N N
A2 X X 0
E[X` X`0 ] ei2πf (`−` )Ts |ĝ(f )|2
NTs 0
| {z }
`=1 ` =1 a`,`0

Use
N X
X N N
X N X
X `−1

a`,`0 = a`,` + a`,`0 + a`0 ,`
`=1 `0 =1 `=1 `=2 `0 =1

E[X` X`0 ] = E[X`0 X` ] :


0 0
a`,`0 + a`0 ,` = E[X` X`0 ] ei2πf (`−` )Ts + E[X`0 X` ] ei2πf (` −`)Ts

= 2E[X` X`0 ] cos 2πf (` − `0 )Ts .
N N `−1
!
A2 X  2  X X 
E X` + 2 E[X` X`0 ] cos 2πf (` − `0 )Ts |ĝ(f )|2
NTs 0
`=1 `=2 ` =1
which is symmetric in f .
c
Lecture 7, Amos Lapidoth 2017
Haven’t We Forgotten Something?

What about when the time shifts of the pulse shape by integer
multiples of Ts are orthonormal?
Z T  L
X
1 2 A2 1  
lim E X (t) dt = lim E X`2 .
T→∞ 2T −T Ts L→∞ 2L + 1
`=−L

This property isn’t preserved under filtering: φ ? h need not have it.

c
Lecture 7, Amos Lapidoth 2017
The Bandwidth of a SP


We say that a SP X(t) of operational PSD SXX is
bandlimited to W Hz if, except on a set of frequencies of Lebesgue
measure zero, SXX (f ) is zero whenever |f | > W.


The smallest
 W to which X(t) is limited is the bandwidth of
X(t) .

c
Lecture 7, Amos Lapidoth 2017
The Bandwidth of PAM

Assume bi-infinite block-mode and


N
X  
A > 0, E X`2 > 0,
`=1

so X is not deterministically zero.

The bandwidth of X is the bandwidth of the pulse shape g.

c
Lecture 7, Amos Lapidoth 2017
Proof (1)

It cannot be larger because


N N
A2 X X 0
SXX (f ) = E[X` X`0 ] ei2πf (`−` )Ts |ĝ(f )|2
NTs 0
`=1 ` =1

so    
ĝ(f ) = 0 =⇒ SXX (f ) = 0 .

c
Lecture 7, Amos Lapidoth 2017
Proof (2)
N N
A XX 2
0
SXX (f ) = E[X` X`0 ] ei2πf (`−` )Ts |ĝ(f )|2
NTs 0 `=1 ` =1
There could be frequencies where SXX (f ) is zero but ĝ(f ) is not,
i.e., the zeros of m
N X
2 X N z }| {
A 0
σ(f ) , E[X` X`0 ] ei2πf (` − ` )Ts
NTs 0 `=1 ` =1
N−1
X
= γm ei2πf mTs
m=−N+1
N−1
X

= γm z m ,
z=ei2πf Ts
m=−N+1
min{N,N+m}
A2 X
γm = E[X` X`−m ] , m ∈ {−N + 1, . . . , N − 1}.
NTs
`=max{1,m+1}
c
Lecture 7, Amos Lapidoth 2017
Proof (3)
SXX (f ) is zero while ĝ(f ) is not only if ei2πf Ts is a root of
N−1
X
z 7→ γm z m .
m=−N+1

Since ei2πf Ts is nonzero, we can multiply by z N−1 . Thus, σ(f ) is


zero iff ei2πf Ts is a root of
2N−2
X
z 7→ γν−N+1 z ν .
ν=0

Our assumptions guarantee that γ0 > 0, so the polynomial is


nonzero. Hence, it has at most 2N − 2 distinct roots. Denote
those of unit magnitude

eiθ1 , . . . , eiθd , d ≤ 2N − 2 and θ1 , . . . , θd ∈ [−π, π).


c
Lecture 7, Amos Lapidoth 2017
Proof (4)

SXX (f ) is zero while ĝ(f ) is not only if ei2πf Ts ∈ {eiθ1 , . . . , eiθd }.


   θ η 
ei2πf Ts = eiθ ⇐⇒ f= + , η∈Z .
2πTs Ts
Thus, SXX (f ) is zero while ĝ(f ) is not only if f is in the set
   
θ1 η θd η
+ : η ∈ Z ∪ ··· ∪ + :η∈Z .
2πTs Ts 2πTs Ts

This set is countable, so the bandwidth of X cannot be smaller


than the bandwidth of g.

(If g is bandlimited, then there can be at most a finite number of


frequencies at which SXX is zero and ĝ is not.)

c
Lecture 7, Amos Lapidoth 2017
Recap (1)


• X(t) is of operational PSD SXX if SXX : R → R is
symmetric and for every stable real filter
Z ∞
Power in X ? h = SXX (f ) |ĥ(f )|2 df.
−∞

• Two such functions must be equal (outside a null set).


• Any such function must be nonnegative (outside a null set).
• Passing a PAM signal of pulse shape g through a stable filter
of impulse response h is tantamount to changing its pulse
shape from g to g ? h.

c
Lecture 7, Amos Lapidoth 2017
Recap (2)


• If X` is WSS and centered

A2 X
SXX (f ) = KXX (m) ei2πf mTs |ĝ(f )|2 .
Ts m=−∞

• In bi-infinite block-mode with enc(·):

N N
A2 X X 0
SXX (f ) = E[X` X`0 ] ei2πf (`−` )Ts |ĝ(f )|2 .
NTs 0
`=1 ` =1

• No analogous result if we only assume shift-orthonormality.

c
Lecture 7, Amos Lapidoth 2017
Recap (3)


• A SP X(t) of operational PSD SXX is bandlimited to W Hz
if, except on a set of frequencies of Lebesgue measure zero,
SXX (f ) is zero whenever |f | > W.

• The smallest W to which X(t) is bandlimited is called the

bandwidth of X(t) .
• The bandwidth of a nonzero PAM signal equals the
bandwidth of its pulse shape.

c
Lecture 7, Amos Lapidoth 2017
Next Week

Quadrature Amplitude Modulation (Chapter 16).

Thank you!

c
Lecture 7, Amos Lapidoth 2017
Communication and Detection Theory: Lecture 8

Amos Lapidoth
ETH Zurich

April 11, 2017

Quadrature Amplitude Modulation (QAM)

c
Lecture 8, Amos Lapidoth 2017
Today

• Linear passband communication.


• Quadrature Amplitude Modulation (QAM).
• Bandwidth around fc .
• Orthogonality.
• Spectral efficiency.
• Constellations.
• Symbol recovery in the absence of noise.
• A glimpse at complex random variables.

c
Lecture 8, Amos Lapidoth 2017
Passband Communication

The transmitted signals must be bandlimited to W Hz around the


carrier frequency fc .

We assume throughout
W
fc > .
2

c
Lecture 8, Amos Lapidoth 2017
The Good-Old Baseband
The pulse shape √
t 7→ 2W sinc(2Wt)
is of bandwidth W, and its time shifts by integer multiples of
1/(2W) are orthonormal. By using it with PAM we can send
symbols arriving at rate
 
real symbol
Rs
second
as the coefficients in a linear combination of orthonormal signals
whose bandwidth does not exceed
Rs
[Hz] .
2
For each 1 Hz at baseband we obtain 2 real-dimensions per second:
our spectral efficiency is
[real dimension/sec]
2 .
[baseband Hz]
c
Lecture 8, Amos Lapidoth 2017
Objective
Transmit real symbols arriving at rate Rs [real symbol/second] as
the coefficients in a linear combination of orthonormal passband
signals occupying a bandwidth of W Hz around the carrier
frequency fc , where W equals Rs /2:

[real dimension/sec]
2 .
[passband Hz]

Since real symbols at rate Rs [real symbols/second] can be viewed


as complex symbols at rate Rs /2 [complex symbol/second]

[complex dimension/sec]
1 .
[passband Hz]

And don’t make things too carrier dependent.

c
Lecture 8, Amos Lapidoth 2017
The PAM Solution—Not Great

Find a pulse shape φ that is bandlimited to W Hz around fc and


that satisfies the Nyquist criterion
∞ 
X j  2
φ̂ f + ≡ Ts .
Ts
j=−∞

Why is this not so great?


• Can achieve the spectral efficiency only if

4fc Ts is an odd integer.

• The choice of the pulse shape depends on fc .

c
Lecture 8, Amos Lapidoth 2017
QAM in a Nutshell

The baseband representation of the transmitted signal is PAM with


complex symbols and (possibly) complex pulse shapes.

c
Lecture 8, Amos Lapidoth 2017
The QAM Signal
• Map the bits to complex symbols

ϕ : {0, 1}k → Cn .
• The rate is  
k bit
.
n complex symbol
• The baseband representation of the transmitted signal is
n
X
XBB (t) = A C` g(t − `Ts ), t ∈ R.
`=1

• The transmitted signal is


 Xn 
XPB (t) = 2 Re A C` g(t − `Ts ) ei2πfc t , t ∈ R.
`=1

c
Lecture 8, Amos Lapidoth 2017
Alternative Representation
Using

Re(wz) = Re(w) Re(z) − Im(w) Im(z), Im(z) = − Re(iz),

with w = C` :
gI,` (t)
z }| !{
n
X
√ 1
XPB (t) = 2A Re(C` ) 2 Re √ g(t − `Ts ) ei2πfc t
2
`=1 | {z }
gI,`,BB (t)
gQ,` (t)
z }| !{
n
X
√ 1
+ 2A Im(C` ) 2 Re i √ g(t − `Ts ) ei2πfc t , t ∈ R.
2
`=1 | {z }
gQ,`,BB (t)

c
Lecture 8, Amos Lapidoth 2017
If the Pulse Shape is Real:

n
X
XPB (t) = 2A Re(C` ) g(t − `Ts ) cos(2πfc t)
`=1
n
X
− 2A Im(C` ) g(t − `Ts ) sin(2πfc t), g real.
`=1

When g is real, the QAM signal is the sum of:


• the result of feeding {Re(C` )} to a PAM modulator of pulse
shape g and multiplying the result by cos(2πfc t), and
• the result of feeding {Im(C` )} to a PAM modulator of pulse
shape g and multiplying the result by − sin(2πfc t).

c
Lecture 8, Amos Lapidoth 2017
QAM Modulator with a Real Pulse Shape
P P
A ` Re(C` )g(t − `Ts ) A ` Re(C` )g(t − `Ts ) cos(2πfc t)
PAM ×
Re(C` )

Re(·)
cos(2πfc t)
xPB (t)/2
{C` } +

90◦
Im(·)
−sin(2πfc t)
Im(C` )
PAM P × P
A ` Im(C` )g(t − `Ts ) −A ` Im(C` )g(t − `Ts ) sin(2πfc t)

c
Lecture 8, Amos Lapidoth 2017
Bandwidth Considerations

• The bandwidth of a xPB around fc is twice the bandwidth


of xBB .
• The bandwidth of xBB (if nonzero) is the bandwidth of g.

The bandwidth of a QAM signal around the carrier frequency is


twice the bandwidth of its pulse shape.

We multiplied the PAM signal by a carrier.

c
Lecture 8, Amos Lapidoth 2017
Orthogonality Considerations (1)
If the pulse shape φ satisfies
Z ∞
φ(t − `Ts ) φ∗ (t − `0 Ts ) dt = I{` = `0 }, `, `0 ∈ Z,
−∞

then the QAM signal XPB (·) can be expressed as


n
X n
X
√ √
XPB = 2A Re(C` ) ψI,` + 2A Im(C` ) ψQ,`
`=1 `=1

where
. . . , ψI,−1 , ψQ,−1 , ψI,0 , ψQ,0 , ψI,1 , ψQ,1 , . . .
are orthonormal functions:
 
1 i2πfc t
ψI,` : t 7→ 2 Re √ φ(t − `Ts ) e , ` ∈ Z,
2
 
1 i2πfc t
ψQ,` : t 7→ 2 Re i √ φ(t − `Ts ) e , ` ∈ Z.
2
c
Lecture 8, Amos Lapidoth 2017
Orthogonality Considerations (2)
ψI,` (t)
n  z}| {
√ X 1
XPB (t) = 2A Re(C` ) 2 Re √ φ(t − `Ts ) ei2πfc t
2
`=1 | {z }
ψI,`,BB (t)
ψQ,` (t)
n  z }| {
√ X 1
+ 2A Im(C` ) 2 Re i √ φ(t − `Ts ) ei2πfc t , t ∈ R.
2
`=1 | {z }
ψQ,`,BB (t)

Recall (Theorem 7.6.10)



hxPB , yPB i = 2 Re hxBB , yBB i ,

so xPB and yPB are orthogonal iff hxBB , yBB i is purely imaginary.
c
Lecture 8, Amos Lapidoth 2017
Orthogonality Considerations (3)



ψI,` , ψI,`0 = 2 Re ψI,`,BB , ψI,`0 ,BB
D 1 1 E
= 2 Re t 7→ √ φ(t − `Ts ), t 7→ √ φ(t − `0 Ts )
 0
2 2
= Re I ` = `
= I{` = `0 },



ψQ,` , ψQ,`0 = 2 Re ψQ,`,BB , ψQ,`0 ,BB
D 1 1 E
= 2 Re t 7→ i √ φ(t − `Ts ), t 7→ i √ φ(t − `0 Ts )
 2  2
∗ 0
= i i Re I ` = `
= I{` = `0 },

D 1 1 E
ψI,` , ψQ,`0 = 2 Re t 7→ √ φ(t − `Ts ), t 7→ i √ φ(t − `0 Ts )
 2
 2
∗ 0
= Re i I ` = ` = 0.
c
Lecture 8, Amos Lapidoth 2017
Spectral Efficiency

• Choose φ of bandwidth W/2, e.g., φ : t 7→ W sinc(Wt).
• The QAM signal is then of bandwidth W around fc .
• To satisfy Nyquist,
1
. Ts ≥
W
• We are then sending complex symbols at rate 1/Ts , i.e., W,
complex symbols per second.
• This corresponds to 2W real symbols per second.
• By orthogonality, we achieve

[real dimension/sec]
2 .
[passband Hz]

c
Lecture 8, Amos Lapidoth 2017
Mission Accomplished

QAM with√the bandwidth-W/2, unit-energy, pulse


shape t 7→ W sinc(Wt) transmits a sequence of real
symbols arriving at a rate of 2W real symbols per sec-
ond as the coefficients in a linear combination of or-
thogonal signals, with the resulting waveform being
bandlimited to W Hz around the carrier frequency fc .
It thus achieves a spectral efficiency of

[real dimension/sec] [complex dimension/sec]


2 =1 .
[passband Hz] [passband Hz]

c
Lecture 8, Amos Lapidoth 2017
QAM Constellations

The constellation C is the smallest subset of C s.t.

Ci ∈ C, i = 1, . . . , n.

c
Lecture 8, Amos Lapidoth 2017
4-QAM 16-QAM

8-PSK 32-QAM

c
Lecture 8, Amos Lapidoth 2017
The Constellation’s Parameters

The minimum distance of C is

δ , min
0
|c − c0 |.
c,c ∈C
c6=c0

The second moment of C is


1 X 2
|c| .
#C
c∈C

c
Lecture 8, Amos Lapidoth 2017
Recovering the Symbols

If the time shifts of φ by integer multiples of Ts are orthonormal,


n
X n
X
√ √
XPB = 2A Re(C` ) ψI,` + 2A Im(C` ) ψQ,` ,
`=1 `=1

where . . . , ψI,−1 , ψQ,−1 , ψI,0 , ψQ,0 , ψI,1 , ψQ,1 , . . . are orthonormal.


Hence
1
Re(C` ) = √ hXPB , ψI,` i , ` ∈ {1, . . . , n},
2A
1
Im(C` ) = √ hXPB , ψQ,` i , ` ∈ {1, . . . , n}.
2A

c
Lecture 8, Amos Lapidoth 2017
Computing hr, ψI,` i and hr, ψQ,` i (1)
More generally, we’ll compute

hr, gI,` i , hr, gQ,` i .

gI,` (t)
z }| !{
n
X
√ 1
XPB (t) = 2A Re(C` ) 2 Re √ g(t − `Ts ) ei2πfc t
2
`=1 | {z }
gI,`,BB (t)
gQ,` (t)
z }| !{
n
X
√ 1
+ 2A Im(C` ) 2 Re i √ g(t − `Ts ) ei2πfc t , t ∈ R.
2
`=1 | {z }
gQ,`,BB (t)

Both gI,` and gQ,` are bandlimited to W Hz around fc .


c
Lecture 8, Amos Lapidoth 2017
Computing hr, ψI,` i and hr, ψQ,` i (2)
gI,` (t)
z }| !{
n
X
√ 1
XPB (t) = 2A Re(C` ) 2 Re √ g(t − `Ts ) ei2πfc t
2
`=1 | {z }
gI,`,BB (t)
gQ,` (t)
z }| !{
n
X
√ 1
+ 2A Im(C` ) 2 Re i √ g(t − `Ts ) ei2πfc t , t ∈ R.
2
`=1 | {z }
gQ,`,BB (t)

Since gI,` and gQ,` are bandlimited to W Hz around fc ,


hr, gI,` i = hs, gI,` i ,
hr, gQ,` i = hs, gQ,` i ,
where
s = r ? BPFW,fc .
c
Lecture 8, Amos Lapidoth 2017
Computing hr, ψI,` i and hr, ψQ,` i (3)

hr, gI,` i = hs, gI,` i ,


hr, gQ,` i = hs, gQ,` i ,
where
s = r ? BPFW,fc .
Denoting the baseband representation of s by sBB ,
hr, gI,` i = hs, gI,` i

= 2 Re hsBB , gI,`,BB i
√ 
= 2 Re hsBB , t 7→ g(t − `Ts )i .

hr, gQ,` i = hs, gQ,` i



= 2 Re hsBB , gQ,`,BB i
√ 
= 2 Re hsBB , t 7→ i g(t − `Ts )i
√ 
= 2 Im hsBB , t 7→ g(t − `Ts )i .
c
Lecture 8, Amos Lapidoth 2017
Bandpass Filtering and Baseband Conversion

× LPFWc Re(sBB )

cos(2πfc t)
s(t)
r(t) BPFW,fc W
≤ Wc ≤ 2fc − W
2 2

90◦

× LPFWc Im(sBB )

c
Lecture 8, Amos Lapidoth 2017
√ 
hr, gI,` i = 2 Re hsBB , t 7→ g(t − `Ts )i ,
√ 
hr, gQ,` i = 2 Im hsBB , t 7→ g(t − `Ts )i
can be computed with real operations:
Z ∞ 
√ ∗
hr, gI,` i = 2 Re sBB (t) g (t − `Ts ) dt
−∞
√ Z ∞  
= 2 Re sBB (t) Re g(t − `Ts ) dt
−∞
√ Z ∞  
+ 2 Im sBB (t) Im g(t − `Ts ) dt,
−∞
and Z ∞ 
√ ∗
hr, gQ,` i = 2 Im sBB (t) g (t − `Ts ) dt
−∞
√ Z ∞  
= 2 Im sBB (t) Re g(t − `Ts ) dt
−∞
√ Z ∞  
− 2 Re sBB (t) Im g(t − `Ts ) dt.
−∞
c
Lecture 8, Amos Lapidoth 2017
Computing hr, ψI,` i and
Z
hr, ψQ,` i: Real Pulse

Shape
√ ∞
hr, gI,` i = 2 Re sBB (t) g ∗ (t − `Ts ) dt
−∞
√ Z ∞  
= 2 Re sBB (t) Re g(t − `Ts ) dt
−∞
√ Z ∞  
+ 2 Im sBB (t) Im g(t − `Ts ) dt
−∞
√ Z ∞ 
= 2 Re sBB (t) g(t − `Ts ) dt
−∞
Z ∞ 

hr, gQ,` i = 2 Im sBB (t) g ∗ (t − `Ts ) dt
−∞
√ Z ∞  
= 2 Im sBB (t) Re g(t − `Ts ) dt
−∞
√ Z ∞  
− 2 Re sBB (t) Im g(t − `Ts ) dt
−∞
√ Z ∞ 
= 2 Im sBB (t) g(t − `Ts ) dt
c
Lecture 8, Amos Lapidoth 2017 −∞
Bandpass Filtering and Baseband Conversion

× LPFWc Re(sBB )

cos(2πfc t)
s(t)
r(t) BPFW,fc W
≤ Wc ≤ 2fc − W
2 2

90◦

× LPFWc Im(sBB )

c
Lecture 8, Amos Lapidoth 2017
Matched Filtering in Baseband (g Real)

Re(sBB ) ~g √1 hr, gI,` i


2
`Ts

Im(sBB ) ~g √1 hr, gQ,` i


2
`Ts

This circuit does not depend on fc .

c
Lecture 8, Amos Lapidoth 2017
Filtering QAM Signals

Let us recall our discussion (Section 7.6.7) of



xPB ? h BB ,
when
• xPB is an integrable signal that is bandlimited to W Hz
around fc , and
• h ∈ L1 is a real stable filter.

c
Lecture 8, Amos Lapidoth 2017
7.7 Energy-Limited Passband Signals 131

x̂PB (f )

W
1

f
−fc fc
ĥ(f )

f
−fc fc
x̂PB (f ) ĥ(f )

f
−fc fc

c
Lecture 8, Amos Lapidoth 2017
ĥ(f )
W

f
fc

f
−W
2
W
2

c
Lecture 8, Amos Lapidoth 2017
The frequency response of the real impulse response h ∈ L1
with respect to the bandwidth W around the carrier
frequency fc is the mapping
n Wo
f 7→ ĥ(f + fc ) I |f | ≤ .
2

The FT of 
xPB ? h BB
is the product of x̂BB by the filter’s frequency response with
respect to the bandwidth W around the carrier frequency fc
n Wo
f 7→ x̂BB (f ) ĥ(f + fc ) I |f | ≤ .
2

c
Lecture 8, Amos Lapidoth 2017
7.7 Energy-Limited Passband Signals 131

x̂PB (f )

W
1

f
−fc fc
ĥ(f )

f
−fc fc
x̂PB (f ) ĥ(f )

f
−fc fc

c
Lecture 8, Amos Lapidoth 2017
x̂BB (f )

f
−W
2
W
2

c
Lecture 8, Amos Lapidoth 2017
Returning to Filtered QAM
The baseband representation of QAM is a complex PAM, so
Xn
X̂BB (f ) = A C` e−i2πf `Ts ĝ(f ), f ∈ R.
`=1
The baseband representation of XPB ? h is hence of FT
Xn
f 7→ A C` e−i2πf `Ts ĝ(f ) ĥ(f + fc ), f ∈ R.
`=1
In the time domain,
n
X

XPB ? h BB
(t) = A C` p(t − `Ts ),
`=1
where Z ∞
p(t) = ĝ(f ) ĥ(f + fc ) ei2πf t df, t ∈ R.
−∞

XPB ? h BB is a complex PAM with g replaced by p.
c
Lecture 8, Amos Lapidoth 2017
Filtering a QAM Signal

Filtering a QAM signal xPB through h ∈ L1 is tantamount to


replacing its pulse shape g by the pulse shape p, where
Z ∞
p(t) = ĝ(f ) ĥ(f + fc ) ei2πf t df
−∞

Note that p may be complex even if g is real.

c
Lecture 8, Amos Lapidoth 2017
Complex Random Variables

• A CRV maps experiment outcomes ω ∈ Ω to C.


• Its real and imaginary parts are (real) RVs.
• Any two real RVs X and Y can be used to construct the CRV

Z = X + iY.

• The distribution of a CRV is determined by the joint law of its


real and imaginary parts.
So far nothing more than a trivial data structure for pairs of real
random variables. . .

c
Lecture 8, Amos Lapidoth 2017
The Density of a CRV

The PDF fZ (·) of Z at z ∈ C is the joint PDF of the real pair


(Re(Z), Im(Z)) at (Re(z), Im(z)):

fZ (z) , fRe(Z),Im(Z) Re(z), Im(z) , z ∈ C.

Thus,

∂2  
fZ (z) = Pr Re(Z) ≤ x, Im(Z) ≤ y , z ∈ C.
∂x ∂y x=Re(z),y=Im(z)

c
Lecture 8, Amos Lapidoth 2017
The Expectation of a CRV

E[Z] = E[Re(Z)] + i E[Im(Z)] .


Thus,
  
Re E[Z] = E Re(Z) ,
  
Im E[Z] = E Im(Z) .
Consequently, conjugation and expectation commute
E[Z ∗ ] = (E[Z])∗ .
If g : C → C, then
Z
 
E g(Z) = fZ (z) g(z) dz
Zz∈C
∞ Z ∞ 
= fZ (x + iy) Re g(x + iy) dx dy
−∞ −∞
Z ∞Z ∞

+i fZ (x + iy) Im g(x + iy) dx dy.
−∞ −∞
c
Lecture 8, Amos Lapidoth 2017
The Variance

Here we do not treat Z as a pair!


 
Var[Z] , E |Z − E[Z]|2
 
= E |Z|2 − |E[Z]|2
   
= Var Re(Z) + Var Im(Z) .

Contrast with the covariance matrix of the pair (Re(Z), Im(Z))


    
Var Re(Z) Cov Re(Z), Im(Z)
 .
   
Cov Re(Z), Im(Z) Var Im(Z)

Var[Z] is the trace of the covariance matrix of (Re(Z), Im(Z)).

c
Lecture 8, Amos Lapidoth 2017
Proper CRV
A CRV Z is proper if it is zero-mean; of finite-variance; and
 
E Z 2 = 0.

Since
     
E Z 2 = E Re(Z)2 − Im(Z)2 + i2E Re(Z) Im(Z) ,
 
the condition E Z 2 = 0 is equivalent to
   
E Re(Z)2 = E Im(Z)2

and  
E Re(Z) Im(Z) = 0.

Z is proper iff:Z is of zero mean; Re(Z) & Im(Z) have the same
finite variance; and Re(Z) & Im(Z) are uncorrelated.
c
Lecture 8, Amos Lapidoth 2017

The Covariance Matrix of Re(Z), Im(Z)


In general, the covariance matrix of Re(Z), Im(Z) is
    
Var Re(Z) Cov Re(Z), Im(Z)
 .
   
Cov Re(Z), Im(Z) Var Im(Z)

But if Z is proper,
1 
2 Var[Z] 0
 .
1
0 2 Var[Z]

c
Lecture 8, Amos Lapidoth 2017
The Covariance

  ∗ 
Cov[Z, W ] , E Z − E[Z] W − E[W ] .

This is not a matrix!

c
Lecture 8, Amos Lapidoth 2017
Properties of the Covariance (1)
1. Conjugate Symmetry:
∗
Cov[Z, W ] = Cov[W, Z] .

2. Sesquilinearity:

Cov[αZ, W ] = αCov[Z, W ] ,
Cov[Z1 + Z2 , W ] = Cov[Z1 , W ] + Cov[Z2 , W ] ,
Cov[Z, βW ] = β ∗ Cov[Z, W ] ,
Cov[Z, W1 + W2 ] = Cov[Z, W1 ] + Cov[Z, W2 ] ,

and, more generally,


X
n n
X
0  n X
X n 0
 
Cov αj Zj , βj 0 Wj 0 = αj βj∗0 Cov Zj , Wj 0 .
j=1 j 0 =1 j=1 j 0 =1

c
Lecture 8, Amos Lapidoth 2017
Properties of the Covariance (2)

3. Relation with Variance:

Var[Z] = Cov[Z, Z] .

4. Variance of Linear Functionals:


Xn  X n X n
 
Var αj Zj = αj αj∗0 Cov Zj , Zj 0 .
j=1 j=1 j 0 =1

c
Lecture 8, Amos Lapidoth 2017
WSS Discrete-Time Complex Stochastic Processes


A discrete-time CSP Zν is wide-sense stationary if:
1. For every ν ∈ Z the CRV Zν is of finite variance.
2. The mean of Zν does not depend on ν.
 
3. E Zν Zν∗0 depends on ν 0 and ν only via ν − ν 0 :
 
E[Zν Zν∗0 ] = E Zν+η Zν∗0 +η , ν, ν 0 , η ∈ Z.

Note: we do not require that E[Zν 0 Zν ] (unconjugated) be


computable from ν − ν 0 ; it may or may not be.

c
Lecture 8, Amos Lapidoth 2017
Autocovariance Function

KZZ (η) , Cov[Zν+η , Zν ]


  ∗ 
= E Zν+η − E[Z1 ] Zν − E[Z1 ] , η ∈ Z.

Key properties:
• KZZ is conjugate symmetric:

KZZ (−η) = KZZ (η) , η ∈ Z.

• KZZ is a positive-definite function:


n X
X n
αν αν∗ 0 KZZ (ν − ν 0 ) ≥ 0, α1 , . . . , αn ∈ C.
ν=1 ν 0 =1

c
Lecture 8, Amos Lapidoth 2017
The PSD of a Complex Discrete-Time SP


Zν is of power spectral density SZZ if
Z 1/2
KZZ (η) = SZZ (θ) ei2πηθ dθ, η ∈ Z.
−1/2

Note that SZZ need not be symmetric!

SZZ must be nonnegative outside a null set. By altering it on that


set, we can always assume that the PSD, if it exists, is nonnegative.

c
Lecture 8, Amos Lapidoth 2017
The PSD when the Autocovariance Function is Summable
If the autocovariance function KZZ is absolutely summable, i.e.,

X
KZZ (η) < ∞,
η=−∞

then the function



X
S(θ) = KZZ (η) e−i2πηθ , θ ∈ [−1/2, 1/2]
η=−∞

is continuous, nonnegative, and


Z 1/2
S(θ) ei2πηθ dθ = KZZ (η), η ∈ Z.
−1/2

c
Lecture 8, Amos Lapidoth 2017
The Intuition
The complex exponentials are orthonormal:
Z 1/2
0
ei2π(η−η )θ dθ = I{η = η 0 }, η, η 0 ∈ Z.
−1/2
Hence,
Z 1/2 Z 1/2  ∞
X 
i2πηθ −i2πη 0 θ
S(θ) e dθ = KXX (η ) e 0
ei2πηθ dθ
−1/2 −1/2 η 0 =−∞

X Z 1/2
0
= KXX (η )0
e−i2πη θ ei2πηθ dθ
η 0 =−∞ −1/2

X∞ Z 1/2
0
= KXX (η 0 ) ei2π(η−η )θ dθ
η 0 =−∞ −1/2

X∞
= KXX (η 0 ) I{η = η 0 }
η 0 =−∞
= KXX (η), η ∈ Z.
c
Lecture 8, Amos Lapidoth 2017
Next Week

Energy, Power, and Operational PSD of QAM (Chapter 18).

Please read Chapter 19 through Section 19.7.

Thank you!

c
Lecture 8, Amos Lapidoth 2017
Communication and Detection Theory: Lecture 9

Amos Lapidoth
ETH Zurich

April 25, 2017

Energy, Power, and Operational PSD of QAM

c
Lecture 9, Amos Lapidoth 2017
Today

• Energy in QAM.
• Power in QAM.
• Operational PSD of QAM.

c
Lecture 9, Amos Lapidoth 2017
Sending a Single Block
• K IID random bits D1 , . . . , DK are transmitted.
• These bits are mapped by

enc : {0, 1}K → CN

to N complex symbols C1 , . . . , CN .
• The transmitted signal is

X(t) = 2 Re XBB (t) ei2πfc t
N
!
X
i2πfc t
= 2 Re A C` g(t − `Ts ) e , t ∈ R,
`=1

where the baseband representation of the transmitted signal is


N
X
XBB (t) = A C` g(t − `Ts ), t ∈ R.
`=1

c
Lecture 9, Amos Lapidoth 2017
Assumptions

• D1 , . . . , DK are IID random bits.


• g is bandlimited to W/2 Hz.
• fc > W/2.

c
Lecture 9, Amos Lapidoth 2017
The Energy in a Single Block

We seek Z ∞ 
E,E 2
X (t) dt .
−∞

Since XBB (·) is bandlimited to W/2 Hz, and since fc > W/2,
Z ∞ 

E = 2E XBB (t) 2 dt .
−∞

Calculate as in PAM, but with complex symbols and pulse shape:


• Use |w|2 = ww∗ , w ∈ C and
• swap summations, integrations, and expectations.

c
Lecture 9, Amos Lapidoth 2017
The Energy in Baseband
Z  Z " N 2 #
∞ ∞ X
E XBB (t) 2 dt = E A C` g(t − `Ts ) dt
−∞ −∞ `=1
Z " N  X
N ∗ #
∞ X
= E A C` g(t − `Ts ) A C`0 g(t − `0 Ts ) dt
−∞ `=1 `0 =1
N X
X N Z ∞
= A2 E[C` C`∗0 ] g(t − `Ts ) g ∗ (t − `0 Ts ) dt
`=1 `0 =1 −∞

XN X N

= A2 E[C` C`∗0 ] Rgg (`0 − `)Ts ,
`=1 `0 =1

where Rgg is the self-similarity function of the pulse shape g


Z ∞
Rgg (τ ) = g(t + τ ) g ∗ (t) dt, τ ∈ R.
−∞
c
Lecture 9, Amos Lapidoth 2017
Simplifications
This simplifies if {C` } are of zero mean and uncorrelated

Z ∞  N
X
 
E XBB (t) 2 dt = A2 kgk2 E |C` |2 ,
2
−∞ `=1
   
E[C` C`∗0 ] 2
= E |C` | I{` = `0 }, `, `0 ∈ {1, . . . , N} ,

or if the time shifts of the pulse shape by integer multiples of Ts


are orthonormal
Z ∞  N
X
2  
E XBB (t) dt = A2 E |C` |2 ,
−∞ `=1
Z ∞ 
∗ 0 0 0
g(t − `Ts )g (t − ` Ts ) dt = I{` = ` }, `, ` ∈ {1, . . . , N} .
−∞

c
Lecture 9, Amos Lapidoth 2017
The Energy in XPB
Z ∞
Rgg (τ ) = |ĝ(f )|2 ei2πf τ df, τ ∈ R,
−∞
so
N X
X N

E = 2A2 E[C` C`∗0 ] Rgg (`0 − `)Ts
`=1 `0 =1
Z ∞ X N X
N
0
= 2A2 E[C` C`∗0 ] ei2πf (` −`)Ts |ĝ(f )|2 df.
−∞ `=1 `0 =1
 
Only expectations of the form E C` C`∗0 show up; not E[C` C`0 ].
We define the energy per bit Eb
E
Eb ,
K
and the energy per complex symbol Es
E
Es , .
N
c
Lecture 9, Amos Lapidoth 2017
Relating Power in Passband to Power in Baseband
• The energy in a passband signal is twice the energy in its
baseband representation.
• But power is trickier:
"Z # "Z #
T T
1 1 2
XBB (t) dt ,
E X 2 (t) dt 6= 2 E
2T −T 2T −T

because t 7→ X(t) I{|t| ≤ T} is not bandlimited around fc .


Fortunately, equality holds in the limit:
"Z # "Z #
T T
1 1 2
XBB (t) dt .
lim E X 2 (t) dt = 2 lim E
T→∞ 2T −T T→∞ 2T −T

The power in QAM is twice the power in its baseband


representation.

c
Lecture 9, Amos Lapidoth 2017

C` Is Zero-Mean and WSS (1)


Assume that C` is a zero-mean WSS discrete-time CSP of
autocovariance function KCC :

E[C` ] = 0, ` ∈ Z,

E[C`+m C`∗ ] = KCC (m) , m, ` ∈ Z.


We calculate Z 
τ +Ts
E XBB (t) 2 dt
τ
and show that it does not depend on τ .

c
Lecture 9, Amos Lapidoth 2017
Z  Z " ∞ 2 #
τ +Ts τ +Ts X
E XBB (t) 2 dt = A2 E C` g(t − `Ts ) dt
τ τ `=−∞
Z τ +Ts  X
∞ ∞
X 
=A 2
E C` C`∗0 ∗ 0
g(t − `Ts ) g (t − ` Ts ) dt
τ `=−∞ `0 =−∞
Z ∞
τ +Ts X ∞
X
= A2 E[C` C`∗0 ] g(t − `Ts ) g ∗ (t − `0 Ts ) dt
τ `=−∞ `0 =−∞
Z τ +Ts X∞ ∞
X 
= A2 E[C`0 +m C`∗0 ] g t − (`0 + m)Ts g ∗ (t − `0 Ts ) dt
τ m=−∞ `0 =−∞
Z ∞
τ +Ts X ∞
X 
=A 2
KCC (m) g t − (`0 + m)Ts g ∗ (t − `0 Ts ) dt
τ | {z }
m=−∞ `0 =−∞ t0

X ∞
X Z τ +Ts −`0 Ts
= A2 KCC (m) g(t0 − mTs ) g ∗ (t0 ) dt0
m=−∞ `0 =−∞ τ −`0 Ts

X Z ∞
= A2 KCC (m) g ∗ (t0 ) g(t0 − mTs ) dt0
m=−∞ | −∞ {z }
∗ (mT )
Rgg
c
Lecture 9, Amos Lapidoth 2017 s
We lower-bound the energy of XBB (·) in the interval [−T, +T ] by
  Z τ +Ts 
2T
E XBB (t) 2 dt
Ts τ

and upper-bound it by
  Z τ +Ts 
2T 2
E
XBB (t) dt ,
Ts τ

to obtain (Sandwich Theorem)


Z +T  Z τ +Ts 
1 2 1 2
lim E
XBB (t) dt = E
XBB (t) dt
T→∞ 2T −T Ts τ

A2 X ∗
= KCC (m) Rgg (mTs ).
Ts m=−∞

c
Lecture 9, Amos Lapidoth 2017
The Power in Passband

Since the power in passband is twice the power in baseband:


Z  ∞
1 T
2A2 X ∗
lim E X 2 (t) dt = KCC (m) Rgg (mTs ),
T→∞ 2T −T Ts m=−∞

and
Z  Z ∞
1 T
2A2 ∞ X
lim E 2
X (t) dt = KCC (m) e−i2πf mTs |ĝ(f )|2 df.
T→∞ 2T −T Ts −∞ m=−∞

c
Lecture 9, Amos Lapidoth 2017
The Power in QAM in Bi-Infinite Block-Mode
If enc(·) produces zero-mean symbols from IID random bits:
"Z N 2 #
∞ X
1 A
PBB = E C ` g(t − `Ts )
NTs −∞ `=1
Z ∞X N X N
A2 0
= E[C` C`∗0 ] ei2πf (` −`)Ts |ĝ(f )|2 df.
NTs −∞ 0 `=1 ` =1
Consequently
Z T 
1 2 Es
lim E X (t) dt =
T→∞ 2T −T Ts
where Es = E/N, and
N X
X N

E = 2A 2
E[C` C`∗0 ] Rgg (`0 − `)Ts
`=1 `0 =1
Z ∞ X N X
N
0
= 2A2 E[C` C`∗0 ] ei2πf (` −`)Ts |ĝ(f )|2 df.
c
Lecture 9, Amos −∞
Lapidoth 2017 0
Time Shifts of Pulse Shape Are Orthonormal
Suppose
 X∞ 
i2πfc t
X(t) = 2 Re A C` φ(t − `Ts ) e , t ∈ R,
`=−∞

where φ is bandlimited to W/2 Hz and satisfies


Z ∞
φ(t − `Ts ) φ∗ (t − `0 Ts ) dt = I{` = `0 }, `, `0 ∈ Z,
−∞

and fc > W/2 > 0. Then

Z T  L
X
1 2A2 1  
lim E X 2 (t) dt = lim E |C` |2 ,
T→∞ 2T −T Ts L→∞ 2L + 1
`=−L

whenever the limit on the RHS exists.


c
Lecture 9, Amos Lapidoth 2017
The Operational PSD of a Complex Stochastic Process

We say that a CSP Z(t) is of operational power spectral density
SZZ if, for every integrable complex-valued function h
Z ∞
Power in Z ? h = SZZ (f ) |ĥ(f )|2 df.
−∞

We dropped the symmetry requirement. Nevertheless:

The operational PSD of a CSP is unique in the sense that if a CSP


is of two different operational power spectral densities, then the
two must be indistinguishable.

c
Lecture 9, Amos Lapidoth 2017
The Operational PSD of QAM

If XBB is of operational PSD SBB (·), then the operational PSD of


the QAM signal is


SXX (f ) = SBB |f | − fc , f ∈ R.

For a formal proof, see Section 18.6; intuition follows.

c
Lecture 9, Amos Lapidoth 2017
XBB Is Bandlimited to W/2 Hz

We argue that, because g is bandlimited to W/2 Hz,

W
SBB (f ) = 0, |f | > .
2
More precisely, we’ll assume that XBB is of operational PSD
SBB (·) and show that it is also of operational PSD
n Wo
f 7→ SBB (f ) I |f | ≤ .
2

c
Lecture 9, Amos Lapidoth 2017
 X 
Power in XBB ? h = Power in t 7→ A C` g(t − `Ts ) ? h
`∈Z
X
= Power in t 7→ A C` (g ? h)(t − `Ts )
`∈Z
X 
= Power in t 7→ A C` (g ? LPFW/2 ) ? h (t − `Ts )
`∈Z
X 
= Power in t 7→ A C` g ? (h ? LPFW/2 ) (t − `Ts )
`∈Z
 X 
= Power in t 7→ A C` g(t − `Ts ) ? (h ? LPFW/2 )
`∈Z
Z ∞ 2
= SBB (f ) ĥ(f ) I{|f | ≤ W/2} df
Z−∞
∞  2
= SBB (f ) I{|f | ≤ W/2} ĥ(f ) df.
−∞

c
Lecture 9, Amos Lapidoth 2017
The Baseband Representation of X ? h

Loosely speaking, if h : R → R is integrable, then the baseband


representation of X ? h is XBB ? h0BB , where h0BB : R → C is the
baseband representation of h ? BPFW,fc

ĥ0BB (f ) = ĥ(f + fc ) I{|f | ≤ W/2}, f ∈ R.

f 7→ ĥ(f + fc ) I{|f | ≤ W/2} is the frequency response of h w.r.t.


the bandwidth W around fc .

c
Lecture 9, Amos Lapidoth 2017
7.7 Energy-Limited Passband Signals 131

x̂PB (f )

W
1

f
−fc fc
ĥ(f )

f
−fc fc
x̂PB (f ) ĥ(f )

f
−fc fc

c
Lecture 9, Amos Lapidoth 2017
ĥ(f )
W

f
fc
ĥ0BB (f )

f
−W
2
W
2

c
Lecture 9, Amos Lapidoth 2017
Power in X ? h
= 2 Power in XBB ? h0BB
Z ∞
2
=2 SBB (f ) ĥ0BB (f ) df
Z−∞
∞ 2
=2 SBB (f ) ĥ(f + fc ) I{|f | ≤ W/2} df
Z−∞
∞ 2
=2 SBB (f ) ĥ(f + fc ) df
Z−∞
∞ 2
=2 SBB (f˜ − fc ) ĥ(f˜) df˜
−∞
Z ∞ Z ∞
2 2
= ˜ ĥ ˜
SBB (f − fc ) (f ) df + ˜ SBB (f˜ − fc ) ĥ(−f˜) df˜
Z−∞ Z−∞
∞ 2 ∞ 2
= SBB (f˜ − fc ) ĥ(f˜) df˜ + SBB (−f 0 − fc ) ĥ(f 0 ) df 0
Z−∞ −∞
∞  2
= SBB (f − fc ) + SBB (−f − fc ) ĥ(f ) df
Z−∞∞ 2
= SBB (|f | − fc ) ĥ(f ) df .
−∞
c
Lecture 9, Amos Lapidoth 2017
Computing SBB (·)
• To compute SBB (·) we need the power in XBB ? h.
• Also for complex PAM, feeding XBB to a filter of impulse
response h is tantamount to changing its pulse shape from g
to g ? h.
 ∞
X  

X ? h (t) = σ 7→ A X` g(σ − `Ts ) ? h (t)
`=−∞

X Z ∞
=A X` h(s) g(t − s − `Ts ) ds
`=−∞ −∞

X∞
=A X` (g ? h)(t − `Ts ), t ∈ R.
`=−∞

• If you know how to compute the power in a complex PAM,


you also know how to compute it for a filtered complex PAM.
c
Lecture 9, Amos Lapidoth 2017

C` Zero-Mean WSS and Bounded
We compute the operational PSD of XBB by replacing g with
g ? h:
Z ∞ ∞
X
A2
Power in XBB ? h = KCC (m) e−i2πf mTs |ĝ(f )|2 |ĥ(f )|2 df.
Ts −∞ m=−∞

The operational PSD of XBB is thus



A2 X
SBB (f ) = KCC (m) e−i2πf mTs |ĝ(f )|2 , f ∈ R.
Ts m=−∞

Consequently,


A2 X  2
SXX (f ) = KCC (m) e−i2π(|f |−fc )mTs ĝ |f | − fc , f ∈ R.
Ts m=−∞

c
Lecture 9, Amos Lapidoth 2017

C` Zero-Mean, Variance-σC2 , and Uncorrelated

In this case

A2 X  2
KCC (m) e−i2π(|f |−fc )mTs ĝ |f | − fc , f ∈R
Ts m=−∞

simplifies to

A2 2  2
SXX (f ) = σC ĝ |f | − fc , f ∈ R.
Ts

c
Lecture 9, Amos Lapidoth 2017
ĝ(f )

|ĝ(f )|2

� � ��
�ĝ |f | − fc �2

f
−fc fc

Figure
c
Lecture 9, Amos 18.1: The
Lapidoth relationship between the Fourier Transform� of�the pulse shape
2017
The Operational PSD of QAM in Bi-Infinite Block-Mode
To compute the operational PSD of XBB , replace g with g ? h:

Power in XBB ? h
Z ∞ 2 X N XN 
A
∗ i2πf (`0 −`)Ts
2
= E[C` C`0 ] e ĝ(f ) ĥ(f ) 2 df.
−∞ NTs 0 `=1 ` =1

Hence,
N N
A2 X X 0 2
SBB (f ) = E[C` C`∗0 ] ei2πf (` −`)Ts ĝ(f ) , f ∈ R.
NTs 0 `=1 ` =1

Consequently,

N N
A2 X X 0  2
SXX (f ) = E[C` C`∗0 ] ei2π(|f |−fc )(` −`)Ts ĝ |f | − fc .
NTs 0`=1 ` =1

c
Lecture 9, Amos Lapidoth 2017
You have all read Chapter 19.
But let’s quickly review the Q-function.

c
Lecture 9, Amos Lapidoth 2017
Standard Gaussian
1 w2
fW (w) = √ e− 2 , w ∈ R.

It is of zero mean and unit variance.

fW (w)

c
Lecture 9, Amos Lapidoth 2017
Gaussian Random Variables
• X is a centered Gaussian if

X = aW

for some deterministic a ∈ R and for some standard


Gaussian W .
• X is Gaussian if
X = aW + b
for some deterministic a, b ∈ R and for some standard
Gaussian W .
• Note that a may be zero, in which case X is deterministic.
• There is only one mean-µ variance-σ 2 Gaussian distribution,

N µ, σ 2 .

c
Lecture 9, Amos Lapidoth 2017
Standardizing a Gaussian

• Any affine transformation of a Gaussian is Gaussian.


• There is only one zero-mean unit-variance Gaussian, the
Standard Gaussian.

• If X ∼ N µ, σ 2 with σ 2 > 0, then

X −µ
∼ N (0, 1)
σ
and is thus a standard Gaussian.

c
Lecture 9, Amos Lapidoth 2017
The Q-Function
The Q-function maps every α ∈ R to the probability that a
standard Gaussian exceeds it:

Z ∞
1 2 /2
Q(α) , √ e−ξ dξ, α ∈ R.
2π α

Q(α)

α
c
Lecture 9, Amos Lapidoth 2017
The Q-Function

Q(α)

1
2

α
1

c
Lecture 9, Amos Lapidoth 2017
The Q-Function and Intervals
The CDF of a Standard Gaussian:
FW (w) = Pr[W ≤ w]
= 1 − Pr[W ≥ w]
= 1 − Q(w), w ∈ R.
More generally,
Pr[a ≤ W ≤ b] = Pr[W ≥ a] − Pr[W ≥ b]
= Q(a) − Q(b), a ≤ b.
2

If X ∼ N µ, σ with σ > 0, then
Pr[a ≤ X ≤ b] = Pr[X ≥ a] − Pr[X ≥ b], a ≤ b
   
X −µ a−µ X −µ b−µ
= Pr ≥ − Pr ≥ , σ>0
σ σ σ σ
a − µ b − µ  
=Q −Q , a ≤ b, σ > 0 .
σ σ
c
Lecture 9, Amos Lapidoth 2017
The Q-Function and Rays

Letting b → +∞, we obtain the probability of a half ray:


a − µ
Pr[X ≥ a] = Q , σ > 0.
σ

Letting a → −∞ we obtain
b − µ
Pr[X ≤ b] = 1 − Q , σ > 0.
σ

c
Lecture 9, Amos Lapidoth 2017
The Q-Function with Negative Arguments
• The standard Gaussian density is symmetric. Let
W ∼ N (0, 1).

Pr[W ≥ −α] = Pr[−W ≤ α]


= Pr[W ≤ α]
= 1 − Pr[W ≥ α], α ∈ R.

Consequently,

Q(−α) = 1 − Q(α), α ∈ R.

• Please use only nonnegative arguments to the Q-function!


• Q(0) = 1/2.

c
Lecture 9, Amos Lapidoth 2017
19.4 The Q-Function 343

Q(α)

Q(α)

−α

Q(−α)

−α

Q(α) Q(−α)

−α
Figure 19.4: The identity Q(α) + Q(−α) = 1.
c
Lecture 9, Amos Lapidoth 2017
Linear Combinations of Independent Gaussians

Suppose Z1 , . . . , ZJ are independent centered Gaussians



Zj ∼ N 0, σj2 , j = 1, . . . , J.

Let
α1 , . . . , αJ ∈ R
be deterministic constants. Then
J
X J
X
2
 2
αj Zj ∼ N 0, σ , σ = αj2 σj2 .
j=1 j=1

c
Lecture 9, Amos Lapidoth 2017
Next Week

Binary Hypothesis Testing (Chapter 20).

Thank you!

c
Lecture 9, Amos Lapidoth 2017
Communication and Detection Theory:
Lecture 10

Amos Lapidoth
ETH Zurich

May 2, 2017

Binary Hypothesis Testing

c
Lecture 10, Amos Lapidoth 2017
Today

• Binary Hypothesis Testing.

c
Lecture 10, Amos Lapidoth 2017
Guessing H
H takes on the values 0 and 1 according to the prior

π0 = Pr[H = 0], π1 = Pr[H = 1].

We wish to guess H based on the observation Y


T
Y = Y (1) , . . . , Y (d) .

Given the prior (π0 , π1 ) and the conditional densities

fY|H=0 (·), fY|H=1 (·),

we wish to design a guessing rule

φGuess : Rd → {0, 1}

that maps the observed value yobs of Y to our guess of H.


c
Lecture 10, Amos Lapidoth 2017
The Probability of Error

• The probability of error associated with φGuess : Rd → {0, 1} is

Pr(error) , Pr[φGuess (Y) 6= H].

• A guessing rule is optimal if no other guessing rule attains a


smaller probability of error.
• The probability of error associated with optimal guessing rules
is the optimal probability of error

p∗ (error).

1. How to find an optimal decision rule?


2. What is its performance?

c
Lecture 10, Amos Lapidoth 2017
Guessing in the Absence of Observables

• There are only two guessing rules: φ0 , which guesses


“H = 0,” and φ1 , which guesses “H = 1.”
• The probability of error associated with φ0 is π1 .
• The probability of error associated with φ1 is π0 .
• φ0 is optimal if π0 ≥ π1 .
• φ1 is optimal if π0 ≤ π1 .

Guess the value of H that has the highest a priori probability.



p∗ (error) = min Pr[H = 0], Pr[H = 1] .

Check by case!

c
Lecture 10, Amos Lapidoth 2017
The Joint Law of H and Y
We are typically given the prior (π0 , π1 ) and the conditionals

fY|H=0 (·), fY|H=1 (·).

The (unconditional) density of Y is

fY (y) = π0 fY|H=0 (y) + π1 fY|H=1 (y), y ∈ Rd .

The a posteriori probabilities are


(π f
0 Y|H=0 (yobs )
  if fY (yobs ) > 0,

Pr H = 0 Y = yobs , fY (yobs )
1
2 otherwise,
(π f
1 Y|H=1 (yobs )
  if fY (yobs ) > 0,
Pr H = 1 Y = yobs , fY (yobs )
1
2 otherwise.

c
Lecture 10, Amos Lapidoth 2017
Intuition
Suppose the observation is a scalar Y .
 
  Pr H = 0, Y ∈ y obs − δ, yobs + δ
Pr H = 0 Y = yobs ≈ lim   .
δ↓0 Pr Y ∈ yobs − δ, yobs + δ

Now approximate
Z
  yobs +δ
Pr H = 0, Y ∈ (yobs − δ, yobs + δ) = π0 fY |H=0 (y) dy
yobs −δ
≈ π0 2δfY |H=0 (yobs ), δ  1,
Z yobs +δ
 
Pr Y ∈ (yobs − δ, yobs + δ) = fY (y) dy
yobs −δ
≈ 2δfY (yobs ), δ  1,

and the ratio is approximately π0 fY |H=0 (yobs )/fY (yobs ).

c
Lecture 10, Amos Lapidoth 2017
Advice

In a first reading of this chapter,


assume fY|H=0 (·) and fY|H=1 (·) are positive.
And assume π0 , π1 > 0.

c
Lecture 10, Amos Lapidoth 2017
Guessing after Observing Y—Heuristics
Having observed that Y = yobs , we associate with H the a
posteriori probabilities Pr[H = 0|Y = yobs ], Pr[H = 1|Y = yobs ].
(
0 if Pr[H = 0| Y = yobs ] ≥ Pr[H = 1|Y = yobs ],
φ∗Guess (yobs ) =
1 otherwise,
i.e.,
(
0 if π0 fY|H=0 (yobs ) ≥ π1 fY|H=1 (yobs ),
φ∗Guess (yobs ) =
1 otherwise.
The conditional error probability is

p∗ (error|Y = yobs ) = min Pr[H = 0|Y = yobs ], Pr[H = 1|Y = yobs ] ,
so Z


p (error) = min Pr[H = 0| Y = y], Pr[H = 1| Y = y] fY (y) dy
ZRd

= min π0 fY|H=0 (y), π1 fY|H=1 (y) dy.
Rd
c
Lecture 10, Amos Lapidoth 2017
The Error
Let φGuess : Rd → {0, 1} be any guessing rule, and let

D = {y ∈ Rd : φGuess (y) = 0}.

Then
Z
p(error|H = 0) = fY|H=0 (y) dy,
Zy∈D
/

p(error|H = 1) = fY|H=1 (y) dy,


y∈D

and
Z Z
p(error) = π0 fY|H=0 (y) dy + π1 fY|H=1 (y) dy
y∈D
/ y∈D
Z  
= π0 fY|H=0 (y) I{y ∈
/ D} + π1 fY|H=1 (y) I{y ∈ D} dy.
Rd

c
Lecture 10, Amos Lapidoth 2017
The Main Result
If φ∗Guess guesses “H = 0” only when
π0 fY|H=0 (yobs ) ≥ π1 fY|H=1 (yobs )
   
φ∗Guess (yobs ) = 0 =⇒ π0 fY|H=0 (yobs ) ≥ π1 fY|H=1 (yobs ) ,

and guesses “H = 1” only when


π1 fY|H=1 (yobs ) ≥ π0 fY|H=0 (yobs ),
   
φ∗Guess (yobs ) = 1 =⇒ π1 fY|H=1 (yobs ) ≥ π0 fY|H=0 (yobs ) ,

then φ∗Guess is optimal and


Z
  
Pr φ∗Guess (Y) 6= H = min π0 fY|H=0 (y), π1 fY|H=1 (y) dy.
Rd

c
Lecture 10, Amos Lapidoth 2017
Proof
Let φGuess : Rd → {0, 1} be any guessing rule, and
D = {y ∈ Rd : φGuess (y) = 0}.
Then
 
Pr φGuess (Y) 6= H
Z  
= π0 fY|H=0 (y) I{y ∈
/ D} + π1 fY|H=1 (y) I{y ∈ D} dy
ZRd

≥ min π0 fY|H=0 (y), π1 fY|H=1 (y) dy.
Rd
But ∗
φGuess (·) achieves this lower bound! Indeed, with
D∗ = {y ∈ Rd : φ∗Guess (y) = 0}
we have
/ D∗ } + π1 fY|H=1 (y) I{y ∈ D∗ }
π0 fY|H=0 (y) I{y ∈

= min π0 fY|H=0 (y), π1 fY|H=1 (y) , y ∈ Rd .
c
Lecture 10, Amos Lapidoth 2017
Randomized Guessing Rules

yobs b(yobs ) Θ < b(yobs ) ⇒ “H = 0” Guess


Bias
Calculator Θ ≥ b(yobs ) ⇒ “H = 1”

Θ ∼ U ([0, 1])

Random Number
Generator

c
Lecture 10, Amos Lapidoth 2017
Randomized Guessing Rules Are not Better
Deterministic rules are randomized rules where b(yobs ) ∈ {0, 1}.

For a randomized rule



Pr error Y = yobs
    
= b(yobs ) Pr H = 1 Y = yobs + 1 − b(yobs ) Pr H = 0 Y = yobs
| {z } | {z }
Pr[“H = 0” | Y=yobs ] Pr[“H = 1” | Y=y
obs ]
    

≥ min Pr H = 1 Y = yobs , Pr H = 0 Y = yobs ,

because the weighted average of Pr[H = 0|Y = yobs ] and


Pr[H = 1|Y = yobs ] cannot be smaller than the minimum.

The minimum is achieved by a deterministic rule that guesses


“H = 0” iff π0 fY|H=0 (y) ≥ π1 fY|H=1 (y).

c
Lecture 10, Amos Lapidoth 2017
Alternative Proof
The randomized rule is a deterministic rule based on (Y, Θ)!

fY,Θ|H=0 (y, θ) = fY|H=0 (y) fΘ|Y=y,H=0 (θ)


= fY|H=0 (y) fΘ (θ)
= fY|H=0 (y) I{0 ≤ θ ≤ 1}.

Similarly,

fY,Θ|H=1 (y, θ) = fY|H=1 (y) I{0 ≤ θ ≤ 1}.

The rule ‘guess “H = 0” iff π0 fY|H=0 (y) ≥ π1 fY|H=1 (y)’ is


optimal for this setting too because it guesses “H = 0” only when
fY,Θ|H=0 (y, θ) ≥ fY,Θ|H=1 (y, θ) and it guesses “H = 1” only
when fY,Θ|H=1 (y, θ) ≥ fY,Θ|H=0 (y, θ).

c
Lecture 10, Amos Lapidoth 2017
The Maximum A Posteriori Rule

The MAP rule resolves ties at random:

φMAP (yobs )


0 if Pr[H = 0|Y = yobs ] > Pr[H = 1| Y = yobs ],
, 1 if Pr[H = 0|Y = yobs ] < Pr[H = 1| Y = yobs ],

 
U {0, 1} if Pr[H = 0|Y = yobs ] = Pr[H = 1| Y = yobs ],


0 if π0 fY|H=0 (yobs ) > π1 fY|H=1 (yobs ),
= 1 if π0 fY|H=0 (yobs ) < π1 fY|H=1 (yobs ),

 
U {0, 1} if π0 fY|H=0 (yobs ) = π1 fY|H=1 (yobs ).

c
Lecture 10, Amos Lapidoth 2017
The Likelihood-Ratio Function

LR : Rd → [0, ∞]
fY|H=0 (y)
LR(y) , , y ∈ Rd
fY|H=1 (y)
using the convention
α  0
= ∞, α>0 and = 1.
0 0
Using this function,
 π1

0 if LR(yobs ) > π0 ,
π1

φMAP (yobs ) = 1 if LR(yobs ) < π0 ,
π0 , π1 , fY (yobs ) > 0 .

  π1
U {0, 1} if LR(yobs ) = π0 ,

c
Lecture 10, Amos Lapidoth 2017
The Maximum-Likelihood Rule

• The ML rule ignores the prior.


• It is the MAP corresponding to a uniform prior.
• In general, it is suboptimal.


0 if fY|H=0 (yobs ) > fY|H=1 (yobs ),
φML (yobs ) , 1 if fY|H=0 (yobs ) < fY|H=1 (yobs ),

 
U {0, 1} if fY|H=0 (yobs ) = fY|H=1 (yobs ),


0 if LR(yobs ) > 1,
= 1 if LR(yobs ) < 1,

 
U {0, 1} if LR(yobs ) = 1.

c
Lecture 10, Amos Lapidoth 2017
The Bhattacharyya Bound
Z


p (error) = min π0 fY|H=0 (y), π1 fY|H=1 (y) dy
ZRd q
≤ π0 fY|H=0 (y)π1 fY|H=1 (y) dy
Rd Z q

= π0 π1 fY|H=0 (y)fY|H=1 (y) dy
Z qRd
1
≤ fY|H=0 (y)fY|H=1 (y) dy,
2 Rd
where we have used
√ 1
min{a, b} ≤ ab ≤ (a + b), a, b ≥ 0.
2
Thus,
Z q
1
p∗ (error) ≤ fY|H=0 (y)fY|H=1 (y) dy.
2 y∈Rd
c
Lecture 10, Amos Lapidoth 2017
Testing the Mean of a Univariate Gaussian (1)

H is uniform and
1 2 /(2σ 2 )
fY |H=0 (y) = √ e−(y−A) , y ∈ R,
2πσ 2
1 2 /(2σ 2 )
fY |H=1 (y) = √ e−(y+A) , y∈R
2πσ 2
for some deterministic A, σ > 0.

Since the prior is uniform, the MAP and the ML rules both guess
“H = 0” or “H = 1” depending on whether LR(yobs ) is greater or
smaller than one.

c
Lecture 10, Amos Lapidoth 2017
Testing the Mean of a Univariate Gaussian (2)

fY |H=0 (y)
LR(y) =
fY |H=1 (y)
2 2
√ 1 e−(y−A) /(2σ )
2πσ 2
=
√ 1 e−(y+A)2 /(2σ2 )
2πσ 2
4yA/(2σ 2 )
=e , y ∈ R.

2
LR(yobs ) > 1 ⇐⇒ e4yobs A/(2σ ) > 1
 2

⇐⇒ ln e4yobs A/(2σ ) > ln 1
⇐⇒ 4yobs A/(2σ 2 ) > 0
⇐⇒ yobs > 0.

c
Lecture 10, Amos Lapidoth 2017
Testing the Mean of a Univariate Gaussian (3)

Likewise
2
LR(yobs ) < 1 ⇐⇒ e4yobs A/(2σ ) < 1
 2

⇐⇒ ln e4yobs A/(2σ ) < ln 1
⇐⇒ 4yobs A/(2σ 2 ) < 0
⇐⇒ yobs < 0.

The MAP and ML rules thus guess “H = 0,” if yobs > 0; they
guess “H = 1,” if yobs < 0; and they guess “H = 0” or “H = 1”
equiprobably, if yobs = 0 (i.e., in the case of a tie).

c
Lecture 10, Amos Lapidoth 2017
Testing the Mean of a Univariate Gaussian (4)

The probability of a tie is zero. Indeed, under both hypotheses, the


probability that the observed variable Y is exactly equal to zero is
zero:
     
Pr Y = 0 H = 0 = Pr Y = 0 H = 1 = Pr Y = 0 = 0.

Consequently, the way ties are resolved is immaterial.

c
Lecture 10, Amos Lapidoth 2017
Testing the Mean of a Univariate Gaussian (4)
pMAP (error|H = 1) = Pr[Y > 0 |H = 1]
A
=Q ,
σ

because, conditional on H = 1, the RV Y is N −A, σ 2 so the
origin is A/σ standard deviations away (to the right).

pMAP (error|H = 0) = Pr[Y < 0 |H = 0]


A
=Q ,
σ

because, conditional on H = 0, the RV Y is N A, σ 2 , and the
origin is again A/σ standard deviations away (to the left).
It is a coincidence that the two types of error are of equal
probability.
c
Lecture 10, Amos Lapidoth 2017
Testing the Mean of a Univariate Gaussian (5)

Since

p∗ (error) = π0 pMAP (error|H = 0) + π1 pMAP (error|H = 1),

we conclude that A



p (error) = Q .
σ

c
Lecture 10, Amos Lapidoth 2017
Guess “H = 1” Guess “H = 0”

fY |H=1 (y) fY |H=0 (y)

fY (y)
pMAP (error|H = 0)

y
−A A
c
Lecture 10, Amos Lapidoth 2017
Testing the Mean of a Univariate Gaussian (6)
The Bhattacharyya Bound:
Z
1 ∞q
p∗ (error) ≤ fY |H=0 (y)fY |H=1 (y) dy
2 −∞
Z s
1 ∞ 1 1
= √ e−(y−A)2 /(2σ2 ) √ e−(y+A)2 /(2σ2 ) dy
2 −∞ 2πσ 2 2πσ 2
Z ∞
1 2 2 1 2 2
= e−A /2σ √ e−y /2σ dy
2 −∞ 2πσ 2
1 −A2 /2σ2
= e .
2

As an aside, we obtained
1 −α2 /2
Q(α) ≤ e , α ≥ 0.
2

c
Lecture 10, Amos Lapidoth 2017
Deterministic Processing is Futile

yobs g(yobs ) Guess H based Guess


g(·)
on g(yobs )

No rule based on g(yobs ) can outperform an optimal rule based


on yobs .

Computing g(yobs ) and then deciding based on the answer is a


special case of guessing based on yobs .

c
Lecture 10, Amos Lapidoth 2017
More General Processing
The processor generates Θ independently of (H, Y) and forms
g(Y, Θ).
This too is futile!
Cannot outperform an optimal rule based on (yobs , θobs ), where
fY,Θ|H=0 (yobs , θobs ) = fY|H=0 (yobs ) fΘ (θobs ),
fY,Θ|H=1 (yobs , θobs ) = fY|H=1 (yobs ) fΘ (θobs ).
But,
fY,Θ|H=0 (yobs , θobs )
LR(yobs , θobs ) =
fY,Θ|H=1 (yobs , θobs )
fY|H=0 (yobs ) fΘ (θobs )
=
fY|H=1 (yobs ) fΘ (θobs )
fY|H=0 (yobs )
= , fΘ (θobs ) 6= 0
fY|H=1 (yobs )
c
Lecture 10, Amos Lapidoth 2017
= LR(yobs ), fΘ (θobs ) 6= 0.
Recall that X and Y are conditionally independent given Z

−Z(−
X(− −Y

if
PX,Y |Z (x, y|z) = PX|Z (x|z)PY |Z (y|z), PZ (z) > 0.

We say that Z is the result of processing Y with respect to H if H


and Z are conditionally independent given Y.

Processing the observables does not decrease the optimal


probability of error.

c
Lecture 10, Amos Lapidoth 2017
yobs yobs + W MAPfor testing  Guess
+ N α0 , σ 2 + δ 2 vs. N α1 , σ 2 + δ 2
with prior (π0 , π1 )

W ∼ N 0, δ 2

Gaussian RV
Generator

Local
Randomness
W independent of (Y, H).

 
A randomized rule for N α0 , σ 2 vs. N α1 , σ2 that attains the 
optimal probability of error for N α0 , σ 2 + δ 2 vs. N α1 , σ 2 + δ 2 .
c
Lecture 10, Amos Lapidoth 2017
Sufficient Statistics—an Example (1)
Let H have a uniform prior. We observe (Y1 , Y2 ). Conditional on
H = 0, they areIID N 0, σ02 , whereas conditional on H = 1 they
are IID N 0, σ12 , where σ0 > σ1 > 0. Thus,
1  1 
2 2
fY1 ,Y2 |H=0 (y1 , y2 ) = exp − (y 1 + y2 ) , y1 , y2 ∈ R,
2πσ02 2σ02
1  1 
2 2
fY1 ,Y2 |H=1 (y1 , y2 ) = exp − (y 1 + y2 ) , y1 , y2 ∈ R.
2πσ12 2σ12

fY1 ,Y2 |H=0 (y1 , y2 )


LR(y1 , y2 ) =
fY1 ,Y2 |H=1 (y1 , y2 )
 
1 1 2 + y2)
2πσ0 2 exp − 2σ0 2 (y1 2
=  
1 1 2 + y2)
2πσ12
exp − 2σ 2 1
(y 2
2
  1  
σ1 1 1 1 2 2
= 2 exp − (y1 + y2 ) , y1 , y2 ∈ R.
σ0 2 σ12 σ02
c
Lecture 10, Amos Lapidoth 2017
Sufficient Statistics—an Example (2)

  
1 1 1 2 2 σ02
LR(y1 , y2 ) > 1 ⇐⇒ exp − (y 1 + y2 ) >
2 σ12 σ02 σ12

1 1 1  σ 2
⇐⇒ 2 − 2 (y12 + y22 ) > ln 02
2 σ1 σ0 σ1
2
σ0 − σ1 2 2 σ 2
⇐⇒ 2 2 (y1 + y22 ) > ln 02
2σ0 σ1 σ1
2
2σ σ 2 σ 2
⇐⇒ y12 + y22 > 2 0 12 ln 02 .
σ0 − σ1 σ1

The ML/MAP compares Y12 + Y22 to a threshold.


To implement it, one need not observe Y1 and Y2 directly; it
suffices to observe
T , Y12 + Y22 .

c
Lecture 10, Amos Lapidoth 2017
Sufficient Statistics—an Example (3)

• Being the result of processing (Y1 , Y2 ) with respect to H, no


guess based on T can outperform an optimal guess based on
(Y1 , Y2 ).
• In this example, even though pre-processing the observations
to produce T = Y12 + Y22 is not reversible, basing one’s
decision on T incurs no loss in optimality.
This is all because LR(y1 , y2 ) is computable from y12 + y22 . In this
sense T = Y12 + Y22 forms a sufficient statistic for guessing H from
(Y1 , Y2 ).

c
Lecture 10, Amos Lapidoth 2017
Sufficient Statistics—Informal Definition
0
A mapping T : Rd → Rd forms a sufficient statistic for the
densities fY|H=0 (·) and fY|H=1 (·) if the likelihood-ratio LR(yobs )
can be computed from T (yobs ) for every yobs in Rd .

Has nothing to do with the prior!

For technical reasons


• we require that LR(yobs ) be computable from T (yobs ) only
when fY|H=0 (yobs ) and fY|H=1 (yobs ) are not both zero;
• and we allow some Y0 ⊂ Rd of Lebesgue measure zero
containing observations where the computation of LR(yobs )
from T (yobs ) may fail.

c
Lecture 10, Amos Lapidoth 2017
Sufficient Statistics—Formal Definition
0
A mapping T : Rd → Rd forms a sufficient statistic for the
densities fY|H=0 (·) and fY|H=1 (·) on Rd if it is Borel measurable
and if there exists a set Y0 ⊂ Rd of Lebesgue measure zero and a
0
Borel measurable function ζ : Rd → [0, ∞] such that for all
yobs ∈ Rd satisfying

yobs ∈
/ Y0 and fY|H=0 (yobs ) + fY|H=1 (yobs ) > 0.

we have
fY|H=0 (yobs ) 
= ζ T (yobs ) ,
fY|H=1 (yobs )
where on the LHS of the above we define a/0 to be +∞ whenever
a > 0.

c
Lecture 10, Amos Lapidoth 2017
Basing the Decision on a Sufficient Statistic Is Optimal

0
If T : Rd → Rd is a sufficient statistic for the densities fY|H=0 (·)
and fY|H=1 (·), then, for every prior of H, there exists a decision
rule that guesses H based on T (Y) and which is as good as any
optimal guessing rule based on Y.
Indeed, the rule
 

 0 if ζ T (yobs ) > ππ10 ,
 
φT T (yobs ) = 1 if ζ T (yobs ) < ππ10 ,

  
U {0, 1} if ζ T (yobs ) = ππ10

has the same performance as the MAP rule based on Y.

c
Lecture 10, Amos Lapidoth 2017
Computability of the a Posteriori Distribution—Informal
0
T : Rd → Rd is a sufficient statistic for fY|H=0 (·) and fY|H=1 (·)
iff
for every prior (π0 , π1 ) there exist functions

t 7→ ψm π0 , π1 , t , m = 0, 1,

such that the vector


   T
ψ0 π0 , π1 , T (yobs ) , ψ1 π0 , π1 , T (yobs )

equals
 T
Pr[H = 0|Y = yobs ], Pr[H = 1| Y = yobs ] ,

where the above a posteriori distributions are computed for H of


prior (π0 , π1 ) and for the conditional densities fY|H=0 (·) and
fY|H=1 (·).
c
Lecture 10, Amos Lapidoth 2017
Informal Proof (1)
Suppose LR(yobs ) = ζ T (yobs ) for every yobs , and let (π0 , π1 ) be
any (nondegenerate) prior. Then,

Pr[H = 0|Y = yobs ]


Pr[H = 0] fY|H=0 (yobs )
=
Pr[H = 0] fY|H=0 (yobs ) + Pr[H = 1] fY|H=1 (yobs )
π0 LR(yobs )
=
π0 LR(yobs ) + π1

π0 ζ T (yobs )
=  .
π0 ζ T (yobs ) + π1
And

Pr[H = 1|Y = yobs ] = 1 − Pr[H = 0|Y = yobs ]



π0 ζ T (yobs )
=1−  .
π0 ζ T (yobs ) + π1
c
Lecture 10, Amos Lapidoth 2017
Informal Proof (2)
Suppose the a posteriori distribution is computable from
π0 , π1 , T (yobs ) for every prior and, a fortiori, the uniform prior.

Pr[H = 0|Y = yobs ]


Pr[H = 0] fY|H=0 (yobs )
=
Pr[H = 0] fY|H=0 (yobs ) + Pr[H = 1] fY|H=1 (yobs )
Pr[H = 1|Y = yobs ]
Pr[H = 1] fY|H=1 (yobs )
=
Pr[H = 0] fY|H=0 (yobs ) + Pr[H = 1] fY|H=1 (yobs )

Substituting the uniform prior and dividing the equations,

Pr[H = 0|Y = yobs ]


= LR(yobs ) (uniform prior).
Pr[H = 1|Y = yobs ]

So if the LHS is computable from T (yobs ) then so is the RHS.


c
Lecture 10, Amos Lapidoth 2017
After Identifying a Sufficient Statistic (1)

Method 1: Ignore this fact and use the MAP rule


 π1

0 if LR(yobs ) > π0 ,
φMAP (yobs ) = 1 π1
if LR(yobs ) < π0 ,

  π1
U {0, 1} if LR(yobs ) = π0 .

(Because T (Y) is a sufficient, LR(yobs ) will be computable from


T (yobs ), but who cares?)

c
Lecture 10, Amos Lapidoth 2017
After Identifying a Sufficient Statistic (2)

Method 2: Use the MAP rule for guessing H based on the new
d0 -dimensional observations tobs = T (yobs ). You’ll need the
conditional densities of T = T (Y) given H.
(  
0 if π0 fT|H=0 T (yobs ) > π1 fT|H=1 T (yobs ) ,
φGuess (T (yobs )) =  
1 if π0 fT|H=0 T (yobs ) < π1 fT|H=1 T (yobs ) ,

with ties being resolved at random.

c
Lecture 10, Amos Lapidoth 2017
Applying Method 2 in the Example
The squares of two IID centered Gaussians sum to an exponential
1  t 
fT |H=0 (t) = 2 exp − 2 , t ≥ 0,
2σ0 2σ
1  t0 
fT |H=1 (t) = 2 exp − 2 , t ≥ 0.
2σ1 2σ1
So,
 
fT |H=0 (t) σ12 1 1 
= 2 exp t − , t ≥ 0,
fT |H=1 (t) σ0 2σ12 2σ02
fT |H=0 (t) σ2  1 1 
ln = ln 12 + t − , t ≥ 0.
fT |H=1 (t) σ0 2σ12 2σ02
We thus guess “H = 0” if the log likelihood-ratio is positive,
2σ 2 σ 2 σ2
t ≥ 2 0 12 ln 02
σ0 − σ1 σ1
2σ 2 σ 2 σ2
⇐⇒ y12 + y22 ≥ 2 0 12 ln 02 .
σ0 − σ1 σ1
c
Lecture 10, Amos Lapidoth 2017
Multi-Dimensional Binary Gaussian Hypothesis Testing
H is of nondegenerate prior (π0 , π1 ). The observable is
T
Y = Y (1) , . . . , Y (J)
(j)
H = 0 : Y (j) = s0 + Z (j) , j = 1, 2, . . . , J,
(j) (j) (j)
H=1:Y = s1 +Z ,
j = 1, 2, . . . , J,

where Z (1) , Z (2) , . . . , Z (J)
are IID N and 0, σ 2
(1) 
(J) T (1) (J) T
s0 = s0 , . . . , s0 , s1 = s1 , . . . , s1
are deterministic. The Euclidean inner product and norm in RJ are
J
X
hu, viE , u(j) v (j) ,
j=1
v
q u J
uX 2
kuk , hu, uiE = t u(j) .
j=1

c
Lecture 10, Amos Lapidoth 2017
The Likelihood Function

fY|H=0 (y)
LR(y) =
fY|H=1 (y)
 (j) 2
 !
QJ y (j) −s0
√ 1 exp −
j=1 2πσ 2 2σ 2
=  (j) 2
 !
QJ y (j) −s1
√ 1 exp −
j=1 2πσ 2 2σ 2

J  (j) 2 (j) 2 
!
Y y (j) − s0 y (j) − s1
= exp − + , y ∈ RJ .
2σ 2 2σ 2
j=1

c
Lecture 10, Amos Lapidoth 2017
The Log-Likelihood Function
1 X (j) 
J
(j) 2 (j) (j) 2
LLR(y) = y − s1 − y − s0
2σ 2
j=1
 
1 ks1 k2 − ks0 k2
= hy, s0 − s1 iE +
σ2 2
 
1 hs0 , s0 − s1 iE + hs1 , s0 − s1 iE
= hy, s0 − s1 iE −
σ2 2
 D E D E 
  s , s0 −s1
+ s , s0 −s1
ks0 − s1 k  s0 − s1 0 ks0 −s1 k E 1 ks0 −s1 k E

= 2
y, −
σ ks0 − s1 k E 2
ks0 − s1 k  1 
= hy, φi E − hs 0 , φi E + hs1 , φi E , y ∈ RJ ,
σ2 2
where
s0 − s1
φ=
ks0 − s1 k
is a unit-norm vector pointing from s1 to s0 .
c
Lecture 10, Amos Lapidoth 2017
Decision Rule
An optimal rule is to guess “H = 0” when LLR(y) ≥ ln ππ10 :

hs0 , φiE + hs1 , φiE σ2 π1


Guess “H = 0” if hy, φiE ≥ + ln .
2 ks0 − s1 k π0

guess 0 guess 0 guess 0


s0 guess 1 s0 guess 1 s0 guess 1
φ φ φ
s1 s1 s1

π0 < π1 π0 = π1 π0 > π1

c
Lecture 10, Amos Lapidoth 2017
y

φ
s0

s1
φ

The projection of Y onto φ = (s0 − s1 )/ks0 − s1 k forms a


sufficient statistic for guessing H based on Y.
c
Lecture 10, Amos Lapidoth 2017
Error Probability Lemma

Suppose s0 , s1 ∈ RJ are deterministic and different. Let



Y = s0 + Z, Z1 , . . . , Zd ∼ IID N 0, σ 2 .

Then,
h i  
ks0 − s1 k
Pr kY − s1 k ≤ kY − s0 k = Q .

• (s0 − s1 )/2 is half the Euclidean distance.


• (s0 − s1 )/(2σ) is half the distance measured in standard
deviations of the noise.
• For a more general result see Lemma 20.14.1.

c
Lecture 10, Amos Lapidoth 2017
Error Probability Lemma

h i
Pr kY − s1 k ≤ kY − s0 k
h i
= Pr kZ + s0 − s1 k ≤ kZk
h i
= Pr kZ + s0 − s1 k2 ≤ kZk2
h i
2
kZk
= Pr   + ks0 − s1 k2 + 2 hZ, s0 − s1 iE ≤ 
kZk
 2
h i
= Pr −2 hZ, s0 − s1 iE ≥ ks0 − s1 k2
h i
= Pr 2 hZ, s0 − s1 iE ≥ ks0 − s1 k2

and the result follows because


 
hZ, s0 − s1 iE ∼ N 0, ks0 − s1 k2 σ 2 .

c
Lecture 10, Amos Lapidoth 2017
Linear Combinations of Independent Gaussians

Suppose Z1 , . . . , ZJ are independent centered Gaussians



Zj ∼ N 0, σj2 , j = 1, . . . , J.

Let
α1 , . . . , αJ ∈ R
be deterministic constants. Then
J
X J
X

αj Zj ∼ N 0, σ 2 , σ2 = αj2 σj2 .
j=1 j=1

(Choose αj as the j-th component of s0 − s1 , and σj2 as σ 2 .)

c
Lecture 10, Amos Lapidoth 2017
For Our Problem
For a uniform prior,
 
ks0 − s1 k
Pr[error |H = 0] = Pr[error |H = 1] = Pr[error] = Q .

More generally,
 
ks0 − s1 k σ π0
pMAP (error|H = 0) = Q + ln .
2σ ks0 − s1 k π1
 
ks0 − s1 k σ π1
pMAP (error|H = 1) = Q + ln .
2σ ks0 − s1 k π0
 
∗ ks0 − s1 k σ π0
p (error) = π0 Q + ln
2σ ks0 − s1 k π1
 
ks0 − s1 k σ π1
+ π1 Q + ln .
2σ ks0 − s1 k π0
c
Lecture 10, Amos Lapidoth 2017
Random Parameter Not Observed—Nuisance Parameter
Instead of fY|H=0 (yobs ) and fY|H=1 (yobs ), we are given
fΘ (·), fY|Θ=θ,H=0 (·), fY|Θ=θ,H=1 (·), Θ ind. of H.
Z
fY|H=0 (yobs ) = fY,Θ|H=0 (yobs , θ) dθ

= fY|Θ=θ,H=0 (yobs ) fΘ|H=0 (θ) dθ

= fY|Θ=θ,H=0 (yobs ) fΘ (θ) dθ.
θ
(Think about conditioning on H = 0 as specifying the law.)
Z
fY|H=1 (yobs ) = fY|Θ=θ,H=1 (yobs ) fΘ (θ) dθ.
θ
R
fY|Θ=θ,H=0 (yobs ) fΘ (θ) dθ
LR(yobs ) = Rθ .
θ fY|Θ=θ,H=1 (yobs ) fΘ (θ) dθ

c
Lecture 10, Amos Lapidoth 2017
Random Parameter Observed
If Θ is observed, we merely view the observable as (Y, Θ).

fY,Θ|H=0 (yobs , θobs )


LR(yobs , θobs ) = .
fY,Θ|H=1 (yobs , θobs )
The twist is that, because Θ is independent of H,

fY,Θ|H=0 (yobs , θobs ) = fΘ|H=0 (θobs )fY|Θ=θobs ,H=0 (yobs )


= fΘ (θobs )fY|Θ=θobs ,H=0 (yobs ).

Likewise,

fY,Θ|H=1 (yobs , θobs ) = fΘ (θobs )fY|Θ=θobs ,H=1 (yobs ),

so
fY|H=0,Θ=θobs (yobs )
LR(yobs , θobs ) = .
fY|H=1,Θ=θobs (yobs )

c
Lecture 10, Amos Lapidoth 2017
Next Week

Multi-Hypothesis Testing (Chapter 21 & 22).

Thank you!

c
Lecture 10, Amos Lapidoth 2017
Communication and Detection Theory:
Lecture 11

Amos Lapidoth
ETH Zurich

May 9, 2017

Multi-Hypothesis Testing

c
Lecture 11, Amos Lapidoth 2017
Today

• A bit more on binary hypothesis testing.


• Multi-hypothesis testing.

c
Lecture 11, Amos Lapidoth 2017
Multiple Hypotheses
M takes value in the set M = {1, . . . , M}, where M ≥ 2
according to the prior

πm = Pr[M = m], m ∈ M.

The prior is nondegenerate if

πm > 0, m ∈ M.

The observation Y is a d-dimensional random vector. Conditional


on M = m, its density is

fY|M =m (·), m ∈ M.

A guessing rule is a mapping

φGuess : Rd → M.

After observing that Y = yobs we guess that M is φGuess (yobs ).


c
Lecture 11, Amos Lapidoth 2017
Performance

The error probability associated with φGuess (·) is


 
Pr φGuess (Y) 6= M .

A rule is optimal if no rule achieves a lower probability of error.


The optimal error probability

p∗ (error)

is the probability of error associated with an optimal decision rule.

c
Lecture 11, Amos Lapidoth 2017
Guessing in the Absence of Observables
• Only M deterministic decision rules: φ1 , . . . , φM , where

φm guesses “M = m”.

• The probability of success of φm is πm .


• The guessing rule “guess m̃” is optimal iff

πm̃ = max
0
πm0 .
m ∈M

• For an optimal guessing rule the probability of success is



p∗ (correct) = max
0
πm 0 ,
m ∈M

and the optimal error probability is thus



p∗ (error) = 1 − max0
π m0 .
m ∈M

c
Lecture 11, Amos Lapidoth 2017
The Joint Law of M and Y

In terms of the prior and the conditional densities


X
fY (y) = πm fY|M =m (y), y ∈ Rd .
m∈M

And, as in the binary case,


(π fY|M =m (yobs )
m
if fY (yobs ) > 0,
Pr[M = m|Y = yobs ] , fY (yobs )
1
M otherwise.

c
Lecture 11, Amos Lapidoth 2017
Guessing in the Presence of Observables
• After observing that Y = yobs , we associate with each
m ∈ M the a posteriori probability Pr[M = m|Y = yobs ].
• We pick the message of highest a posteriori probability.
• A tie occurs when more than one outcome attains the highest
a posteriori probability. Any one of the maximum-achieving
messages will do.
• We thus guess “m̃,” only if
 0

Pr[M = m̃| Y = yobs ] = max 0
Pr[M = m |Y = y obs ] .
m ∈M
• For this rule

p∗ (correct|Y = yobs ) = max Pr[M = m 0
|Y = y obs ] ,
m0 ∈M

p∗ (error|Y = yobs ) = 1 − max Pr[M = m 0
| Y = y obs ] ,
m0 ∈M
Z 
 
p∗ (error) = 1 − max
0
Pr[M = m0
| Y = y] fY (y) dy.
Rd m ∈M
c
Lecture 11, Amos Lapidoth 2017
The Main Result

Consider the set of messages of maximal a posteriori probability

M̃(yobs )
n o
, m̃ ∈ M : Pr[M = m̃ |Y = yobs ] = max Pr[M = m 0
| Y = y obs ]
m0 ∈M
n o
= m̃ ∈ M : πm̃ fY|M =m̃ (yobs ) = max
0
π 0 f
m Y|M =m 0 (y obs ) .
m ∈M

Any guessing rule φ∗Guess : Rd → M that satisfies

φ∗Guess (yobs ) ∈ M̃(yobs ), yobs ∈ Rd

is optimal.

c
Lecture 11, Amos Lapidoth 2017
Proof
Given any φGuess (·), define the disjoint sets

Dm = yobs ∈ Rd : φGuess (yobs ) = m , m ∈ M.

X Z
Pr(correct) = πm fY|M =m (y) dy
m∈M Dm
X Z
= πm fY|M =m (y) I{y ∈ Dm } dy
m∈M Rd
Z  X 
= πm fY|M =m (y) I{y ∈ Dm } dy
Rd m∈M
Z n o
≤ max πm fY|M =m (y) dy.
Rd m∈M
Equality is attained if
   
y ∈ Dm̃ =⇒ πm̃ fY|M =m̃ (y) = max
0
π 0 f
m Y|M =m 0 (y) .
m ∈M
c
Lecture 11, Amos Lapidoth 2017
Randomized Rules, the MAP, and the ML Rules

• Randomization does not help.


• The Maximum A Posteriori rule picks uniformly at random an
element of M̃(yobs ). It is optimal.
• The Maximum-Likelihood rule ignores the prior. It picks
uniformly at random an element of
n o
m̃ ∈ M : fY|M =m̃ (yobs ) = max0
fY|M =m 0 (y obs ) .
m ∈M

It is optimal when the prior is uniform.

c
Lecture 11, Amos Lapidoth 2017
Processing

Z is the result of processing Y with respect to M if

M (−
−Y(−
−Z.

If Z is the result of processing Y with respect to M , then no


decision rule based on Z can outperform an optimal decision rule
based on Y.

c
Lecture 11, Amos Lapidoth 2017
Multi-Hypothesis Testing for 2D Signals

• M is uniform over M = {1, . . . , M}.


• Y is two-dimensional of components Y (1) and Y (2) .
• Conditional on M = m, the random variables Y (1) and Y (2)

are independent with Y (1) ∼ N am , σ 2 and

Y (2) ∼ N bm , σ 2 . Here σ 2 > 0.


fY (1) ,Y (2) |M =m y (1) , y (2)
 
1 (y (1) − am )2 + (y (2) − bm )2
= exp − .
2πσ 2 2σ 2

c
Lecture 11, Amos Lapidoth 2017
8PSK
8PSK corresponds to M = 8 and
 2πm   2πm 
am = A cos , bm = A sin , m = 1, . . . , 8.
8 8

(a2 , b2 )
(a3 , b3 ) (a1 , b1 )

(a4 , b4 ) A (a8 , b8 )

(a5 , b5 ) (a7 , b7 )
(a6 , b6 )

c
Lecture 11, Amos Lapidoth 2017
The “Nearest-Neighbor” Decoding Rule
Since M is uniform, the MAP picks an element of

argmax fY (1) ,Y (2) |M =m0 (y (1) , y (2) )
m0 ∈M ( 2 2)
1 −
(y(1) −am0 ) +(y(2) −bm0 )
= argmax e 2σ 2
m0 ∈M 2πσ 2
( 2 2)

(y(1) −am0 ) +(y(2) −bm0 )
= argmax e 2σ 2
m0 ∈M
 2 2 
y (1) − am0 + y (2) − bm0
= argmax −
m0 ∈M 2σ 2
 (1) 2 2 
y − am0 + y (2) − bm0
= argmin
m0 ∈M 2σ 2
n  2 o
(1) 2 (2)
= argmin y − am + y − bm
0 0
m0 ∈M
n o
= argmin ky − sm0 k .
0 ∈M
m2017
c
Lecture 11, Amos Lapidoth
Nearest-Neighbor Decoding for 8PSK
y (2)
guess 1

m=1

y (1)

Observations in the shaded region lead the ML to guess “M = 1.”

c
Lecture 11, Amos Lapidoth 2017
Error Analysis for 8PSK

• By symmetry, it suffices to study pMAP (error|M = 4).


• Conditional on M = 4,
T T
Y (1) , Y (2) = (−A, 0)T + Z (1) , Z (2) ,

where Z (1) , Z (2) are IID N 0, σ 2 :
 
1 (z (1) )2 + (z (2) )2
fZ (1) ,Z (2) (z (1) , z (2) ) = exp − .
2πσ 2 2σ 2
• The contour lines of fY (1) ,Y (2) |M =4 (·) are circles centered
around the conditional mean (a4 , b4 ) = (−A, 0).
• We need to integrate this density over the complement of the
decoding region of 4.

c
Lecture 11, Amos Lapidoth 2017
y (2)

y (1)

Contour lines of the density fY1 ,Y2 |M =4 (·). Shaded region


corresponds to guessing “M = 4”.

c
Lecture 11, Amos Lapidoth 2017
The Union-of-Events Bound
• The probability of the union of two disjoint events is the sum
of their probabilities.
• Given two not necessarily disjoint events V and W,
V ∪ W = W ∪ (V \ W),
so
Pr(V ∪ W) = Pr(W) + Pr(V \ W).
• To study Pr(V \ W), note that
V = (V \ W) ∪ (V ∩ W).
so
Pr(V \ W) = Pr(V) − Pr(V ∩ W).
Hence,
Pr(V ∪ W) = Pr(V) + Pr(W) − Pr(V ∩ W)
≤ Pr(V) + Pr(W).
c
Lecture 11, Amos Lapidoth 2017
The Union-of-Events Bound for Finite Collections of Events
If V1 , V2 , . . . , is a finite (or countably-infinite) collection of events,
then
[  X
Pr Vj ≤ Pr(Vj ).
j j

The proof in the finite case is by induction:


[ n   [n 
Pr Vj = Pr V1 ∪ Vj
j=1 j=2
 [
n 
≤ Pr V1 + Pr Vj
j=2
n
X
 
≤ Pr V1 + Pr Vj
j=2
n
X 
= Pr Vj .
j=1
c
Lecture 11, Amos Lapidoth 2017
Applications to Hypothesis Testing
Define for every m0 6= m the set Bm,m0 ⊂ Rd by
n o
Bm,m0 = y ∈ Rd : πm0 fY|M =m0 (y) ≥ πm fY|M =m (y) .
Note:
y ∈ Bm,m0 =⇒
6 MAP rule will guess m0 .
(A third hypothesis might be a posteriori even more likely.)

y ∈ Bm,m0 =⇒
6 the MAP will not guess m.
(There could be a tie resolved in m’s favor.)
If m was not guessed by the MAP rule, then some m0 which is not
equal to m must have had an a posteriori probability that is at
least as high as that of m
   [ 
m was not guessed =⇒ Y∈ Bm,m0 .
m0 6=m

c
Lecture 11, Amos Lapidoth 2017
   [ 
m was not guessed =⇒ Y∈ Bm,m0 ,
m0 6=m
implies that
  h [ i
Pr m was not guessed ≤ Pr Y ∈ Bm,m0 .
m0 6=m
 
[
pMAP (error|M = m) ≤ Pr Y ∈
Bm,m0 M = m
m0 6=m
 [ n o 
= Pr ω ∈ Ω : Y(ω) ∈ Bm,m0 M = m
m0 6=m
X 
≤ Pr {ω ∈ Ω : Y(ω) ∈ Bm,m0 } M = m
m0 6=m
X  
= Pr Y ∈ Bm,m0 M = m
m0 6=m
X Z
= fY|M =m (y) dy.
m0 6=m Bm,m0
c
Lecture 11, Amos Lapidoth 2017
The Union-of-Events Bound in Hypothesis Testing
pMAP (error|M = m)
X  
≤ Pr Y ∈ Bm,m0 M = m
m0 6=m
X Z
= fY|M =m (y) dy
m0 6=m Bm,m0
X  
= Pr πm0 fY|M =m0 (Y) ≥ πm fY|M =m (Y) M = m ,
m0 6=m

where
n o
Bm,m0 = y ∈ Rd : πm0 fY|M =m0 (y) ≥ πm fY|M =m (y) .
If ties occur with probability zero, then Pr[Y ∈ Bm,m0 |M = m] is
the conditional probability of error of the MAP rule for
 π πm0 
m
fY|M =m (·) vs. fY|M =m0 (·) with prior , .
πm + πm0 πm + πm0
c
Lecture 11, Amos Lapidoth 2017
The Union Bound for 8-PSK—pMAP (error|M = 4)
B4,3
3 3 3
4 4 = 4

5 5 5
B4,5 B4,3 ∪ B4,5

Here Bm,m0 comprises the vectors that are at least as close to


(am0 , bm0 ) as to (am , bm ):
n 2 2 2 2 o
y ∈ R2 : y (1) −am0 + y (2) −bm0 ≤ y (1) −am + y (2) −bm .

Given M = 4, an error occurs only if Y is at least as close to


(a3 , b3 ) as to (a4 , b4 ), or if it is at least as close to (a5 , b5 ) as to
(a4 , b4 ), i.e., only if Y ∈ B4,3 ∪ B4,5 .
c
Lecture 11, Amos Lapidoth 2017
The Union Bound for 8-PSK—pMAP (error|M = 4)

B4,3
3 3 3
4 4 = 4

5 5 5
B4,5 B4,3 ∪ B4,5

The events Y ∈ B4,5 and Y ∈ B4,3 are not mutually exclusive, but,

pMAP (error|M = 4) ≤ Pr[Y ∈ B4,3 ∪ B4,5 | M = 4]


≤ Pr[Y ∈ B4,3 |M = 4] + Pr[Y ∈ B4,5 | M = 4].

(The first inequality is an equality because, for this problem, the


probability of a tie is zero.)
c
Lecture 11, Amos Lapidoth 2017
B4,3
3 3 3
4 4 = 4

5 5 5
B4,5 B4,3 ∪ B4,5

From our analysis of multi-dimensional binary hypothesis testing


p    π 
  (a4 − a3 )2 + (b4 − b3 )2 A

Pr Y ∈ B4,3 M = 4 = Q =Q sin
2σ σ 8
p    


 2
(a4 − a5 ) + (b4 − b5 ) 2 A π
Pr Y ∈ B4,5 M = 4 = Q =Q sin
2σ σ 8
   
A π
pMAP (error|M = 4) ≤ 2Q sin .
σ 8
By symmetry, 
A  π 

p (error) ≤ 2Q sin .
σ 8
c
Lecture 11, Amos Lapidoth 2017
Multi-Dimensional M-ary Gaussian Hypothesis Testing

• M in M = {1, . . . , M} with nondegenerate prior {πm }.


• The observation Y is a J-dimensional vector.
• Conditional on M = m, the components of Y are
(j)
independent, with Y (j) ∼ N (sm , σ 2 ), where sm ∈ RJ is
deterministic and σ 2 > 0:
J  (j) 2 
!
Y 1 y (j) − sm
fY|M =m (y) = √ exp − .
2πσ 2 2σ 2
j=1

c
Lecture 11, Amos Lapidoth 2017
Optimal Guessing Rule
Having observed that Y = y, the MAP rule randomly picks an
element from the set
M̃(y)
 n o
= m̃ ∈ M : πm̃ fY|M =m̃ (y) = max πm0 fY|M =m0 (y)
m0 ∈M
 n 
 o
= m̃ ∈ M : ln πm̃ fY|M =m̃ (y) = max0
ln πm fY|M =m0 (y)
0 .
m ∈M

Here
J
 J 1 X (j) 2
ln πm fY|M =m (y) = ln πm − ln(2πσ 2 ) − 2 y − s(j)
m .
2 2σ
j=1

The term in red is common to all, so


( J (j) 2 )
X y (j) − sm̃
M̃(y) = argmax ln πm̃ − .
m̃∈M 2σ 2
j=1

c
Lecture 11, Amos Lapidoth 2017
Optimal Rule for a Uniform Prior

When the prior is uniform, the expression for M̃(y) simplifies:


( J (j) 2
)
X y (j) − sm̃
M̃(y) = argmax −
m̃∈M 2σ 2
j=1
( J )
X (j)  2
= argmin y (j) − sm̃
m̃∈M j=1
 
2
= argmin ky − sm̃ k
m̃∈M
 
= argmin ky − sm̃ k , M uniform.
m̃∈M

This is the “nearest-neighbor rule”. No need to know σ 2 .

c
Lecture 11, Amos Lapidoth 2017
Uniform Prior and Equi-Norm Vectors
A further simplification arises when
ks1 k = ks2 k = · · · = ksM k .
In this case the nearest-neighbor rule coincides with the ”highest
correlation rule”
( J )
X (j)
M̃(y) = argmax y (j) sm̃ .
m̃∈M j=1

Indeed, starting from the nearest-neighbor rule, we note


J
X (j) 2
ky − sm̃ k2 = y (j) − sm̃
j=1
J
X J
X J
X
2 (j) (j) 2
= y (j) −2 y (j) sm̃ + sm̃ ,
j=1 j=1 j=1

where the red term is (always) common, and the blue term by
assumption. c
Lecture 11, Amos Lapidoth 2017
Ties
If the mean vectors s1 , . . . , sM are distinct,

ksm0 − sm00 k > 0, m0 6= m00 ,

then the probability of ties is zero. That is,


• the probability of observing a vector y for which # M̃(y) > 1
is zero;
• the probability that the observable Y will be such that the
MAP will require randomization is zero;
• with probability one the observed vector y is such that there
is a unique message of highest a posteriori probability.
To prove this we show that, irrespective of m, for all m0 6= m00

Pr score of Message m0 = score of Message m00 M = m = 0.

See Proposition 21.6.2 for the details.


c
Lecture 11, Amos Lapidoth 2017
The Union Bound

pMAP (error|M = m)
X  
≤ Pr πm0 fY|M =m0 (Y) ≥ πm fY|M =m (Y) M = m
m0 6=m
X  
ksm − sm0 k σ πm
= Q + ln .
2σ ksm − sm0 k πm0
m0 6=m

Thus,
X X  
∗ ksm − sm0 k σ πm
p (error) ≤ πm Q + ln .
2σ ksm − sm0 k πm0
m∈M m0 6=m

For a uniform prior these simplify to:

c
Lecture 11, Amos Lapidoth 2017
The Union Bound for the Gaussian Problem with a
Uniform Prior

X  
ksm − sm0 k
pMAP (error|M = m) ≤ Q , M uniform,

m0 6=m
 
∗ 1 X X ksm − sm0 k
p (error) ≤ Q , M uniform.
M 0

m∈M m 6=m

c
Lecture 11, Amos Lapidoth 2017
A Lower Bound
If the score of Message m0 is higher than that of Message m, then
the MAP decoder will surely not guess “M = m.”
(Whether it will guess “M = m0 ” depends on the other scores.)
Thus, for each message m0 6= m
pMAP (error|M = m)
 
≥ Pr πm0 fY|M =m0 (Y) > πm fY|M =m (Y) M = m
 
ksm − sm0 k σ πm
=Q + ln .
2σ ksm − sm0 k πm0
To tighten the bound we maximize over m0 :
 
ksm − sm0 k σ πm
pMAP (error|M = m) ≥ max Q + ln
m0 ∈M\{m} 2σ ksm − sm0 k πm0
X  
∗ ksm − sm0 k σ πm
p (error) ≥ πm max Q + ln .
m0 ∈M\{m} 2σ ksm − sm0 k πm0
m∈M
This simplifies for a uniform prior:
c
Lecture 11, Amos Lapidoth 2017
A Lower Bound for the Gaussian Problem with a
Uniform Prior
When the prior is uniform,
 
ksm − sm0 k
pMAP (error|M = m) ≥ max Q .
m0 ∈M\{m} 2σ

Since Q(·) is strictly decreasing,


 
ksm − sm0 k
pMAP (error|M = m) ≥ Q min , M uniform,
m0 ∈M\{m} 2σ
 
∗ 1 X ksm − sm0 k
p (error) ≥ Q min , M uniform.
M m0 ∈M\{m} 2σ
m∈M
 
∗ ksm0 − sm00 k
p (error) ≥ Q min , M uniform.
m0 6=m00 2σ
The minimum distance!
c
Lecture 11, Amos Lapidoth 2017
Sufficient Statistic—Informal Definition
22.2 Definition and Main Consequence 429

� � � � ��T
yobs T (yobs ) ψ1 {πm } , T (yobs ) , . . . , ψM {πm } , T (yobs )
T (·) Black Box


πm }m∈M

T (·) forms
Figure a sufficient
22.1: A black box statistic for fed
that when guessing
any priorM{πbased
m } and on obs )if (but
T (yY there
not the observation yobs directly) produces a probability vector that is equal to
exists
(Pr[Ma black box that, when fed T (y
= 1| Y = yobs ], . . . , Pr[M = M | Y = yobs ) (but not
obs]) whenever both
T y obs ) and
the conditionany

{πmπm} fproduces
priorm∈M Y |M =m (yobsthe
) > 0aand posteriori distribution
the condition yobs ∈/ Y0 areofsatisfied.
M given
Y = yobs .
is a probability vector and such that this probability vector is equal to
� �T
Pr[M = 1 | Y = yobs ], . . . , Pr[M = M | Y = yobs ] (22.7)

c
Lecture 11, Amos Lapidoth 2017
Technicalities

While the black box must always produce a probability vector, we


only require that this vector be the a posteriori distribution of M
given Y = yobs for observations yobs that satisfy
X
πm fY|M =m (yobs ) > 0
m∈M

and that lie outside some prespecified null set Y0 ⊂ Rd .

The exception set Y0 is not allowed to depend on {πm }.

Pbox need not indicate whether yobs is in Y0 and/or


The black
whether m∈M πm fY|M =m (yobs ) > 0.

c
Lecture 11, Amos Lapidoth 2017
Sufficient Statistic—Formal Definition
0
T: Rd→ Rd forms a sufficient statistic for the densities fY|M =1 , . . . ,
fY|M =M on Rd if it is Borel measurable and if for some Y0 ⊂ Rd
of Lebesgue measure zero we have that for every prior {πm } there
0
exist M Borel measurable functions from Rd to [0, 1]

T (yobs ) 7→ ψm {πm }, T (yobs ) , m ∈ M,
such that the vector
   T
ψ1 {πm }, T (yobs ) , . . . , ψM {πm }, T (yobs )
is a probability vector and such that this probability vector equals
 T
Pr[M = 1|Y = yobs ], . . . , Pr[M = M| Y = yobs ]
whenever both the condition yobs ∈
/ Y0 and the condition
M
X
πm fY|M =m (yobs ) > 0
m=1
are satisfied.
c
Lecture 11, Amos Lapidoth 2017
Guessing Based on a Sufficient Statistic Is Optimal

The MAP is optimal, and it is computable from T (yobs ).

(Ignoring the technicalities.)

c
Lecture 11, Amos Lapidoth 2017
Sufficiency Implies Pairwise Sufficiency

 π f (y)
 m Y|M =m if fY (y) > 0,
Pr[M = m|Y = y] , fY (y) m ∈ M,
1

otherwise, y ∈ Rd ,
M
and ignore the second case. For a uniform prior
−1
M fY|M =m0 (y)

Pr[M = m0 |Y = y] = P

−1
M fY|M =m (y)

m∈M 

−1
M fY|M =m00 (y)

Pr[M = m00 |Y = y] = P

−1
.
M f (y)

m∈M  Y|M =m
Dividing
Pr[M = m0 |Y = y] fY|M =m0 (y)
00
= ,
Pr[M = m | Y = y] fY|M =m00 (y)
so if the LHS is computable from T (y) then so is the RHS.
c
Lecture 11, Amos Lapidoth 2017
Pairwise Sufficiency Implies Sufficiency
Consider M densities {fY|M =m (·)}m∈M on Rd , and assume that
0
T : Rd → Rd forms a sufficient statistic for every pair of densities
fY|M =m0 (·), fY|M =m00 (·), where m0 6= m00 are both in M. Then
T (·) is a sufficient statistic for the M densities {fY|M =m (·)}m∈M .

πm fY|M =m (y)
Pr[M = m|Y = y] = , fY (y) > 0, m ∈ M, y ∈ Rd .
fY (y)
Consequently, for any prior,
πm fY|M =m (y)
Pr[M = m |Y = y] =
fY (y)
πm fY|M =m (y)
=P
m0 ∈M πm0 fY|M =m0 (y)
πm
=P fY|M =m0 (y)
m0 ∈M πm0 f (y)
Y|M =m

c
Lecture 11, Amos Lapidoth 2017
A Markov Condition

0
A Borel measurable function T : Rd → Rd forms a sufficient
statistic for the M densities {fY|M =m (·)}m∈M if, and only if, for
any prior {πm }
M (− −T (Y)(− −Y.

Intuition: Sufficiency is tantamount to the a posteriori distribution


of M given Y being computable from T (·).
This is equivalent
 to the conditional distribution of M given
Y, T (Y) being the same as given T (Y).

c
Lecture 11, Amos Lapidoth 2017
Simulating the Observables

• The condition
M (−
−T (Y)(−
−Y

is equivalent to the distribution of Y given M, T (Y) being
the same as given T (Y).
• Stated differently, the distribution of Y given T (Y) under
fY|M =m does not depend on m.
• If we generate Ỹ from T (Y) according to this conditional
law, then Ỹ will be of the same conditional law given M = m
as Y.
• We could then feed Ỹ to decoder that was designed for Y
and get the same performance!

c
Lecture 11, Amos Lapidoth 2017
Guessing Based on the Simulated Observables

438 Sufficient Statistics

� �
T (yobs ) Ỹ T (yobs ), Θ Guess
PY|T (Y)=T (yobs ) Given Rule for Guessing
M based on Y

Random Number
Generator

Figure 22.2: If T (Y) forms a sufficient statistic for guessing M based on Y, then—
even though Y cannot typically be recovered from T (Y)—the performance of any
given detector based on Y can be achieved based on T (Y) and a local random
number generator as follows. Using T (yobs ) and local randomness Θ, one produces
a Ỹ whose conditional law given M = m is the same as that of Y, for each m ∈ M.
One then feeds Ỹ to the given detector.
c
Lecture 11, Amos Lapidoth 2017
The Example Revisited (1)
Let H have a uniform prior. We observe (Y1 , Y2 ). Conditional on
H = 0, they areIID N 0, σ02 , whereas conditional on H = 1 they
are IID N 0, σ12 , where σ0 > σ1 > 0. Thus,
1  1 
2 2
fY1 ,Y2 |H=0 (y1 , y2 ) = exp − (y 1 + y2 ) , y1 , y2 ∈ R,
2πσ02 2σ02
1  1 
2 2
fY1 ,Y2 |H=1 (y1 , y2 ) = exp − (y 1 + y2 ) , y1 , y2 ∈ R.
2πσ12 2σ12

fY1 ,Y2 |H=0 (y1 , y2 )


LR(y1 , y2 ) =
fY1 ,Y2 |H=1 (y1 , y2 )
 
1 1 2 + y2)
2πσ0 2 exp − 2σ0 2 (y1 2
=  
1 1 2 + y2)
2πσ12
exp − 2σ 2 1
(y 2
2
  1  
σ1 1 1 1 2 2
= 2 exp − (y1 + y2 ) , y1 , y2 ∈ R.
σ0 2 σ12 σ02
c
Lecture 11, Amos Lapidoth 2017
The Example Revisited (2)

  
1 1 1 2 2 σ02
LR(y1 , y2 ) > 1 ⇐⇒ exp − (y 1 + y 2 ) >
2 σ12 σ02 σ12

1 1 1  σ 2
⇐⇒ 2 − 2 (y12 + y22 ) > ln 02
2 σ1 σ0 σ1
2
σ0 − σ1 2 2 σ 2
⇐⇒ 2 2 (y1 + y22 ) > ln 02
2σ0 σ1 σ1
2
2σ σ 2 σ 2
⇐⇒ y12 + y22 > 2 0 12 ln 02 .
σ0 − σ1 σ1

The ML/MAP compares Y12 + Y22 to a threshold.


To implement it, one need not observe Y1 and Y2 directly; it
suffices to observe
T , Y12 + Y22 .

c
Lecture 11, Amos Lapidoth 2017
The Example Revisited (3)

• Being the result of processing (Y1 , Y2 ) with respect to H, no


guess based on T can outperform an optimal guess based on
(Y1 , Y2 ).
• In this example, even though pre-processing the observations
to produce T = Y12 + Y22 is not reversible, basing one’s
decision on T incurs no loss in optimality.
This is all because LR(y1 , y2 ) is computable from y12 + y22 . In this
sense T = Y12 + Y22 forms a sufficient statistic for guessing H from
(Y1 , Y2 ).

c
Lecture 11, Amos Lapidoth 2017
The Example Revisited—Simulating the Observables

From T (y1 , y2 ) = y12 + y22 we can generate Ỹ using Θ ∼ U ([0, 1)):


p 
Ỹ1 = T (Y) cos 2πΘ
p 
Ỹ2 = T (Y) sin 2πΘ .

That is, after observing T (yobs ) = t, we generate Ỹ1 , Ỹ2

uniformly over the tuples that are at radius t from the origin.

c
Lecture 11, Amos Lapidoth 2017
Guessing Based on the Simulated Observables

438 Sufficient Statistics

� �
T (yobs ) Ỹ T (yobs ), Θ Guess
PY|T (Y)=T (yobs ) Given Rule for Guessing
M based on Y

Random Number
Generator

Figure 22.2: If T (Y) forms a sufficient statistic for guessing M based on Y, then—
even though Y cannot typically be recovered from T (Y)—the performance of any
given detector based on Y can be achieved based on T (Y) and a local random
number generator as follows. Using T (yobs ) and local randomness Θ, one produces
a Ỹ whose conditional law given M = m is the same as that of Y, for each m ∈ M.
One then feeds Ỹ to the given detector.
c
Lecture 11, Amos Lapidoth 2017
Guessing whether M Lies in a Given Subset of M

Let K ⊂ M be a nonempty strict subset of M. Let {πm } be a


prior under which Pr[M ∈ K] and Pr[M ∈/ K] are both positive:
X X
Pr[M ∈ K] = πm , Pr[M ∈/ K] = πm .
m∈K m∈K
/

The conditional densities of Y given M ∈ K and given M ∈


/ K are
1 X
fY|M ∈K (y) = πm fY|M =m (y),
Pr[M ∈ K]
m∈K
1 X
fY|M ∈K
/ (y) = πm fY|M =m (y).
Pr[M ∈/ K]
m∈K
/

c
Lecture 11, Amos Lapidoth 2017
Indeed

1
Pr[Y ∈ A|M ∈ K] = Pr[M ∈ K, Y ∈ A]
Pr[M ∈ K]
1 X
= Pr[M = m, Y ∈ A]
Pr[M ∈ K]
m∈K
1 X
= Pr[M = m] Pr[Y ∈ A |M = m]
Pr[M ∈ K]
m∈K
X Z
1
= πm fY|M =m (y) dy
Pr[M ∈ K] A
m∈K
Z !
1 X
= πm fY|M =m (y) dy.
A Pr[M ∈ K]
m∈K
| {z }
fY|M ∈K (y)

c
Lecture 11, Amos Lapidoth 2017
Sufficiency and Testing whether M is in K
0
Let T : Rd → Rd form a sufficient statistic for the M densities
{fY|M =m (·)}m∈M . Then T (·) is also sufficient for
fY|M ∈K (·), fY|M ∈K
/ (·).

1 X
fY|M ∈K (y) = πm fY|M =m (y),
Pr[M ∈ K]
m∈K
1 X
fY|M ∈K
/ (y) = πm fY|M =m (y),
Pr[M ∈/ K]
m∈K
/
so P
fY|M ∈K (y) Pr[M ∈
/ K] πm fY|M =m (y)
= Pm∈K
fY|M ∈K
/ (y) Pr[M ∈ K] / πm fY|M =m (y)
m∈K
P
Pr[M ∈
/ K] πm fY|M =m (y)/fY|M =1 (y)
= Pm∈K
Pr[M ∈ K] / πm fY|M =m (y)/fY|M =1 (y)
m∈K
and all the terms are computable from T (y) and M ’s PMF.
c
Lecture 11, Amos Lapidoth 2017
Next Week

The Multivariate Gaussian Distribution (Chapter 23).

Thank you!

c
Lecture 11, Amos Lapidoth 2017
Communication and Detection Theory:
Lecture 12

Amos Lapidoth
ETH Zurich

May 16, 2017

The Multivariate Gaussian Distribution

c
Lecture 12, Amos Lapidoth 2017
Today

• Sufficient Statistics in multi-hypothesis testing


• Gaussian Random Vectors

c
Lecture 12, Amos Lapidoth 2017
The Multivariate Gaussian Distribution (Chapter 23).

c
Lecture 12, Amos Lapidoth 2017
Definitions

• A random vector W is a standard Gaussian if its components


are IID N (0, 1).
• A random n-vector X is a centered Gaussian if there exists
some deterministic n × m matrix A such that
L
X = AW,

where W is a standard Gaussian m-vector.


• A random n-vector X is Gaussian if there exists some
deterministic n × m matrix A and some deterministic vector
µ ∈ Rn such that
L
X = AW + µ,
where W is a standard Gaussian m-vector.

c
Lecture 12, Amos Lapidoth 2017
Orthogonal Matrix—Definition

An n × n real matrix U is orthogonal if

UUT = In .

This condition is equivalent to

UT U = In .

c
Lecture 12, Amos Lapidoth 2017
Writing U in terms of its columns,
 
↑ ··· ↑
 ··· 
 
U=  1ψ · ·· ψn 
,
 ··· 
↓ ··· ↓
the condition UT U = In is
 
 T
 ↑ ··· ↑
← ψ1 →  
· · · · · · · · ·  ··· 
In =    ψ1 · · · ψn
· · · · · · · · · 


 ··· 
← ψnT →
↓ ··· ↓
 T T T

ψ1 ψ1 ψ1 ψ2 · · · ψ1 ψn
ψ T ψ1 ψ T ψ2 · · · ψ T ψn 
 2 2 2 
= . .. . ..  .
 . . . . . . 
ψnT ψ1 ψnT ψ2 · · · ψnT ψn
c
Lecture 12, Amos Lapidoth 2017
The Columns of an Orthogonal Matrix

A square real matrix is orthogonal iff its columns are orthonormal.

ψνT ψν 0 = I{ν = ν 0 }, ν, ν 0 ∈ {1, . . . , n}.


Using the equivalent condition UUT = In

A square real matrix is orthogonal iff its rows are orthonormal.

c
Lecture 12, Amos Lapidoth 2017
2 × 2 Orthogonal Matrices

The 2 × 2 orthogonal matrices are


   
cos θ − sin θ cos θ sin θ
, .
sin θ cos θ sin θ − cos θ

The blue corresponds to a rotation by θ and has determinant +1,


and the red to a reflection followed by a rotation
    
cos θ sin θ cos θ − sin θ 1 0
=
sin θ − cos θ sin θ cos θ 0 −1

and has determinant −1.

c
Lecture 12, Amos Lapidoth 2017
Eigenvectors of Symmetric Matrices (1)

• A matrix is symmetric if it equals its transpose.


• The vector ψ is an eigenvector of the matrix A corresponding
to the eigenvalue λ if it is nonzero and

Aψ = λψ.

• If A ∈ Rn×n is symmetric, then A has n (not necessarily


distinct) real eigenvalues λ1 , . . . , λn ∈ R with corresponding
orthonormal eigenvectors ψ1 , . . . , ψn ∈ Rn

ψνT ψν 0 = I{ν = ν 0 }, ν, ν 0 ∈ {1, . . . , n}.

c
Lecture 12, Amos Lapidoth 2017
Eigenvectors of Symmetric Matrices (2)

   
↑ ··· ↑ ↑ ··· ↑
 ···   ··· 
   
A
 ψ1 ··· ψn  
 = Aψ1 ··· Aψn 

 ···   ··· 
↓ ··· ↓ ↓ ··· ↓
and
    
↑ ··· ↑ λ1 0 ··· 0 ↑ ··· ↑
 ···  .. ..   ··· 
  0 λ2 . .   
 ψ1 ··· ψn    = λ 1 ψ1 · · · λn ψn 
   .. .. ..   .
 ···  . . . 0  ··· 
↓ ··· ↓ 0 ··· 0 λn ↓ ··· ↓

c
Lecture 12, Amos Lapidoth 2017
Eigenvectors of Symmetric Matrices (3)
The eigen-vectors/values relation is thus AU = UΛ, where
   
↑ ··· ↑ λ1 0 · · · 0
   . 
 ···   0 λ2 . . . .. 
 
U =  ψ1 · · · ψn  and Λ =  . .  .
. . .. 
 ···   . . . 0
↓ ··· ↓ 0 · · · 0 λn

Thus,
A = UΛU−1 .
The orthonormality of the eigenvectors is equivalent to U being
orthogonal. So, alternatively,

A = UΛUT .

c
Lecture 12, Amos Lapidoth 2017
Spectral Decomposition Theorem for Real Symmetric
Matrices

If A ∈ Rn×n is symmetric, then

A = UΛUT ,

where Λ ∈ Rn×n is a diagonal matrix whose diagonal elements are


the eigenvalues of A, and where U ∈ Rn×n is an orthogonal matrix
whose ν-th column is an eigenvector of A corresponding to the
eigenvalue in the ν-th position on the diagonal of Λ.

c
Lecture 12, Amos Lapidoth 2017
Positive Semidefinite Matrices
• K ∈ Rn×n is positive semidefinite or nonnegative definite

K0

if K is symmetric and

αT Kα ≥ 0, α ∈ Rn .

• K ∈ Rn×n is positive definite

K0

if K is symmetric and
 
αT Kα > 0, α 6= 0, α ∈ Rn .

c
Lecture 12, Amos Lapidoth 2017
Characterizing Positive Semidefinite Matrices

Let K be a real n × n matrix. Then the statement that K is positive


semidefinite is equivalent to each of the following statements:
1. K can be written in the form

K = ST S

for some S ∈ Rn×n .


2. K is symmetric and all its eigenvalues are nonnegative.
3. K can be expressed as

K = UΛUT ,

where Λ ∈ Rn×n is diagonal with nonnegative entries on the


diagonal and where U ∈ Rn×n is orthogonal.

c
Lecture 12, Amos Lapidoth 2017
Characterizing Positive Definite Matrices

Let K be a real n × n matrix. Then the statement that K is


positive definite is equivalent to each of the following statements.
1. K = ST S for some nonsingular S ∈ Rn×n .
2. K is symmetric and all its eigenvalues are positive.
3. K can be expressed as

K = UΛUT ,

where Λ ∈ Rn×n is diagonal with positive diagonal entries and


where U ∈ Rn×n is orthogonal.

c
Lecture 12, Amos Lapidoth 2017
Finding S Satisfying K = ST S
Given K  0, how can we find a matrix S satisfying K = ST S?
There are many. E.g., find matrices U and Λ as above satisfying
K = UΛUT .
Define √ 
λ1 0 ··· 0
 √ .. .. 
 0 λ2 . . 
Λ 1/2
=
 .. .. ..
.

 . . .0 

0 ··· 0 λn
Now choose
S = Λ1/2 UT .
Indeed, with this definition of S we have
T
ST S = Λ1/2 UT Λ1/2 UT
= UΛ1/2 Λ1/2 UT = UΛUT = K.
c
Lecture 12, Amos Lapidoth 2017
Random Vectors

• A random n-vector over the probability space (Ω, F, P ) is a


(measurable) mapping from Ω to Rn .
• It can be viewed as an array of n random variables.
• Its density is the joint density of its components.
An n × m random matrix H is an n × m array of random variables
defined over a common probability space.

c
Lecture 12, Amos Lapidoth 2017
Expectations

The expectation E[X] of a random n-vector


X = (X (1) , . . . , X (n) )T is a vector whose components are the
expectations of the corresponding components of X:
    T
E[X] , E X (1) , . . . , E X (n) .

The j-th element of E[X] isthus the expectation of the j-th


component of X, namely, E X (j) . Similarly, the expectation of a
random matrix is the matrix of expectations.

c
Lecture 12, Amos Lapidoth 2017
The Covariance Matrix
The n × n covariance matrix KXX of the random n-vector X is
h i
KXX , E (X − E[X]) (X − E[X])T
   
X (1) − E X (1)
 ..  
 .     
= E
 ..
 X (1) − E X (1) · · ·
 ··· X (n) −E X (n) 

 .  
(n) (n)

X −E X
  (1)   (1) (2)   
Var
 X  Cov X , X ··· CovX (1) , X (n) 
 Cov X (2) , X (1) Var X (2) ··· Cov X (2) , X (n) 
 
= .. .. .. .. .
 . . . . 
 (n) (1)
  (n) (2)
  (n) 
Cov X , X Cov X , X ··· Var X

c
Lecture 12, Amos Lapidoth 2017
The Covariance Matrix of a Subset of the Components

The r × r covariance matrix of

(X (j1 ) , X (j2 ) , . . . , X (jr ) )T

where 1 ≤ j1 < j2 < · · · < jr ≤ n is obtained from KXX by


picking Rows and Columns j1 , . . . , jr . For example, if
 
30 31 9 7
31 39 11 13
KXX =   9 11 9 12 ,

7 13 12 26
T 
then the covariance matrix of X (2) , X (4) is 39 13
13 26 .

c
Lecture 12, Amos Lapidoth 2017
Mean and Covariance under Linear Transformation
If H is a random matrix and A, B are deterministic matrices then
E[AH] = AE[H] , E[HB] = E[H] B.
The transpose operation commutes with expectation:
 
E HT = (E[H])T .
As to the covariance matrix,
   
Y = AX =⇒ KYY = A KXX AT .
Indeed,  
KYY , E (Y − E[Y])(Y − E[Y])T
 
= E (AX − E[AX])(AX − E[AX])T
 
= E A(X − E[X])(A(X − E[X]))T
 
= E A(X − E[X])(X − E[X])T AT
 
= AE (X − E[X])(X − E[X])T AT
= A KXX AT .
c
Lecture 12, Amos Lapidoth 2017
Singular Covariance Matrices—Example
Let X be centered with covariance matrix
 
3 5 7
KXX = 5 9 13 .
7 13 19
As we’ll see, because the columns of KXX satisfy
     
3 5 7
− 5 + 2  9  − 13 = 0,
7 13 19
it follows that
−X (1) + 2X (2) − X (3) = 0, with probability one.
Consequently, in manipulating X we can pick the two components 
X (2) , X (3) , which are of nonsingular covariance matrix 13
9 13 and
19
keep track “on the side” of the fact that X (1) is equal, with
probability one, to 2X (2) − X (3) .
c
Lecture 12, Amos Lapidoth 2017
Manipulating Singular Covariance Matrices

Let X be a centered random n-vector. Its `-th component X (`) is


a deterministic linear combination of X (`1 ) , . . . , X (`η ) iff the `-th
column of KXX is a linear combination of Columns `1 , . . . , `η .
(Proposition 23.4.1.)

Let X be a centered n-vector. Then:


• KXX is singular iff a component of X is a linear combination
of the other components.
• If Columns `1 , . . . , `d of KXX form a basis for the subspace of
Rn spanned by the columns of KXX , then every component of
X can be written as a linear combination of X (`1 ) , . . . , X (`d ) ,
T
and X (`1 ) , . . . , X (`d ) has a nonsingular covariance matrix.

c
Lecture 12, Amos Lapidoth 2017
The Characteristic Function

If X is a random n-vector, then its characteristic function ΦX (·) is


a mapping from Rn to C that maps each vector
$ = ($(1) , . . . , $(n) )T in Rn to ΦX ($), where
h T i
ΦX ($) , E ei$ X
  Xn 
(`) (`)
= E exp i $ X , $ ∈ Rn .
`=1

If X has the density fX (·), then


Z ∞ Z ∞ Pn (`) (`)
ΦX ($) = ··· fX (x) ei `=1 $ x dx(1) · · · dx(n) .
−∞ −∞

c
Lecture 12, Amos Lapidoth 2017
Random Vectors of Identical Characteristic Functions
Have Identical Laws

Two random n-vectors are of equal distribution iff they have


identical characteristic functions:
   
L
X = Y ⇐⇒ ΦX ($) = ΦY ($), $ ∈ Rn .

c
Lecture 12, Amos Lapidoth 2017
Establishing Independence via the Characteristic Function
X and Y are independent iff
h i    
E ei($1 X+$2 Y ) = E ei$1 X E ei$2 Y , $1 , $2 ∈ R. (16)

• Independence implies (16), because if X & Y are independent


then so are ei$1 X & ei$2 Y , and hence
h i      
E ei($1 X+$2 Y ) = E ei$1 X ei$2 Y = E ei$1 X E ei$2 Y .

• To prove the reverse, let X 0 and Y 0 be independent with


L L
X 0 = X and Y 0 = Y . The c.f. of (X 0 , Y 0 ) is thus
   
($1 , $2 ) 7→ E ei$1 X E ei$2 Y .
If (16) holds, then (X, Y ) and (X 0 , Y 0 ) have identical
characteristic functions and hence identical laws. Hence, like
(X 0 , Y 0 ), also the vector (X, Y ) has independent components.
c
Lecture 12, Amos Lapidoth 2017
A Standard Gaussian Vector

W is a standard Gaussian if its components are IID N (0, 1):


n  2 !
Y 1 w(`)
fW (w) = √ exp −
2π 2
`=1
 n 
1 1 X (`) 2
= exp − w
(2π)n/2 2
`=1
− 21 kwk2
= (2π)−n/2 e , w ∈ Rn .

A standard Gaussian random variable can be viewed as a standard


Gaussian 1-vector. Also,

E[W] = 0, and KWW = In .

c
Lecture 12, Amos Lapidoth 2017
Gaussian Random Vectors
A random n-vector X is Gaussian if for some positive integer m
there exists an n × m matrix A; a standard Gaussian random
m-vector W; and a deterministic vector µ ∈ Rn such that
L
X = AW + µ.

Computing the expectation and covariance of both sides, we obtain


   
L
X = AW + µ and W standard =⇒ E[X] = µ and KXX = AAT .

The law of X does not determine A. Not even the number of its
columns m. It only determines AAT .
Every positive semidefinite matrix is the covariance matrix of some
centered Gaussian random vector.
c
Lecture 12, Amos Lapidoth 2017
Examples and Basic Properties


1. Every N µ, σ 2 random variable, when viewed as a random
1-vector, is Gaussian.
Such a random variable has the same law as σW +µ, when
W is a standard univariate Gaussian.
2. Every deterministic vector is a Gaussian vector.
Choose the matrix A as the all-zero matrix 0.
3. If the components of X are independent univariate Gaussians
(not necessarily of equal variance), then X is a Gaussian
vector.
Choose A to be an appropriate diagonal matrix.

c
Lecture 12, Amos Lapidoth 2017
The Definition of Independent Vectors

The random vectors


T T
X = X (1) , . . . , X (nx ) and Y = Y (1) , . . . , Y (ny )

are independent if, for every choice of ξ1 , . . . , ξnx ∈ R and


η1 , . . . , ηny ∈ R,
h i
Pr X (1) ≤ ξ1 , . . . , X (nx ) ≤ ξnx , Y (1) ≤ η1 , . . . , Y (ny ) ≤ ηny
h i h i
= Pr X (1) ≤ ξ1 , . . . , X (nx ) ≤ ξnx Pr Y (1) ≤ η1 , . . . , Y (ny ) ≤ ηny .

c
Lecture 12, Amos Lapidoth 2017
Stacking Independent Gaussians Yields a Gaussian
L
Let A1 ∈ Rn1 ×m1 and µ1 ∈ Rn1 be such that X1 = A1 W1 + µ1 ,
where W1 is a standard Gaussian m1 -vector. Similarly, let
A2 ∈ Rn2 ×m2 and µ2 ∈ Rn2 represent the n2 -vector X2 . Let W1
& W2 be independent standard Gaussians.
      
A1 0 W1 µ1 A1 W1 + µ1
+ =
0 A2 W2 µ2 A2 W2 + µ2
| {z } | {z }
A µ
 
L X1
= ,
X2
L
where we have used that if X1 & X2 are independent, X1 = X01 ,
L
X2 = X02 , and X01 & X02 are independent, then
   0
X1 L X1
= .
X2 X02
c
Lecture 12, Amos Lapidoth 2017
An Affine Transformation of a Gaussian Is a Gaussian

Let X be a Gaussian n-vector. If C ∈ Rν×n and if d ∈ Rν , then


the random ν-vector CX + d is Gaussian.

L
Indeed, if X = AW + µ, then
L
CX + d = C(AW + µ) + d
= (CA)W + (Cµ + d),

so CX + d is Gaussian.

c
Lecture 12, Amos Lapidoth 2017
Permuting and Selecting Components
Permuting the components of a Gaussian vector results in a
Gaussian vector. Hence we speak of jointly Gaussian without
specifying the order.
Choose C as a permutation matrix, e.g.,
 (3)     (1) 
X 0 0 1 X
X (1)  = 1 0 0 X (2)  .
X (2) 0 1 0 X (3)
Constructing a random p-vector from a Gaussian n-vector by
picking p of its components (allowing for repetition) yields a
Gaussian vector.
Picking is also an affine transformation, e.g.,
 
 (3)    X (1)
X 0 0 1  (2) 
= X .
X (1) 1 0 0
X (3)
c
Lecture 12, Amos Lapidoth 2017
Every Component of a Gaussian Vector is a Gaussian RV

• Picking a component of a Gaussian vector yields a Gaussian


1-vector.
• We need to show that the sole component of a Gaussian
1-vector is a Gaussian RV.
• Let X be such a 1-vector and let it be represented by the row
matrix A and the scalar µ, so
m
X
L
X= a(1,`) W (`) + µ.
`=1

• The RHS is Gaussian because a linear combination of the


independent univariate Gaussians W (1) , . . . , W (m) is Gaussian,
and adding a constant to a Gaussian results in a Gaussian.

c
Lecture 12, Amos Lapidoth 2017
The Mean and Covariance Determine the
Law of a Gaussian

We show that if X is Gaussian of mean µ and covariance KXX ,


then
1 T T
ΦX ($) = e− 2 $ KXX $+i$ µ , $ ∈ Rn .
The c.f. is thus fully specified by the mean vector and the
covariance matrix of X, and consequently so is the distribution.

c
Lecture 12, Amos Lapidoth 2017
Computing the Characteristic Function of
a Gaussian Vector
• We compute ΦX (·) when X is a Gaussian n-vector.
 T 
• We need to compute E ei$ X for every $ ∈ Rn .
• $ T X is a Gaussian 1-vector, whose sole component is thus a
Gaussian RV. Its mean is $ T µ and its variance is $ T KXX $
 
$ T X ∼ N $ T µ, $ T KXX $ , $ ∈ Rn .

• From the c.f. of the univariate Gaussian distribution (with the


substitution $ T µ for µ, the substitution $ T KXX $ for σ 2 ,
and the substitution 1 for $), we obtain
 T  1 T T
E ei$ X = e− 2 $ KXX $+i$ µ , $ ∈ Rn .

c
Lecture 12, Amos Lapidoth 2017
There Is only One Gaussian Distribution of Given Mean
and Covariance
• Every positive semidefinite matrix is the covariance matrix of
some centered Gaussian random vector.
• The law of a Gaussian is determined by the mean and
covariance.
For every µ ∈ Rn and every positive semidefinite matrix K ∈ Rn×n
there exists one, and only one, Gaussian distribution of mean µ
and covariance matrix K. We denote it

N (µ, K) .

   1 T T

X ∼ N (µ, K) =⇒ ΦX ($) = e− 2 $ K$+i$ µ , $ ∈ Rn .

c
Lecture 12, Amos Lapidoth 2017
Jointly Gaussian Vectors

Two random vectors are said to be jointly Gaussian if the vector


that results when one is stacked on top of the other is Gaussian.

c
Lecture 12, Amos Lapidoth 2017
Independence between Jointly Gaussian Vectors
Suppose that X and Y are jointly Gaussian. Then they are
independent iff they are uncorrelated.
• Independence always implies uncorrelatedness.
• Suppose now that X and Y are centered, jointly Gaussian,
and uncorrelated. Let X0 and Y0 be independent random
L L
vectors such that X0 = X and Y0 = Y.
 0    
X KXX 0 X
0 is Gaussian of covariance , like !
Y 0 KYY Y

• The two are thus centered Gaussians of identical covariances,


and hence of identical laws.
• Since X0 & Y0 are independent, so must hence also X & Y
be.

c
Lecture 12, Amos Lapidoth 2017
More Generally

If the components of a Gaussian vector are uncorrelated, then they


are independent.

c
Lecture 12, Amos Lapidoth 2017
Pairwise Independence

The RVs X1 , . . . , Xn are pairwise independent if for each pair of


distinct indices ν 0 , ν 00 ∈ {1, . . . , n} and all ξν 0 , ξν 00 ∈ R
   
Pr Xν 0 ≤ ξν 0 , Xν 00 ≤ ξν 00 = Pr Xν 0 ≤ ξν 0 Pr[Xν 00 ≤ ξν 00 ].

The RVs X1 , . . . , Xn are independent if for all ξ1 , . . . , ξn ∈ R


n
  Y  
Pr Xj ≤ ξj , for all j ∈ {1, . . . , n} = Pr Xj ≤ ξj .
j=1

Independence implies pairwise independence, but the two are not


equivalent. However,

c
Lecture 12, Amos Lapidoth 2017
Pairwise Independence of Jointly Gaussians

If the components of a Gaussian random vector are pairwise


independent, then they are independent.

Pairwise independence implies a diagonal covariance matrix.

c
Lecture 12, Amos Lapidoth 2017
The Matrix A Can be Chosen Square
If X is a centered Gaussian n-vector, then there exists a
L
deterministic square n × n matrix A such that X = AW, where
W is a standard Gaussian n-vector.

• Being a covariance matrix, KXX must be positive semidefinite.


• There thus exists some square S ∈ Rn×n such that

KXX = ST S.

• Consider now the centered Gaussian ST W, where W is a


standard Gaussian n-vector.
• Its covariance is ST S, which equals KXX .
• The Gaussian vectors ST W and X are both centered and have
identical covariance matrices. They are thus of equal law.

c
Lecture 12, Amos Lapidoth 2017
A Canonical Representation of a Centered Gaussian
We can generate any Gaussian by stretching a standard Gaussian
and rotating the result:
Let X be a centered Gaussian n-vector. Then
√ 
λ1 W (1)
L  .. 
X = UΛ1/2 W = U  . ,
√ (n)
λn W

where W is a standard Gaussian n-vector; U ∈ Rn×n is


orthogonal; Λ ∈ Rn×n is diagonal; the diagonal elements of Λ are
the eigenvalues of KXX ; and the j-th column of U is an
eigenvector corresponding to the eigenvalue of KXX that is equal
to the j-th diagonal element of Λ.
Proof: Choose U and Λ as in the spectral representation of KXX ;
define S = Λ1/2 UT ; and verify that KXX = ST S.
c
Lecture 12, Amos Lapidoth 2017
Transforming a Gaussian to a Standard Gaussian


If X ∼ N µ, σ 2 with σ 6= 0, then (X − µ)/σ is standard. What
about vectors?
Suppose X ∼ N (µ, K), where µ ∈ Rn and K0. Let Λ and U be
as in the spectral representation of K. Then,

Λ−1/2 UT (X − µ) ∼ N (0, In ) ,

where Λ−1/2 is the diagonal matrix whose diagonal entries are the
reciprocals of the square roots of the diagonal elements of Λ.

Proof: Being the result of linearly transforming X, the vector is


Gaussian. Now check that its covariance is In .

c
Lecture 12, Amos Lapidoth 2017
The Density of a Gaussian Vector (1)
L
If X ∼ N (0, K), then X = BW, where

B = UΛ1/2 ,

and U and Λ are as in the spectral representation.



fW B−1 x
fX (x) = .
|det(B)|
Since BBT = K,
p
| det(B)| = det(B) det(B)
q
= det(B) det(BT )
q
= det(BBT )
p
= det(K).
L
We next use the density of the standard Gaussian and X = BW:
c
Lecture 12, Amos Lapidoth 2017

fW B−1 x
fX (x) =
|det(B)|
 T 
exp − 12 B−1 x B−1 x
=
(2π)n/2 |det(B)|
 T −1 
exp − 12 xT B−1 B x
=
(2π)n/2 |det(B)|
 −1 
exp − 12 xT BBT x
=
(2π)n/2 |det(B)|

exp − 12 xT K−1 x
=
(2π)n/2 |det(B)|

exp − 12 xT K−1 x
= p .
(2π)n/2 det(K)

Thus,
c
Lecture 12, Amos Lapidoth 2017

exp − 12 xT K−1 x
fX (x) = p , x ∈ Rn .
(2π)n det(K)

If X ∼ N (µ, K) where K  0, then



exp − 12 (x − µ)T K−1 (x − µ)
fX (x) = p , x ∈ Rn .
(2π)n det(K)

c
Lecture 12, Amos Lapidoth 2017
Linear Functionals of Gaussian Vectors
• A linear functional on Rn is a linear mapping from Rn to R.
• All linear functionals are of the form

x 7→ αT x.

(The j-th component of α is the result of applying the linear


functional to the vector ej .)

If X is a Gaussian n-vector and α ∈ Rn , then αT X is a Gaussian


RV.
αT X is a random 1-vector (because it is the result of lin-
early transforming X) and all the components of a Gaus-
sian vector are Gaussian RVs.

A random vector X is Gaussian iff every linear functional of X has


a univariate Gaussian distribution.

c
Lecture 12, Amos Lapidoth 2017
Proof

• To prove the remaining direction, we compute ΦX ($).


• For every $ ∈ Rn the mapping x 7→ $ T x is a linear
functional. Consequently, our assumption that the result of
the application of every linear functional to X has a univariate
Gaussian distribution implies
 
$ T X ∼ N $ T µ, $ T KXX $ , $ ∈ Rn .

• Using the c.f. of a univariate Gaussian, we compute


 T  1 T T
E ei$ X = e− 2 $ KXX $+i$ µ , $ ∈ Rn .

• The c.f. of X is thus that of a Gaussian, so X is Gaussian.

c
Lecture 12, Amos Lapidoth 2017
Next Week

Continuous-Time Stochastic Processes and White Noise


(Chapter 25).

Thank you!

c
Lecture 12, Amos Lapidoth 2017
Communication and Detection Theory:
Lecture 13

Amos Lapidoth
ETH Zurich

May 23, 2017

Continuous-Time Stochastic Processes


and White Noise

c
Lecture 13, Amos Lapidoth 2017
Today

Continuous-Time Stochastic Processes:


• Definition
• FDDs
• Stationarity
• Gaussian SPs
• Linear functionals of Gaussian SP
• White noise w.r.t. some bandwidth
• Linear functionals of white noise
• Projecting white noise onto a finite-dimensional subspace

c
Lecture 13, Amos Lapidoth 2017
Notation 
A continuous-time stochastic process X(t), t ∈ R is a family of
random variables that are defined on a common probability space
(Ω, F, P ) and that are indexed by the reals.

X : Ω × R → R, (ω, t) 7→ X(ω, t).

• If t ∈ R is fixed, then X(t), or ω 7→ X(ω, t), or X(·, t) is the



time-t sample of X(t), t ∈ R or the the state at time t.
• If ω ∈ Ω is fixed, then X(ω, ·) or t 7→ X(ω, t) is trajectory,
sample-path, path, sample-function, or realization.

ω 7→ X(ω, t) time-t sample for a fixed t ∈ R (random variable)


t 7→ X(ω, t) trajectory for a fixed ω ∈ Ω (function of time)

c
Lecture 13, Amos Lapidoth 2017
The Finite-Dimensional Distributions

The FDDs of a continuous-time SP X(t) is the collection of all
joint distributions of

X(t1 ), . . . , X(tn )

where
• n can be any positive integer and
• t1 , . . . , tn ∈ R are arbitrary epochs.


To specify the FDDs of X(t) we must specify for every n ∈ N
and for every choice of the epochs t1 , . . . , tn ∈ R the distribution of

X(t1 ), . . . , X(tn ) .

c
Lecture 13, Amos Lapidoth 2017
Do the FDDs Tell Us Everything about a SP?

What is the probability that the sample-path is continuous?


n o
Pr ω ∈ Ω : X(ω, ·) is continuous =?

This cannot be answered based on the FDDs alone!


The σ-algebra generated by X(t) is the set of events
 whose
probability can be computed from the FDDs of X(t) using only
the axioms of probability.

c
Lecture 13, Amos Lapidoth 2017
Independent Stochastic Processes

 
X(t) and Y (t) are independent stochastic processes if for
every n ∈ N and any choice of the epochs t1 , . . . , tn ∈ R,
 
X(t1 ), . . . , X(tn ) & Y (t1 ), . . . , Y (tn ) are independent.

c
Lecture 13, Amos Lapidoth 2017
Gaussian SP: Definition


X(t) is a Gaussian stochastic process if for every n ∈ N and
every choice of the epochs t1 , . . . , tn ∈ R, the random vector
(X(t1 ), . . . , X(tn ))T is Gaussian.

c
Lecture 13, Amos Lapidoth 2017

The FDDs of Gaussian SPs
If X(t) is a centered Gaussian SP, then its FDDs are determined
by the mapping
 
(t1 , t2 ) 7→ Cov X(t1 ), X(t2 ) , t1 , t2 ∈ R.

Proof: Since X(t) is a Gaussian SP,
T
X(t1 ), . . . X(tn )

is Gaussian, and its law is specified by its mean (which is zero) and
its covariance matrix, which is determined by the mapping:
 
Cov[X(t1 ), X(t1 )] Cov[X(t1 ), X(t2 )] · · · Cov[X(t1 ), X(tn )]
 ··· ··· ··· ··· 
 
 ··· ··· ··· ··· .
 
 ··· ··· ··· ··· 
Cov[X(tn ), X(t1 )] Cov[X(tn ), X(t2 )] · · · Cov[X(tn ), X(tn )]
c
Lecture 13, Amos Lapidoth 2017
Stationary Stochastic Processes

X(t) is stationary if all its time shifts have identical FDDs: for
every τ ∈ R, every n ∈ N and all epochs t1 , . . . , tn ∈ R,
L 
X(t1 + τ ), . . . , X(tn + τ ) = X(t1 ), . . . , X(tn ) .

• Choosing n = 1: if X(t) is stationary, then all its samples
have the same distribution
L
X(t) = X(t + τ ), t, τ ∈ R.

• Choosing n = 2: if X(t) is stationary, then the joint
distribution of any two of its samples depends on the elapsed
time between them and not on the absolute time at which
they are taken
L 
X(t1 ), X(t2 ) = X(t1 + τ ), X(t2 + τ ) , t1 , t2 , τ ∈ R.

c
Lecture 13, Amos Lapidoth 2017
Wide-Sense Stationary Stochastic Processes

X(t), t ∈ R is wide-sense stationary if
1. It is of finite variance;
2. its mean is constant
   
E X(t) = E X(t + τ ) , t, τ ∈ R;

3. and the covariance between its samples satisfies


   
Cov X(t1 ), X(t2 ) = Cov X(t1 + τ ), X(t2 + τ ) , t1 , t2 , τ ∈ R.

Every finite-variance
 stationary SP is also wide-sense stationary.
Indeed, if X(t) is stationary then
L
X(t) = X(t + τ ), t, τ ∈ R,
L 
X(t1 ), X(t2 ) = X(t1 + τ ), X(t2 + τ ) , t1 , t2 , τ ∈ R.
c
Lecture 13, Amos Lapidoth 2017
Autocovariance Function


The autocovariance function KXX : R → R of a WSS SP X(t) is
 
KXX (τ ) , Cov X(t + τ ), X(t) ,

(which does not depend on t because X(t) is WSS).

c
Lecture 13, Amos Lapidoth 2017
Stationary Gaussian Stochastic Processes

A Gaussian SP is stationary iff it is wide-sense stationary.

Proof: Every Gaussian SP is of finite-variance, so if it is


additionally stationary, it must also be WSS.


Assume now that X(t) is WSS. We’ll show
L 
X(t1 + τ ), . . . , X(tn + τ ) = X(t1 ), . . . , X(tn ) .

Both vectors are Gaussian (because X is a Gaussian SP), so we


only need to show identical means and covariances. The mean
vectors are both (E[X(0)] , . . . , E[X(0)])T (because X is WSS).
As to the covariance matrices:

c
Lecture 13, Amos Lapidoth 2017
The former’s covariance is
 
Cov[X(t1 ), X(t1 )] ··· Cov[X(t1 ), X(tn )]
 ··· ··· ··· 
 
 ··· ··· ··· 
 
 ··· ··· ··· 
Cov[X(tn ), X(t1 )] ··· Cov[X(tn ), X(tn )]

and the latter’s


 
Cov[X(t1 + τ ), X(t1 + τ )] ··· Cov[X(t1 + τ ), X(tn + τ )]
 ··· ··· ··· 
 
 ··· ··· ··· .
 
 ··· ··· ··· 
Cov[X(tn + τ ), X(t1 + τ )] ··· Cov[X(tn + τ ), X(tn + τ )]

They are identical by wide-sense stationarity.

c
Lecture 13, Amos Lapidoth 2017
The FDDs of a Stationary Gaussian SP
The FDDs of a centered stationary Gaussian SP are fully specified
by its autocovariance function.

Proof: Since X(t) is Gaussian, the vector
T
X(t1 ), . . . , X(tn )

is Gaussian, and its law thus determined by its mean & covariance.

• Its mean is 0 because X(t) is centered.
• The Row-j Column-` entry of its covariance matrix is

Cov[X(tj ), X(t` )] ,

which is KXX (t` − tj ) and thus determined by KXX .

c
Lecture 13, Amos Lapidoth 2017
The PSD of a Continuous-Time WSS SP


The WSS SP X(t) is of power spectral density (PSD) SXX if
SXX : R → R is nonnegative, symmetric, integrable,
 and its IFT is
the autocovariance function KXX of X(t) :
Z ∞
KXX (τ ) = SXX (f ) ei2πf τ df, τ ∈ R.
−∞

c
Lecture 13, Amos Lapidoth 2017
Remarks on the PSD


If KXX is continuous at the origin and integrable, then X(t) is of
PSD K̂XX (·). (Proposition 25.7.1.)

Every nonnegative, symmetric, integrable function is the PSD of


some stationary Gaussian SP whose autocovariance function is
continuous. (Proposition 25.7.3.)

c
Lecture 13, Amos Lapidoth 2017
The PSD and Operational PSD of a WSS SP


Let X(t) be a measurable, centered, WSS SP of a continuous
autocovariance function KXX . Let S(·) be a nonnegative,
symmetric, integrable function. Then the following two conditions
are equivalent:
1. KXX is the Inverse Fourier Transform of S(·).
2. For every integrable h : R → R, the power in X ? h is given by
Z ∞
Power of X ? h = S(f ) |ĥ(f )|2 df.
−∞

(Theorem 25.14.3)

c
Lecture 13, Amos Lapidoth 2017
The Average Power

We would like to discuss


Z Z
1 T/2 2 1 T/2 2
X (ω, t) dt or ω 7→ X (ω, t) dt, ω ∈ Ω.
T −T/2 T −T/2

Some technicalities:
• For some ω ∈ Ω the mapping t 7→ X 2 (ω, t) might be
ill-behaved.
• The mapping from ω to the result of the integral might not
be measurable.


These difficulties are eliminated if X(t) is measurable.

c
Lecture 13, Amos Lapidoth 2017

Power in a Centered WSS SP
If X(t) is a measurable, centered, WSS SP of autocovariance
function KXX , then for all a < b
Z b
1
ω 7→ X 2 (ω, t) dt
b−a a
defines a RV (possibly taking on the value +∞) satisfying
Z b 
1 2
E X (t) dt = KXX (0).
b−a a

The power in X(t) is thus KXX (0).
Proof: Swapping integration and expectation we obtain
Z b  Z b
2
 
E X (t) dt = E X 2 (t) dt
a a
Z b
= KXX (0) dt
a
= (b − a) KXX (0).
c
Lecture 13, Amos Lapidoth 2017
Linear Functionals

Let X(t) be WSS. We wish to study the RV
Z ∞
ω 7→ X(ω, t) s(t) dt.
−∞

We focus on the mean and variance:


Z ∞  Z ∞
 
E X(t) s(t) dt = E X(t) s(t) dt
−∞
Z −∞
∞  
= E X(t) s(t) dt
−∞
Z
  ∞
= E X(0) s(t) dt.
−∞

As to the variance:

c
Lecture 13, Amos Lapidoth 2017
Linear Functionals

We first consider the centered case:


Z ∞  "Z 2 #

Var X(t) s(t) dt = E X(t) s(t) dt
−∞ −∞
Z ∞  Z ∞ 
=E X(t) s(t) dt X(τ ) s(τ ) dτ
−∞ −∞
Z ∞ Z ∞ 
=E X(t) s(t) X(τ ) s(τ ) dt dτ
Z ∞−∞
Z ∞−∞
 
= s(t) s(τ ) E X(t) X(τ ) dt dτ
Z−∞ −∞
∞ Z ∞
= s(t) KXX (t − τ ) s(τ ) dt dτ.
−∞ −∞

This can be written in two forms:

c
Lecture 13, Amos Lapidoth 2017
Linear Functionals

Z ∞  Z ∞Z ∞
Var X(t) s(t) dt = s(t) KXX (t − τ ) s(τ ) dt dτ
−∞ −∞ −∞
Z ∞Z ∞
= s(σ + τ ) KXX (σ) s(τ ) dσ dτ
−∞ −∞
Z ∞ Z ∞
= KXX (σ) s(σ + τ ) s(τ ) dτ dσ
Z−∞

−∞

= KXX (σ) Rss (σ) dσ.


−∞

Or Z  Z
∞ ∞ 2
Var X(t) s(t) dt = SXX (f ) ŝ(f ) df .
−∞ −∞

c
Lecture 13, Amos Lapidoth 2017

What if X(t) is WSS but not Centered?

Consider the centered SP X̃(t)

X̃(t) = X(t) − µ, µ = E[X(t)] .

Z ∞  Z ∞ 

Var X(t) s(t) dt = Var X̃(t) + µ s(t) dt
−∞
Z−∞
∞ Z ∞ 
= Var X̃(t) s(t) dt + µ s(t) dt
Z−∞
∞  −∞

= Var X̃(t) s(t) dt


Z ∞ Z−∞∞
= s(t) KX̃X̃ (t − τ ) s(τ ) dt dτ
−∞ −∞
Z ∞Z ∞
= s(t) KXX (t − τ ) s(τ ) dt dτ.
−∞ −∞

c
Lecture 13, Amos Lapidoth 2017
Linear Functionals of Gaussian Processes

If X(t) is stationary and Gaussian, then
Z ∞ n
X
X(t) s(t) dt + αν X(tν ) is a Gaussian RV.
−∞ ν=1

Here:
• s : R → R is deterministic and integrable;
• n is an arbitrary nonnegative integer;
• α1 , . . . , αn ∈ R are arbitrary coefficients; and
• the epochs t1 , . . . , tn ∈ R are arbitrary.

The mean and variance determine the distribution!

c
Lecture 13, Amos Lapidoth 2017
Some Intuition

Approximating the integral with a Riemann sum


Z ∞ Xn K
X Xn
X(t) s(t) dt+ αν X(tν ) ≈ δ s(δk) X(δk)+ αν X(tν ).
−∞ ν=1 k=−K ν=1

The RHS is a linear functional of the vector


T
X(−Kδ), . . . , X(Kδ), X(t1 ), . . . , X(tν ) ,

which is Gaussian because X(t) is a Gaussian SP.
Being a linear functional of a Gaussian vector, the RHS is thus
Gaussian.

c
Lecture 13, Amos Lapidoth 2017
Computing the Mean

Z ∞ n
X 
E X(t) s(t) dt + αν X(tν )
−∞ ν=1
Z ∞  Xn
 
=E X(t) s(t) dt + αν E X(tν )
−∞ ν=1
Z ∞ Xn 
= E[X(0)] s(t) dt + αν .
−∞ ν=1

c
Lecture 13, Amos Lapidoth 2017
The Variance (1)
Z ∞ n
X  Z ∞ 
Var X(t) s(t) dt + αν X(tν ) = Var X(t) s(t) dt
−∞ ν=1 −∞
X
n  n
X Z ∞ 
+ Var αν X(tν ) + 2 αν Cov X(t) s(t) dt, X(tν ) .
ν=1 ν=1 −∞

We already saw
Z ∞  Z ∞
Var X(t) s(t) dt = KXX (σ) Rss (σ) dσ.
−∞ −∞

The sesquilinearity of the variance yields


X
n  n X
X n
Var αν X(tν ) = αν αν 0 KXX (tν − tν 0 ).
ν=1 ν=1 ν 0 =1

c
Lecture 13, Amos Lapidoth 2017
The Variance (2)
It remains to compute the covariance
 Z ∞  Z ∞ 
E X(tν ) X(t) s(t) dt = E X(t)X(tν ) s(t) dt
−∞
Z ∞−∞
= s(t) E[X(t)X(tν )] dt
Z−∞

= s(t) KXX (t − tν ) dt.
−∞

Combining all the terms we obtain


Z ∞ Xn  Z ∞
Var X(t) s(t) dt + αν X(tν ) = KXX (σ) Rss (σ) dσ
−∞ ν=1 −∞
n X
X n n
X Z ∞
+ αν αν 0 KXX (tν − tν 0 ) + 2 αν s(t) KXX (t − tν ) dt.
ν=1 ν 0 =1 ν=1 −∞

c
Lecture 13, Amos Lapidoth 2017
Linear Functionals of a Gaussian SP Are Jointly Gaussian
The m linear functionals
Z ∞ Xn1

X(t) s1 (t) dt + α1,ν X t1,ν , . . . ,
−∞ ν=1
Z ∞ nm
X 
X(t) sm (t) dt + αm,ν X tm,ν
−∞ ν=1

of a measurable, stationary, Gaussian SP X(t) are jointly
Gaussian.
Here:
• m ∈ N is the number of functionals;
• the m functions {sj }m j=1 are integrable;
• the coefficients {αj,ν } and the epochs {tj,ν } are deterministic
real numbers for all j ∈ {1, . . . , m} and all ν ∈ {1, . . . , nj }.

All we need is the mean vector and covariance matrix.


c
Lecture 13, Amos Lapidoth 2017
Proof
We’ll show that any linear combination of these m RVs is Gaussian:
For any choice of γ1 , . . . , γm ∈ R the linear combination
Z ∞ n1
X 

γ1 X(t) s1 (t) dt + α1,ν X t1,ν + ···
−∞ ν=1
Z ∞ nm
X 

+ γm X(t) sm (t) dt + αm,ν X tm,ν
−∞ ν=1

can also be written as linear functional of X(t)
Z ∞ X
m  nj
m X
X 
X(t) γj sj (t) dt + γj αj,ν X tj,ν ,
−∞ j=1 j=1 ν=1

and is thus Gaussian.


c
Lecture 13, Amos Lapidoth 2017
Computing the Covariance Matrix (1)

"Z nj Z nk
#
∞ X ∞ X
Cov X(t) sj (t) dt + αj,ν X(tj,ν ), X(t) sk (t) dt + αk,ν 0 X(tk,ν 0 )
−∞ ν=1 −∞ ν 0 =1
Z ∞ Z ∞ 
= Cov X(t) sj (t) dt, X(t) sk (t) dt
−∞ −∞
nj
X  Z ∞ 
+ αj,ν Cov X(tj,ν ), X(t) sk (t) dt
ν=1 −∞
Xnk  Z ∞ 
+ αk,ν 0 Cov X(tk,ν 0 ), X(t) sj (t) dt
ν 0 =1 −∞
nj nk
X X  
+ αj,ν αk,ν 0 Cov X(tj,ν ), X(tk,ν 0 ) , j, k ∈ {1, . . . , m}.
ν=1 ν 0 =1

We have seen all the terms except the first:


c
Lecture 13, Amos Lapidoth 2017
Computing the Covariance Matrix (2)
Z ∞ Z ∞ 
Cov X(t) sj (t) dt, X(t) sk (t) dt
−∞ −∞
Z ∞ Z ∞ 
=E X(t) sj (t) dt X(τ ) sk (τ ) dτ
Z−∞
∞ Z ∞
−∞

=E X(t) sj (t) X(τ ) sk (τ ) dt dτ
Z ∞−∞
Z ∞−∞
= E[X(t) X(τ )] sj (t) sk (τ ) dt dτ
−∞ −∞
Z ∞Z ∞
= KXX (t − τ ) sj (t) sk (τ ) dt dτ
−∞ −∞
Z ∞ Z ∞
= KXX (σ) sj (t) sk (t − σ) dt dσ
Z−∞
∞ Z−∞

= KXX (σ) sj (t) ~sk (σ − t) dt dσ
Z−∞

−∞

= KXX (σ) sj ? ~sk (σ) dσ.
c
Lecture 13, Amos −∞
Lapidoth 2017
Computing the Covariance Matrix (3)
Z ∞ Z ∞ 
Cov X(t) sj (t) dt, X(t) sk (t) dt
−∞ −∞
Z ∞

= KXX (σ) sj ? ~sk (σ) dσ.
−∞

If X(t) is of PSD SXX , then we can rewrite this as
Z ∞ Z ∞ 
Cov X(t) sj (t) dt, X(t) sk (t) dt
−∞ −∞
Z ∞
= SXX (f ) ŝj (f ) ŝ∗k (f ) df,
−∞

because the FT of sj ? ~sk is the product of the FT of sj and the


FT of ~sk , and because the FT of ~sk is f 7→ ŝk (−f ), which,
because sk is real, is also given by f 7→ ŝ∗k (f ).
(The covariances are summarized in Theorem 25.12.2.)
c
Lecture 13, Amos Lapidoth 2017

White Noise
N (t) is white Gaussian noise of double-sided spectral
 density
N0 /2 with respect to the bandwidth W if N (t) is a measurable,
stationary, centered, Gaussian SP that has a PSD SNN satisfying
N0
SNN (f ) = , f ∈ [−W, W ].
2

SNN (f )

N0 /2

f
−W W

c
Lecture 13, Amos Lapidoth 2017
Key Properties of White Gaussian Noise (1)
• If s is any integrable function that is bandlimited to W Hz,
Z ∞  
N0 2
N (t) s(t) dt ∼ N 0, ksk2 .
−∞ 2
• If s1 , . . . , sm are integrable functions that are bandlimited to
W Hz, then the m random variables
Z ∞ Z ∞
N (t) s1 (t) dt, . . . , N (t) sm (t) dt
−∞ −∞

are jointly Gaussian centered random variables of covariance


 
hs1 , s1 i hs1 , s2 i · · · hs1 , sm i
N0  
 hs2 , s1 i hs2 , s2 i · · · hs2 , sm i 
 .. .. .. .
.. .
2  . . . 
hsm , s1 i hsm , s2 i · · · hsm , sm i

c
Lecture 13, Amos Lapidoth 2017
Key Properties of White Gaussian Noise (2)
• If φ1 , . . . , φm are integrable functions that are bandlimited to
W Hz and are orthonormal, then
Z ∞ Z ∞  N 
0
N (t) φ1 (t) dt, . . . , N (t) φm (t) dt ∼ IID N 0, .
−∞ −∞ 2
• If s is any integrable function that is bandlimited to W Hz,

N0
KNN ? s = s.
2
• If s is an integrable function that is bandlimited to W Hz,
then for every epoch t ∈ R
Z ∞ 
N0
Cov N (σ) s(σ) dσ, N (t) = s(t).
−∞ 2

c
Lecture 13, Amos Lapidoth 2017
Proof (1)

Z ∞ Z ∞ 
Cov N (t) sj (t) dt, N (t) sk (t) dt
−∞ −∞
Z ∞
= SNN (f ) ŝj (f ) ŝ∗k (f ) df
−∞
Z W
= SNN (f ) ŝj (f ) ŝ∗k (f ) df
−W
Z
N0 W
= ŝj (f ) ŝ∗k (f ) df
2 −W
N0
= hsj , sk i , j, k ∈ {1, . . . , m}.
2

c
Lecture 13, Amos Lapidoth 2017
Proof (2)
Z ∞

KNN ? s (t) = s(τ ) KNN (t − τ ) dτ
Z −∞
∞ Z ∞
= s(τ ) SNN (f ) ei2πf (t−τ ) df dτ
Z−∞∞
−∞
Z ∞
i2πf t
= SNN (f ) e s(τ ) e−i2πf τ dτ df
−∞ −∞
Z ∞
= SNN (f ) ŝ(f ) ei2πf t df
−∞
Z W
= SNN (f ) ŝ(f ) ei2πf t df
−W
Z
N0 W
= ŝ(f ) ei2πf t df
2 −W
N0
= s(t), t ∈ R.
2
c
Lecture 13, Amos Lapidoth 2017
Proof (3)

Z ∞  Z ∞
Cov N (σ) s(σ) dσ, N (t) = SNN (f ) ŝ(f ) ei2πf t df
−∞ −∞
Z W
= SNN (f ) ŝ(f ) ei2πf t df
−W
Z
N0 W
= ŝ(f ) ei2πf t df
2 −W
N0
= s(t), t ∈ R.
2

c
Lecture 13, Amos Lapidoth 2017
Projecting a SP
If X is a measurable WSS SP, and if φ1 , . . . , φd ∈ L1 ∩ L2 are
orthonormal, then the projection of X onto span(φ1 , . . . , φd ) is
the SP
d
X
(ω, t) 7→ hX, φ` i(ω) φ` (t),
`=1

i.e.,
d
X
hX, φ` i φ` .
`=1

For a given ω ∈ Ω, it is the projection of the sample-path X(ω, ·):


d Z
X ∞  d
X

X(ω, t) φ` (t) dt φ` t 7→ X(ω, t), φ` φ` .
`=1 −∞ `=1

c
Lecture 13, Amos Lapidoth 2017
Projecting White Noise


If N (t) is white Gaussian noise of power spectral density N0 /2
w.r.t. the bandwidth W, and if φ1 , . . . , φd are orthonormal
integrable signals that are bandlimited to W Hz. Then
d
X d
X
hN, φ` i φ` and N − hN, φ` i φ`
`=1 `=1

are independent Gaussian stochastic processes.

c
Lecture 13, Amos Lapidoth 2017
A Small Detour (1)
Suppose
N = g(N ) + h(N ), (17a)
where
g(N ) and h(N ) are independent. (17b)
Then
L
N =G+H (18a)
whenever
L L
G = g(N ), H = h(N ), G and H are independent. (18b)

One way to generate such G and H is to generate N and set


G = g(N ) and H = h(N ).

But here is another way.

c
Lecture 13, Amos Lapidoth 2017
A Small Detour (2)

• Generate N 0 of the same law as N but independently of it.


• set G = g(N ) and H = h(N 0 ).
In this case too
L
N = G + H.
Indeed,
• G and H are independent because N and N 0 are independent.
L L L
• G= g(N ) and H = h(N ) because N 0 = N .

c
Lecture 13, Amos Lapidoth 2017
Simulating White Noise of a Given Projection

Let N be white Gaussian noise of double-sided power spectral


density N0 /2 w.r.t. the bandwidth W. Let N0 be of the same law
as N but independent of it. Let φ1 , . . . , φd be orthonormal
integrable signals that are bandlimited to W Hz. Then
d
X d
X

hN, φ` i φ` + N0 − N0 , φ` φ`
`=1 `=1

is a measurable SP of the same FDDs as N.

c
Lecture 13, Amos Lapidoth 2017
Projecting White Noise


If N (t) is white Gaussian noise of power spectral density N0 /2
w.r.t. the bandwidth W, and if φ1 , . . . , φd are orthonormal
integrable signals that are bandlimited to W Hz. Then
d
X d
X
hN, φ` i φ` and N − hN, φ` i φ`
`=1 `=1

are independent Gaussian stochastic processes.

c
Lecture 13, Amos Lapidoth 2017
Define
d
X d
X
N1 , hN, φ` i φ` , N2 , N − hN, φ` i φ` .
`=1 `=1
d
X d
X
N1 (t) , hN, φ` i φ` (t), N2 (t) , N (t) − hN, φ` i φ` (t).
`=1 `=1
We need to show that for every n ∈ N and epochs t1 , . . . , tn ∈ R
T T
N1 (t1 ), . . . , N1 (tn ) and N2 (t1 ), . . . , N2 (tn )
are independent Gaussian vectors. They are jointly Gaussian
because they are linear functionals of a Gaussian SP:
 X d  Z ∞ X d 
N1 (tν ) = N, φ` (tν )φ` = N (t) φ` (tν )φ` dt,
`=1 −∞ `=1
Z ∞  Xd 
N2 (tν 0 ) = N (t) − φ` (tν 0 )φ` dt + N (tν 0 ).
−∞ `=1
It thus remains to establish that they are uncorrelated.
c
Lecture 13, Amos Lapidoth 2017

Because N (t) is centered, N1 (tν ) and N2 (tν 0 ) are centered and
   
Cov N1 (tν ), N2 (tν 0 ) = E N1 (tν ) N2 (tν 0 ) .
It thus remains to establish
 
E N1 (tν ) N2 (tν 0 ) = 0, ν, ν 0 ∈ {1, . . . , n}.
More generally,
 
E N1 (t) N2 (t0 )
X d  d
X 
0 0
=E hN, φ` iφ` (t) N (t ) − hN, φ`0 iφ`0 (t )
`=1 `0 =1
d
X d d
  X X  
= φ` (t)E hN, φ` iN (t0 ) − φ` (t)φ`0 (t0 )E hN, φ` ihN, φ`0 i
`=1 `=1 `0 =1
d
X d
X
N0 N0
= φ` (t)φ` (t0 ) − φ` (t)φ` (t0 )
2 2
`=1 `=1
= 0, t, t0 ∈ R.
c
Lecture 13, Amos Lapidoth 2017
Next Week

Detection in White Noise (Chapter 26).

Thank you!

c
Lecture 13, Amos Lapidoth 2017
Communication and Detection Theory:
Lecture 14

Amos Lapidoth
ETH Zurich

May 30, 2017

Detection in White Noise

c
Lecture 14, Amos Lapidoth 2017
Signals in White Noise
The “message” M takes value in M = {1, . . . , M}, with prior

πm = Pr[M = m], m ∈ M.

The “observation” Y (t) is a continuous-time SP.
Conditional on M = m,

Y (t) = sm (t) + N (t), t ∈ R.

• The “mean signals” s1 , . . . , sM are real, deterministic,


integrable signals that are bandlimited to W Hz.

• The “noise” N (t) is independent of M and is white
Gaussian noise of double-sided spectral density N0 /2 w.r.t.
the bandwidth W.

Based on Y (t) we wish to guess M with the smallest possible
probability of error.
c
Lecture 14, Amos Lapidoth 2017
A Technicality

We only consider guessing rules whose performance is determined


by the FDDs.

(I.e., that
 are measurable w.r.t. the σ-algebra generated by
Y (t) .)

c
Lecture 14, Amos Lapidoth 2017
From a SP to a Random Vector

If (φ1 , . . . , φd ) is an orthonormal basis for span(s1 , . . . , sM ), then


to every decision rule2 based on Y there corresponds a
(randomized) decision-rule based on
T
T , hY, φ1 i, . . . , hY, φd i

of identical performance. Consequently,

no measurable decision rule based on Y can outperform an


optimal rule based on T.

2
that is measurable w.r.t. the σ-algebra generated by Y
c
Lecture 14, Amos Lapidoth 2017
Computing the Inner Products

~1 hY, φ1 i
φ

~2 hY, φ2 i

Decision Rule
φ
 Guess
Y (t)
...

~d hY, φd i
φ
sample at t = 0

c
Lecture 14, Amos Lapidoth 2017
More Generally
• span(φ1 , . . . , φd ) need not equal span(s1 , . . . , sM ): it suffices
that
span(φ1 , . . . , φd ) ⊇ span(s1 , . . . , sM ).
• The same holds for any vector S provided that T is
computable from S.
• If s̃1 , . . . , s̃n are integrable signals that are bandlimited to W
Hz and
span(s1 , . . . , sM ) ⊆ span(s̃1 , . . . , s̃n ),
then it is optimal to base our guess on
T
hY, s̃1 i, . . . , hY, s̃n i .
Indeed, T is computable from this vector because
 X n   Xn 
φ` = α`,j s̃j =⇒ hY, φ` i = α`,j hY, s̃j i .
j=1 j=1

c
Lecture 14, Amos Lapidoth 2017
The Conditional Law of T

Given M = m, what is the conditional law of


T
T , hY, φ1 i, . . . , hY, φd i ?

Given M = m, we have Y = sm + N so
T T
T = hsm , φ1 i, . . . , hsm , φd i + hN, φ1 i, . . . , hN, φd i .

And since N is white w.r.t. W and (φ1 , . . . , φd ) are orthonormal


and bandlimited to W Hz,
 
N0
hN, φ1 i, . . . , hN, φd i ∼ IID N 0, .
2

c
Lecture 14, Amos Lapidoth 2017
Key Properties of White Gaussian Noise (2)
• If φ1 , . . . , φm are integrable functions that are bandlimited to
W Hz and are orthonormal, then
Z ∞ Z ∞  N 
0
N (t) φ1 (t) dt, . . . , N (t) φm (t) dt ∼ IID N 0, .
−∞ −∞ 2
• If s is any integrable function that is bandlimited to W Hz,

N0
KNN ? s = s.
2
• If s is an integrable function that is bandlimited to W Hz,
then for every epoch t ∈ R
Z ∞ 
N0
Cov N (σ) s(σ) dσ, N (t) = s(t).
−∞ 2

c
Lecture 14, Amos Lapidoth 2017
Reduction to the Multi-Dimensional Multi-Hypothesis
Gaussian Problem

Before Now
observed vector Y T
number of components of
J d
the observed vector
variance of noise added to
σ2 N0 /2
each component
number of hypotheses M M
conditional mean of ob- (1) (J) T T
sm , . . . , sm hsm , φ1 i, . . . , hsm , φd i
servation given M = m
J
X d
X Z ∞
sum of squared compo- (j) 2 2
sm hsm , φ` i = s2m (t) dt
nents of mean vector −∞
j=1 `=1

c
Lecture 14, Amos Lapidoth 2017
Optimal Rule Based on T

Picking uniformly at random from


( Pd 2 )
`=1 hY, φ` i − hsm0 , φ` i
argmax ln πm0 −
m0 ∈M N0

minimizes the probability of a guessing error.

If the prior is uniform:

c
Lecture 14, Amos Lapidoth 2017
Optimal Rule Based on T—Uniform Prior

If M is uniform, then this rule does not depend on the value of


N0 . It picks uniformly at random from
( d )
X 2
argmin hY, φ` i − hsm0 , φ` i .
m0 ∈M `=1

c
Lecture 14, Amos Lapidoth 2017
What if s1 , . . . , sM all Have the same Energy?
Since (φ1 , . . . , φd ) is orthonormal
d
X
sm = hsm , φ` iφ` , m ∈ M,
`=1

and,
d
X
ksm k22 = hsm , φ` i2 , m ∈ M.
`=1
Consequently,
 
ks1 k2 = ks2 k2 = · · · = ksM k2
X
d d
X d
X 
2 2 2
=⇒ hs1 , φ` i = hs2 , φ` i · · · = hsM , φ` i .
`=1 `=1 `=1

In this case the Euclidean norms of all mean vectors are equal!
c
Lecture 14, Amos Lapidoth 2017
Optimal Rule for a Uniform Prior and Equi-Energy Mean
Signals

If M has a uniform distribution and, in addition, the mean signals


are of equal energy, i.e.,

ks1 k2 = ks2 k2 = · · · = ksM k2 ,

then it is optimal to use the maximum-correlation


d
X
argmax hsm0 , φ` ihY, φ` i.
m0 ∈M `=1

c
Lecture 14, Amos Lapidoth 2017
The Decision Rule without Reference to a Basis
Pd 2
hY, φ` i − hsm0 , φ` i
`=1
ln πm0 −
N0
can be expressed by opening the square as
d d d
1 X 2 2 X 1 X
ln πm0 − hY, φ` i + hY, φ` ihsm0 , φ` i− hsm0 , φ` i2 .
N0 N0 N0
`=1 `=1 `=1

The term in red does not depend on m0 , so we choose at random a


message in
 

 


 


 d d 

2 X 1 X 2
argmax ln πm0 + hY, φ` ihsm0 , φ` i − hsm0 , φ` i .
m0 ∈M  N0 N0 


 `=1
| {z } `=1
| {z } 


 

hY,sm0 i 2
ksm0 k2

c
Lecture 14, Amos Lapidoth 2017
d
X  X d 
hY, φ` ihsm , φ` i = Y, hsm , φ` iφ`
`=1 `=1
= hY, sm i, m ∈ M.

c
Lecture 14, Amos Lapidoth 2017
Optimal Rule

Pick at random an element of


 Z ∞ Z 
2 1 ∞ 2
argmax ln πm0 + Y (t) sm0 (t) dt − s 0 (t) dt .
m0 ∈M N0 −∞ 2 −∞ m

For a uniform prior


Z ∞ Z 
1 ∞ 2
argmax Y (t) sm0 (t) dt − s 0 (t) dt .
m0 ∈M −∞ 2 −∞ m

For a uniform prior and equi-energy mean signals


Z ∞ 
argmax Y (t) sm (t) dt .
0
m0 ∈M −∞

c
Lecture 14, Amos Lapidoth 2017
Performance Analysis
As in the Mutli-Dimensional Gaussian Multi-Hypothesis Problem!
Before Now
observed vector Y T
number of components of
J d
the observed vector
variance of noise added to
σ2 N0 /2
each component
number of hypotheses M M
conditional mean of ob- (1) (J) T T
sm , . . . , sm hsm , φ1 i, . . . , hsm , φd i
servation given M = m
J
X d
X Z ∞
sum of squared compo- (j) 2 2
sm hsm , φ` i = s2m (t) dt
nents of mean vector −∞
j=1 `=1

And note
d
X 2
hsm , φ` i − hsm0 , φ` i = ksm − sm0 k22 .
`=1

c
Lecture 14, Amos Lapidoth 2017
 p 
X
ksm − sm0 k2 N0 /2 πm
pMAP (error|M = m) ≤ Q √ + ln
2N0 ksm − sm0 k2 πm0
m0 6=m

s 
X ksm −sm0 k22
pMAP (error|M = m) ≤ Q , M uniform
2N0
m0 6=m

 p 
ksm − sm0 k2 N0 /2 πm
pMAP (error|M = m) ≥ max Q √ + ln
m0 6=m 2N0 ksm − sm0 k2 πm0

s 
ksm −sm0 k22
pMAP (error|M = m) ≥ max Q , M uniform
0 m 6=m 2N0

c
Lecture 14, Amos Lapidoth 2017
Antipodal Signaling
Consider the binary case with a uniform prior
s0 = −s1 = s,
where s is a nonzero integrable signal that is bandlimited to W Hz,
Es , ksk22 .
Here span(s0 , s1 ) is one dimensional and is spanned by the
unit-norm signal
s
φ= .
ksk2
We guess based on
T = hY, φi .
√ 
Conditional on H = 0, we have T ∼ N √ Es , N0 /2 ,whereas,
conditional on H = 1, we have T ∼ N − Es , N0 /2 .
How to guess H based on T we have already seen with
p N0
A = Es , σ2 = .
2
c
Lecture 14, Amos Lapidoth 2017
Guess “H = 1” Guess “H = 0”

fY |H=1 (y) fY |H=0 (y)

fY (y)
pMAP (error|H = 0)

y
−A A
c
Lecture 14, Amos Lapidoth 2017
Antipodal Signaling
It is optimal to guess “H = 0” if T ≥ 0 and to guess “H = 1” if
T < 0. That is,
Z ∞
Guess “H = 0” if Y (t) s(t) dt ≥ 0.
−∞

Substituting
p N0
Es = A, = σ2
2
we obtain !
r
2Es
p∗ (error) = Q .
N0

The √distance is ks − (−s)k2 , i.e., 2 ksk2 . Half the


√ distance
p is ksk2 ,
i.e., Es . Measured in standard deviations it is Es / N0 /2.
c
Lecture 14, Amos Lapidoth 2017
General Binary Signaling (1)
Assume a uniform prior and mean signals s0 and s1 .
We could find an orthonormal basis for span(s0 , s1 ).
Instead, we subtract (s0 + s1 )/2 from Y so
1 
Ỹ (t) = Y (t) − s0 (t) + s1 (t) , t ∈ R.
2
Since Y can be recovered from Ỹ we can guess based on Ỹ.
Conditional on H = 0,
Ỹ = Y − (s0 + s1 )/2
= s0 + N − (s0 + s1 )/2
= (s0 − s1 )/2 + N.
Conditional on H = 1,
Ỹ = Y − (s0 + s1 )/2
= s1 + N − (s0 + s1 )/2
= −(s0 − s1 )/2 + N.
c
Lecture 14, Amos Lapidoth 2017
General Binary Signaling

(2)
Thus, the guessing problem given Ỹ (t) is the antipodal signal
problem with
s0 − s1
s, .
2
An
R optimal decision rule is to guess “H = 0” if
Ỹ (t) s0 (t) − s1 (t) /2 dt is nonnegative, i.e.,
Z ∞ 
s0 (t) + s1 (t) s0 (t) − s1 (t)
Guess “H = 0” if Y (t) − dt ≥ 0.
−∞ 2 2
s 
2
ks 0 − s k
1 2 
p∗ (error) = Q  .
2N0

Half the distance is ks0 − s1 k2 /2, which in standard deviations is


ks0 − s1 k2 /2
p .
N0 /2
c
Lecture 14, Amos Lapidoth 2017
M-ary Orthogonal Keying

Suppose M is uniform, and the mean signals are orthogonal of


equal energy Es > 0:

hsm0 , sm00 i = Es I{m0 = m00 }, m0 , m00 ∈ M.

Since M is uniform, and since the mean signals are of equal


energy, the “max-correlation” rule is optimal:

Guess “m” if hY, sm i = max


0
hY, sm0 i
m ∈M

with ties resolved by picking any message achieving the maximum.

c
Lecture 14, Amos Lapidoth 2017
The Probability of Error

Define Z ∞
(`) s` (t)
T = Y (t) √ dt, ` ∈ {1, . . . , M}.
−∞ Es
0
We guess “M = m” if T (m) = maxm0 ∈M T (m ) , with ties being
resolved at random among the components of T that are maximal.

The mean signals are distinct, and hence the probability of a tie is
zero, and

pMAP (error|M = m)
  
= Pr max T (1) , . . . , T (m−1) , T (m+1) , . . . , T (M) > T (m) M = m .

c
Lecture 14, Amos Lapidoth 2017
The Conditional Law of T

Conditional on M = m,
• the components of T are independent
√ 
• with the m-th being N Es , N0 /2 and
• the other components N (0, N0 /2).

pMAP (error|M = m) is the probability that at least one of M − 1


N (0, N0 /2)
IID √  random variables exceeds the value of a
N Es , N0 /2 random variable that is independent of them.
Consequently,

pMAP (error|M = m) = pMAP (error|M = 1), m ∈ M.

c
Lecture 14, Amos Lapidoth 2017
pMAP (error|M = 1)
  
= Pr max T (2) , . . . , T (M) > T (1) M = 1
  
= 1 − Pr max T (2) , . . . , T (M) ≤ T (1) M = 1
Z ∞
  
= 1− fT (1) |M =1 (t) Pr max T (2) , . . . , T (M) ≤ t M = 1, T (1) = t dt
Z−∞
∞   
=1− fT (1) |M =1 (t) Pr max T (2) , . . . , T (M) ≤ t M = 1 dt
Z−∞
∞  
=1− fT (1) |M =1 (t) Pr T (2) ≤ t, . . . , T (M) ≤ t M = 1 dt
Z−∞  
∞ M−1
=1− fT (1) |M =1 (t) Pr T (2) ≤ t M = 1 dt
−∞
Z ∞   M−1
t
=1− fT (1) |M =1 (t) 1 − Q p dt
−∞ N0 /2
Z ∞ √
(t− Es )2
  M−1
1 − t
=1− √ e N 0 1−Q p dt
−∞ πN0 N0 /2
Z ∞  r !M−1
1 2 2E s
=1− √ e−τ /2 1 − Q τ + dτ.
c

Lecture 14, Amos −∞ 2017
Lapidoth
N 0
Zero-Mean Signals for Additive-Noise Channels

{Dj } X Y =X+N {Djest }


TX1 + RX1

N
TX2 RX2
{Dj } X X−c Y =X−c+N X+N {Djest }
TX1 + + + RX1

−c c

How should we choose c(·)?


c
Lecture 14, Amos Lapidoth 2017
Subtracting the Mean (1)
 
E (W − c)2 ≥ Var[W ] , c∈R
with equality iff

c = E[W ] .
 
E (W − c)2
h 2 i
= E (W − E[W ]) + (E[W ] − c)
    
= E (W − E[W ])2 + 2 E W − E[W ] E[W ] − c + (E[W ] − c)2
| {z }
0
 
= E (W − E[W ])2 + (E[W ] − c)2
 
≥ E (W − E[W ])2
= Var[W ] ,

with equality iff c = E[W ]. (Huygens-Steiner)


c
Lecture 14, Amos Lapidoth 2017
Subtracting the Mean (2)

To minimize Z
1 T h 2 i
E X(t) − c(t) dt,
2T −T

we minimize the integrand, i.e., we choose c(t) to minimize


h 2 i
E X(t) − c(t) ,

and thus choose  


c(t) = E X(t) , t ∈ R.

The transmitted signal X − c is then centered!

c
Lecture 14, Amos Lapidoth 2017
The M-ary Simplex
Start from Orthogonal Keying and subtract the mean.
Let φ1 , . . . , φM be orthonormal. Let φ̄ be their “center of gravity”
1 X
φ̄ = φm .
M
m∈M
The M-ary Simplex is a scaled version of
φ1 − φ̄, . . . , φM − φ̄.

φ2

φ̄

φ2 φ1
φ̄

φ̄

φ1
c
Lecture 14, Amos Lapidoth 2017
Constructing the 3-Ary Simplex

c
Lecture 14, Amos Lapidoth 2017
The Simplex: Inner Products and Energies
φ1 , . . . , φM are orthonormal and
1 X
φ̄ = φm .
M
m∈M
Consequently
D E
φm0 − φ̄, φm00 − φ̄




= hφm0 , φm00 i − φm0 , φ̄ − φm00 , φ̄ + φ̄, φ̄
n o 1D X E 1D X E
= I m0 = m00 − φm0 , φm − φm00 , φm
M m
M m
2

1 X

+ 2 φm
M m

2
n o 1 1 1
0 00
=I m =m − − + M
M M M2
n o 1
= I m0 = m00 − , m0 , m00 ∈ M.
M
c
Lecture 14, Amos Lapidoth 2017
Normalization
Since

φm − φ̄ 2 = 1 − 1 = M − 1 ,
2 M M
we define the energy-Es M-ary simplex constellation as
r
p M 
sm = Es φm − φ̄ , m ∈ M,
M−1
with the result that
Es
ksm k22 = Es and hsm0 , sm00 i = − , m0 6= m00 .
M−1
{sm } can be viewed as the result of subtracting the center of
gravity from orthogonal signals of energy Es M/(M − 1):
r r
p M p M
sm = Es φm − Es φ̄, m ∈ M.
M−1 M−1
c
Lecture 14, Amos Lapidoth 2017
The Probability of Error for the Simplex

Since {sm } can be viewed as the result of subtracting the center of


gravity from orthogonal signals of energy Es M/(M − 1), p∗ (error)
for the energy-Es simplex is the same as for orthogonal keying with
energy
M
Es .
M−1

p∗ (error)
Z ∞  r !M−1
1 −τ 2 /2 M 2Es
=1− √ e 1−Q τ + dτ.
2π −∞ M − 1 N0

c
Lecture 14, Amos Lapidoth 2017
From the Simplex to Orthogonal Keying
If ψ is of unit-energy and orthogonal to {s1 , . . . , sM }, then
 p 
1
sm + √ Es ψ
M−1 m∈M
are orthogonal, each of energy Es M/(M − 1).
s1

ψ
+ √
s
E
s1


Es ψ
s2
+
√ Es
ψ

s2
c
Lecture 14, Amos Lapidoth 2017
Decoding the Simplex

1

To decode, add √M−1 Es ψ to Y and use a decoder for
orthogonal keying.

c
Lecture 14, Amos Lapidoth 2017
Bi-Orthogonal Keying

The 2κ mean signals are


p p
sν,u = + Es φν and sν,d = − Es φν , ν ∈ {1, . . . , κ},
where (φ1 , . . . , φκ ) are orthonormal.
p
ksν,u k2 = ksν,d k2 = Es , ν ∈ {1, . . . , κ}.
c
Lecture 14, Amos Lapidoth 2017
Optimal Guessing Rule
Since the prior is uniform and the mean signals of equal energy, we
should pick the message corresponding to the largest of
hY, s1,u i , hY, s1,d i , . . . , hY, sκ,u i , hY, sκ,d i .
Since sν,u = −sν,d
 p
max hY, sν,u i , hY, sν,d i = hY, sν,u i = Es hY, φν i , ν ∈ {1, . . . , κ}.
We can also compare in pairs and then compare the κ results:
n o
max hY, s1,u i , hY, s1,d i , . . . , hY, sκ,u i , hY, sκ,d i
n   o
= max max hY, s1,u i , hY, s1,d i , . . . , max hY, sκ,u i , hY, sκ,d i .
Find which ν ∗ in {1, . . . , κ} attains

max hY, φν i
ν∈{1,...,κ}

and then guess “sν ∗ ,u ” if hY, φν ∗ i > 0 and guess “sν ∗ ,d ”


otherwise.
c
Lecture 14, Amos Lapidoth 2017
The Probability of Error
pMAP (correct|s1,u ) (ties occur with probability zero)
  

= Pr − hY, φ1 i ≤ hY, φ1 i and max hY, φν i ≤ hY, φ1 i s1,u
2≤ν≤κ
  

= Pr hY, φ1 i ≥ 0 and max hY, φν i ≤ hY, φ1 i s1,u
2≤ν≤κ
Z ∞ h i

= fhY,φ1 i|s1,u (t) Pr max hY, φν i ≤ t s1,u , hY, φ1 i = t dt
2≤ν≤κ
Z0 ∞ h i

= fhY,φ1 i|s1,u (t) Pr max hY, φν i ≤ t s1,u dt
2≤ν≤κ
Z0 ∞   κ−1
= fhY,φ1 i|s1,u (t) Pr |hY, φ2 i| ≤ t s1,u dt
0
Z ∞ √
(t− Es )2
 !κ−1
1 − t
= √ e N0 1 − 2Q p dt
0 πN0 N0 /2
Z ∞  r !κ−1
2 2Es
= (2π)−1/2 q e−τ /2 1 − 2Q τ + dτ.
− 2Es N0
N0
c
Lecture 14, Amos Lapidoth 2017
Justification for the Red Terms

Conditional on s1,u being sent, hY, φ2 i ∼ N (0, N0 /2), so


h i

Pr hY, φ2 i ≤ t s1,u
 
|hY, φ2 i| t
= Pr p ≤p s1,u
N /2 N0 /2
 0 
|hY, φ2 i| t
= 1 − Pr p ≥p s1,u
N0 /2 N0 /2
   
hY, φ2 i t hY, φ2 i −t
= 1 − Pr p ≥p s1,u − Pr p ≤p s1,u
N0 /2 N0 /2 N0 /2 N0 /2
 
t
= 1 − 2Q p .
N0 /2

c
Lecture 14, Amos Lapidoth 2017
From a SP to a Random Vector

If (φ1 , . . . , φd ) is an orthonormal basis for span(s1 , . . . , sM ), then


to every decision rule3 based on Y there corresponds a randomized
decision-rule based on
T
T , hY, φ1 i, . . . , hY, φd i

of identical performance.

Consequently, no measurable decision rule based on Y can


outperform an optimal rule based on T.

3
that is measurable w.r.t. the σ-algebra generated by Y
c
Lecture 14, Amos Lapidoth 2017
A Toy Problem
We observe a pair (Y1 , Y2 ).

H=0: Y1 = s0 + N1 , Y2 = N2 .

H=1: Y1 = s1 + N1 , Y2 = N2 .
Here  
N1 ∼ N 0, σ 2 N2 ∼ N 0, σ 2
are independent of H.
(Later s0 , s1 will be waveforms and N1 , N2 stochastic processes.)
• Can Y2 be discarded?
• Does (y1 , y2 ) 7→ y1 form a sufficient statistic?

Not necessarily!

c
Lecture 14, Amos Lapidoth 2017
Can Y2 be Discarded?

If N1 and N2 are not independent, Y2 could be useful!

If N2 is equal to N1 , we can guess H error-free based on Y1 − Y2 !

But if N2 is independent of N1 , then Y2 can be discarded!

c
Lecture 14, Amos Lapidoth 2017
Discarding Y2 when N1 and N2 Are Independent

fY1 ,Y2 |H=0 (y1 , y2 )


LR(y1 , y2 ) =
fY1 ,Y2 |H=1 (y1 , y2 )
fY |H=0 (y1 )fN2 (y2 )
= 1
fY1 |H=1 (y1 )fN2 (y2 )
fN (y1 − s0 )
= 1 ,
fN1 (y1 − s1 )

which is computable from y1 .

Here is a proof that extends better to stochastic processes:

c
Lecture 14, Amos Lapidoth 2017
y1
Given rule for guessing Guess
y2 H based on (Y1 , Y2 )

c
Lecture 14, Amos Lapidoth 2017
y1
Given rule for guessing Guess
y2 H based on (Y1 , Y2 )

y1
Given rule for guessing Guess
H based on (Y1 , Y2 )

c
Lecture 14, Amos Lapidoth 2017
y1
Given rule for guessing Guess
y2 H based on (Y1 , Y2 )

y1
Given rule for guessing Guess
ỹ2 H based on (Y1 , Y2 )

Generate
Ỹ2 ∼ fN2 (·)

Local
Randomness
c
Lecture 14, Amos Lapidoth 2017
Back to the Real Problem (1)
d
X d
X
Y1 , hY, φ` i φ` , Y2 , Y − hY, φ` i φ` .
`=1 `=1
Since Y = Y1 + Y2 , we can guess based on (Y1 , Y2 ).
Conditional on M = m
Xd
Y1 = h(sm + N), φ` i φ`
`=1
d
X d
X
= hsm , φ` i φ` + hN, φ` i φ`
`=1 `=1
d
X
= sm + hN, φ` i φ` ,
`=1
and
d
X
Y2 = N − hN, φ` i φ` .
c
Lecture 14, Amos Lapidoth 2017 `=1
Back to the Real Problem (2)
Conditional on M = m,

Y1 = sm + N1 , Y2 = N2 ,

where
d
X d
X
N1 = hN, φ` i φ` N2 = N − hN, φ` i φ` .
`=1 `=1

Since N1 and N2 are independent, Y2 can be discarded!

And Y1 can be reconstructed from


T
hY, φ1 i, . . . , hY, φd i .

QED.
c
Lecture 14, Amos Lapidoth 2017
� �
Y (t), t ∈ R Decision Guess
Device

�Y, φ1 �

Reconstruction
�Y, φ2 �

Projection
� � �
� �Y, φ� � φ� Y � (t) Decision Guess
+
Device


�Y, φd � N� − � �N� , φ� � φ�

Projection
Subtraction

N�

� �
Generate N � (t)
� �
of same FDDs as N (t)

Local
Randomness

c
Lecture 14, Amos Lapidoth 2017
Thank you!

Kindly read Section 26.9; it has numerous useful examples.

c
Lecture 14, Amos Lapidoth 2017

You might also like