Digital Modulation 1: Ingmar Land and Bernard H. Fleury

Digital Modulation 1
– Lecture Notes –
Ingmar Land and Bernard H. Fleury
Navigation and Communications (NavCom)

Department of Electronic Systems
Aalborg University, DK
Version: February 5, 2007

i
Contents
I Basic Concepts 1
1 The Basic Constituents of a Digital Communication System

1.1 Discrete Information Sources . . . . . . . . . . . . 4
1.2 Considered System Model . . . . . . . . . . . . . . 6
1.3 The Digital Transmitter . . . . . . . . . . . . . . . 10
1.3.1 Waveform Look-up Table . . . . . . . . . . 10
1.3.2 Waveform Synthesis . . . . . . . . . . . . . 12
1.3.3 Canonical Decomposition . . . . . . . . . . 13
1.4 The Additive White Gaussian-Noise Channel . . . . 15
1.5 The Digital Receiver . . . . . . . . . . . . . . . . . 17
1.5.1 Bank of Correlators . . . . . . . . . . . . . 17
1.5.2 Canonical Decomposition . . . . . . . . . . 18
1.5.3 Analysis of the correlator outputs . . . . . . 20
1.5.4 Statistical Properties of the Noise Vector . . 23
1.6 The AWGN Vector Channel . . . . . . . . . . . . . 27
1.7 Vector Representation of a Digital Communication System 28
1.8 Signal-to-Noise Ratio for the AWGN Channel . . . 31
2 Optimum Decoding for the AWGN Channel 33

2.1 Optimality Criterion . . . . . . . . . . . . . . . . . 33
2.2 Decision Rules . . . . . . . . . . . . . . . . . . . . 34
2.3 Decision Regions . . . . . . . . . . . . . . . . . . . 36
Land, Fleury: Digital Modulation 1 NavCom

ii
2.4 Optimum Decision Rules . . . . . . . . . . . . . . 37

2.5 MAP Symbol Decision Rule . . . . . . . . . . . . . 39
2.6 ML Symbol Decision Rule . . . . . . . . . . . . . . 41
2.7 ML Decision for the AWGN Channel . . . . . . . . 42
2.8 Symbol-Error Probability . . . . . . . . . . . . . . 44
2.9 Bit-Error Probability . . . . . . . . . . . . . . . . . 46
2.10 Gray Encoding . . . . . . . . . . . . . . . . . . . . 48
II Digital Communication Techniques 50
3 Pulse Amplitude Modulation (PAM) 51

3.1 BPAM . . . . . . . . . . . . . . . . . . . . . . . . 51
3.1.1 Signaling Waveforms . . . . . . . . . . . . . 51
3.1.2 Decomposition of the Transmitter . . . . . . 52
3.1.3 ML Decoding . . . . . . . . . . . . . . . . . 53
3.1.4 Vector Representation of the Transceiver . . 54
3.1.5 Symbol-Error Probability . . . . . . . . . . 55
3.1.6 Bit-Error Probability . . . . . . . . . . . . . 57
3.2 MPAM . . . . . . . . . . . . . . . . . . . . . . . . 58
3.2.3 ML Decoding . . . . . . . . . . . . . . . . . 62

iii
4 Pulse Position Modulation (PPM) 67

4.1 BPPM . . . . . . . . . . . . . . . . . . . . . . . . 67
4.1.3 ML Decoding . . . . . . . . . . . . . . . . . 69
4.2 MPPM . . . . . . . . . . . . . . . . . . . . . . . . 74
4.2.3 ML Decoding . . . . . . . . . . . . . . . . . 77
5 Phase-Shift Keying (PSK) 85

5.1 Signaling Waveforms . . . . . . . . . . . . . . . . . 85
5.2 Signal Constellation . . . . . . . . . . . . . . . . . 87
5.3 ML Decoding . . . . . . . . . . . . . . . . . . . . . 90
5.4 Vector Representation of the MPSK Transceiver . . 93
5.5 Symbol-Error Probability . . . . . . . . . . . . . . 94
6 Offset Quadrature Phase-Shift Keying (OQPSK)100

6.1 Signaling Scheme . . . . . . . . . . . . . . . . . . . 100
6.2 Transceiver . . . . . . . . . . . . . . . . . . . . . . 101
6.3 Phase Transitions for QPSK and OQPSK . . . . . 102

iv
7 Frequency-Shift Keying (FSK) 104

7.1 BFSK with Coherent Demodulation . . . . . . . . . 104
7.1.1 Signal Waveforms . . . . . . . . . . . . . . 104
7.1.2 Signal Constellation . . . . . . . . . . . . . 106
7.1.3 ML Decoding . . . . . . . . . . . . . . . . . 107
7.1.5 Error Probabilities . . . . . . . . . . . . . . 109
7.2 MFSK with Coherent Demodulation . . . . . . . . 110
7.2.4 Error Probabilities . . . . . . . . . . . . . . 113
7.3 BFSK with Noncoherent Demodulation . . . . . . . 114
7.3.3 Noncoherent Demodulator . . . . . . . . . . 116
7.3.4 Conditional Probability of Received Vector . 117
7.3.5 ML Decoding . . . . . . . . . . . . . . . . . 119
7.4 MFSK with Noncoherent Demodulation . . . . . . 121
7.4.3 Conditional Probability of Received Vector . 123
7.4.4 ML Decision Rule . . . . . . . . . . . . . . 123
7.4.5 Optimal Noncoherent Receiver for MFSK . . 124

v
8 Minimum-Shift Keying (MSK) 130

8.1 Motivation . . . . . . . . . . . . . . . . . . . . . . 130
8.2 Transmit Signal . . . . . . . . . . . . . . . . . . . 132
8.3 Signal Constellation . . . . . . . . . . . . . . . . . 135
8.4 A Finite-State Machine . . . . . . . . . . . . . . . 138
8.5 Vector Encoder . . . . . . . . . . . . . . . . . . . 140
8.6 Demodulator . . . . . . . . . . . . . . . . . . . . . 144
8.7 Conditional Probability Density of Received Sequence145
8.8 ML Sequence Estimation . . . . . . . . . . . . . . 146
8.9 Canonical Decomposition of MSK . . . . . . . . . . 149
8.10 The Viterbi Algorithm . . . . . . . . . . . . . . . . 150
8.11 Simplified ML Decoder . . . . . . . . . . . . . . . 155
8.13 Differential Encoding and Differential Decoding . . 161
8.14 Precoded MSK . . . . . . . . . . . . . . . . . . . . 162
8.15 Gaussian MSK . . . . . . . . . . . . . . . . . . . . 166
9 Quadrature Amplitude Modulation 169
10 BPAM with Bandlimited Waveforms 170
III Appendix 171

vi
A Geometrical Representation of Signals 172

A.1 Sets of Orthonormal Functions . . . . . . . . . . . 172
A.2 Signal Space Spanned by an Orthonormal Set of Functions173
A.3 Vector Representation of Elements in the Vector Space174
A.4 Vector Isomorphism . . . . . . . . . . . . . . . . . 175
A.5 Scalar Product and Isometry . . . . . . . . . . . . 176
A.6 Waveform Synthesis . . . . . . . . . . . . . . . . . 178
A.7 Implementation of the Scalar Product . . . . . . . . 179
A.8 Computation of the Vector Representation . . . . . 181
A.9 Gram-Schmidt Orthonormalization . . . . . . . . . 182
B Orthogonal Transformation of a White Gaussian Noise Vec

B.1 The Two-Dimensional Case . . . . . . . . . . . . . 185
B.2 The General Case . . . . . . . . . . . . . . . . . . 186
C The Shannon Limit 188

C.1 Capacity of the Band-limited AWGN Channel . . . 188
C.2 Capacity of the Band-Unlimited AWGN Channel . 191

vii
References
[1] J. G. Proakis and M. Salehi, Communication Systems Engi-
neering, 2nd ed. Prentice-Hall, 2002.
[2] B. Rimoldi, “A decomposition approach to CPM,” IEEE

Trans. Inform. Theory, vol. 34, no. 2, pp. 260–270, Mar. 1988.

1
Part I
Basic Concepts

2
1 The Basic Constituents of a Digi-

tal Communication System
ments
Block diagram of a digital communication system:
Binary Information Source Digital Transmitter

Uk X(t)
Information Source Vector Waveform
Source Encoder Encoder Modulator
1/Tb 1/T
Waveform
Channel
Binary Information Sink Digital Receiver

1/Tb 1/T
Information Source Vector Waveform
Sink Decoder Decoder Demodulator
Ûk Y (t)
• Time duration of one binary symbol (bit) Uk : Tb
• Time duration of one waveform X(t): T
• Waveform channel may introduce

- distortion,
- interference,
- noise
Goal: Signal of information source and signal at information sink

should be as similar as possible according to some measure.

3
Some properties:
• Information source: may be analog or digital
• Source encoder: may perform sampling, quantization and

compression; generates one binary symbol (bit) Uk ∈ {0, 1}
per time interval Tb
• Waveform modulator: generates one waveform x(t) per time

interval T
• Vector decoder: generates one binary symbol (bit) Ûk ∈

{0, 1} (estimates of Uk ) per time interval Tb
• Source decoder: reconstructs the original source signal

4
1.1 Discrete Information Sources

A discrete memoryless source (DMS) is a source that generates
a sequence U1, U2, . . . of independent and identically distributed
(i.i.d.) discrete random variables, called an i.i.d. sequence.
PSfrag replacements
DMS U1 , U 2 , . . .
Properties of the output sequence:
discrete U1, U2, . . . ∈ {a1, a2, . . . , aQ}.
memoryless U1, U2, . . . are statistically independent.
stationary U1, U2, . . . are identically distributed.
Notice: The symbol alphabet {a1, a2, . . . , aQ} has cardinality Q.

The value of Q may be infinite, the elements of the alphabet only
have to be countable.
Example: Discrete sources
(i) Output of the PC keyboard, SMS

(usually not memoryless).
(ii) Compressed file

(near memoryless).
(iii) The numbers drawn on a roulette table in a casino

(ought to be memoryless, but may not. . . ).

5
A DMS is statistically completely described by the probabilities

pU (aq ) = Pr(U = aq )
of the symbols aq , q = 1, 2, . . . , Q. Notice that
(i) pU (aq ) ≥ 0 for all q = 1, 2, . . . , Q, and

Q
X
(ii) pU (aq ) = 1.
q=1
As the sequence is i.i.d., we have Pr(Ui = aq ) = Pr(Uj = aq ).

For convenience, we may write pU (a) shortly as p(a).
Binary Symmetric Source
A binary memoryless source is a DMS with a binary symbol al-

phabet.
Remark:
Commonly used binary symbol alphabets are F2 := {0, 1} and
B := {−1, +1}. In the following, we will use F2.
A binary symmetric source (BSS) is a binary memoryless source

with equiprobable symbols,
1
pU (0) = pU (1) = .
2
Example: Binary Symmetric Source

All length-K sequences of a BSS have the same probability
K
Y 1 K
pU1U2...UK (u1u2 . . . uK ) = pU (ui) = .
i=1
2
3

6
1.2 Considered System Model

This course focuses on the transmitter and the receiver. Therefore,
we replace the information source by a BSS
Information Source U1 , U2 , . . .
PSfrag replacements Source Encoder
Binary
Decoder
Sink BSS U1 , U2 , . . .
and the information sink by a binary (information) sink.
PSfrag replacements Information Source Û1 , Û2 , . . .

BSS Sink Decoder
Encoder
U1 , U2 , . . . Binary Û1 , Û2 , . . .

Sink

7
Some Remarks
(i) There is no loss of generality resulting from these substitutions.
Indeed it can be demonstrated within Shannon’s Information The-
ory that an efficient source encoder converts the output of an in-
formation source into a random binary independent and uniformly
distributed (i.u.d.) sequence. Thus, the output of a perfect source
encoder looks like the output of a BSS.
(ii) Our main concern is the design of communication systems for
reliable transmission of the output symbols of a BSS. We will not
address the various methods for efficient source encoding.

8
Digital communication system considered in this course:
U X(t)
BSS Digital
Transmitter
bit rate 1/Tb rate 1/T Waveform

Channel
Binary Digital
Sink Û Receiver Y (t)
• Source: U = [U1, . . . , UK ], Ui ∈ {0, 1}
• Transmitter: X(t) = x(t, u)
• Sink: Û = [Û1, . . . , ÛK ], Ûi ∈ {0, 1}
• Bit rate: 1/Tb
• Rate (of waveforms): 1/T = 1/(KTb )
• Bit error probability

K
1 X
Pb = Pr(Uk 6= Ûk )
K
k=1
Objective: Design of efficient digital communication systems

(
small bit error probability and
Efficiency means
high bit rate 1/Tb.

9
Constraints and limitations:
• limited power
• limited bandwidth
• impairments (distortion, interference, noise) of the channel
Design goals:
• “good” waveforms
• low-complexity transmitters
• low-complexity receivers

10
1.3 The Digital Transmitter

PSfrag replacements
1.3.1 Waveform
PSfrag replacementsTable
Look-up
PSfrag replacements
Example: 4PPM (Pulse-Position Modulation)
eplacements
Set of four different waveforms, S = {s1(t), s2 (t), s3 (t), s4 (t)}:
s1 (t)
s1 (t) s2 (t) s3 (t) s4 (t)
s1 (t) s2 (t)
A A A A
s1 (t) s2 (t) s3 (t)
0 t 0 t 0 t 0 t
0 T 0 T 0 T 0 T
Each waveform X(t) ∈ S is addressed by vector [U1, U2] ∈ {0, 1}2 :

[U1, U2] 7→ X(t).
The mapping may be implemented by a waveform look-up table.
PSfrag replacements
4PPM
Transmitter
[00] 7→ s1(t)
[U1, U2] [01] 7→ s2(t) X(t)
PSfrag replacements [10] 7→ s3(t)
[11] 7→ s4(t)
Example: [00011100] is transmitted as

x(t)
A
0 t
0 T 2T 3T 4T

11
rag replacements
For an arbitrary digital transmitter, we have the following:
Digital
Transmitter
[U1, U2, . . . , UK ] X(t)
Waveform
Look-up
Table
Input Binary vectors of length K from the input set U:

[U1, U2, . . . , UK ] ∈ U := {0, 1}K .
The “duration” of one binary symbol Uk is Tb.
Output Waveforms of duration T from the output set S:

x(t) ∈ S := {s1(t), s2 (t), . . . , sM (t)}.
Waveform duration T means that for m = 1, 2, . . . , M ,
sm(t) = 0 for t ∈
/ [0, T ].
The look-up table maps each input vector [u1, . . . , uK ] ∈ U to one

waveform x(t) ∈ S. Thus the digital transmitter may defined by
a mapping U → S with
[u1, . . . , uK ] 7→ x(t).
The mapping is one-to-one and onto such that
M = 2K .
Relation between signaling interval (waveform duration) T and bit

interval (“bit duration”) Tb:
T = K · Tb .

12
1.3.2 Waveform Synthesis
The set of waveforms,
S := {s1(t), s2 (t), . . . , sM (t)},
spans a vector space. Applying the Gram-Schmidt orthogonaliza-

tion procedure, we can find a set of orthonormal functions
Sψ := {ψ1(t), ψ2 (t), . . . , ψD (t)} with D ≤ M
such that the space spanned by Sψ contains the space spanned

by S.
Hence, each waveform sm(t) can be represented by a linear combi-

nation of the orthonormal functions:
D
X
sm(t) = sm,i · ψi(t),
i=1
m = 1, 2, . . . , M . Each signal sm(t) can thus be geometrically

represented by the D-dimensional vector
sm = [sm,1 , sm,2 , . . . , sm,D ]T ∈ RD
with respect to the set Sψ .

Further details are given in Appendix A.

13
1.3.3 Canonical Decomposition
The digital transmitter may be represented by a waveform look-up

table:
u = [u1, . . . , uK ] 7→ x(t),
where u ∈ U and x(t) ∈ S.
From the previous section, we know how to synthesize the wave-
form sm(t) from sm (see also Appendix A) with respect to a set of
basis functions
Sψ := {ψ1(t), ψ2(t), . . . , ψD (t)}.
Making use of this method, we can split the waveform look-up
table into a vector look-up table and a waveform synthesizer:
u = [u1, . . . , uK ] 7→ x = [x1, x2, . . . , xD ] 7→ x(t),
where
u ∈ U = {0, 1}K ,
x ∈ X = {s1, s2, . . . , sD } ⊂ RD ,
eplacements
x(t) ∈ S = {s1(t), s2 (t), . . . , sM (t)}.
This splitting procedure leads to the sought canonical decomposi-

tion of a digital transmitter:
Vector Waveform
Encoder Modulator
X1
Vector
[U1, . . . , UK ] Look-up Table ψ1(t) X(t)
[0 . . . 0] 7→ s1
XD
[1 . . . 1] 7→ sM
ψD (t)

PSfrag replacements
PSfrag replacements
14
PSfrag replacements
√
√ 2/ T
placements
eplacementsExample: 4PPM 2/ T
√
or EncoderThe four T
2/ orthonormal basis functions are ψ1(t)
Vector ψ1(t)
ψ1(t) ψ2(t) ψ3(t) ψ4(t)
√ ψ1(t) ψ2(t)
2/ T
Modulator ψ2(t) ψ3(t)
0 t 0 t 0 t 0 t
0 T 0 T 0 T 0 T
The canonical
√ decomposition of the receiver is the following, where
√
Es = A · T /2.
Vector Waveform
Encoder Modulator
X1
Vector
Look-up Table ψ1(t)
√ X2
T
[U1, U2] [00] 7→ [ Es, 0, 0, 0]
√ ψ2(t) X(t)
[01] 7→ [0, Es, 0, 0]T X3
√
[10] 7→ [0, 0, Es, 0]T
√ ψ3(t)
[11] 7→ [0, 0, 0, Es]T X4
ψ4(t)
Remarks:
- The
R signal energy of each basis function is equal to one:
ψd(t)dt = 1 (due to their construction).
- The basis functions are orthonormal (due to their construction).
√
- The value Es is chosen such that the synthesized signals have
the same energy as the original signals, namely Es (compare
Example 3).
3

15
1.4 The Additive White Gaussian-Noise

PSfrag Channel
replacements
W (t)
X(t) Y (t)
The additive white Gaussian noise (AWGN) channel-model is

widely used in communications. The transmitted signal X(t) is
superimposed by a stochastic noise signal W (t), such that the
transmitted signal reads
Y (t) = X(t) + W (t).
The stochastic process W (t) is a stationary process with the

following properties:
(i) W (t) is a Gaussian process, i.e., for each time t, the

samples v = w(t) are Gaussian distributed with zero mean
and variance σ 2:
1 v2
pV (v) = √ exp − 2 .
2πσ 2 2σ
(ii) W (t) has a flat power spectrum with height N0/2:

N0
SW (f ) =
2
(Therefore it is called “white”.)
(iii) W (t) has the autocorrelation function

N0
RW (τ ) = δ(τ ).
2

16
Remarks
1. Autocorrelation function and power spectrum:

RW (τ ) = E W (t)W (t + τ ) , SW (f ) = F RW (τ ) .
2. For Gaussian processes, weak stationarity implies strong sta-

tionarity.
3. An WGN is an idealized process without physical reality:
• The process is so “wild” that its realizations are not

ordinary functions of time.
• Its power is infinite.
However, a WGN is a useful approximation of a noise with a

flat power spectrum in the bandwidth B used by a commu-
nication system:
PSfrag replacements Snoise(f )

N0/2
f
−f0 f0
B B
4. In satellite communications, W (t) is the thermal noise of the

receiver front-end. In this case, N0/2 is proportional to the
squared temperature.

17
1.5 The Digital Receiver

Consider a transmission system using the waveforms
n o
S = s1(t), s2 (t), . . . , sM (t)
with sm(t) = 0 for t ∈

/ [0, T ], m = 1, 2, . . . , M , i.e., with dura-
tion T .
Assume transmission over an AWGN channel, such that
y(t) = x(t) + w(t),
x(t) ∈ S.
1.5.1 Bank of Correlators
The transmitted waveform may be recovered from the received

waveform y(t) by correlating y(t) with all possible waveforms:
eplacements
cm = hy(t), sm(t)i,
m = 1, 2, . . . , M . Based on the correlations [c1, . . . , cM ], the wave-

form ŝ(t) with the highest correlation is chosen and the correspond-
ing (estimated) source vector û = [û1, . . . , ûK ] is output.
RT c1
0 (.)dt estimation
s1 (t) of x̂(t) [û1, . . . , ûK ]
y(t) and
table
RT cM look-up
0 (.)dt
sM (t)
Disadvantage: High complexity.

18
1.5.2 Canonical Decomposition
Consider a set of orthonormal functions

n o
Sψ = ψ1(t), ψ2(t), . . . , ψD (t)
obtained by applying the Gram-Schmidt procedure to

s1(t), s2 (t), . . . , sM (t). Then,
D
X
sm(t) = sm,d · ψd(t),
d=1
m = 1, 2, . . . , M . The vector
sm = [sm,1 , sm,2 , . . . , sm,D ]T
entirely determines sm(t) with respect to the orthonormal set Sψ .

The set Sψ spans the vector space
n D
X o
D
Sψ := s(t) = siψi(t) : [s1, s2, . . . , sD ] ∈ R .
i=1
Basic Idea
• The received waveform y(t) may contain components outside

of Sψ . However, only the components inside Sψ are relevant,
as x(t) ∈ Sψ .
• Determine the vector representation y of the components

of y(t) that are inside Sψ .
• This vector y is sufficient for estimating the transmitted

waveform.

19
Canonical decomposition of the optimal receiver for

the AWGN channel
Appendix A describes two ways to compute the vector represen-

tation of a waveform. Accordingly, the demodulator may be im-
plemented in two ways, leading to the following two receiver struc-
tures.
replacements
Correlator-based Demodulator and Vector Decoder
RT Y1
0 (.)dt
Y (t) Vector [Û1, . . . , ÛK ]
ψ1(t)
Decoder
RT YD
0 (.)dt
ψD (t)
replacements
Matched-filter based Demodulator and Vector De-
coder
MF Y1
ψ1(T − t)
Y (t) Vector [Û1, . . . , ÛK ]
Decoder
MF YD
ψD (T − t)
T
The optimality of the receiver structures is shown in the following

sections.

20
1.5.3 Analysis of the correlator outputs
Assume that the waveform sm(t) is transmitted, i.e.,

x(t) = sm(t).
The received waveform is thus
y(t) = x(t) + w(t) = sm(t) + w(t).
The correlator outputs are
ZT
yd = y(t)ψd(t)dt
0
ZT i

= sm(t) + w(t) ψd(t)dt
0
ZT ZT
= sm(t)ψd(t)dt + w(t)ψd (t)dt
|0 {z } |0 {z }
sm,d wd
= sm,d + wd,
d = 1, 2, . . . , D.
Using the notation
y = [y1, . . . , yD ]T,
ZT
w = [w1, . . . , wD ]T, wd = w(t)ψd(t)dt,
0
we can recast the correlator outputs in the compact form
y = sm + w.
The waveform channel between x(t) = sm(t) and y(t) is thus trans-
formed into a vector channel between x = sm and y.

21
Remark:
Since the vector decoder “sees” noisy observations of the vectors
sm, these vectors are also called signal points with respect to Sψ ,
and the set of signal points is called the signal constellation.
From Appendix A, we know that sm(t) entirely characterizes sm(t).

The vector y, however, does not represent the complete wave-
form y(t); it represents only the waveform
D
X
y 0(t) = ydψd(t)
d=1
which is the “part” of y(t) within the vector space spanned by the
set Sψ of orthonormal functions, i.e.,
y 0(t) ∈ Sψ .
The “remaining part” of y(t), i.e., the part outside of Sψ , reads
y ◦(t) = y(t) − y 0(t)

XD
= y(t) − ydψd(t)
d=1
D
X
= sm(t) + w(t) − [sm,d + wd]ψd(t)
d=1
D
X D
X
= sm(t) − sm,d ψd(t) +w(t) − wdψd(t)
d=1 d=1
| {z }
=0
D
X
= w(t) − wdψd(t).
d=1

22
Observations
• The waveform y ◦(t) contains no information about sm(t).
• The waveform y 0(t) is completely represented by y.
Therefore, the receiver structures do not lead to a loss of informa-

tion, and so they are optimal.

23
1.5.4 Statistical Properties of the Noise Vector
The components of the noise vector

w = [w1, . . . , wD ]T
are given by
ZT
wd = w(t)ψd(t)dt,
0
d = 1, 2, . . . , D. The random variables Wd are Gaussian as they
result from a linear transformation of the Gaussian process W (t).
Hence the probability density function of Wd is of the form
1 (wd − µd)2
pWd (wd) = p exp − .
2πσd2 2σd2
where the mean value µd and the variance σd2 are given by
2
2

µd = E W d , σd = E (Wd − µd) .
These values are now determined.
Mean value
Using the definition of Wd and taking into account that the pro-
cess W (t) has zero mean, we obtain
hZT i
E Wd = E W (t)ψd (t)dt
0
ZT

= E W (t) ψd(t)dt
0
= 0.
Thus, all random variables Wd have zero mean.

24
Variance
For computing the variance, we use µd = 0, the definition of Wd,

and the autocorrelation function of W (t),
N0
RW (τ ) = δ(τ ).
2
We obtain the following chain of equations:

E (Wd − µd)2 = E Wd2
ZT ZT

=E W (t)ψd (t)dt W (t0)ψd(t0)dt0
0 0
ZT ZT

= E W (t)W (t0 ) ψd(t)ψd(t0)dtdt0
| {z }
0 0 0
RW (t − t)
ZT ZT
N0
= δ(t − t0)ψd(t0)dt0 ψd(t)dt
2
0
|0 {z }
δ(t) ∗ ψd(t) = ψd(t)
ZT
N0 N0
= ψd2(t)dt = .
2 2
0
Distribution of the noise vector components
Thus we have µd = 0 and σd2 = N0/2, and the distributions read

1 wd2
pWd (wd) = √ exp − .
πN0 N0
for d = 1, 2, . . . , D. Notice that the components Wd are all iden-
tically distributed.

25
As the Gaussian distribution is very important, the shorthand

notation
A ∼ N (µ, σ 2)
is often used. This notation means: The random variable A is
Gaussian distributed with mean value µ and variance σ 2.
Using this notation, we have for the components of the noise vector
Wd ∼ N (0, N0/2),
d = 1, 2, . . . , D.
Distribution of the noise vector
Using the same reasoning as before, we can conclude that W is a

Gaussian noise vector, i.e.,
W ∼ N (µ, Σ),
where the expectation µ ∈ RD×1 and the covariance matrix Σ ∈
RD×D are given by

µ=E W , Σ = E (W − µ)(W − µ)T .
From the previous considerations, we have immediately

µ = 0.
Using a similar approach as for computing the variances, the sta-
tistical independence of the components of W can be shown. That
is,
N0
E Wi Wj = δi,j ,
2
where δi,j denotes the Kronecker symbol defined as
(
1 for i = j,
δi,j =
0 otherwise.

26
As a consequence, the covariance matrix of W is

N0
Σ= I D,
2
where I D denotes the D × D identity matrix.
To sum up, the noise vector W is distributed as

N
0
W ∼ N 0, I D .
2
As the components of W are statistically independent, the prob-

ability density function of W can be written as
D
Y
pW (w) = pWd (wd)
d=1
D
Y 1 wd2
= √ exp −
πN 0 N0
d=1
D
1 D 1 X 2
= √ exp − wd
πN0 N0
d=1
−D/2 1 2

= πN0 exp − kwk
N0
with kwk denoting the Euclidean norm
D
X 1/2
kwk = wd2
d=1
of the D-dimensional vector w = [w1, . . . , wD ]T.
The independency of the components of W give rise to the AWGN

vector channel.

27
1.6 The AWGN Vector Channel

The additive white Gaussian noise vector channel is defined as a
channel with input vector
X = [X1, X2 , . . . , XD ]T,
output vector
Y = [Y1, Y2, . . . , YD ]T,
and the relation
Y = X + W,
where
W = [W1, W2, . . . , WD ]T
is a vector of D independent random variables distributed as
Wd ∼ N (0, N0/2).
PSfrag replacements
W
X Y
(As nobody would have been expected differently. . . )

28
1.7 Vector Representation of a Digital

Communication System
Consider a digital communication system transmitting over
an AWGN channel. Let s1(t), s2 (t), . . . , sM (t) , sm(t) = 0 for
t∈
/ [0, T ], m = 1, 2, . . . , M , denote the waveforms. Let further

Sψ = ψ1(t), ψ2(t), . . . , ψD (t)
denote a set of orthonormal functions for these waveforms.

The canonical decompositions of the digital transmitter and the
digital receiver convert the block formed by the waveform modu-
lator, the AWGN channel, and the waveform demodulator into an
AWGN vector channel.
placements
Modulator, AWGN channel, demodulator

RT
X1 W (t) (.)dt Y1
0
X(t) Y (t)
ψ1(t) ψ1(t)
RT
XD 0 (.)dt YD
ψD (t) ψD (t)
AWGN vector channel

PSfrag replacements
[W1, . . . , WD ]
[X1, . . . , XD ] [Y1, . . . , YD ]

29
placements
Using this equivalence, the communication system operating over a
(waveform) AWGN channel can be represented as a communication
system operating over an AWGN vector channel:
U X
BSS Vector
Encoder
Binary Vector AWGN

Sink Decoder Vector
Û Y Channel
• Source vector: U = [U1, . . . , UK ], Ui ∈ {0, 1}
• Transmit vector: X = [X1, . . . , XD ]
• Receive vector: Y = [Y1, . . . , YD ]
• Estimated source vector: Û = [Û1, . . . , ÛK ], Ui ∈ {0, 1}
• AWGN vector: W = [W1, . . . , WD ],

N
0
W ∼ N 0, I D
2
This is the vector representation of a digital communication system

transmitting over an AWGN channel.

30
For estimating the transmitted vector x, we are interested in the

likelihood of x for the given received vector y:
pY |X (y|x) = pW (y − x)
−D/2 1 2

= πN0 exp − ky − xk .
N0
Symbolically, we may write
N0
Y |X = sm ∼ N sm , I D .
2
Remark:
The likelihood of x is the conditional probability density function
of y given x, and it is regarded as a function of x.
Example: Signal buried in AWGN

Consider D = 3 and x = sm. Then the “Gaussian cloud”
around sm may look like this:
ψ2
sm
PSfrag replacements
ψ1
ψ3
3

31
1.8 Signal-to-Noise Ratio for the AWGN

Channel
Consider a digital communication system using the set of wave-
forms n o
S = s1(t), s2 (t), . . . , sM (t) ,
sm(t) = 0 for t ∈
/ [0, T ], m = 1, 2, . . . , M , for transmission over
an AWGN channel with power spectrum
N0
SW (f ) = .
2
Let Esm denote the energy of waveform sm(t), and let E s denote
the average waveform energy:
ZT M
2 1 X
E sm = sm(t) dt, Es = Es
M m=1 m
0
Thus, E s is the average energy per waveform x(t) and also

the average energy per vector x (due to the one-to-one correspon-
dence.)
The average signal-to-noise ratio (SNR) per waveform is

then defined as
Es
γs = .
N0

32
On the other hand, the average energy per source bit uk is of

interest. Since K source bits uk are transmitted per waveform, the
average energy per bit is
1
Eb = E s.
K
Accordingly, the average signal-to-noise ratio (SNR) per

bit is then defined as
Eb
γb = .
N0
As K bits are transmitted per waveform, we have the relation

1
γb = γs .
K
Remark:
The energies and signal-to-noise ratios usually denote received val-
ues.

33
2 Optimum Decoding for the

AWGN Channel
placements2.1 Optimality Criterion

In the previous chapter, the vector representation of a digital
transceiver transmitting over an AWGN channel was derived.
U X
BSS Vector
Encoder
Binary Vector AWGN

Sink Decoder Vector
Û Y Channel
Nomenclature
u = [u1, . . . , uK ] ∈ U, x = [x1, . . . , xD ] ∈ X,
û = [û1, . . . , ûK ] ∈ U, y = [y1, . . . , yD ] ∈ RD ,
U = {0, 1}K , X = {s1, . . . , sM }.
U : set of all binary sequences of length K,

X : set of vector representations of the waveforms.
The set X is called the signal constellation, and its elements

are called signals, signal points, or modulation symbols.
For convenience, we may refer to uk as bits and to x as symbols.
(But we have to be careful not to cause ambiguity.)

34
As the vector representation implies no loss of information, re-

covering the transmitted block u = [u1, . . . , uK ] from the re-
ceived waveform y(t) is equivalent to recovering u from the vec-
tor y = [y1, . . . , yD ].
Functionality of the vector encoder

enc : u →
7 x = enc(u)
U → X
Functionality of the vector decoder

decu : y →
7 û = decu(y)
RD → U
Optimality criterion
A vector decoder is called an optimal vector decoder if it

minimizes the block error probability Pr(U 6= Û ).
2.2 Decision Rules

Since the encoder mapping is one-to-one, the functionality of the
vector decoder can be decomposed into the following two opera-
tions:
(i) Modulation-symbol decision:

dec : y →
7 x̂ = dec(y)
RD → X
(ii) Encoder inverse:

enc−1 : x̂ 7→ û = enc−1(x̂)
X → U

placements 35
Thus, we have the following chain of mappings:

dec enc−1
y 7→ x̂ 7→ û.
Block Diagram
U Vector X
Encoder
U enc X
W
RD
Û Encoder X̂ Mod. Symbol Y
Inverse Decision
U enc−1 X dec RD
Vector Decoder
ag replacements
A decision rule is a function which maps each observation y ∈
RD to a modulation symbol x ∈ X, i.e., to an element of the signal
constellation X.
Example: Decision rule
RD
y (1)
y (2) s 1 , s 2 , . . . , sM =X
y (3)

36
2.3 Decision Regions

A decision rule dec : RD → X defines a partition of RD into deci-
sion regions
ag replacements
D
Dm := y ∈ R : dec(y) = sm ,
m = 1, 2, . . . , M .
Example: Decision regions
RD
D1
.
..

s 1 , s 2 , . . . , sM =X
DM D2
Since the decision regions D1, D2, . . . , DM form a partition, they

have the following properties:
(i) They are disjoint:
Di ∩ Dj = ∅ for i 6= j.
(ii) They cover the whole observation space RD :
D1 ∪ D 2 ∪ · · · ∪ D M = R D .
Obviously, any partition of RD defines a decision rule as well.

37
2.4 Optimum Decision Rules

Optimality criterion for decision rules
Since the mapping enc : U → X of the vector encoder is one-to-

one, we have
Û 6= U ⇔ X̂ 6= X.
Therefore, the decision rule of an optimal vector decoder minimizes
the modulation-symbol error probability Pr(X̂ 6= X).
Such a decision rule is called optimal.
Sufficient condition for an optimal decision rule
Let dec : RD → X be a decision rule which minimizes the

a-posteriori probability of making a decision error,

Pr dec(y) 6= X Y = y ,
| {z }
x̂
for any observation y. Then dec is optimal.
Remark:
The a-posteriori probability of a decoding error is
Sfrag replacements
the probability of the event {dec(y) 6= X} conditioned on the
event {Y = y}:
X̂ = dec(y) Mod. Symbol Y =y X

Decision
dec

38
Proof:
Let dec : RD → X denote any decision rule, and let
decopt : RD → X denote an optimal decision rule. Then we
have
Z

Pr dec(Y ) 6= X = Pr dec(Y ) 6= X Y = y · pY (y)dy

RD
Z

= Pr dec(y) 6= X Y = y · pY (y)dy
RD
(?)
Z

≥ Pr decopt (y) 6= X Y = y · pY (y)dy

RD
Z

= Pr decopt(Y ) 6= X Y = y · pY (y)dy
RD

= Pr decopt (Y ) 6= X .
(?): by definition of decopt , we have

Pr dec(y) 6= X Y = y ≥ Pr decopt (y) 6= X Y = y

for all x ∈ X.

39
2.5 MAP Symbol Decision Rule

Starting with the necessary condition for an optimal decision rule
given above, we derive now a method to make an optimal decision.
The corresponding decision rule is called the maximum a-posteriori
(MAP) decision rule.
Consider an optimal decision rule
dec : y 7→ x̂ = dec(y),
where the estimated modulation-symbol is denoted by x̂.

As the decision rule is assumed to be optimum, the probability
Pr(x̂ 6= X|Y = y)
is minimal.
For any x ∈ X, we have
Pr(X 6= x|Y = y) = 1 − Pr(X = x|Y = y)

= 1 − pX|Y (x|y)
pY |X (y|x) pX (x)
=1− ,
pY (y)
where Bayes’ rule has been used in the last line. Notice that
Pr(X = x|Y = y) = pX|Y (x|y)
is the a-posteriori probability of {X = x} given {Y = y}.

40
Using the above equalities, we can conclude that
Pr(X 6= x|Y = y) is minimal

⇔ pX|Y (x|y) is maximal
⇔ pY |X (y|x) pX (x) is maximal,
since pY (y) does not depend on x.
Thus, the estimated modulation-symbol x̂ = dec(y) resulting from

the optimal decision rule satisfies the two equivalent conditions
pX|Y (x̂|y) ≥ pX|Y (x|y) for all x ∈ X;

pY |X (y|x̂) pX (x̂) ≥ pY |X (y|x) pX (x) for all x ∈ X.
These two conditions may be used to define the optimal decision

rule.
Maximum a-posteriori (MAP) decision rule:

n o
x̂ = decMAP (y) := argmax pX|Y (x|y)
x∈X
n o
:= argmax pY |X (y|x) pX (x) .
x∈X
The decision rule is called the MAP decision rule as the selected
modulation-symbol x̂ maximizes the a-posteriori probability den-
sity function pX|Y (x|y). Notice that the two formulations are
equivalent.

41
2.6 ML Symbol Decision Rule

If the modulation-symbols at the output of the vector encoder are
equiprobable, i.e., if
1
Pr(X = sm) =
M
for m = 1, 2, . . . , M , the MAP decision rule reduces to the follow-
ing decision rule.
Maximum likelihood (ML) decision rule:
n o
x̂ = decML(y) := argmax pY |X (y|x) .
x∈X
In this case, the the selected modulation-symbol x̂ maximizes the

likelihood function pY |X (y|x) of x.

42
2.7 ML Decision for the AWGN Channel

Using the canonical decomposition of the transmitter and of the
receiver, we obtain the AWGN vector-channel:
PSfrag replacements

N0
W ∼ N 0, 2 I D
X∈X Y ∈ RD
The conditional probability density function of Y given

X = x ∈ X is
−D/2 1 2

pY |X (y|x) = πN0 exp − ky − xk .
N0
Since exp(.) is a monotonically increasing function, we obtain the

following formulation of the ML decision rule
n o
x̂ = decML(y) = argmax pY |X (y|x)
x∈X
n1 o
2
= argmax exp − ky − xk
x∈X N0
n o
2
= argmin ky − xk .
x∈X
Thus, the ML decision rule for the AWGN vector channel selects
the vector in the signal constellation X that is the closest one to the
observation y, where the distance measure is the squared Euclidean
distance.

43
Implementation of the ML decision rule
The squared Euclidean distance can be expanded as
ky − xk2 = kyk2 − 2hy, xi + kxk2 .
The term kyk2 is irrelevant for the minimization with respect to x.
Thus we get the following equivalent formulation of the ML decision

rule:
n o
2
x̂ = decML(y) = argmin −2hy, xi + kxk
x∈X
n o
2
= argmax 2hy, xi − kxk
x∈X
n o
1 2
= argmax hy, xi − 2 kxk .
x∈X
PSfrag replacements
Block diagram of the ML decision rule for the AWGN vector-
channel:
select
y s1 1 2 largest
2 ks1 k
x̂
.. .. one
. .
sM 1 2
2 ksM k
PSfragVector
replacements
correlator:
D
PD
a∈R ha, bi = d=1 ad bd ∈R
b ∈ RD

44
Remark:
Some signal constellations X have the property that kxk2 has the
same value for all x ∈ X. In these cases (but only then), the ML
decision rule simplifies further to
n o
x̂ = decML(y) = argmax hy, xi .
x∈X
2.8 Symbol-Error Probability

The error probability for modulation-symbols X, or for short, the
symbol-error probability is defined as
Ps = Pr(X̂ 6= X) = Pr(dec(Y ) 6= X).
The symbol-error probability coincides with the block error prob-

ability

Ps = Pr(Û 6= U ) = Pr [Û1, . . . , ÛK ] 6= [U1, . . . , UK ] .
We compute Ps in two steps:
(i) Compute the conditional symbol-error probability given

X = x for any x ∈ X:
Pr(X̂ 6= X|X = x).
(ii) Average over the signal constellation:

X
Pr(X̂ 6= X) = Pr(X̂ 6= X|X = x) · Pr(X = x).
x∈X

45
Remember: Decision regions for the symbol decision rule:
RD
D1
.
..

s 1 , s 2 , . . . , sM =X
DM D2
Assume that x = sm is transmitted. Then,

Pr(X̂ 6= X|X = sm) = Pr(X̂ 6= sm|X = sm)
= Pr(Y ∈ / Dm|X = sm)
= 1 − Pr(Y ∈ Dm|X = sm).
For the AWGN vector channel, the conditional probability density

function of Y given X = sm is
−D/2 1
pY |X (y|sm) = πN0 exp − ky − smk2 .
N0
Hence,
Z
Pr(Y ∈ Dm|X = sm) = pY |X (y|sm)dy =
Dm
Z
−D/2 1
= πN0 exp − ky − smk2 dy.
N0
Dm
The symbol-error probability is then obtained by averaging

over the signal constellation:
M
X
Pr(X̂ 6= X) = 1 − Pr(Y ∈ Dm|X = sm) · Pr(X = sm).
m=1

46
If the modulation symbols are equiprobable, the formula simplifies

to
M
1 X
Pr(X̂ 6= X) = 1 − Pr(Y ∈ Dm|X = sm).
M m=1
2.9 Bit-Error Probability

The error probability for the source bits Uk , or for short, the bit-
error probability is defined as
K
1 X
Pb = Pr(Ûk 6= Uk ),
K
k=1
i.e., as the average number of erroneous bits in the block

Û = [Û1, . . . , ÛK ].
Relation between the bit-error probability and the

symbol-error probability
Clearly, the relation between Pb and Ps depends on the encoding

function
enc : u = [u1, . . . , uk ] 7→ x = enc(u).
The analytical derivation of the relation between Pb and Ps is
usually cumbersome. However, we can derive an upper bound
and a lower bound for Pb when Ps is given (and also vice versa).
These bounds are valid for any encoding function.
The lower bound provides a guideline for the design of a “good”

encoding function.

47
Lower bound:
If a symbol is erroneous, there is at least one erroneous bit in the
block of K bits. Therefore
1
Pb ≥ K Ps .
Upper bound:
If a symbol is erroneous, there are at most K erroneous bits in the
block of K bits. Therefore
Pb ≤ P s .
Combination:
Combining the lower bound and the upper bound yields
1
K Ps ≤ Pb ≤ Ps .
An efficient encoder should achieve the lower bound for the bit-
error probability, i.e.,
Pb ≈ K1 Ps.
Remark:
The bounds can also be obtained by considering the union bound
of the events
{Û1 6= U1}, {Û2 6= U2}, . . . , {ÛK 6= UK },
i.e, by relating the probability

Pr {Û1 6= U1} ∪ {Û2 6= U2} ∪ . . . ∪ {ÛK 6= UK }
to the probabilities

Pr {Û1 6= U1} , Pr {Û2 6= U2} , . . . , Pr {ÛK 6= UK } .

48
2.10 Gray Encoding

Sfrag replacements
In Gray encoding, blocks of bits, u = [u1, . . . , uK ], which differ in
only one bit uk are assigned to neighboring vectors (modulation
symbols) x in the signal constellation X.
Example: 4PAM (Pulse Amplitude Modulation)

Signal constellation: X = {[−3], [−1], [+1], [+3]}.
00 01 11 10
s1 −2 s2 s3 2 s4
Motivation
When a decision error occurs, it is very likely that the transmitted
symbol and the detected symbol are neighbors. When the bit
blocks of neighboring symbols differ only in one bit, there will be
only one bit error per symbol error.
Effect
Gray encoding achieves
1
Pb ≈ K Ps
for high signal-to-noise ratios (SNR).

49
Sketch of the proof:

We consider the case K = 2, i.e., U = [U1, U2].
Ps = Pr(X̂ 6= X)

= Pr [Û1, Û2] 6= [U1, U2]

= Pr {Û1 6= U1} ∪ {Û2 6= U2}
= Pr(Û1 6= U1) + Pr(Û2 6= U2)

| {z }
2 · Pb

− Pr {Û1 6= U1} ∩ {Û2 6= U2} .
| {z }
(?)
The term (?) is small compared to the other two terms if the SNR
is high. Hence
Ps ≈ 2 · P b .

50
Part II
Digital Communication
Techniques

51
3 Pulse Amplitude Modulation

(PAM)
In pulse amplitude modulation (PAM), the information is conveyed
by the amplitude of the transmitted waveforms.
3.1 BPAM
PAM with only two waveforms is called binary PAM (BPAM).
3.1.1 Signaling Waveforms

PSfrag replacements
Set of two rectangular
waveforms
with amplitudes A and −A over
t ∈ [0, T ], S := s1(t), s2 (t) :
s1 (t) s2 (t)
A A
0 0
T t T t
−A −A
As |S| = 2, one bit is sufficient to address the waveforms (K = 1).

Thus, the BPAM transmitter performs the mapping
U 7→ x(t)
with
U ∈ U = {0, 1} and x(t) ∈ S.
Remark: In the general case, the basic waveform may also have
another shape.

52
3.1.2 Decomposition of the Transmitter
Gram-Schmidt procedure
1
ψ1(t) = √ s1(t)
E1
with
ZT ZT
2
E1 = s1(t) dt = A2dt = A2 · T.
0 0
Thus (
√1 for t ∈ [0, T ] ,
ψ1(t) = T
0 otherwise.
Both s1(t) and s2(t) are a linear combination of ψ(t):

p √
s1(t) = + Es ψ1(t) = +A T ψ1(t),
p √
s2(t) = − Es ψ1(t) = −A T ψ1(t),
√ √
where Es = A T and Es = E1 = E2 . (The two waveforms
have the same energy.)
Signal constellation
Vector representation of the two waveforms:
p
s1(t) ↔ s1 = s1 = + Es
p
PSfrag replacements s2(t) ↔ s2 = s2 = − Es
√ √
Thus the signal constellation is X = − Es, + Es .
√ √
Es Es
ψ1
s2 0 s1

53
Vector Encoder
The vector encoder is usually defined as
( √
+ Es for u = 0,
x = enc(u) = √
− Es for u = 1.
The signal constellation comprises only two waveforms, and thus

per symbol and per bit, the average transmit energies and the
SNRs are the same:
Eb = E s , γs = γb .
3.1.3 ML Decoding
ML symbol decision rule

( √
2 + Es for y ≥ 0,
decML(y) = argmin
√ √
ky − xk = √
x∈{− Es ,+ Es } − Es for y < 0.
PSfrag replacements
Decision regions
D2ψ=
1 {y ∈ R : y < 0}, D1 = {y ∈ R : y ≥ 0}.
D2 D1
y
√ √
s2 = − E s 0 s1 = + E s
Remark: The decision rule

( √
+ Es for y > 0,
decML(y) = √
− Es for y ≤ 0.
is also a ML decision rule.

54
The mapping of the encoder inverse reads

( √
0 for x̂ = + Es ,
û = enc-1 (x̂) = √
1 for x̂ = − Es.
placements
Combining the (modulation) symbol decision rule and the encoder
inverse yields the ML decoder for BPAM:
(
0 for y ≥ 0,
û =
1 for y < 0.
3.1.4 Vector Representation of the Transceiver
BPAM transceiver:
U Encoder X X(t)
BSS √
0 7→ + √Es
1 7→ − Es W (t)
placements ψ1(t)
Û Decoder Y RT Y (t)
Sink √1 (.)dt
D1 7→ 0 T 0
D2 7→ 1
RT
√1 (.)dt
T 0
Vector representation:
U Encoder X
BSS √
0 7→ + √Es
1 7→ − Es
W ∼ N (0, N0 /2)
Û Decoder Y
Sink D1 7→ 0
D2 7→ 1

55
3.1.5 Symbol-Error Probability
Conditional probability densities
p 1 1 p
pY |X (y| + Es) = √ exp − |y − Es|2
πN0 N0
p 1 1 p
2

pY |X (y| − Es) = √ exp − |y + Es|
πN0 N0
Conditional symbol-error probability

√
Assume that X = x = + Es is transmitted.
p p
Pr(X̂ 6= X|X = + Es) = Pr(Y ∈ D2|X = + Es)
p
= Pr(Y < 0|X = + Es)
Z0
1 1 p
2

=√ exp − |y − Es| dy
πN0 N0
−∞
√
− 2Es /N0
Z
1
=√ exp − 12 z 2 dz,
2π
−∞
where the last equality results from substituting

p p
z = (y − Es)/ N0/2.
The Q-function is defined as

Z∞
1
Q(v) := q(z)dz, q(z) = √ exp − 12 z 2 .
2π
v

56
The function q(z) denotes a Gaussian pdf with zero mean and vari-
ance 1; i.e., it may be interpreted as the pdf of a random variable
Z ∼ N (0, 1).
Using this definition, the conditional symbol-error probabilities

may be written as
p r 2E
s
Pr(X̂ 6= X|X = + Es) = Q ,
N 0
p r 2E
s
Pr(X̂ 6= X|X = − Es) = Q .
N0
The second equation follows from symmetry.
Symbol-error probability
The symbol-error probability results from averaging over the con-
ditional symbol-error probabilities:
Ps = Pr(X̂ 6= X)
p p
= Pr(X = + Es) · Pr(X̂ 6= X|X = + Es)
p p
+ Pr(X = − Es) · Pr(X̂ 6= X|X = − Es)
r 2E p
s
=Q = Q( 2γs)
N0
√ √
as Pr(X = + Es) = Pr(X = − Es) = 1/2.
Alternative method to derive the conditional symbol-

error probability
√
Assume again that X = Es is transmitted. Instead of consider-
ing the conditional pdf of Y , we directly use the pdf of the white
Gaussian noise W .

57
p p
Pr(X̂ 6= X|X = + Es) = Pr(Y < 0|X = + Es)
p p
= Pr( Es + W < 0|X = + Es)
p p
= Pr(W < − Es|X = + Es)
p
= Pr(W < − Es)
0
p
= Pr(W < − 2Es/N0)
p
= Q( 2Es/N0),
p
where we substituted W 0 = W/ N0/2; notice that
W 0 ∼ N (0, 1). A plot of the symbol-error probability can found
in [1, p. 412].
{Y < 0}) is transformed into an equivalent
The initial event (here p
event (here {W 0 < − 2Es/N0}), of which the probability can
easily be calculated. This method is frequently applied to deter-
mine symbol-error probabilities.
3.1.6 Bit-Error Probability
In BPAM, each waveform conveys one source bit. Accordingly,

each symbol-error leads to exactly one bit-error, and thus
Pb = Pr(Û 6= U ) = Pr(X̂ 6= X) = Ps.
Due to the same reason, the transmit energy per symbol is equal
to the transmit energy per bit, and so also the SNR per symbol is
equal to the SNR per bit:
Therefore, the bit-error probability (BEP) of BPAM results as

p
Pb = Q( 2γb).
Notice: Pb ≈ 10−5 is achieved for γb ≈ 9.6 dB.

58
3.2 MPAM
In the general case of M -ary pulse amplitude modulation (MPAM)
with M = 2K , input blocks of K bits modulate the amplitude of
a unique pulse.
Waveform encoder:
u = [u1, . . . , uK ] →
7 x(t)
K

U = {0, 1} → S = s1(t), . . . , sM (t) ,
where
(
(2m − M − 1) A g(t) for t ∈ [0, T ],
sm(t) =
0 otherwise,
A ∈ R, and g(t) is a predefined pulse of duration T ,
g(t) = 0 for t ∈
/ [0, T ].
Values of the factors (2m − M − 1)A that are weighting g(t):

−(M − 1)A, −(M − 3)A, . . . , −A, A, . . . , (M − 3)A, (M − 1)A.
Example: 4PAM with rectangular pulse.

Using the rectangular pulse
(
1 for t ∈ [0, T ],
g(t) =
0 otherwise,
the resulting waveforms are

S = −3Ag(t), −Ag(t), Ag(t), 3Ag(t) .
3

59
Gram-Schmidt procedure:
Orthonormal function
1 1
ψ(t) = √ s1(t) = p g(t),
E1 Eg
RT 2 RT 2
E1 = 0 s1(t) dt, Eg = 0 g(t) dt.
Every waveform in S is a linear combination of (only) ψ(t):

p
sm(t) = (2m − M − 1) A Eg ψ(t)
| {z }
d
= (2m − M − 1)d ψ(t).
s1(t) ←→ s1 = −(M − 1)d

s2(t) ←→ s2 = −(M − 3)d
···
sm(t) ←→ sm = (2m − M − 1)d
···
sM −1(t) ←→ sM −1 = (M − 3)d
sM (t) ←→ sM = (M − 1)d

X = s1 , s 2 , . . . , s M

= −(M − 1)d, −(M − 3)d, . . . , −d, d, . . .

. . . , (M − 3)d, (M − 1)d .

60
Example: M = 2, 4, 8
0 1
s1 s2
00 01 11 10
s1 s2 s3 s4
000 001 011 010 110 111 101 100
s1 s2 s3 s4 s5 s6 s7 s8
Vector Encoder
The vector encoder
enc : u = [u1, . . . , uK ] →
7 x = enc(u)
K

U = {0, 1} → X = −(M − 1)d, . . . , (M − 1)d
is any Gray encoding function.
Energy and SNR

The source bits are generated by a BSS, and so the symbols of the
signal constellation X are equiprobable:
1
Pr(X = sm) =
M
for all m = 1, 2, . . . , M .

61
The average symbol energy results as

M
2
1 X
Es = E X = (sm)2
M m=1
M
d2 X 2 d2 1
= (2m − M − 1) = M (M 2 − 1)
M m=1 M3
Thus,
s
M2 − 1 2 3E s
Es = d ⇐⇒ d=
3 M2 − 1
As M = 2K , the average energy per transmitted source bit results

as
r
2
Es M −1 2 3 log2 M
Eb = = d ⇐⇒ d= E b.
K 3 log2 M M2 − 1
The SNR per symbol and the SNR per bit are related as
1 1
γb = γs = γs .
K log2 M

62
3.2.3 ML Decoding
PSfrag replacements

decML(y) = argmin ky − xk2
x∈X
Decision regions
Example: Decision regions for M = 4.
D1 D2 D3 D4
y
s1 s2 s3 s4
−2d 0 2d
3
Starting from the special cases M = 2, 4, 8, the decision regions
can be inferred:
D1 = {y ∈ R : y ≤ −(M − 2)d},
D2 = {y ∈ R : −(M − 2)d < y ≤ −(M − 4)d},
···
Dm = {y ∈ R : (2m − M − 2)d < y ≤ (2m − M )d},
···
DM −1 = {y ∈ R : (M − 4)d < y ≤ (M − 2)d},
DM = {y ∈ R : (M − 2)d < y}.
Accordingly, the ML symbol decision rule may also be written as

decML(y) = sm for y ∈ Dm.
ML decoder for MPAM

û = enc-1 (decML(y))

63
BPAM transceiver:
U Gray X X(t)
BSS Encoder
um 7→ sm W (t)
ψ(t)
Û Decoder Y RT Y (t)
placements Sink √1 (.)dt
T 0
Dm 7→ um
Vector representation:
RT
√1 (.)dt
T 0
U Gray X
BSS Encoder
um 7→ sm
W ∼ N (0, N0 /2)
Û Decoder Y
Sink
Dm 7→ um
Notice: um denotes the block of source bits corresponding to sm,

i.e.,
sm = enc(um), um = enc-1 (sm).

64

Two cases must be considered:
1. The transmitted symbol is an edge symbol (a symbol at the

edge of the signal constellation):
X ∈ {s1, sM }.
2. The transmitted symbol is not an edge symbol:
X ∈ {s2, s3, . . . , sM −1}.
Case 1: X ∈ {s1, sM }
Assume first X = s1. Then
Pr(X̂ 6= X|X = s1) = Pr(Y ∈

/ D1|X = s1)
= Pr(Y > s1 + d|X = s1)
= Pr(W > d)
W d
= Pr p >p
N0/2 N0/2
d
=Q p .
N0/2
p
Notice that W/ N0/2 ∼ N (0, 1).
By symmetry,
d
Pr(X̂ 6= X|X = sM ) = Q p .
N0/2

65
Case 2: X ∈ {s2, . . . , sM −1 }
Assume X = sm. Then
Pr(X̂ 6= X|X = sm) = Pr(Y ∈
/ Dm|X = sm)
= Pr(W < −d or W > d)
= Pr(W < −d) + Pr(W > d)
W d
= Pr p < −p
N0/2 N0/2
W d
+ Pr p >p
N0/2 N0/2
d d
=Q p +Q p
N0/2 N0/2
d
=2·Q p .
N0/2
Averaging over the symbols, one obtains
M
1 X
Ps = Pr(X̂ 6= X|X = sm)
M m=1
2 d 2(M − 2) d
= Q p + Q p
M N0/2 M N0/2
M −1 d
=2 Q p .
M N0/2
Expressing the symbol-error probability as a function of the SNR
per symbol, γs, or the SNR per bit, γb, yields
r
M −1 6
Ps = 2 Q γs
M M2 − 1
r
M −1 6 log2 M
=2 Q γb .
M M2 − 1

66
Plots of the symbol-error probabilities can be found in [1, p. 412].
Remark: When comparing BPAM and 4PAM for the same Ps,
the SNRs per bit, γb, differ by 10 log10(5/2) ≈ 4 dB. (Proof?)
When Gray encoding is used, we have the relation

Ps
Pb ≈
log2 M
M −1 r 6 log M
2
=2 Q γb
M log2 M M2 − 1
for high SNR γb = E b/N0.

67
4 Pulse Position Modulation (PPM)

In pulse position modulation (PPM), the information is conveyed
by the position of the transmitted waveforms.
4.1 BPPM
PPM with only two waveforms is called binary PPM (BPPM).

PSfrag replacements
Set of two rectangular waveforms with amplitude
A and dura-
tion T /2 over t ∈ [0, T ], S := s1(t), s2 (t) :
s1 (t) s2 (t)
A A
0 T
t 0 T
t
2 T 2 T
−A −A
As the cardinality of the set is |S| = 2, one bit is sufficient to

address each waveforms, i.e., K = 1. Thus, the BPPM transmitter
performs the mapping
U 7→ x(t)
with
U ∈ U = {0, 1} and x(t) ∈ S.
Remark: In the general case, the basic waveform may also have
another shape.

68
The waveforms s1(t) and s2(t) are orthogonal and have the same
energy
ZT ZT
Es = s21(t)dt = s22(t)dt = A2 · T2 .
0 0
√ p
Notice that Es = A T /2.
The two orthonormal functions and the corresponding waveform
representations are
1 p
ψ1(t) = √ s1(t), s1(t) = Es · ψ1(t) + 0 · ψ2(t),
Es
1 p
ψ2(t) = √ s2(t), s2(t) = 0 · ψ1(t) + Es · ψ2(t).
Es
Vector representation of the two waveforms:
p
s1(t) ←→ s1 = ( Es, 0)T,
p
s2(t) ←→ s2 = (0, Es)T.
√ √
Thus the signal constellation is X = ( Es, 0)T , (0, Es)T .
PSfrag replacements
ψ2
√ s2
Es
√
2Es
s1
0 √
0 Es ψ1

69
Vector Encoder
The vector encoder is defined as
(√
( Es, 0)T for u = 0,
x = enc(u) = √
(0, Es)T for u = 1.
The signal constellation comprises only two waveforms, and thus

per (modulation) symbol and per (source) bit, the average transmit
energies and the SNRs are the same:
4.1.3 ML Decoding

(√
2 ( Es, 0)T for y1 ≥ y2,
decML(y) = argmin ky − xk = √
x∈{s1 ,s2 } (0, Es)T for y1 < y2.
Notice: ky − s1k2 ≤ ky − s2k2 ⇔ y1 ≥ y2.
Decision
PSfrag regions
replacements
D1 = {y ∈ R2 : y1 ≥ y2}, D2 = {y ∈ R2 : y1 < y2}.
√
2Es y2
D2
√ s2
Es D1
s1
0 √
0 Es y1

70
Combining the (modulation) symbol decision rule and the encoder

inverse yields the ML decoder for BPPM:
(
0 for y1 ≥ y2,
û = enc-1 (decML(y)) =
1 for y1 < y2.
PSfrag replacements
The ML decoder may be implemented by
Y1
Z z≥0→0 Û
z<0→1
Y2
Notice that Z = Y1 − Y2 is then the decision variable.

71
BPPM transceiver for the waveform AWGN channel
ψ1(t)
Vector X1
U Encoder X(t)
BSS √
0 7→ ( Es, 0)T
√
1 7→ (0, Es)T W (t)
X2
placements ψ2(t)
Y1 q R
T /2
Vector 2
T 0
(.)dt
Û Decoder
Sink D1 7→ 0
D2 7→ 1
q R
2 T
(.)dt
Y (t)
T T /2
Y2
ψ1(t)
ψ (t)
q R 2
(.)dtVector representation including a AWGN vector channel
2 T /2
T 0
q R
2 T
T T /2 (.)dt X1
Vector
U Encoder
BSS √
0 7→ ( Es, 0)T
√
1 7→ (0, Es)T
X2 W1 ∼ N (0, N0 /2)
Y1
Vector W2 ∼ N (0, N0 /2)
Û Decoder
Sink D1 7→ 0
D2 7→ 1
Y2

72

√
Assume that X = s1 = ( Es, 0)T is transmitted.
Pr(X̂ 6= X|X = s1) = Pr(Y ∈ D2|X = s1)

PSfrag replacements = Pr(s1 + W ∈ D2|X = s1)
0 = Pr(s1 + W ∈ D2)
p
The noiseEvector
s /2 W may be represented in the coordinate system
(ξ1, ξ2) or in the rotated coordinate system (ξ10 , ξ20 ). The latter is
more suited to computing the above probability, as only the noise
component along the ξ20 -axis is relevant.
ξ2 D2
ξ20 D1
s2 s1 + w
w2
ξ10
√ w10
Es
α = π/4
w20
√ s1 w1 ξ1
Es
The elements of the noise vector in the rotated coordinate system

have the same statistical properties as in the non-rotated coordi-
nate system (see Appendix B), i.e.,
W10 ∼ N (0, N0/2), W20 ∼ N (0, N0 /2).

73
Thus, we obtain
p
6 X|X = s1) =
Pr(X̂ = Pr(W20
> Es/2)
r
W20 Es
= Pr( p > )
N /2 N 0
p 0 √
= Q( Es/N0) = Q( γs).
By symmetry,
√
Pr(X̂ 6= X|X = s2) = Q( γs).
Averaging over the conditional symbol-error probabilities, we ob-
tain
√
Ps = Pr(X̂ 6= X) = Q( γs).
For BPPM, Eb = Es, γs = γb, Pb = Ps (see above), and so the

bit-error probability (BEP) results as
√
Pb = Pr(Û 6= U ) = Q( γb).
A plot of the BEP versus γb may be found in [1, p. 426]
Remark
Due to their signal constellations, BPPM is called orthogonal sig-
naling and BPAM is called antipodal signaling. Their BEPs are
given by
√ p
Pb,BPPM = Q( γb), Pb,BPAM = Q( 2γb).
When comparing the SNR for a certain BEP, BPPM requires 3 dB
more than BPAM. For a plot, see [1, p. 409]

74
4.2 MPPM
In the general case of M -ary pulse position modulation (MPPM)
with M = 2K , input blocks of K bits determine the position of a
unique pulse.
Waveform encoder
u = [u1, . . . , uK ] →
7 x(t)
K

U = {0, 1} → S = s1(t), . . . , sM (t) ,
where (m − 1)T
sm(t) = g t − ,
M
m = 1, 2, . . . , M , and g(t) is a predefined pulse of duration T /M ,
g(t) = 0 for t ∈
/ [0, T /M ].
The waveforms are orthogonal and have the same energy as g(t),
ZT
Es = E g = g 2(t)dt.
0
PSfrag replacements
Example: (M=4)
PSfrag replacements
Consider
PSfrag a rectangular pulse g(t) with amplitude A and du-
replacements
ration T /4. Then, we have the set of four waveforms, S =
eplacements
{s1(t), s2 (t), s3 (t), s4 (t)}:
s1 (t)
s1 (t) s2 (t) s3 (t) s4 (t)
s1 (t) s2 (t)
A A A A
s1 (t) s2 (t) s3 (t)
0 t 0 t 0 t 0 t
0 T 0 T 0 T 0 T

75
As all waveforms are orthogonal, we have M orthonormal func-
tions:
1
ψm(t) = √ sm(t),
Es
m = 1, 2, . . . , M .
The waveforms and the orthonormal functions are related as
p
sm(t) = Es · ψm(t),
m = 1, 2, . . . , M .
Waveforms and their vector representations:
M
zp }| {
s1(t) ←→ s1 = ( Es, 0, 0, . . . , 0)T
p
s2(t) ←→ s2 = (0, Es, 0, . . . , 0)T
...
p
sM (t) ←→ sM = (0, 0, 0, . . . , Es)T
Signal constellation:

X = s1 , s 2 , . . . , s M ∈ R M .
The modulation symbols x ∈ X are orthogonal, and they are

placed on corners of an M -dimensional hypercube.

76
Example: M = 3
The signal constellation
p p p
X = ( Es, 0, 0) , (0, Es, 0) , (0, Es, 0)T ∈ R3
T T
√
may be visualized using a cube with side-length Es. 3
Vector Encoder
enc : u = [u1, . . . , uK ] →
7 x = enc(u)
K

U = {0, 1} → X = s1, s2, . . . , sM
Energy and SNR

The symbol energy is Es, and as M = 2K , the average energy per
transmitted source bit results as
Es
Eb =
K
The SNR per symbol and the SNR per bit are related as
1 1
γb = γs = γs .
K log2 M

77
4.2.3 ML Decoding

x∈X
= argmin kyk2 − 2hy, xi + kxk2
x∈X | {z }
Es
= argmax hy, xi
x∈X
ML decoder for MPAM
û = enc-1 (decML(y))
placements
Implementation of the ML detector using vector correlators:
hy, s1i
y s1 select x̂ û
-1
enc
.. max.
.
hy, sM i
sM
Notice that û = [û1, . . . , ûK ].

78

Assume first that X = s1 is transmitted. Then
Pr(X̂ 6= X|X = s1) = 1 − Pr(X̂ = X|X = s1)

= 1 − Pr hy, s1i ≥ hy, smi for all m 6= 1|X = s1 .
√
For X = s1 = ( Es, 0, . . . , 0), the received vector is
p
Y = s1 + W = ( Es, 0, . . . , 0) + (W1 , . . . , WM ),
so that
( √
Es + Es · W1 for m = 1,
hy, smi = √
Es · Wm for m = 2, . . . , M .
Therefore, the symbol-error probability may be written as follows:
Pr(X̂ 6= X|X = s1) =

p p
= 1 − Pr Es + EsW1 ≥ EsWm for all m 6= 1
s
Es W1 Wm
= 1 − Pr +p ≥p for all m 6= 1
N0/2 N /2 N /2
| {z0 } | {z0 }
W10 Wm0
r
2Es
= 1 − Pr Wm0 ≤ + W10 for all m 6= 1
N0
with
W 0 = (W10 , . . . , WM
0 T
) ∼ N (0, I),
i.e., the components Wm0 are independent and Gaussian distributed

with zero mean and variance 1.

79
The probability is now written using the pdf of W10 :

q
0 2Es 0
Pr Wm ≤ N0 + W1 for all m 6= 1 =
Z+∞ q
0 2Es 0
0
= Pr Wm ≤ N0 + W1 for all m 6= 1W1 = v pW10 (v)dv.
| {z }
−∞
(?)
The events corresponding to different values of m are independent,
and thus
M
Y q
(?) = Pr Wm0 ≤ + 2Es
N0 W10 W10
=v
m=2 | {zq }

Pr Wm0 ≤ 2E N0 + v
s
M
Y q
2Es
M −1
p
= Q N0 +v =Q ( 2γs + v).
m=2
Using this result and inserting pW10 (v), we obtain the conditional
symbol-error probability
Pr(X̂ 6= X|X = s1) =

Z+∞h
M −1
p i 1 v2
= 1−Q ( 2γs + v) √ exp − dv
2π 2
−∞
The above expression is independent of s1 and thus valid for all
x ∈ X. Therefore, the average symbol-error probability results as
Z+∞h
M −1
p i 1 v2
Ps = 1−Q ( 2γs + v) √ exp − dv.
2π 2
−∞

80
Hamming distance
The Hamming distance dH(u, u0) between two length-K vectors
u and u0 is the number of positions in which they differ:
K
X
dH(u, u0) = δ(uk , u0k ),
k=1
where the indicator function is defined as

(
0 1 for u = u0,
δ(u, u ) =
0 for u 6= u0.
Example: 4PPM
Let the blocks of source bits and the modulation symbols be asso-
ciated as
u1 = enc-1 (s1) = [0, 0]

u2 = enc-1 (s2) = [0, 1]
u3 = enc-1 (s3) = [1, 0]
u4 = enc-1 (s4) = [1, 1].
Then we have for example
dH(u1, u1) = 0, dH(u1, u2) = 1,

dH(u2, u3) = 2, dH(u2, u4) = 1.

81
Conditional bit-error probability

Assume that X = s1, and let
um = [um,1 , . . . , um,K ] = enc-1 (sm).
The conditional bit-error probability is defined as
K
1 X
Pr(U 6= Û |X = s1) = 6 Ûk |X = s1).
Pr(Uk =
K
k=1
The argument of the sum may be expanded as

Pr(Uk 6= Ûk |X = s1) =
XM
= Pr(Uk 6= Ûk |X̂ = sm, X = s1) Pr(X̂ = sm|X = s1).
m=1
Since
X = s1 ⇔ U = u1 ⇒ Uk = u1,k ,
X̂ = sm ⇔ Û = um ⇒ Ûk = um,k ,
we have
Pr(Uk 6= Ûk |X̂ = sm, X = s1) =
= Pr(u1,k 6= um,k |X̂ = sm, X = s1)
= δ(u1,k , um,k ).
Using this result in the expression for the conditional bit-error
probability, we obtain
Pr(U 6= Û |X = s1)
K M
1 XX
= δ(u1,k , um,k ) Pr(X̂ = sm|X = s1)
K m=1
k=1
M XK
1 X
= δ(u1,k , um,k ) Pr(X̂ = sm|X = s1)
K m=1 k=1
| {z }
dH(u1, um)

82
Due to the symmetry of the signal constellation, the probabilities

of making a specific decision error are identical, i.e.,
Ps Ps
Pr(X̂ = sm|X = sm0 ) = = K
M −1 2 −1
for all m 6= m0. Using this in the above sum yields
M
1 Ps X
6 Û |X = s1) =
Pr(U = · dH(u1, um).
K 2K − 1 m=1
Without loss of generality, we assume u1 = 0. The sum over the

Hamming distances becomes then the overall number of ones in all
bit blocks of length K.
Example:
Consider K = 3 and write the binary vectors of length 3 in the
columns of a matrix:
 
T 0 0 0 0 1 1 1 1
T
u1 , . . . , u 8 =  0 0 1 1 0 0 1 1  .
0 1 0 1 0 1 0 1
In every row, half of the entries are ones. The overall number of
ones is thus 3 · 23−1 . 3
Generalizing the example, the above sum can shown to be

M
X
dH(0, um) = K · 2K−1 .
m=1

83
Applying this result, we obtain the conditional bit-error probability

2K−1
6 Û |X = s1) = K
Pr(U = · Ps ,
2 −1
which is independent of s1 and thus the same for all transmitted
symbols X = sm, m = 1, . . . , M .
Average bit-error probability

From above, we obtain
2K−1
6 Û ) = K
Pr(U = · Ps .
2 −1
Plots of the BEPs may be found in [1, p. 426]
Notice that for large K,

1
Pr(U 6= Û ) ≈ · Ps .
2
Due to the signal constellation, Gray mapping is not possible.

84
Remarks
(i) Provided that the SNR γb = Eb/N0 is larger than ln 2 (cor-

responding to −1.6 dB), the BEP of MPPM can be
made arbitrarily small by selecting M large enough.
The price to pay is an increasing decoding delay and an in-
creasing required bandwidth.
Let the transmission rate be denoted by
K log2 M
R= = [bit/sec].
T T
Decoding delay:
1
· log2 M.
T =
R
Hence, for a fixed rate R, T → ∞ for M → ∞.
Required bandwidth:
1 M M
B = Bg ≈ = =R· ,
T /M T log2 M
where Bg denotes the bandwidth of the pulse g(t).
Hence, for a fixed rate R, B → ∞ for M → ∞.
(ii) The power of the MPPM signal is

Eb · K
P = = Eb · R.
T
Hence, the condition γb > ln 2 is equivalent to the condition
1 P
R< · = C∞ .
ln 2 N0
The value C∞ denotes the capacity of the (infinite band-
width) AWGN channel (see Appendix C).

85
5 Phase-Shift Keying (PSK)

In phase-shift keying (PSK), the waveforms are sines with the same
frequency, and the information is conveyed by the phase of the
waveforms.
5.1 Signaling Waveforms

Set of sines sm(t) with the same frequency f0, duration T , and
different phases φm:
√
m−1

 2P cos 2πf 0 t + 2π for t ∈ [0, T ],
 | {zM }
sm(t) = φm


0 elsewhere,
m = 1, 2, . . . , M . The phase can take the values

φm ∈ 0, 2π M1 , 2π M2 , . . . , 2π MM−1
Hence, the digital information is embedded in the phase of a carrier.

The frequency f0 is called the carrier frequency.
For M = 2, this modulation scheme is called binary phase-

shift keying (BPSK), and for M = 4, it is called quadrature
phase-shift keying. (QPSK).
All waveforms have the same energy

Z∞
Es = s2m(t)dt = P T.
−∞
(Notice: cos2 x = 1 + cos 2x.)

86
Usually, f0 1/T . For simplifying the theoretical analysis, we

assume that f0 = L/T for some large L ∈ N.
BPSK
For M = 2, we obtain φm ∈ {0, π} and thus the two waveforms

√
s1(t) = 2P cos 2πf0t ,
√
s2(t) = 2P cos 2πf0t + π = −s1(t)
for t ∈ [0, T ].
QPSK
For M = 4, we obtain φm ∈ {0, π2 , π, 3π

2 } and thus the waveforms
are
√
s1(t) = + 2P cos 2πf0t ,
√
s2(t) = − 2P sin 2πf0t ,
√
s3(t) = − 2P cos 2πf0t ,
√
s4(t) = + 2P sin 2πf0t
for t ∈ [0, T ].

87
5.2 Signal Constellation

BPSK
Orthonormal functions:
q
ψ1(t) = √1 s1 (t) = 2
cos 2πf0 t
Es T
for t ∈ [0, T ].
Geometrical representation of the waveforms:

√ √ p
s1(t) = + P T ψ1(t) ←→ s1 = + P T = + Es
√ √ p
s2(t) = − P T ψ1(t) ←→ s2 = − P T = − Es
PSfrag replacements
√ √
Es Es
ψ1
s2 0 s1
Vector encoder:
(
s1 for u = 1,
x = enc(u) =
s2 for u = 0.

88
QPSK
q
ψ1(t) = √1 s1 (t) =+ 2
cos 2πf0t
Es T
q
ψ2(t) = √1 s2 (t) =− 2
Es T sin 2πf0 t
for t ∈ [0, T ].

√ p
s1(t) = + P T ψ1(t) ←→ s1 = ( Es, 0)T
p p
s2(t) = Esψ2(t) ←→ s2 = (0, Es)T
√ p
s3(t) = − P T ψ1(t) ←→ s3 = (− Es, 0)T
p p
s4(t) = − Esψ2(t) ←→ s4 = (0, − Es)T
with Es = P T .
ψ2
PSfrag replacements s2
ψ1
s3 s1
s4
Vector encoder: any Gray encoder

89
General Case
The expression for sm(t) can be expanded as follows:

p
sm(t) = Es cos 2πf0t + φm
p p
= Es cos φm cos(2πf0 t) − Es sin φm sin(2πf0t),
where φm = 2π m−1
M , m = 1, 2, . . . , M .
(Notice: cos(x + y) = cos x cos y − sin x sin y.)
q
2
ψ1(t) = T cos(2πf0 t)
q
2
ψ2(t) = − T sin 2πf0t
for t ∈ [0, T ].

p p
sm(t) = Es cos φm · ψ1(t) + Es sin φm · ψ2(t)
p p T
←→ sm = Es cos φm, Es sin φm ,
m = 1, 2, . . . , M . Remember: Es = P T .
n p p o
m−1 m−1 T
X= Es cos 2π M , Es sin 2π M : m = 1, 2, . . . , M
Example: 8PSK
For M = 8, we obtain φm = π m−1
4 , m = 1, 2, . . . , 8. (Figure?) 3

90
5.3 ML Decoding

x∈X
ML decoder for MPSK
û = [û1, û2, . . . , ûK ] = enc-1 (decML(y))
PSfrag replacements
BPSK
ψ1
BPSK has the same signal constellation as BPAM. Thus, the ML
detectors for these two modulation schemes are the same.
D2 D1
y
s2 0 s1
(u = 0) (u = 1)
PSfrag replacements
Implementation of the ML detector:
Y y≥0→1 Û
y<0→0

91
PSfrag replacements
QPSK
Decision regions for the symbols sm and corresponding bit blocks

u = [u1, u2] (Gray encoding):
y2
D2
s2 01
D1
11 00
y1
s3 s1
D3
s4 10
D4
Decision rule for the data bits:

(
0 for y ∈ D1 ∪ D2 ⇔ y1 + y 2 > 0
û1 =
1 for y ∈ D3 ∪ D4 ⇔ y1 + y 2 < 0
(
PSfrag replacements 0 for y ∈ D1 ∪ D4 ⇔ y2 − y 1 < 0
û2 =
1 for y ∈ D2 ∪ D3 ⇔ y2 − y 1 > 0
Implementation of the ML decoder:
Z1 z1 > 0 → 0 Û1
Y1
z1 < 0 → 1
Z2 z2 < 0 → 0 Û2
Y2
z2 > 0 → 1

92
General Case
The decision regions for the general case can be determined in a

straight-forward way:
y2 s2
PSfrag replacements
2π/M
π/M D1
y1
s1
sM

93
5.4 Vector Representation of the MPSK

Transceiver
placements q
2
T
cos 2πf0 t
T
X1
U Vector X(t)
or Channel
Encoder
X2
q
2
T
− sin 2πf0 t
T
W (t)
cos 2πf0 t
Y1 q R
2 T
(.)dt
Û Vector T 0
Decoder q R
2 T
Y (t)
T 0
(.)dt
Y2
− sin 2πf0 t
AWGN Vector Channel:

PSfrag replacements W1 ∼ N (0, N0 /2)
X1 Y1
X2 Y2
W2 ∼ N (0, N0 /2)

94
5.5 Symbol-Error Probability

BPSK
Since BPSK has the same signal constellation as BPAM, the two
modulation schemes have the same symbol-error probability:
q p
2Es
Ps = Q N0 = Q( 2γs).
QPSK
Sfrag replacements
0 that X = s is transmitted. Then, the conditional
Assume 1
symbol-error probability is given by
p
Es/2
Pr(X̂ 6= X|X = s1) = Pr(y ∈ / D1|X = s1)
= 1 − Pr(y ∈ D1|X = s1)
= 1 − Pr(s1 + W ∈ D1).
For computing this probability, the noise vector is projected on the

rotated coordinate system (ξ10 , ξ20 ) (compare BPPM):
y2 ξ2
ry
da
un
s2 D2
bo
D1
ion
ξ20
cis
de
s1 + w
de
c
isi
w2
ξ10
on
bo
w10
un
d
ar
π/4
y
w20
D3 y1
√ w1
Es s1 ξ1
D4

95
The probability may then be written as

q q
0 0
Pr(s1 + W ∈ D1) = Pr W1 > − 2 and W2 < E2s
Es
q q
= Pr W10 > − E2s · Pr W20 < E2s
q q
W10 E W20 Es
= Pr √ > − N0 · Pr √
s
< N
N0 /2 N0 /2 0
√ √
= 1 − Q( γs) 1 − Q( γs)
√ 2
= 1 − Q( γs) .
Hence,
√ 2
Pr(X̂ 6= X|X = s1) = 1 − 1 − Q( γs)
√ √
= 2 Q( γs) − Q2( γs).
By symmetry, the four conditional symbol-error probabilities are

identical, and thus the average symbol-error probability results as
√ √
Ps = 2 Q( γs) − Q2( γs).

96
General Case
A closed-form expression for Ps does not exist. It can be upper-

bounded using an union-bound technique, or it can be numerically
computed using an integral expression. (See literature.) A plot of
Ps versus the SNR can be found in [1, p. 416].
Remark: Union Bound

For two random events A and B, we have
Pr(A ∪ B) = Pr(A) + Pr(B) − Pr(A ∩ B) ≤ Pr(A) + Pr(B).

| {z }
≤0
The expression on the right-hand side is called the union bound for
the probability on the left-hand side. In a similar way, we obtain
the union bound for multiple events:
Pr(A1 ∪ A2 ∪ · · · ∪ AM ) ≤ Pr(A1) + Pr(A2) + · · · + Pr(AM ).
Each of these events may denote the case that the received vector
is within a certain decision region.

97

BPSK
Since Eb = Es, γs = γb, Pb = Ps, the bit-error probability is given

by
p
Pb = Q( 2γb).
QPSK
For Gray encoding, Pb can directly be computed.

Assume first that X = s1 is transmitted, which corresponds to
[U1, U2] = [0, 0]. Then, the conditional bit-error probability is
given by
Pr(Û 6= U |X = s1) =

= 12 Pr(Û1 6= U1|X = s1) + Pr(Û2 6= U2|X = s1)

= 21 Pr(Û1 6= 0|X = s1) + Pr(Û2 6= 0|X = s1)

= 21 Pr(Û1 = 1|X = s1) + Pr(Û2 = 1|X = s1) .
From the direct decision rule for data bits, we have
Û1 = 1 ⇔ Y ∈ D 3 ∪ D4 ,
Û2 = 1 ⇔ Y ∈ D 2 ∪ D3 .

98
Thus,
Pr(Û 6= U |X = s1) =

= 12 Pr(Y ∈ D3 ∪ D4) + Pr(Y ∈ D2 ∪ D3)
q q
= 21 Pr W10 < − E2s + Pr W20 > E2s
q q
W 0 0
= 12 Pr √ 1 < − N Es
+ Pr √W2 > N Es
N0 /2 0 N0 /2 0
√
= Q( γs).
By symmetry, all four conditional bit-error probabilities are iden-

tical. Since two data bits are transmitted per modulation sym-
bol (waveform) in QPSK, we have the relations Es = 2Eb and
γs = 2γb. Therefore,
p
Pb = Q( 2γb).
(Remember: Pb ≈ 10−5 for γb ≈ 9.6 dB.)

99
Comparison of QPSK and BPSK
(i) QPSK has exactly the same bit-error probability as BPSK.
(ii) Relation between symbol duration T and (average) bit dura-

tion Tb = T /K:
BPSK: T = Tb,
QPSK: T = 2Tb.
Hence, for a fixed symbol duration T , the bit rate of QPSK

is twice as fast as that of BPSK.
General Case
Since Gray encoding is used, we have

1
Pb ≈ Ps
log2 M
for high SNR.
The symbol-error probability may be determined using one of the
methods mentioned above.

100
6 Offset Quadrature Phase-Shift

Keying (OQPSK)
Offset QPSK is a variant of QPSK. It has the same performance
as QPSK, but the requirements on the transmitter and receiver
hardware are lower.
6.1 Signaling Scheme

Consider QPSK with the following “tilted” signal constella-
tion and Gray encoding:
PSfrag replacements
ψ2
01 00
√
Es
p ψ1
Es/2
11 10
The phase of QPSK can change by maximal π. This is the case

for instance, when two consecutive 2-bit blocks are [00], [11] or
[01], [10]. These phase changes put high requirements on the dy-
namic behavior of the RF-part of the QPSK transmitter and re-
ceiver.
Phase changes of π can be avoided by staggering the quadrature

branch of QPSK by T /2 = Tb. The resulting modulation scheme
is called staggered QPSK or offset QPSK (OQPSK).

101
Without significant loss of generality, we assume

2L
f0 = for some L 1, L ∈ N.
T
Remark
The ψ1 component of the modulation symbol is called the in-phase
component (I), and the ψ2 component is called the quadrature
component (Q).
placements6.2 Transceiver
We consider communication over an AWGN channel.
q
2
or Channel T
cos 2πf0 t
T
X1
[U1, U2] Vector X(t)
Encoder
X2
q
2
T
− sin 2πf0 t
T 3T W (t)
2 2
cos 2πf0 t
Y1 q R
2 T
[Û1, Û2] (.)dt
Vector T 0
Decoder q R
2 3T /2
Y (t)
T T /2
(.)dt
Y2
− sin 2πf0 t

replacements 102
6.3 Phase Transitions for QPSK and

OQPSK
10
Example: Transmission of three symbols
For the transmission of three symbols, we consider the data bits
u = [u1, u2], the in-phase component (I) and the quadrature com-
ponent (Q) of the modulation
p symbol, and the phase change ∆φ.
For convenience, let a := Es/2.
QPSK
replacements
u 00 11 01
I +a −a −a
Q +a −a +a
10
∆φ +π − π2
−π
+ π2 t
T
0 2 = Tb T 2T 3T
OQPSK
u 00 11 01
I +a +a −a −a −a −a
Q +a +a −a −a +a +a
+π ∆φ + π2 + π2 0 − π2
−π
t
T
0 2 = Tb T 2T 3T

103
QPSK exhibits phase changes {± π2 , ±π} at times that are mul-

tiples of the signaling interval T . In contrast to this, OQPSK
exhibits phase changes {± π2 } at times that are multiples of the bit
duration Tb = T /2.
Remarks
(i) OQPSK has the same bit-error probability as QPSK.
(ii) Staggering the quadrature component of QPSK by T /2 does

not prevent phase changes of π if the signaling constellation
of QPSK is not tilted. (Why?)

104
7 Frequency-Shift Keying (FSK)

In FSK, the waveforms are sines, and the information is conveyed
by the frequencies of the waveforms. FSK with two waveforms is
called binary FSK (BFSK), and FSK with M waveforms is called
M -ary FSK (MFSK).
7.1 BFSK with Coherent Demodulation

7.1.1 Signal Waveforms
Set S of two sinus waveforms with different frequencies f1, f2 and

(possibly) different but known phases φ1 , φ2 ∈ [0, 2π):
√
s1(t) = 2P cos 2πf1 t + φ1 ,
√
s2(t) = 2P cos 2πf2 t + φ2 .
Both values are limited to the time interval t ∈ [0, T ]. Thus we
have

S = s1(t), s2 (t) .
Usually, f1, f2 1/T . To simplify the theoretical analysis, we

assume that
k1 k2
f1 = , f2 = ,
T T
where k1, k2 ∈ N and k1, k2 1.
Both waveforms have the same energy

Z∞ Z∞
Es = s21(t)dt = s22(t)dt = P T.
−∞ −∞

105
We assume the following BFSK transmitter:
{0, 1} → S
u 7→ x(t)
with the mapping
0 →
7 s1(t)
1 → 7 s2(t).
The frequency separation
∆f = |f1 − f2|
is selected such that s1(t) and s2(t) are orthogonal, i.e.,

ZT
hs1(t), s2 (t)i = s1(t)s2 (t)dt = 0.
0
The necessary condition for orthogonality is

k
∆f =
T
with k ∈ N. Notice that the minimum frequency separation is
1/T .
The phases are assumed to be known to the receiver. Demodula-

tion with phase information (known phases or estimated phases)
is called coherent demodulation.

106
7.1.2 Signal Constellation
q
ψ1(t) = √1 s1 (t) = 2
cos 2πf1t + φ1
Es T
q
ψ2(t) = √1 s2 (t) = 2
cos 2πf2t + φ2
Es T
for t ∈ [0, T ].

p p
s1(t) = Es · ψ1(t) + 0 · ψ2(t) ←→ s1 = ( Es, 0)T
p p
s2(t) = 0 · ψ1(t) + Es · ψ2(t) ←→ s1 = (0, Es)T.
PSfrag replacements
ψ2
√ s2
Es
√
2Es
s1
D1 0 √
D2 0 Es ψ1
BFSK with coherent demodulation has the same signal constella-

tion as BPPM.
Accordingly, these two modulation methods are equivalent.

107
7.1.3 ML Decoding

(√
2 ( Es, 0)T for y1 ≥ y2,
decML(y) = argmin ky − xk = √ T
x∈{s1 ,s2 } (0, Es) for y1 < y2.
PSfrag replacements
Decision regions
2
D1 = {y ∈
√R : y1 ≥ y2}, D2 = {y ∈ R2 : y1 < y2}.
2Es
(u = 1)
y2
D2
√ s2 D1
Es
(u = 0)
s1
0 √
0 Es y1
ML decoder for BFSK

(
0 for y1 ≥ y2,
û = enc-1 (decML(y)) =
1 for y1 < y2.
PSfrag replacements
Implementation of the ML decoder

Y1
Z z≥0→0 Û
z<0→1
Y2

108
We assume BFSK over the AWGN channel and coherent demod-

ulation.
placements q
2
T
cos(2πf1 t + φ1 )
T
X1
U Vector X(t)
or Channel
Encoder
X2
q
2
T
cos(2πf2 t + φ2 )
T
W (t)
cos(2πf1 t + φ1 )
Y1 q R
2 T
(.)dt
Û Vector T 0
Decoder q R
2 T
Y (t)
T 0
(.)dt
Y2
cos(2πf2 t + φ2 )
The dashed box forms a 2-dimensional AWGN vector channel with

independent noise components
W1, W2 ∼ N (0, N0/2).
Remark: Notice the similarities to BPSK and BPPM.

109
7.1.5 Error Probabilities
As the signal constellation of coherently demodulated BFSK is the

same as that of BPPM, the error probabilities are the same:
√
Ps = Q( γs),
√
Pb = Q( γb).
For the derivation, see analysis of BPPM.

110
7.2 MFSK with Coherent Demodulation

The signal waveforms are

(√
2P cos 2πfmt + φm for t ∈ [0, T ],
sm(t) =
0 elsewhere,
m = 1, 2, . . . , M .
The parameters of the waveforms are as follows:
(a) frequencies fm = km/T for km ∈ N and km 1;
(b) frequency separations ∆fm,n = |fm − fn| = km,n/T , km,n ∈

N;
(c) phases φm ∈ [0, 2π);
m, n = 1, 2, . . . , M .
The phases are assumed to be known to the receiver. Thus, we
consider coherent demodulation.
As in the binary case, all waveforms have the same energy

Z∞
Es = s2m(t)dt = P T.
−∞

111
q
ψm(t) = √1 sm (t) = 2
cos 2πfmt + φm
Es T
for t ∈ [0, T ], m = 1, 2, . . . , M .

M elements
zp }| {
p
s1(t) = Es · ψ1(t) ←→ s1 = ( Es, 0, . . . , 0)T
...
p p
sM (t) = Es · ψM (t) ←→ s1 = (0, . . . , 0, Es)T.
MFSK with coherent demodulation has the same signal constella-

tion as MPPM.
Accordingly, these two modulation methods are equivalent and
have the same performance.
Vector Encoder
enc : u = [u1, . . . , uK ] →
7 x = enc(u)

U = {0, 1}K → X = s1, s2, . . . , sM
with K = log2 M .

112
We assume MFSK over the AWGN channel and coherent demod-

ulation.
placements q
2
T
cos(2πf1 t + φ1 )
T
X1
U Vector X(t)
or Channel
Encoder
XM
q
2
T
cos(2πfM t + φM )
T
W (t)
cos(2πf1 t + φ1 )
Y1 q R
2 T
(.)dt
Û Vector T 0
Decoder q R
2 T
Y (t)
T 0
(.)dt
YM
cos(2πfM t + φM )
The dashed box forms an M -dimensional AWGN vector channel

with mutually independent components
Wm ∼ N (0, N0/2),
m = 1, 2, . . . , M .

113
Remarks
The M waveforms sm(t) ∈ S are orthogonal. The receiver simply
correlates the received waveform y(t) to each of the possibly trans-
mitted waveforms sm(t) ∈ S and selects the one with the highest
correlation as the estimate.
7.2.4 Error Probabilities
The signal constellation of coherently demodulated MFSK is the

same as that of MPPM. Therefor, the symbol-error probability
and the bit-error probability are the same. For the analysis, see
MPPM.

114
7.3 BFSK with Noncoherent Demodulation


√
s1(t) = 2P cos 2πf1 t + φ1 ,
√
s2(t) = 2P cos 2πf2 t + φ2 .
for t ∈ [0, T ) with
(i) frequencies fm = km/T for km ∈ N and km 1, m = 1, 2,

and
(ii) frequency separation ∆f = |f1 − f2| = k/T for some k ∈ N.
As opposed to the case of coherent demodulation, the phases φ1

and φ2 are not known to the receiver. Instead, they are assumed
as random variables that are uniformly distributed, i.e.,
1 1
pΦ1 (φ1 ) = , pΦ2 (φ2 ) =
2π 2π
for φ1, φ2 ∈ [0, 2π).
Demodulation without use of phase information is called nonco-

herent demodulation. As the phases need not be estimated,
the noncoherent demodulator has a lower complexity than the co-
herent demodulator.

115
Using the same method as for the canonical decomposition of PSK,

the orthonormal functions are obtained as
q
ψ1(t) = T2 cos 2πf1t
q
ψ2(t) = − T2 sin 2πf1t
q
ψ3(t) = T2 cos 2πf2t
q
ψ4(t) = − T2 sin 2πf2t
for t ∈ [0, T ].
(Remember: cos(x + y) = cos x cos y − sin x sin y.)
Geometrical representations of the waveforms:

p p
s1(t) = Es cos φ1 · ψ1(t) + Es sin φ1 · ψ2(t)
p p T
←→ s1 = Es cos φ1, Es sin φ1, 0, 0 ,
p p
s2(t) = Es cos φ2 · ψ3(t) + Es sin φ2 · ψ4(t)
p p T
←→ s2 = 0, 0, Es cos φ2, Es sin φ2 .
The signal constellation is thus

X = s1 , s 2 .
The transmitted (modulation) symbols are the four-

dimensional(!) vectors
T
X = |X1{z | 3{z }4 ∈ X.
, X}2 , X , X
“for s1” “for s2”

116
7.3.3 Noncoherent Demodulator
As there are four orthonormal basis functions, the demodulator

has four branches:
cos 2πf1 t
q R
T
2
T 0
(.)dt Y1
q R
T
2
(.)dt Y2
PSfrag replacements T 0
− sin 2πf1 t
Y (t)
cos 2πf2 t
q R
T
2
T 0
(.)dt Y3
q R
T
2
T 0
(.)dt Y4
− sin 2πf2 t
Notice the similarity to PSK.
We introduce the vectors

Y1 Y3
Y1= , Y2=
Y2 Y4
and
T T
Y = Y T1 , Y T2 = Y1 , Y 2 , Y 3 , Y 4 .
(Loosly speaking, Y 1 corresponds to s1 and Y 2 corresponds to s2.)

117
7.3.4 Conditional Probability of Received Vector
Assume that the transmitted vector is

p p T
X = s1 = Es cos φ1, Es sin φ1 , 0, 0 .
Then, the received vector is

p p T
Y = E cos φ1 + W1, Es sin φ1 + W2, W ,W
| s {z } | 3{z }4
Y T1 Y T2
T
= Y T1 , Y T2 .
The conditional probability density of Y given X = s1 and φ1 is
pY |X,Φ1 (y|s1, φ1) =

1 2

= πN0 · exp − N10 ky − s1k2
h p 2
1 2 1
= πN0 · exp − N0 y1 − Es cos φ1 +
p 2 i
+ y2 − Es sin φ1 + + y32 y42
h i
1 2 Es 1 2 2
= πN0 · exp − N0 · exp − N0 ky 1k + ky 2k ·
√
2 Es
· exp N0 y1 cos φ1 + y2 sin φ1
With α1 = tan−1(y2/y1), we may write

y1 cos φ1 + y2 sin φ1 = ky 1k cos α1 cos φ1 + sin α1 sin φ1
= ky 1k cos(φ1 − α1).

118
The phase can be removed from the condition by averaging over

it. The phase is uniformly distributed in [0, 2π), and thus
pY |X (y|s1) =
Z2π
= pY |X,Φ1 (y|s1, φ1) pφ1 (φ1) dφ1
0
Z2π
1
= 2π pY |X,Φ1 (y|s1, φ1) dφ1
0
Z2π h i
1 1 2
Es
2 2
= 2π πN0 · exp −N 0
· exp − N10 ky 1k + ky 2k ·
0
√
2 Es
· exp N0 ky 1 k cos(φ1 − α1) dφ1
h i
1 2 Es 2 2
= πN0 · exp −N 0
· exp − N10 ky 1k + ky 2k ·
Z2π √
1 2 Es
· 2π exp N0 ky 1 k cos(φ1 − α1) dφ1 .
0
| {z √
}

I0 2 NE0 s ky 1k
The above function is the modified Bessel function of order 0, and

it is defined as
Z2π
1

I0(v) := 2π exp v cos(φ − α) dφ
0
for α ∈ [0, 2π). It is monotonically increasing. A plot can be found

in [1, p. 403].

119
Combining all the above results yields

1 2 Es

pY |X (y|s1) = πN0 · exp − N0 ·
h i √
1 2 2 2 Es
· exp − N0 ky 1k + ky 2k · I0 N0 ky 1 k .
Using the same reasoning, we obtain

1 2 Es

pY |X (y|s2) = πN0 · exp − N0 ·
h i √
1 2 2 2 Es
· exp − N0 ky 1k + ky 2k · I0 N0 ky 2 k .
7.3.5 ML Decoding
Remember the definitions

T T

T T
Y = Y , Y}2, Y
| 1{z , Y}4 = Y 1 , Y 2 .
| 3{z
Y1 Y2

x∈X
Let x̂ = sm̂. Then, the index of the ML vector is

√
2 Es
m̂ = argmax I0 N0 ky mk
m∈{1,2}
= argmax ky mk
m∈{1,2}
= argmax ky mk2.
m∈{1,2}
Notice that selecting x̂ and selecting m̂ are equivalent.

120
ML decoder
(
0 for x̂ = s1 ⇔ ky 1k ≥ ky 2k,
û =
1 for x̂ = s2 ⇔ ky 1k < ky 2k.
placementsOptimal noncoherent receiver for BFSK
cos 2πf1 t
q R Y1
2
T
T
0
(.)dt (.)2 kY 1k2
q R
2
T
T
0
(.)dt (.)2
Y2
Y (t) − sin 2πf1 t Z z≥0→0 Û

z<0→1
cos 2πf2 t
q R Y3
2
T
T
0
(.)dt (.)2
kY 2k2
q R
2
T
T
0
(.)dt (.)2
Y4
− sin 2πf2 t
Notice:
q
2
kY 1k = Y12 + Y22, Z = kY 1k2 − kY 2k2,
q
2
kY 2k = Y32 + Y42.

121
7.4 MFSK with Noncoherent Demodula-

tion

√
sm(t) = 2P cos 2πfmt + φm
for t ∈ [0, T ), m = 1, 2, . . . , M , with
(i) fm = km/T for km ∈ N and km 1, m = 1, 2, . . . , M ,
(ii) ∆f m, n = |fm − fn| = k/T for some km,n ∈ N, m, n =

1, 2, . . . , M .
The phases φ1, φ2 , . . . , φM are not known to the receiver. They

are assumed to be uniformly distributed in [0, 2π).

122
q
2
ψ1(t) = T cos 2πf1t
q
2
ψ2(t) = − sin 2πf1t
T
...
q
ψ2m−1 (t) = T2 cos 2πfmt
q
2
ψ2m(t) = − T sin 2πfmt
...
q
ψ2M −1 (t) = T2 cos 2πfM t
q
ψ2M (t) = − T2 sin 2πfM t
for t ∈ [0, T ].
Signal vectors (of length 2M ):

p p T
s1 = Es cos φ1, Es sin φ1, 0, . . . , 0
p p T
s2 = 0, 0, Es cos φ2, Es sin φ2 , 0, . . . , 0
...
p p T
sm = 0, . . . , 0, Es cos φm, Es sin φm , 0, . . . , 0
| {z }
positions 2m − 1 and 2m
...
p p T
sM = 0, . . . , 0, Es cos φM , Es sin φM

X = s1 , s 2 , . . . , s M .

123
7.4.3 Conditional Probability of Received Vector
Using the same reasoning as for BFSK, we can show that

M Es

pY |X (y|sm) = πN1 0 · exp − N 0
·
XM √
1 2 2 Es

· exp − N0 ky mk · I0 N0 ky mk
m=1
with the definition

T T T

T T
Y = |Y1{z
, Y}2, Y , Y}4, . . . , Y2M −1 , Y2M ,
| 3{z = Y 1,Y 2,...,Y M .
| {z }
Y1 Y2 YM
7.4.4 ML Decision Rule
Original ML decision rule:

x∈X
Let x̂ = sm̂. Then, the original decision rule may equivalently be

expressed as
m̂ = argmax ky mk2.
m∈{1,...,M }

124
7.4.5 Optimal Noncoherent Receiver for MFSK
Remember: “noncoherent demodulation” means that no phase in-

formation is used for demodulation.
placements
The optimal noncoherent receiver for MFSK over the AWGN chan-
nel is as follows:
cos 2πf1 t
Y1
q R
2 T
(.)dt (.)2 kY 1k2
T 0
q R
2
T
T
0
(.)dt (.)2 select
Y2 maximum
Y (t) − sin 2πf1 t and Û
cos 2πfM t compute

Y2M −1 encoder
q R
2
T
T
0
(.)dt (.)2 inverse
q R
2
T
T
0
(.)dt (.)2 kY M k2
Y2M
− sin 2πfM t
Notice that Û = [Û1, . . . , ÛK ] with K = log2 M .

125
Assume that X = s1 is transmitted. Then, the probability of a

correct decision is
Pr(X̂ = X|X = s1) =

= Pr kY mk2 ≤ kY 1k2 for all m 6= 1|X = s1 .
Provided that X = s1 is transmitted, we have

p p T
Y1= Es cos φ1 + W1, Es sin φ1 + W2
T
Y 2 = W3 , W 4 = W 2
...
T
Y m = W2m−1 , W2m = W m
...
T
Y M = W2M −1 , W2M = W M .
Consequently,
Pr(X̂ = X|X = s1) =

= Pr kW mk2 ≤ kY 1k2 for all m 6= 1|X = s1

= Pr √ 1 kW mk2 ≤ √ 1 kY 1k2 for all m 6= 1|X = s1
N /2 N /2
| 0 {z } | 0 {z }
Rm R1

= Pr Rm ≤ R1 for all m 6= 1|X = s1
Z∞

= Pr Rm ≤ R1 for all m 6= 1|X = s1, R1 = r1 ·
0
· pR1 (r1) dr1.

126
The probability density function of R1 is denoted by pR1 (r1).
The probability of a correct decision can further be written as
Pr(X̂ = X|X = s1) =

Z∞

= Pr Rm ≤ r1 for all m 6= 1|X = s1, R1 = r1 pR1 (r1) dr1
0
Z∞

= Pr Rm ≤ r1 for all m 6= 1 pR1 (r1) dr1
0
The noise vectors W2, . . . , WM are mutually statistically indepen-

dent, and thus also the values R2, . . . , RM are mutually statisti-
cally independent. Therefore,
Z∞ Y
M
Pr(X̂ = X|X = s1) = Pr(Rm ≤ r1) pR1 (r1) dr1
0 m=2
Z∞ M −1
= Pr(R2 ≤ r1) pR1 (r1) dr1.
0
Notice that
Pr(Rm ≤ r1) = Pr(R2 ≤ r1)
for m = 2, 3, . . . , M , because R2, . . . , RM are identically dis-

tributed.

127
The conditional probability of a symbol-error can thus be written

as
Pr(X̂ 6= X|X = s1) = 1 − Pr(X̂ = X|X = s1)

Z∞ M −1
=1− Pr(R2 ≤ r1) pR1 (r1) dr1.
0
By symmetry, all conditional symbol-error probabilities are the

same. Thus,
Z∞ M −1
Ps = 1 − Pr(R2 ≤ r1) pR1 (r1) dr1.
0
In the above expressions, we have the following distributions (see

literature):
(i) R2 (and also R3, . . . , RM ) is Rayleigh distributed:

2
pR2 (r2) = r2 e−r2 /2.
Accordingly, its cumulative distribution function results as

2
Pr(R2 ≤ r1) = 1 − e−r1 /2
(ii) R1 is Rice distributed:

p
pR1 (r1) = r1 · exp − 21 (r1 + 2γs)2 · I0 2γsr1
Remark
The Rayleigh distribution and the Ricd distribution are often used
to model mobile radio channels.

128
Using the first relation, we can expand the first term as

M −1 M −1
−r12 /2
Pr(R2 ≤ r1) = 1−e
M −1
X M −1 2
= (−1)m e−mr1 /2
m=0
m
After integrating over r1, we obtain the following closed-form so-

lution for the symbol-error probability of MFSK with
noncoherent demodulation:
M −1
X
m+1 M −1 1 m
Ps = (−1) exp − γs .
m=1
m m + 1 m + 1
For M = 2, this sum simplifies to
Ps = 1
2 e−γs/2.

129
BFSK
We have γb = γs and Pb = Ps.
Therefore, the bit-error probability is given by
Pb = 1
2 e−γb/2.
MFSK
For M > 2, we have
1
(i) γb = K γs ,
2K−1
(ii) Pb = K Ps with K = log2 M (see MPPM).
2 −1
Therefore the bit-error probability results as

M −1
2K−1 X m+1 M − 1 1 mK
Pb = K (−1) exp − γs .
2 − 1 m=1 m m+1 m+1
Plots of the error probability can be found in [1, p. 433].

130
8 Minimum-Shift Keying (MSK)

MSK is a binary modulation scheme with coherent detection, and
it is similar to BFSK with coherent detection. The waveforms
are sines, the information is conveyed by the frequencies of the
waveforms, and the phases of the waveforms are chosen such that
there are no phase discontinuities. Thus, the power spectrum of
MSK is more compact than that of BFSK. The description of MSK
follows [2].
8.1 Motivation
Consider BFSK. The signal waveforms of BFSK are
√
U = 0 7→ s1(t) = 2P cos 2πf1t ,
√
U = 1 7→ s2(t) = 2P cos 2πf2t ,
for t ∈ [0, T ]. The minimum frequency separation such that s1(t)
and s2(t) are orthogonal (assuming coherent detection) is
1
∆f = |f1 − f2| = .
2T
In the following, we assume minimum frequency separation.
Without loss of generality, we assume that

k k 1 k + 1/2
f1 = , f2 = + = ,
T T 2T T
for k ∈ N and k 1, i.e., k is a large integer. Thus, we have the
following properties:
waveform number of periods in t ∈ [0, T )
s1(t) k
s2(t) k + 1/2

131
Example: Output of BFSK Transmitter

Consider BFSK with f1 = 1/T and f2 = 3/2T such that the fre-
quency separation ∆ = 1/T is minimum. Assume the bit sequence
U [0,4] = [U0, U1, U2, U3, U4] = [0, 1, 1, 0, 0].
The resulting output signal x•(t) of the BFSK transmitter shows

discontinuities. 3
The discontinuities produce high side-lobes in the power spectrum

of X(t), which is either inefficient or problematic. This can be
avoided by slightly modifying the modulation scheme.
We rewrite the signal waveforms as follows:

√
s1(t) = 2P cos 2πf1t ,
√ t
s2(t) = 2P cos 2πf1t + 2π ,
2T
for t ∈ [0, T ]. (Remember that f2 = f1 + 1/2T .) Thus, we can
express the output signal as
√
x(t) = 2P cos 2πf1t + φ•(t) ,
t ∈ [0, T ), with the phase function

(
0 for s1(t),
φ•(t) =
2π 2Tt for s2(t).
Notice that φ•(t) depends directly on the transmitted source bit U .

132
Example: Phase function of BFSK

The phase function φ•(t) of the previous example shows disconti-
nuities. 3
The discontinuities of the phase and thus of x(t) can be avoided by

making the phase continuous. This can be achieved by disabling
the “reset to zero” of φ•(t), yielding the new phase function φ(t).
Notice that this introduces memory in the transmitter.
Example: Modified BFSK Transmitter

(We continue the previous example.) The phase function φ(t) of
the modified transmitter is continuous. Accordingly, also the sig-
nal x(t) generated by the modified transmitter is continuous. 3
FSK with phase made continuous as described above is called

continuous-phase FSK (CPFSK). Binary CPFSK with minimum
frequency separation is termed minimum-shift keying (MSK).
8.2 Transmit Signal

The continuous-phase output signal of an MSK transmitter con-
sists of the following waveforms (the expressions on the right-hand
side are only used as abbreviations in the following):
√
s1(t) = + 2P cos 2πf1 t “f1”
√
s2(t) = − 2P cos 2πf1t “−f1”
√
s3(t) = + 2P cos 2πf2 t “f2”
√
s4(t) = − 2P cos 2πf2t “−f2”
t ∈ [0, T ].

133
The input bit Uj = 0 generates either s1(t) or s2(t); the input

bit Uj = 1 generates either s3(t) or s4(t), which means a phase
difference of π between the beginning and the end of the waveform.
Consider a bit sequence of length L,
u[0,L−1] = [u0, u1, . . . , uL−1 ].
According to the above considerations (see also examples), the

output signal of the MSK transmitter can be written as
follows:
L−1
√ X
x(t) = 2P cos 2πf1t + ul · 2π b(t − lT ) ,
|l=0 {z }
φ(t)
t ∈ [0, LT ), with the phase pulse


0

 for τ < 0,
b(τ ) = τ /2T for 0 ≤ τ < T ,


1/2 for T ≤ τ .
Remark
The BFSK signal x(t) can be generated with the same expression
as above, when the phase pulse

0

 for τ < 0,
b•(τ ) = τ /2T for 0 ≤ τ < T ,


0 for T ≤ τ
is used.

Sfrag replacements 134
Plotting all possible trajectories of φ(t), we obtain the (tilted)

phase tree of MSK.
φ(t)
f2
f1
4π
−f2 −f2
−f1 −f1
3π
f2 f2 f2
f1 f1 f1
2π
−f2 −f2 −f2 −f2
5T
−f1 −f1 −f1 −f1
π
f2 f2 f2 f2 f2
f1 f1 f1 f1 f1
0
0 T 2T 3T 4T t
Sfrag replacements
The physical phase is obtained by taking φ(t) modulo 2π. Plotting

all possible trajectories, we obtain the (tilted) physical-phase
trellis of MSK.
φ(t) mod 2π
2π
−f2 −f2 −f2 −f2
−f1 −f1 −f1 −f1
π
f2 f2 f2 f2 f2
5T
f1 f1 f1 f1 f1
0
0 T 2T 3T 4T t

135
8.3 Signal Constellation

Signal waveforms:
√
s1(t) = + 2P cos 2πf1t ,
√
s2(t) = − 2P cos 2πf1t ,
√
s3(t) = + 2P cos 2πf2t ,
√
s4(t) = − 2P cos 2πf2t ,
t ∈ [0, T ].
p
ψ1(t) = 2/T cos 2πf1t ,
p
ψ2(t) = 2/T cos 2πf2t ,
t ∈ [0, T ].
Vector representation of the signal waveforms:

p p
s1(t) = + Es ψ1(t) ←→ s1 = (+ Es, 0)T
p p
s2(t) = − Es ψ1(t) ←→ s2 = (− Es, 0)T = −s1
p p
s3(t) = + Es ψ2(t) ←→ s3 = (0, + Es)T
p p
s4(t) = − Es ψ2(t) ←→ s4 = (0, − Es)T = −s3
with Es = P T .

136
Remark:
The modulation symbol at discrete time j is written as
X j = (Xj,1 , Xj,2 )T.

√
For example, X j = s1 means Xj,1 = Es and Xj,2 = 0.
ψ2
PSfrag replacements s3
ψ1
s2 s1
s4
Remark:
MSK and QPSK have the same signal constellation. However, the
two modulation schemes are not equivalent because their vector
encoders are different. This is discussed in the following.

137
Modulator:
The symbol X j = (Xj,1 , Xj,2 )T is transmitted in the time interval
t ∈ [jT, (j + 1)T ).
This time interval can also be written as
t = jT + τ, τ ∈ [0, T ).
The modulation for τ = t − jT ∈ [0, T ), j = 0, 1, 2, . . . , is then

given by the mapping
X j 7→ Xj (τ ),
and it can be implemented by

q
2
T
cos(2πf1τ )
PSfrag replacements T
Xj,1
Xj (τ )
Xj,2
q
2
T
cos(2πf2τ )
T
Remark:
The index j is not necessary for describing the modulator. It is
only introduced to have the same notation as being used for the
vector encoder.

138
8.4 A Finite-State Machine

Consider the following device (which is a simple finite-state ma-
PSfrag chine)
replacements
Vj+1
Uj T Vj
It consists of the following components:
(i) A modulo-2 adder:

PSfrag replacements
Uj Vj+1
Vj
It has the functionality
Vj+1 = Uj ⊕ Vj = Uj + Vj mod 2.
PSfrag replacements
Uj
(ii) A shift register clocked at times T (“every T seconds”):
Vj+1 T Vj
(iii) The sequence {Uj } is a binary sequence with elements

in {0, 1}, and so is the sequence {Vj }.
The output value Vj describes the state of this device.

139
As Vj ∈ {0, 1}, this device can be in two different states. All possi-
ble transitions can be depicted in a state transition diagram:
0
PSfrag replacements1 1
0
1
1
The values in the circles denote the states v, and the labels of the
arrows denote the inputs u.
eplacements
The state transitions and the associated input values can also be
depicted in a trellis diagram. The initial state is assumed to be
zero.
0 0 0 0
1 1 1 1 1 1
1
1
1
0 0 0 0 0 0 0 0 0 0 0
0 1 2 3 4 5
The time indices are written below the states.

The emphasized path corresponds to the input sequence
[U0, U1, U2, U3, U4] = [0, 1, 1, 0, 0].

140
8.5 Vector Encoder

According to the trellis of the physical phase, the initial phase at
the beginning of each symbol interval can take the two values 0
and π. These values determine the state of the encoder. We use
the following association:
phase 0 ↔ state V = 0,
phase π ↔ state V = 1.
The functionality of the encoder can be described by the following

state transition diagram:
0/s1
PSfrag replacements 0
1/s3 1/s4
0/s2
The transitions are labeled by u/x. The value u denotes the binary
source symbol at the encoder input, and x denotes the modulation
symbol at the encoder output.
Remember: s2 = −s1 and s4 = −s3.
Remark
The vector encoder is a finite-state machine as considered in the
previous section.

eplacements
141
The temporal behavior of the encoder can be described by the

trellis diagram:
0/s2 0/s2 0/s2 0/s2

π π π π π π
1/
1/
1/
1/
s4
s4
s4
s4
s3
s3
s3
s3
s3
1/
1/
1/
1/
1/
0 0/s 0 0/s 0 0/s 0 0/s 0 0/s 0
1 1 1 1 1
0 1 2 3 4 5
The emphasized path corresponds to the input sequence
[U0, U1, U2, U3, U4] = [0, 1, 1, 0, 0].
General Conventions
(i) The input sequence of length L is written as
u = [u0, u1, . . . , uL−1 ].
(ii) The initial state is V0 = 0.
(iii) The final state is VL = 0.
The last bit UL−1 is selected such that the encoder is driven back
to state 0:
VL−1 = 0 −→ UL−1 = 0,
VL−1 = 1 −→ UL−1 = 1.
Therefore, UL−1 does not carry information.

142
Generation of the Encoder Outputs

PSfrag replacements
Consider one trellis section:
0/s2
π π
1/
s4
s3
1/
0 0/s 0
1
j j+1
Remember: s2 = −s1 and s4 = −s3.
Observations:
(i) Input symbols → output vectors
Uj = 0 −→ X j ∈ {s1, s2} = {s1, −s1},

Uj = 1 −→ X j ∈ {s3, s4} = {s3, −s3}.
(ii) Encoder state → output vectors
Vj = 0 (“0”) −→ X j ∈ {s1, s3}

Vj = 1 (“π”) −→ X j ∈ {s2, s4} = {−s1, −s3}.
Using these observations, the above device can be supplemented

to obtain an implementation of the vector encoder for MSK.

143
Block Diagram of the Vector Encoder for MSK
Uj s3 10
Xj,1
01 s1
Uj
Vj+1 Vj s2 00
T s4 11 Xj,2
Uj Vj
At time index j:
• input symbol Uj ,
• state Vj ,
• output symbol X j = (Xj,1 , Xj,2 )T,
• subsequent state Vj+1.
Notice:
(i) Uj controls the frequency of the transmitted waveform.
(ii) Vj controls the polarity of the transmitted waveform.

144
8.6 Demodulator
The received symbol Y j = (Yj,1 , Yj,2 )T is determined in the time
interval
t ∈ [jT, (j + 1)T ).
This time interval can also be written as
t = jT + τ, τ ∈ [0, T ).
The demodulation for τ = t − jT ∈ [0, T ), j = 0, 1, 2, . . . , can

then be implemented as follows:
PSfrag replacements
cos(2πf1 τ )
q R
Yj,1 2
T
T
0
(.)dτ
qT Y (t)
2 q R
T 2 T
Yj,2 T 0
(.)dτ
cos(2πf2 τ )

145
8.7 Conditional Probability Density of Re-

ceived Sequence
Sequence of (two-dimensional) received symbols
[Y 0, . . . , Y L−1 ] = [X 0 + W 0, . . . , X L−1 + W L−1 ].
The vector [W 0, . . . , W L−1] is a white Gaussian noise sequence

with the properties:
(i) W 0, . . . , W L−1 are independent ;
(ii) each (two-dimensional) noise vector is distributed as

Wj,1 0 N /2 0
Wj = ∼N , 0
Wj,2 0 0 N0/2
PSfrag replacements
Illustration of the AWGN vector channel:
Xj,1 Yj,1
Xj Wj,1 W Yj
j
Wj,2
Xj,2 Yj,2

146
Conditional probability of y [0,L−1] = [y 0, . . . , y L−1 ] given that

x[0,L−1] = [x0, . . . , xL−1 ] was transmitted:

pY [0,L−1] |X [0,L−1] y 0, . . . , y L−1 |x0, . . . , xL−1 =
L−1
Y
= pY j |X j y j |xj
j=0
L−1
Yh 1 1 2
i
= · exp − N0 ky j − xj k
j=0
πN0
1 L L−1
X
= · exp − N10 ky j − xj k2 .
πN0 j=0
Notice that this is the likelihood function of x[0,L−1] for a given

received sequence y [0,L−1] .
8.8 ML Sequence Estimation

The trellis of the vector encoder has the following two properties:
(i) Each input sequence u[0,L−1] = [u0, . . . , uL−1 ] uniquely de-

termines a path of length L in the trellis, and vice versa.
(ii) Each path in the trellis uniquely determines an output se-

quence x[0,L−1] = [x0, . . . , xL−1 ]
A sequence x[0,L−1] of length L generated by the vector encoder is

called an admissible sequence if the initial and the final state of
the encoder are zero, i.e., if V0 = VL = 0. The set of admissible
sequences is denoted by A.

147
Using these definitions, the ML decoding rule is the following:
Search the trellis for an admissible sequence that max-

imizes the likelihood function.
Or in mathematical terms:
x̂[0,L−1] = argmax p(y [0,L−1] |x[0,L−1] ).

x[0,L−1] ∈A
Using the relation
ky j − xj k2 = ky j k2 −2hy j , xj i + kxj k2
| {z } | {z }
indep. of xj = Es
and the expression for the likelihood function, the rule for ML
sequence estimation (MLSE) can equivalently be formulated
as follows:
L−1
1 X
x̂[0,L−1] = argmax exp − ky j − xj k2
x[0,L−1] ∈A N0 j=0
L−1
X
= argmin ky j − xj k2
x[0,L−1] ∈A j=0
L−1
X
= argmax hy j , xj i.
x[0,L−1] ∈A j=0
Hence, the ML estimate of the transmitted sequence is a sequence

P
in A that maximizes the sum L−1 j=0 hy j , xj i.

148

T
8.9
2
T
cos(2πf1τ )
q
T
Uj s3 10 Xj,1
Uj 01 s1 X(t)
Vj+1 Vj s2 00
T
AWGN Vector Channel
s4 11
Land, Fleury: Digital Modulation 1

Uj Vj Xj,2
hannel 2
T
cos(2πf2τ )
q
T
W (t)
cos(2πf1τ )
Yj,1
2 T
T 0
(.)dτ
Ûj Maximum-Likelihood
q R
Sequence Decoding
2 T Y (t)
T 0
(.)dτ
Canonical Decomposition of MSK
q R
Yj,2
cos(2πf2τ )
NavCom
149
150
8.10 The Viterbi Algorithm

Given Problem:
(i) The admissible sequences
x[0,L−1] = [x0, . . . , xL−1] ∈ A
can be represented in a trellis.
(ii) The value to be maximized (optimization criterion) can be

written as the sum
L−1
X
hy j , xj i.
j=0
(Remember: hy j , xj i = yj,1 · xj,1 + yj,2 · xj,2 .)
The Viterbi Algorithm solves such an optimization problem in

a very effective way.
Branch Metrics and Path Metrics

The branches of the trellis are re-labeled as follows:
g replacements
vj+1 vj+1
ji
/xj ,x
u j yj
h
vj vj
j j+1 j j+1
The new labels are called branch metrics. They may be inter-
preted as a gain or a profit made when going along this branch.

151
When labeling all branches with their branch metrics, we obtain

the following trellis diagram:
hy 1 , s2 i hy 2 , s2 i hy 3 , s2 i hy 4 , s2 i
π π hy π hy π hy π hy π
1, 2, 3, 4,
s4 s4 s4 s4
i i i i
i i i i i
,s ,s ,s ,s ,s
3 3 3 3 3
hy 0
h y 1
h y 2
hy 3
h y 4
0 hy 0 , s1 i
0 hy 1 , s1 i
0 hy 2 , s1 i
0 hy 3 , s1 i
0 hy 4 , s1 i
0
0 1 2 3 4 5
The total gain or profit achieved by going along a path in the

trellis is called the path metric. The path corresponding to the
sequence
x[0,L−1] = [x0, . . . , xL−1 ]
has the path metric
L−1
X
hy j , xj i.
j=0
Notice: The branch metric and the path metric are not necessarily
a mathematical metric.
Hence, the ML estimate x̂[0,L−1] of X [0,L−1] is an admissible se-

quence (∈ A) that maximizes the path metric. The corresponding
path is called an optimum path.
There are 2 · 4L−1 paths of length L. The Viterbi algorithm is

a sequential method that determines an optimum path without
computing the path metrics of all 2 · 4L−1 paths.

152
Survivors
The Viterbi algorithm relies on the following property of optimum
paths:
A path of length L is optimum if and only if its sub-

paths at any depth n = 1, 2, . . . , L − 1 are also opti-
mum.
Example: Optimality of Subpaths

eplacementsConsider the following trellis. Assume that the thick path (solid
line) is optimum.
π π π π
0 0 0 0 0 0
0 1 2 3 4 5
The dashed path ending at depth 3 cannot have a metric that is

larger than that of the subpath obtained by pruning the optimum
path at depth 3. 3
For each state in each depth n, there are several subpaths ending
in this state. The one with the largest metric (“the best path”) is
called the survivor at this state in this depth.

153
From the above considerations, we have the following important

results:
• All subpaths of an optimum path are survivors.
• Therefore, only survivors need to be considered for determin-

ing an optimum path.
Accordingly, non-survivor paths can be discarded without loss of

optimality. (Therefore the name “survivor”.)
Viterbi Algorithm
1. Start at the beginning of the trellis (depth 0 and state 0 by

convention).
2. Increase the depth by 1. Extend all previous survivors to the

new depth and determine the new survivor for each state.
3. If the end of the trellis is reached (depth L and state 0 by

convention), the survivor is an optimum path. The associated
sequence is the ML sequence.
Remarks
(i) The complexity of computing the path metrics of all paths is

exponential in the sequence length. (Exhaustive search.)
(ii) The complexity of the Viterbi algorithm is linear in the se-

quence length.
(iii) The complexity of the Viterbi algorithm is exponential in the

number of shift registers in the vector encoder.

154
eplacementsExample: Viterbi Algorithm

Consider the case L = 5. This corresponds to the following trellis.
π π π π
0 0 0 0 0 0
0 1 2 3 4 5
The Viterbi algorithm may be illustrated for a randomly drawn re-

ceived sequence y [0,L−1] . (The receiver must be capable of handling
every received sequence.) 3

155
8.11 Simplified ML Decoder

The MSK trellis has a special structure, and this leads to a special
behavior of the Viterbi algorithm. This can be exploited to derive
a simpler optimal decoder.
Observation
At depth n = 2, the two survivors (solid line and dashed line) will
look as depicted in (a) or (b), but never as in (c) or (d).
0 1 2 0 1 2
PSfrag replacements
(a) (c)
0 1 2 0 1 2
(b) (d)
Therefore the final estimate of the state V̂1 is already decided at

time j = 1. This is proved shortly afterwards.
The general result for the given MSK transmitter follows by induc-
tion:
The two survivors at any depth n are identical for j < n
and differ only in the current state.
Therefore, final decisions on any state V̂n (value of the vector-
encoder state at time n) can already be made at time n.

placements
156
We proof now that the Viterbi algorithm makes the final decision
−hy 0 , s1 i
about state V̂1 (state at time j = 1) already at time 1.
−hy 0 , s3 i
Proof
The inner product is defined as
hy j , smi = yj,1 · sm,1 + yj,2 · sm,2 .
Due to the signal constellation,
s2 = −s1, s4 = −s3.
Therefore,
hy j , s2i = hy j , −s1i = −hy j , s1i,

hy j , s4i = hy j , −s3i = −hy j , s3i.
Using these relations, the branches can equivalently be labeled as
−hy 1 , s1 i −hy 2 , s1 i −hy 3 , s1 i −hy 4 , s1 i

π π −h π −h π −h π −h π
y y y y
1, 2, 3, 4,
s3 s3 s3 s3
i i i i i i i i i
,s s s s s
3 3 3 3 3
, , , ,
hy 0
hy 1 hy 2 hy 3 hy 4
0 hy 0 , s1 i
0 hy 1 , s1 i
0 hy 2 , s1 i
0 hy 3 , s1 i
0 hy 4 , s1 i
0
0 1 2 3 4 5

PSfrag replacements
157
Consider the conditions for selecting the survivors, while making

use of these −hy 0 , s1 i
relations:
−hy
• Survivor s3i v = 0 (time j = 1):
at0,state 2
−hy 1, s1i
π −h π
hy 1, s3i 3
i y
,s 1 ,s
h y0 3i
0 0 0
hy 0, s1i hy 1, s1i
PSfrag replacements 0 1 2
hy 0, s3i − hy 1, s3i > hy 0, s1i + hy 1, s1i ⇒ upper path

hy 0, s3i − hy 1, s3i < hy 0, s1i + hy 1, s1i ⇒ lower path
−hy 0, s1i
−hy
• Survivor s3i v2 = 1 (“π”) (time j = 1):
at0,state
hy 1, s1i
−hy 1, s1i
π π
3
i 3
i
−hy 1, s3i , s , s
hy 0 hy 1
0 0 0
hy 0, s1i
0 1 2
hy 0, s3i − hy 1, s1i > hy 0, s1i + hy 1, s3i ⇒ upper path

hy 0, s3i − hy 1, s1i < hy 0, s1i + hy 1, s3i ⇒ lower path
Obviously, either the two upper paths or the two lower paths are
the survivors. Therefore, the two survivors go through the same
state V̂1.

158
ML State Sequence
By induction, we obtain the following optimal decision rule for
state Vj based on y j−1 and y j :
(
0 if hy j−1, s3i − hy j , s3i < hy j−1, s1i + hy j , s1i,
V̂j =
1 if hy j−1, s3i − hy j , s3i > hy j−1, s1i + hy j , s1i,
j = 1, 2, . . . , L − 1. (Remember: VL = 0 by convention.)
This decision rule can be simplified. The modulation symbols are

p p
s1 = (+ Es, 0)T, s3 = (0, + Es)T.
Thus, the inner products can be written as

p p
hy j , s1i = yj,1 · Es , hy j , s3i = yj,2 · Es .
√
Substituting this in the decision rule and dividing by Es, we ob-
tain the following simplified (and still optimal) decision rule:
(
0 if yj−1,2 − yj,2 < yj−1,1 + yj,1 ,
V̂j =
1 if yj−1,2 − yj,2 > yj−1,1 + yj,1 ,
j = 1, 2, . . . , L − 1.

g replacements 159
The following device implements this state decision rule:
Yj,1 Yj−1,1 Zj,1

T
Zj zj > 0 → 0 V̂j
zj < 0 → 1
Yj,2 Yj−1,2
T
Zj,2
Vector-Encoder Inverse
Given the sequence of state estimates {V̂j }, the sequence of bit
estimates {Ûj } can computed according to
Ûj = V̂j ⊕ V̂j+1,
j = 0, 1, . . . , L − 1. (Compare vector encoder.)
Thereplacements
PSfrag following device implements this operation:
V̂j V̂j−1 Ûj−1

T
The time indices are j = 0, 1, . . . , L − 1, and V̂L = 0 as VL = 0 by

convention. Remember that UL−1 does not convey information.

160

The BER performance of MSK is about 2 dB worse than that
of BPSK and QPSK. This means, when comparing the SNR in
Eb/N0 required to achieve a certain bit-error probability (BEP),
MSK needs about 2 dB more than BPSK and QPSK. This can be
improved by slightly modifying the vector encoder.
The trellis of the vector encoder has the following properties:
(i) Two different paths differ in at least one state.
(ii) Two different paths differ in at least two bits.
(See trellis of vector encoder in Section 8.5.)
The most likely error-event for high SNR is that the transmitted
path and the estimated path differ in one state. This error-event
leads to two bit errors.
Idea for improvement:
• The state sequence should be equal to the bit sequence, such

that the most likely error-event leads only to one bit error.
• This can be achieved by using a pre-coder before the vector

encoder.
The resulting modulation scheme is called precoded MSK

161
8.13 Differential Encoding and Differential

Decoding
The two terms “differential encoder” and “differential decoder”
are mainly used due to historical reasons. Both may be used as
encoders.
In the following, bj ∈ {0, 1} denote the bits before encoding, and
cj ∈ {0, 1} denote the bits after encoding.
PSfrag replacements
Differential Encoder (DE)
bj cj+1 cj
T
Operation:
cj+1 = bj ⊕ cj ,
j = 0, 1, 2, . . . . Initialization: c0 = 0.
PSfrag replacements
Differential Decoder (DD)
bj bj−1 cj
T
Operation:
cj = bj ⊕ bj−1,
j = 0, 1, 2, . . . . Initialization: b−1 = 0.

162
placements
8.14 Precoded MSK
The DD(!) is used as a pre-coder and is placed before the vector
replacements
encoder. The DE(!) is placed behind the vector-encoder inverse.
W (t)
Uj X(t) Y (t) Ûj−1
DD MSK Tx MSK Rx / DE
Vector Representation of Precoded MSK
Uj0 s3 10
Xj,1
Uj 01 s1
Vj = Uj−1 s2 00
T s4 11 Xj,2
eplacements Uj0 Vj
Initialization: V0 = 0. Notice: Uj0 = Uj ⊕ Vj = Uj ⊕ Uj−1.
Yj,1 Yj−1,1 Zj,1

T
Zj zj > 0 → 0 V̂j = Ûj−1

zj < 0 → 1
Yj,2 Yj−1,2
T
Zj,2
Notice: The vector-encoder inverse and the DE compensate each

other and can be removed.

163
Trellis of Precoded MSK

1/−s1 1/−s1 1/−s1 1/−s1
1 1 0/ 1 0/ 1 0/ 1 0/ 1
− − − −
s3 s3 s3 s3
s 3 s 3 s 3 s 3 s 3
1/ 1/ 1/ 1/ 1/
0 0/s1 0 0/s1 0 0/s1 0 0/s1 0 0/s1 0
0 1 2 3 4 5
The thick path corresponds to the bit sequence

U [0,4] = [0, 1, 1, 0, 0].
Bit-Error Probability of Precoded MSK

The bit-error probability of precoded MSK is
p
Pb = Q( 2γb),
γb = Eb/N0.
Sketch of the Proof:

The bit-error probability is given as
L−1
1X
Pb = lim Pr(Ûj 6= Uj ).
L→∞ L
j=0
All terms in this sum can shown to be identical, and thus

Pb = Pr(Û0 6= U0).
Due to the precoding, we have U0 = V1, and therefore
Pb = Pr(V̂1 6= V1).

164
This expression can be written using conditional probabilities:

1 X
X 1
Pb = Pr(V̂1 6= V1|V1 = v, U1 = u) · Pr(V1 = v, U1 = u).
v=0 u=0
(Usual trick.) These conditional probabilities can shown to be all

identical, and thus
Pb = Pr(V̂1 6= V1|V1 = 0, U1 = 0).
As V1 = U0 (property of precoded MSK), we obtain
Pb = Pr(V̂1 = 1|U0 = 0, U1 = 0).
The ML decision rule is

(
0 if y0,2 − y1,2 < y0,1 + y1,1 ,
V̂1 =
1 if y0,2 − y1,2 > y0,1 + y1,1 .
Given the hypothesis U0 = U1 = 0, we have

p
X 0 = X 1 = s1 = ( Es, 0)T,
and so
p
Y0,1 = Es + W0,1 , Y0,2 = W0,2 ,
p
Y1,1 = Es + W1,1 , Y1,2 = W1,2 .
Remember:
Y 0 = (Y0,1 , Y0,2 )T, Y 1 = (Y1,1 , Y1,2 )T.

165
Therefore, the bit-error probability can be written as
Pb = Pr(Y0,2 − Y1,2 > Y0,1 + Y1,1 |U0 = 0, U1 = 0)

p
= Pr(W0,2 − W1,2 > 2 Es + W0,1 + W1,1 )
p
= Pr(W0,2 − W1,2 − W0,1 − W1,1 > 2 Es)
| {z }
0
W ∼ N (0, 2N0 )
p p
0
= Pr W / 2N0 > 2Es/N0
p p
= Q 2Es/N0 = Q 2γb .
As (precoded) MSK is a binary modulation scheme, we have Es =

Eb and γs = γb.
Thus, the bit-error probability of precoded MSK is

p
Pb = Q( 2γb),
which is the same as that of BPSK and QPSK.

166
8.15 Gaussian MSK

Gaussian MSK is a generalization of MSK, and it is used in the
GSM mobile phone system.
Motivation
The phase function of MSK is continuous, but not its dervative.
These “edges” in the transmit signal lead to some high-frequency
components. By avoiding these “edges”, i.e., by smoothing the
phase function, the spectrum of the transmit signal can be made
more compact.
Concept
The output signal of the transmitter is generated similar to that
in MSK:
L−1
√ X
x(t) = 2P cos 2πf1t + ul · 2π b(t − lT ) ,
|l=0 {z }
φ(t)
t ∈ [0, LT ), with the phase pulse


0

 for τ < 0,
b(τ ) = monotonically increasing for 0 ≤ τ < JT ,


1/2 for JT ≤ τ ,
for some integer J ≥ 1.

167
Notice that there are two new degrees of freedom:
(i) The phase pulse may stretch over more than one symbol
duration T .
(ii) The phase pulse may be be selected such that its derivative
is zero for t = and t = JT .
Phase Pulse and Frequency Pulse

In GMSK, the phase pulse b(τ ) is constructed via its deriva-
tive c(τ ), called the frequency pulse:
(i) The frequency pulse c(τ ) is the convolution of a rectan-

gular pulse (as in MSK!) with a Gaussian pulse:
c(τ ) = rectangular pulse ∗ Gaussian pulse,
R
where c(τ ) = 0 for τ ∈ / [0, JT ), and c(τ )dτ = 1/2. The
resulting pulse is
2πB 2πB
c(τ ) = Q √ τ − T /2 − Q √ τ + T /2 .
ln 2 ln 2
The value B corresponds to a bandwidth and is a design
parameter.
(ii) The phase pulse is the integral over the frequency pulse:
Zτ
b(τ ) = c(τ 0)dτ 0.
0
Due to the definition of the frequency pulse, we have b(τ ) =

1/2 for τ ≥ JT .
The resulting phase pulse fulfills the conditions given above.

168
In GSM, the parameter B is chosen such that BT ≈ 0.3. The

length of the phase pulse is then L ≈ 4.

169
9 Quadrature Amplitude Modula-

tion

170
10 BPAM with Bandlimited Wave-

forms

171
Part III
Appendix

172
A Geometrical Representation of
Signals
A.1 Sets of Orthonormal Functions
The D real-valued functions
ψ1(t), ψ2 (t), . . . , ψD (t)
form an orthonormal set if
Z+∞ (
6 l (orthogonality),
0 for k =
ψk (t) ψl (t) dt =
1 for k = l (normalization).
−∞
Example: D = 3
ψ1(t)
√
1/ T
√ T t
−1/ T
ψ2(t)
√
1/ T
PSfrag replacements T t
√
−1/ T
ψ3(t)
√
1/ T
√ T t
−1/ T
3

173
A.2 Signal Space Spanned by an Orthonor-

mal Set of Functions

Let ψ1(t), ψ2 (t), . . . , ψD (t) be a set of orthonormal functions.
We denote by Sψ the set containing all linear combinations of these
functions:
n D
X o
Sψ := s(t) : s(t) = siψi(t); s1, s2, . . . , sD ∈ R .
i=1
Example: (continued)
The following functions are elements of Sψ :
replacements s1(t) = ψ1(t) + ψ2(t), s2(t) = 12 ψ1(t) − ψ2(t) + 21 ψ3(t)
s1 (t) s2 (t)
√ √
2/ T 2/ T
√ T t √ T t
−2/ T −2/ T
3
The set Sψ is a vector space with respect to the following opera-
tions:
• Addition
+ : s1(t), s2 (t) →7 s1(t) + s2(t)
• Multiplication with a scalar

· : a, s(t) 7→ a · s(t)

174
A.3 Vector Representation of Elements in

the Vector Space
With each element s(t) ∈ Sψ ,
D
X
s(t) = siψi(t),
i=1
we associate the D-dimensional real vector
s = [s1, s2, . . . , sD ]T ∈ RD .
s1(t) = ψ1(t) + ψ2(t) 7→ s1 = [1, 1, 0]T ,

s2(t) = 12 ψ1(t) − ψ2(t) + 12 ψ3(t) 7→ s2 = [ 12 , −1, 12 ]T .
3
Vectors in R, R2, R3, can be represented geometrically by points
on the line, in the plane, an in the 3-dimensional Euclidean space,
respectively.
ψ2
PSfrag replacements
1 s1
0.5
0.5 ψ1
1
1
ψ3 s2

175
A.4 Vector Isomorphism

The mapping Sψ → RD ,
D
X
s(t) = siψi(t) 7→ s = [s1, s2, . . . , sD ]T,
i=1
is a (vector) isomorphism, i.e.:
(i) The mapping is bijective.
(ii) The mapping preserves the addition and the multiplication

operators:
s1(t) + s2(t) 7 → s 1 + s2 ,
as(t) 7→ as.
Since the mapping Sψ → RD is an isomorphism, Sψ and RD are

“the same” or “equivalent” from an algebraic point of view.

176
A.5 Scalar Product and Isometry

Scalar product in Sψ :
Z+∞
hs1(t), s2 (t)i := s1(t)s2 (t) dt
−∞
for s1(t), s2 (t) ∈ Sψ .
Scalar product in RD :
D
X
hs1, s2i := s1,i s2,i
i=1
for s1, s2 ∈ RD .
Relation of scalar products:

Each element in Sψ is a linear combination of
ψ1(t), ψ2 (t), . . . , ψD (t):
D
X D
X
s1(t) = s1,k ψk (t), s2(t) = s2,l ψl (t).
k=1 l=1
Inserting the two sums into the definition of the scalar product, we
obtain
D
Z X D
X
hs1(t), s2 (t)i = s1,k ψk (t) s2,l ψl (t) dt =
k=1 l=1
D X
X D Z D
X
= s1,k s2,l ψk (t)ψl (t)dt = s1,k s2,k = hs1, s2i
k=1 l=1 | {z } k=1
“orthonormality”
Hence the mapping Sψ → RD preserves the scalar product, i.e., it
is an isometry.

177
Norms in Sψ
Z
ks(t)k2 := hs(t), s(t)i = s2(t) dt = Es
The value Es is the energy of s(t).
Norms in RD :
D
X
2
ksk := hs, si = s2i
i=1
From the isometry and the definitions of the norms, we obtain the
Parseval relation
ks(t)k2 = ksk2.

178
A.6 Waveform Synthesis

Let ψ1(t), ψ2(t), . . . , ψD (t) be a set of orthonormal functions
spanning the vector space Sψ . Every element s(t) ∈ Sψ can be
synthesized from its vector representation s = [s1, s2, . . . , sD ]T by
one of
PSfrag the following devices.
replacements
s
Multiplicative-based Synthesizer
ψ1(t)
s1
D
ψ2(t) X
s(t) = sk ψk (t)
s2 k=1
δ(t)
ψD (t)
sD
Filter-bank based Synthesizer

Sfrag replacements
δ(t)
s Filter
s1
ψ1(t)
Filter
s(t)
s2
ψ2(t)
sD Filter
ψD (t)
D
X
s(t) = sk δ(t) ∗ ψk (t) with Dirac function δ(t)
k=1

179
A.7 Implementation of the Scalar Product

Let s1(t) and s2(t) be time-limited to the interval [0, T ]. Then
ZT
hs1(t), s2 (t)i := s1(t)s2 (t) dt.
0
The scalar product can be computed by the following two devices.

ag replacements
Correlator-based Implementation
RT
s1(t) 0 (.)dt hs1(t), s2 (t)i
s2(t)
Matched-filter based Implementation

ag replacements
We seek a linear filter that outputs the value hs1(t), s2 (t)i at a
certain time, say t0.
sample at
time t0
s1(t) linear filter hs1(t), s2 (t)i
g(t)
We want
ZT Z+∞
! 0
hs1(t), s2 (t)i = s1(τ )s2 (τ ) dτ = s1(τ )g(t − τ ) dτ 0
t =t0
0 −∞
The second equality holds for
s2(τ ) = g(t0 − τ ) ⇔ g(t) = s2(t0 − t)

180
Example:
g replacements
s2(t) s2(t0 − t)
t0
T t T t
The filter is causal and thus realizable for a function time-limited

to the interval [0, T ] if
t0 ≥ T.
The value t0 is the time at which the value hs1(t), s2 (t)i appears
replacements
at the filter output. Choosing the minimum value, i.e., t 0 = T , we
obtain the following (realizable) implementation.
sample at
time T
s1 (t) matched filter hs1(t), s2 (t)i
s2(T − t)
The filter is matched to s2(t).

181
A.8 Computation of the Vector Represen-

tation

Let ψ1(t), ψ2(t), . . . , ψD (t) be a set of orthonormal functions
time-limited to [0, T ] and spanning the vector space Sψ .
We seek a device that computes the vector representation s ∈ RD

for any function s(t) ∈ Sψ , i.e.,
D
X
s = [s1, s2, . . . , sD ]T such that s(t) = sk ψk (t).
k=1
The components of s can be computed as

sk = hs(t), ψk (t)i, k = 1, 2, . . . , D.
Correlator-based Implementation
frag replacements
RT
(.)dt s1
0
s(t) ψ1(t)
RT
(.)dt sD
0
ψD (t)
frag replacements
Matched-filter based Implementation
MF s1
ψ1(T − t)
s(t)
MF sD
ψD (T − t)
sample at
time T

182
A.9 Gram-Schmidt Orthonormalization

Assume a set of M signal waveforms
n o
s1(t), s2 (t), . . . , sM (t)
spanning a D-dimensional vector space. From these waveforms,

we can construct a set of D ≤ M orthonormal waveforms
n o
ψ1(t), ψ2 (t), . . . , ψD (t)
by applying the following procedure.

Remember:
Z
hs1(t), s2 (t)i = s1(t)s2 (t) dt
ks(t)k2 := hs(t), s(t)i = Es
Gram-Schmidt orthonormalization procedure
1. Take the first waveform s1(t) and normalize it to unit energy:

s1(t)
ψ1(t) := p .
ks1 (t)k 2
2. Take the second waveform s2(t). Determine the projection

of s2(t) onto ψ1(t), and subtract this projection from s2(t):
c2,1 = hs2(t), ψ1(t)i,

s02(t) = s2(t) − c2,1 ψ1(t).
Normalize s02(t) to unit energy:

s02(t)
ψ2(t) := p 0 .
ks2 (t)k2

183
3. Take the third waveform s3(t). Determine the projections

of s3(t) onto ψ1(t) and onto ψ2(t), and subtract these pro-
jections from s3(t):
c3,1 = hs3(t), ψ1(t)i,

c3,2 = hs3(t), ψ2(t)i,
s03(t) = s3(t) − c3,1 ψ1(t) − c3,2 ψ2(t).
Normalize s03(t) to unit energy:

s03(t)
ψ3(t) := p 0 .
ks3 (t)k2
4. Proceed in this way for all other waveforms sk (t) until k = D.

Notice that s0k (t) = 0 for k > D. (Why?)
The general rule is the following. For k = 1, 2, . . . , D,
ck,i = hsk (t), ψi(t)i, i = 1, 2, . . . , k − 1

k−1
X
0
sk (t) = sk (t) − ck,i ψi(t),
i=1
0
s (t)
ψk (t) := p k0 .
ksk (t)k2
Due to this procedure, the resulting waveforms ψk (t) are orthog-

onal and normalized, as desired, and they form an orthonormal
basis of the D-dimensional vector space of the waveforms si(t).
Remark: The set of orthonormal waveforms is not unique.

184
Example: Orthonormalization
replacements
Original signal waveforms:
s1 (t) s3 (t)
1 1
0 0
1 2 3 t 1 2 3 t
−1 −1
s2 (t) s4 (t)
1 1
√ 0 0
1/√2 1 2 3 t 1 2 3 t
−1/ 2 −1 −1
Set 1 of orthonormal waveforms: Set 2 of orthonormal waveforms:

ψ1(t) ψ1(t)
√ 1
1/ 2
0 0
eplacements √ 1 PSfrag
2 replacements
3 t 1 2 3 t
−1/ 2 −1
ψ2(t) ψ2(t)
√ 1
1/ 2
0 0
√ 1 2 3 t 1 2 3 t
−1/ 2 −1
ψ3(t) ψ3(t)
1 1
0 √ 0
1 2 3 t 1/√2 1 2 3 t
−1 −1/ 2 −1
3

185
B Orthogonal Transformation of a
White Gaussian Noise Vector
The statistical properties of a white Gaussian noise vector
(WGNV) are invariant with respect to orthonormal transforma-
tions. This is first shown for a vector with two dimensions, and
then for a vector with an arbitrary number of dimensions.
B.1 The Two-Dimensional Case

Let W denote a 2-dimensional WGNV, i.e.,
2 !
W1 0 σ 0
W = ∼N , .
W2 0 0 σ2
Consider
PSfrag the original coordinate system (ξ1, ξ2) and the rotated
replacements
coordinate system (ξ10 , ξ20 ).
ξ2
ξ20
w
w2
ξ10
w10
α = π/4
w20
w1 ξ1
Let [W10 , W20 ]T denote the components of W expressed with respect

to the rotated coordinate system (ξ10 , ξ20 ):
W10 = +W1 cos α + W2 sin α
W20 = −W1 sin α + W2 cos α.

186
Writing this in matrix notation as

0
W1 cos α sin α W1
= ,
W20 − sin α cos α W2
the vector W 0 can seen to be the orthonormal transformation of
the vector W .
The resulting vector W 0 = [W10 , W20 ]T is also a two-dimensional

WGNV with the same covariance matrix as W = [W1, W2]T:
0 2 !
W1 0 σ 0
W0 = ∼ N , .
W20 0 0 σ2
Proof: See problem.
B.2 The General Case

Let W = [W1, . . . , WD ]T denote a D-dimensional WGNV with
zero mean and covariance matrix σ 2I, i.e.,
2

W ∼ N 0, σ I .
(0 is a column vector of length D, and I is the D × D identity
matrix.) Let further V denote an orthonormal transformation
RD → RD , i.e., a linear transformation with the property
V V T = V TV = I.
Then the transformed vector

W0 = V W
is also a D-dimensional WGNV with the same covariance matrix
as W :
0 2

W ∼ N 0, σ I .

187
Proof
(i) A linear transformation of a Gaussian random vector is again

a Gaussian random vector; thus W 0 is Gaussian.
(ii) The mean value is

E W 0 = E V W = V E W = 0.
(iii) The covariance matrix is

E W 0W 0T = E V W W TV T
T
= V E W W V = σ2 V
T
| V T} = σ 2I.
{z
| {z }
I
σ2I
Comment
In the case D = 2, V takes the form

cos α sin α
V =
− sin α cos α
for some α ∈ [0, 2π).

188
C The Shannon Limit

C.1 Capacity of the Band-limited AWGN
Channel
Consider an AWGN channel with bandwidth W [1/sec]. The ca-
pacity of this channel is given by
S
CW = W log2 1 + [bit/sec].
W N0
Consider further a digital transmission scheme operating at trans-
mission rate R [bit/sec] with limited transmit power S.
Channel Coding Theorem (C.E. Shannon, 1948):
Consider a binary random sequence generated by a BSS

that is transmitted over an AWGN channel with band-
width W using some transmission scheme.
Transmission with an arbitrarily small probability of er-
ror (using an appropriate coding/modulation scheme)
is possible provided that the transmission rate is less
than the channel capacity, i.e.,
Pb → 0 is possible if R < CW .
Conversely, error-free transmission is impossible if the

rate is larger than the channel capacity, i.e.,
Pb → 0 is impossible if R > CW .

189
A coding/modulation scheme can be characterized by its spectral

efficiency and its power efficiency. The spectral efficiency is
the ratio R/W , and it has the unit secbitHz . The power efficiency
is the SNR γb = Eb/N0 necessary to achieve a certain error prob-
ability (e.g., 10−5).
Example: MPAM
The rate is given by R = T1 log2 M . The bandwidth required to
transmit pulses of duration T is about W = 1/T . The spectral
efficiency is thus
R bit
= log2 M .
W sec Hz
Error probabilities can be found in [1, p. 412, 435]. 3
Example: MPPM
The rate is given by R = T1 log2 M . The bandwidth required
to transmit pulses of duration T /M is about W = M/T . The
spectral efficiency is thus
R log2 M bit
= .
W M sec Hz
Error probabilities can be found in [1, p. 426, 435]. 3

190
A theoretical limit for the relation between the spectral efficiency

and the power efficiency (for error-free transmission) can be estab-
lished using the channel coding theorem.
The transmit power is given by S = REb. Combining this relation,

the channel capacity, and the condition R < CW , we obtain
R REb R
< log2 1 + = log2 1 + γb .
W W N0 W
This can be rewritten as
2R/W − 1
γb > .
R/W
Using the maximal rate for error-free transmission, R = C, we
obtain the theoretical bound
2C/W − 1
γb = .
C/W
The plot can be found in [1, pp. 587].
The lower limit of γb for R/W → 0 is called the Shannon limit

for the AWGN channel:
2R/W − 1
γb > lim = ln 2.
R/W →0 R/W
Notice that 10 log10(ln 2) ≈ −1.6 dB. Thus, error-free transmis-

sion over the AWGN channel is impossible if the SNR per bit is
lower than −1.6 dB.

191
C.2 Capacity of the Band-Unlimited

AWGN Channel
For the AWGN channel with unlimited bandwidth (but still
limited transmit power), the capacity is given by
S 1 S
C∞ = lim W log2 1 + = .
W →∞ W N0 ln 2 N0
The condition R < C∞ becomes then equivalent with the condition

γb > ln 2. (Remember: S = REb.)

Digital Modulation 1: Ingmar Land and Bernard H. Fleury

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Digital Modulation 1: Ingmar Land and Bernard H. Fleury

Uploaded by

Copyright:

Available Formats

Digital Modulation 1

Ingmar Land and Bernard H. Fleury

Navigation and Communications (NavCom)

Version: February 5, 2007

1 The Basic Constituents of a Digital Communication System

2 Optimum Decoding for the AWGN Channel 33

Land, Fleury: Digital Modulation 1 NavCom

2.4 Optimum Decision Rules . . . . . . . . . . . . . . 37

II Digital Communication Techniques 50

3 Pulse Amplitude Modulation (PAM) 51

Land, Fleury: Digital Modulation 1 NavCom

4 Pulse Position Modulation (PPM) 67

5 Phase-Shift Keying (PSK) 85

6 Offset Quadrature Phase-Shift Keying (OQPSK)100

Land, Fleury: Digital Modulation 1 NavCom

7 Frequency-Shift Keying (FSK) 104

Land, Fleury: Digital Modulation 1 NavCom

8 Minimum-Shift Keying (MSK) 130

9 Quadrature Amplitude Modulation 169

10 BPAM with Bandlimited Waveforms 170

III Appendix 171

Land, Fleury: Digital Modulation 1 NavCom

A Geometrical Representation of Signals 172

B Orthogonal Transformation of a White Gaussian Noise Vec

C The Shannon Limit 188

Land, Fleury: Digital Modulation 1 NavCom

[2] B. Rimoldi, “A decomposition approach to CPM,” IEEE

Land, Fleury: Digital Modulation 1 NavCom

Land, Fleury: Digital Modulation 1 NavCom

1 The Basic Constituents of a Digi-

Binary Information Source Digital Transmitter

Binary Information Sink Digital Receiver

• Time duration of one binary symbol (bit) Uk : Tb

• Time duration of one waveform X(t): T

• Waveform channel may introduce

Goal: Signal of information source and signal at information sink

Land, Fleury: Digital Modulation 1 NavCom

• Information source: may be analog or digital

• Source encoder: may perform sampling, quantization and

• Waveform modulator: generates one waveform x(t) per time

• Vector decoder: generates one binary symbol (bit) Ûk ∈

• Source decoder: reconstructs the original source signal

Land, Fleury: Digital Modulation 1 NavCom

1.1 Discrete Information Sources

Properties of the output sequence:

discrete U1, U2, . . . ∈ {a1, a2, . . . , aQ}.

memoryless U1, U2, . . . are statistically independent.

stationary U1, U2, . . . are identically distributed.

Notice: The symbol alphabet {a1, a2, . . . , aQ} has cardinality Q.

Example: Discrete sources

(i) Output of the PC keyboard, SMS

(ii) Compressed file

(iii) The numbers drawn on a roulette table in a casino

Land, Fleury: Digital Modulation 1 NavCom

A DMS is statistically completely described by the probabilities

(i) pU (aq ) ≥ 0 for all q = 1, 2, . . . , Q, and

As the sequence is i.i.d., we have Pr(Ui = aq ) = Pr(Uj = aq ).

Binary Symmetric Source

A binary memoryless source is a DMS with a binary symbol al-

A binary symmetric source (BSS) is a binary memoryless source

Example: Binary Symmetric Source

Land, Fleury: Digital Modulation 1 NavCom

1.2 Considered System Model