Digital Communications

Giulio Colavolpe
Lecture Notes on Advanced

Digital Communications
October 2017
Contents
Foreword v
1 Transmission systems with memory 3

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 General model for a modulated signal . . . . . . . . . . . . . . 6
1.3 Coded linear modulations . . . . . . . . . . . . . . . . . . . . 9
1.3.1 Uncoded transmissions . . . . . . . . . . . . . . . . . . 13
1.3.2 Pulse of duration at most T (short pulse) . . . . . . . . 13
1.3.3 Uncoded transmissions with short pulse . . . . . . . . . 14
1.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2 Sequence detection 21
2.1 MAP sequence detection strategy . . . . . . . . . . . . . . . . 21
2.2 Detection through the Viterbi algorithm . . . . . . . . . . . . 23
2.2.1 Implementation aspects for the Viterbi algorithm . . . 31
2.3 MAP sequence detection for linear modulations . . . . . . . . 33
2.3.1 Uncoded transmission . . . . . . . . . . . . . . . . . . 40
2.3.2 Absence of ISI . . . . . . . . . . . . . . . . . . . . . . . 40
2.3.3 Uncoded transmission and absence of ISI . . . . . . . 41
2.3.4 Considerations on the absence of ISI . . . . . . . . . . 42
2.4 Whitened matched filter front end . . . . . . . . . . . . . . . . 43
2.5 Performance of MAP sequence detectors . . . . . . . . . . . . 51
2.5.1 Upper bound on the error probability . . . . . . . . . . 57
2.5.2 Additive white Gaussian noise . . . . . . . . . . . . . . 58
2.5.3 Linear modulations . . . . . . . . . . . . . . . . . . . . 60
2.5.4 Lower bound on the error probability . . . . . . . . . . 67
2.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3 Detection in the presence of unknown parameters 75

3.1 The synchronization problem . . . . . . . . . . . . . . . . . . 75
3.2 Stochastic parameter with known pdf . . . . . . . . . . . . . . 77
i
ii CONTENTS
3.3 Parameter modeled as deterministic and unknown . . . . . . . 85

3.3.1 Generalized likelihood criterion . . . . . . . . . . . . . 86
3.3.2 Synchronization . . . . . . . . . . . . . . . . . . . . . . 87
3.4 Estimation techniques . . . . . . . . . . . . . . . . . . . . . . 87
3.4.1 Bounds on the performance of an estimator . . . . . . 88
3.4.2 DA estimator . . . . . . . . . . . . . . . . . . . . . . . 91
3.4.3 DD estimator . . . . . . . . . . . . . . . . . . . . . . . 99
3.4.4 NDA estimator . . . . . . . . . . . . . . . . . . . . . . 103
3.4.5 SDD estimator . . . . . . . . . . . . . . . . . . . . . . 107
3.5 A general technique to obtain a sufficient statistic . . . . . . . 111
3.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
4 Codes in the signal space 115

4.1 Continuous phase modulations . . . . . . . . . . . . . . . . . . 115
4.2 Trellis coded modulations . . . . . . . . . . . . . . . . . . . . 128
4.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
5 MAP symbol detection strategy 141

5.1 Minimization of the symbol error probability . . . . . . . . . . 141
5.2 BCJR algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 141
5.3 Soft-output Viterbi algorithm . . . . . . . . . . . . . . . . . . 147
5.4 Computation of the information rate . . . . . . . . . . . . . . 149
5.5 Mismatched detection . . . . . . . . . . . . . . . . . . . . . . 154
5.6 Pragmatic capacity . . . . . . . . . . . . . . . . . . . . . . . . 156
5.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
6 Reduced-complexity and adaptive receivers 161

6.1 Reduced-state sequence detection . . . . . . . . . . . . . . . . 161
6.2 Adaptive equalization . . . . . . . . . . . . . . . . . . . . . . . 168
6.2.1 Linear equalization . . . . . . . . . . . . . . . . . . . . 171
6.2.2 Minimum mean square error . . . . . . . . . . . . . . . 172
6.2.3 Stochastic gradient algorithm . . . . . . . . . . . . . . 174
6.2.4 Decision-feedback equalization . . . . . . . . . . . . . . 175
6.2.5 Notes on the performance . . . . . . . . . . . . . . . . 178
6.3 Adaptive channel identification . . . . . . . . . . . . . . . . . 179
6.4 Channel shortening . . . . . . . . . . . . . . . . . . . . . . . . 182
6.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
7 Turbo codes and iterative decoding 191

7.1 Turbo codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
7.2 Iterative decoding . . . . . . . . . . . . . . . . . . . . . . . . . 196
CONTENTS iii
7.3 EXIT chart analysis . . . . . . . . . . . . . . . . . . . . . . . 200

7.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
8 Factor graphs and the sum-product algorithm 207

8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
9 Codes for fading channels 209

9.1 TCM for fading channels . . . . . . . . . . . . . . . . . . . . . 209
9.1.1 Decoding algorithms . . . . . . . . . . . . . . . . . . . 218
9.1.2 Error performance . . . . . . . . . . . . . . . . . . . . 220
9.2 Bit-interleaved coded modulation (BICM) . . . . . . . . . . . 222
9.2.1 Code construction . . . . . . . . . . . . . . . . . . . . . 222
9.2.2 Decoding algorithms . . . . . . . . . . . . . . . . . . . 223
9.2.3 Error performance . . . . . . . . . . . . . . . . . . . . 225
9.3 Space-time coding for frequency-flat fading channels . . . . . . 227
9.3.1 System model for frequency-flat MIMO channels and main results on capacity2
9.3.2 ST codeword design criteria for slow fading channels . 231
9.3.3 ST codeword design criteria for fast fading channels . . 235
9.3.4 First naïve scheme: delay diversity . . . . . . . . . . . 237
9.3.5 ST block codes . . . . . . . . . . . . . . . . . . . . . . 238
9.3.6 ST trellis codes . . . . . . . . . . . . . . . . . . . . . . 248
9.3.7 Layered space-time codes . . . . . . . . . . . . . . . . . 250
9.3.8 Multiplexing-diversity trade-off . . . . . . . . . . . . . 259
9.3.9 Concatenated codes for MIMO channels . . . . . . . . 260
9.3.10 Unitary and differential ST codes . . . . . . . . . . . . 260
9.4 ST coding for frequency-selective (FS) fading channels . . . . 265
9.4.1 System model for FS MIMO fading channels . . . . . . 265
9.4.2 Design criterion . . . . . . . . . . . . . . . . . . . . . . 267
9.4.3 ST codes for single carrier systems . . . . . . . . . . . 267
9.4.4 ST codes for MIMO OFDM . . . . . . . . . . . . . . . 268
9.5 Massive MIMO systems . . . . . . . . . . . . . . . . . . . . . 268
A Signal spaces 271

A.1 Preliminary definitions . . . . . . . . . . . . . . . . . . . . . . 271
A.2 Signal spaces and orthonormal bases . . . . . . . . . . . . . . 272
A.3 Projection of a signal over a subspace and complete bases . . . 274
A.4 Dicrete representation of a random process . . . . . . . . . . . 276
A.5 Extraction of the signal components . . . . . . . . . . . . . . . 280
B Detection and estimation theory 283

B.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
iv CONTENTS
C Elements of information theory 285

C.1 Definitions for discrete random variables . . . . . . . . . . . . 285
C.2 Capacity for the discrete memoryless channel . . . . . . . . . . 288
C.3 Extension to the transmission of a sequence . . . . . . . . . . 288
C.4 Extension to continuous random variables . . . . . . . . . . . 290
C.5 The vector Gaussian channel . . . . . . . . . . . . . . . . . . . 291
C.6 The bandlimited AWGN channel . . . . . . . . . . . . . . . . 291
C.7 Extension to correlated processes and channels with memory . 293
C.8 Nonergodic channels . . . . . . . . . . . . . . . . . . . . . . . 294
D Block and convolutional codes 295
E Bilateral Z-transform and some of its properties 297

Foreword
This book collects the notes of my lectures for the course of Digital Com-
munications and is the results of twenty years of teaching and research in
this field. Its structure is strictly related to the structure of our 2nd-level de-
gree (laurea magistrale) in Communication Engineering at the University of
Parma, taught in English since 2012. In fact, it is assumed that the students
have some familiarity with the main concepts of Detection and Estimation
and Information Theory and with binary block and convolutional codes (al-
though a summary of the main ideas is reported in the appendices).
The book reflects my personal view and formulation of classical topics as
well as advanced topics that are not present in textbooks but it has clearly
influenced by the reading of references such as the books by Wozencraft
and Jacobs (Principles of Communication Engineering, John Wiley & Sons,
1965), Van Trees (Detection, Estimation and Modulation Theory, Part 1,
John Wiley & Sons, 1968), Lindsey and Simon (Telecommunication Systems
Engineering, Prentice-Hall, 1973), Mengali and D’Andrea (Synchronization
Techniques for Digital Receivers, Plenum Press, 1997), Proakis (Digital Com-
munications, 2nd edition, McGraw-Hill, 1989), Benedetto and Biglieri (Prin-
ciples of Digital Transmission with Wireless Applications, Kluwer, 1999),
Simon, Hinedi, and Lindsey (Digital Communications Techniques, Pren-
tice Hall, 1995), Mengali and Morelli (Trasmissione Numerica, McGraw-Hill,
2001) and clearly many journal papers.
I would like to thank all my students, whose questions allowed to improve
these notes, and all my collaborators (Alan Barbieri, Dario Fertonani, Tom-
maso Foggi, Aldo Cero, Amina Piemontese, Nicolò Mazzali, Andrea Mod-
enini, and Alessandro Ugolini) for their detailed comments and suggestions
for improvements.
Parma, September 2016.
v
Chapter 1
Transmission systems with

memory
1.1 Introduction
Detection theory (see Appendix B for a brief overview) investigates the prob-
lem of one-shot transmissions. In this problem, at the transmission side a
signal s(t) belonging to the set of allowed signals {s(i) (t)}M
i=1 of duration D is
transmitted. These signals are in a one-to-one correspondence with messages
i=1 whose a-priori probabilities P (m ) are known at the receiver. We
{m(i) }M (i)
will denote by m ∈ {m(i) }M i=1 the transmitted message which corresponds to

the transmitted signal s(t). The task that the receiver has to perform is to
take a decision m̂ on the message that has, more likely, been transmitted,
according to a given criterion. The detection strategy that minimizes the
error probability P (m̂ 6= m) turns out to be (see Appendix B)
m̂ = argmax P (m(i) |r) (1.1)

m(i)
where r is the vector representation of the received signal r(t) (or a sufficient
statistic, see Appendix B). In the literature, it is known as maximum a-
posteriori probability (MAP) detection strategy.
In case of a transmission over a channel that introduces additive white
Gaussian noise (AWGN), the MAP detection strategy becomes (see Appendix
B)1
1
When a bandpass transmission is considered, denoting by {s̃(i) (t)}M
i=1 the complex
envelopes of the transmitted signals and by r̃(t) the complex envelope of the received
3
4 Transmission systems with memory
D D 2
1 N0
ˆ ˆ
(i)
m̂ = argmax r(t)s (t)dt − s(i) (t) dt + ln P (m(i) ) . (1.3)
m(i) 0 2 0 2
where N0 is the one-sided power spectral density of the additive white Gaus-
sian noise. In this case, the receiver has to observe signal r(t) in the interval
[0, D] to take a decision. In other words, it is required to observe the received
signal on an interval of length equal to the duration of signals s(i) (t) only.
This kind of systems can be employed even when a large amount of data
has to be transmitted by simply repeating the transmission of the different
M-ary messages. Since each M-ary symbol carries log2 M information bits
and assuming that we transmit a new symbol every T seconds,2 it is possible
to transmit N bits by simply performing N/ log2 M M-ary transmissions,
thus requiring a total time TN given by
N
TN = T .
log2 M
In these communication systems, every transmission act can be considered
as independent of previous and following transmissions provided that both
T ≥ D and the transmitted symbols can be assumed as independent of
each other. In fact, when these two conditions are satisfied, the decision
on a given message is not influenced by the decisions taken on previous or
subsequent messages. For this reason, these systems are called memoryless.
When one of these two conditions is not satisfied, the communication system
is called with memory and the decision on one message cannot be taken
without considering previous or subsequent messages, as explained in the
following examples.
Example 1.1. Let us suppose that a memoryless transmission system, de-

signed to guarantee an error rate of 10−3 , is available. This error rate is
usually sufficient when we have a voice communication. Let us now assume
that we want to employ the same system for a data communication where a
lower error rate is required, let’s say 10−6 . We could achieve this goal through
the use of a channel encoder at the transmitter side and the corresponding
signal, strategy (1.3) becomes
"ˆ #
D D
1 (i) 2
ˆ
(i)∗
m̂ = argmax ℜ r̃(t)s (t)dt − s̃ (t) dt + N0 ln P (m(i) ) . (1.2)
m(i) 0 2 0
2
T is the so called signaling interval or symbol time.
1.1 – Introduction 5
decoder at the receiver, according to the scheme shown in Fig. 1.1. The
channel encoder employs codewords belonging to a given codebook. Since
not all sequences of symbols are allowed (i.e., not all sequences of symbols
belong to the codebook), the encoder operates by introducing correlation in
the sequence of symbols being transmitted. This memory has effect on the
received signal r(t) that, in a given signaling interval, will depend not only
on the corresponding symbol but also on previous and future symbols, and
on the decoding strategy that has to take this correlation into consideration.
♦
Example 1.2. Let us suppose that the transmission channel is bandlimited.

The channel will thus have an effect on the transmitted signals {s(i) (t)}M
i=1 . In
fact, even though we designed these signals in such a way that their duration
is D ≤ T , they will be broadened by the channel filtering and will interfere
with signals transmitted on adjacent symbol intervals. Thus, memory will
arise. We will say that intersymbol interference (ISI) occurs. We could avoid
ISI by increasing, for a given channel bandwidth, the value of T but this will
slow down the transmission. ♦
r(t)
ENC TRANSM CHAN REC DEC
Preexistent communication system
Figure 1.1: Communication system considered in example 1.1.
Although these two examples are representative of the two main reasons
that motivate the investigation of signals with memory, there are many other
possible sources of memory such as, to cite a few of them, the presence of
colored Gaussian noise or the dependence of the received signal from un-
known stochastic parameters (as the unknown phase of transmit and receive
oscillators or the channel fading). This latter scenario will be considered in
Chapter 3.
In order to investigate transmission systems with memory, it is required to
introduce a proper model for these signals that could describe the memory
effects. The corresponding receivers will be more complex than those for
memoryless systems and, in some cases, we will need to resort to suboptimal
detectors with a lower complexity and a worse performance.
w(t)
ak CHANN x(t) r(t) âk
SOURCE TRANSM REC
FILTER
Channel
Figure 1.2: Block diagram of a generic communication system .
1.2 General model for a modulated signal

Let us consider the generic communication system shown in Fig. 1.2, where ak
represents the information symbol (message) generated by the source during
the k-th signaling interval of duration T , the transmission channel is modeled
through a linear time-invariant filter that distorts the transmitted signal and
also introduces the AWGN w(t), and the receiver provides a decision âk for
each information symbol. Let us suppose that both information symbols ak
and decision âk belong to the following M-ary alphabet3
A = {a(i) }M (1) (2)

i=1 = {a , a , . . . , a
(M )
}.
We adopt the following notation: the superscript is used to enumerate the

elements of the alphabet whereas the subscript is used for the signaling inter-
val. As an example, a2 = a(3) means that the symbol generated by the source
during the second signaling interval is the third element of the alphabet.
In order to describe a general model that could represent the signals with
memory we will consider signal x(t) in Fig. 1.2 although this model holds
for all signals except for the noise components that have to be separately
added. Let us first consider a transmission system without memory where
possible signals are {si (t)}M i=1 . The dependence of the transmitted signal on
the information symbol can be simply expressed as


 s1 (t) if a = a(1)

 s2 (t) if a = a(2)
s(t; a) = .. 0<t<T . (1.4)

 .

 s (t) if a = a(M )
M
During the kth signaling interval, symbol ak is associated with signal s(t −
kT ; ak ), which has support in the interval [kT, (k + 1)T ]. Signal x(t) can be
expressed as the superposition of all slices related to the different signaling
3
Messages {m(i) }M (i) M
i=1 of the previous section will be called {a }i=1 from now on.
1.2 – General model for a modulated signal 7
intervals, i.e.,
K−1
X
x(t) = s(t − kT ; ak ) (1.5)
k=0
where we denoted with K the total number of symbols that have to be

transmitted. This number thus represents the transmission length whereas
KT represents the duration. In case of a continuous transmission, lower and
upper limits for index k can be −∞ and +∞, respectively.
s(t; 0) s(t; 1)
T
0 0
t T t
Figure 1.3: Set of transmitted signals in Example 1.3.
x(t)
1 1 0 0 1 0 1
0
T 2T 5T 7T
Figure 1.4: Transmitted signal in Example 1.3 corresponding to the sequence

1100101.
Example 1.3. A binary transmission system with information symbols

ak ∈ {0, 1} employs the signals depicted in Fig. 1.3. The transmitted signal
corresponding to the sequence 1100101 is depicted in Fig. 1.4. ♦
The signal model described in eqn. (1.5) can be simply modified in such
a way it can describe signals with memory. It is sufficient to introduce a
new variable representing a proper “state” of the transmission system. By
denoting with σk this state during the kth signaling interval, signal x(t) can
be expressed as
K−1
X
x(t) = s(t − kT ; ak , σk ) . (1.6)
k=0
With respect to the model in (1.5), the signal during the signaling interval
[kT, (k + 1)T ] is now
x(t) = s(t − kT ; ak , σk ) kT < t < (k + 1)T
and depends not only on the information symbol ak but also on the system
state σk . The system states will belong to the finite alphabet
Σ = {σ (ℓ) }Sℓ=1 = {σ (1) , σ (2) , . . . , σ (S) }
having S elements. The number of possible waveforms in that signaling inter-

val is thus MS and they have to be provided to properly define the model. In
addition, for a complete definition we also need to specify the law regulating
the evolution of the system states. By resorting to the classical finite-state
machine model, it is sufficient to define a law governing the transitions be-
tween successive states, i.e., the “next state” as a function of the present state
and information symbol:
σk+1 = t(ak , σk ) . (1.7)
t(ak , σk ) represents the state transition function.
Thus, the model can be completely specified by providing the MS wave-
forms associated with all possible pairs (a(i) , σ (l) ) and function (1.7). We can
do this by specifying:
A. a table that defines the waveforms s(t; a(i) , σ (ℓ) ) for each pair (a(i) , σ (ℓ) ),
i = 1, . . . , M and ℓ = 1, . . . , S, such as, for example, that in Fig. 1.5;
B. a state transition table, or function, describing the evolution of signal
states, i.e., the next state t(a(i) , σ (ℓ) ) when the present symbol and state
are a(i) and σ (ℓ) , respectively. An example of such a table is reported
in Fig. 1.5.
σ σ
a σ (1) σ (2) σ (3) σ (4) a σ (1) σ (2) σ (3) σ (4)
a(1) s1 (t) a(1) σ (2)
a(2) a(2) σ (1)
t(a(i) , σ (ℓ) )
Figure 1.5: Tables defining the model of a signal with memory.

1.3 – Coded linear modulations 9
An alternative description of the modulated signal is based on a state dia-

gram, i.e., on an oriented graph where nodes represent the states whereas
oriented edges represent the possible transitions among states. States σ (ℓ)
and σ (m) are connected through an edge, oriented from σ (ℓ) to σ (m) , if and
only if there exists a symbol a(i) such that t(a(i) , σ (ℓ) ) = σ (m) . In this case,
the edge will be labeled with that symbol and the corresponding waveform
s(t; a(i) , σ (ℓ) ), as depicted in Fig. 1.6.
Example 1.4. Let us consider a modulator with memory having 2 states,

denoted with σ + and σ − , and binary input ak ∈ {0, 1}. The modulator
associates a null signal with symbol 0, independently of the current state,
and signals p(t) or −p(t), of duration T , with symbol 1 depending on the
current state (p(t) when the current state is σ + and −p(t) when the current
state is σ − ). The modulator changes its state when the information symbol
is 1. Output and transition tables are reported in Fig. 1.7 along with the state
diagram. In Fig. 1.8 we report the modulated waveform s(t) corresponding
to a given information sequence {ak } under the assumption that the initial
state is σ0 = σ + . Notice that the waveform s(t) would have been different in
case of an initial state σ0 = σ − . ♦
We can observe that, in general, once the initial state has been selected,
the information sequence and the sequence of states are in a one-to-one cor-
respondence.
1.3 Coded linear modulations

Coded linear modulations can be described through the general model (1.6).
A coded linearly modulated signal can be expressed as4
K−1
X
x(t) = cn p(t − nT ) (1.8)
n=0
4
In case of passband transmissions, the right-hand side of (1.8) represents the complex
envelope of the modulated signal.
a(i) /sn (t)
σ (ℓ) σ (m)
Figure 1.6: Portion of a state diagram.

σ σ
a + − a 1/p(t)
σ σ σ+ σ−
0/0 0/0
+ −
0 0 0 0 σ σ
σ+ σ−
1 p(t) −p(t) 1 σ− σ+ 1/ − p(t)

t(·, ·)
Figure 1.7: Tables and state diagram for the Example 1.4.
p(t) s(t)
0 0
T t
ak = 0 0 1 1 0 1 0 1 1
σk = + + + − + + − − +
Figure 1.8: Waveform corresponding to a given information sequence for the

Example 1.4.
where code symbols {cn } depend on the information symbols {an } through
some coding law (we will see some examples in Chapters 4, 7, 9, and Appendix
D, but the interested reader can have a look at many textbooks such as
[1, 2, 3, 4]) and p(t) is the so called shaping pulse having finite energy. This
modulation is called linear because if we have
K−1
X
(i)
x (t) = c(i)
n p(t − nT ) i = 1, 2
n=0
then
K−1
X (1)
x(t) = αcn + βc(2)
n p(t − nT ) = αx(1) (t) + βx(2) (t)
n=0
i.e., the dependence of the modulated signal on the sequence {cn } is lin-
ear although, in general, the dependence of the code symbols {cn } on the
information symbols {an } is not.
In the most general case, the relationship between coded and information
symbols is still with memory and can be represented through a finite-state
model
cn = u(an , µn )
(1.9)
µn+1 = t(an , µn )
where µn denotes the encoder state (which belongs to an alphabet having car-
dinality Sc ) whereas u(an , µn ) and t(an , µn ) represent output and transition
functions, respectively. This coding law associates sequences of information
symbols {an } with sequences of code symbols {cn } through a one-to-one
correspondence (see Fig. 1.9). Again, the coding law can be also described
through tables or a state diagram. Code symbols cn will belong to an al-
phabet C which, in general, can have a larger cardinality with respect to the
alphabet A of information symbols an .
an cn
ENC
Figure 1.9: Encoder.
p(t)
L=3
T 2T 3T 4T t
Figure 1.10: Example of shaping pulse.
As far as the shaping pulse p(t) is concerned, it can have a duration

larger than the signaling interval T . Let us suppose that p(t) has a finite
duration and support in the interval (0, (L + 1)T ), i.e., non-zero values for
0 < t < (L + 1)T only, as shown in Fig. 1.10. When considering the interval
kT < t < (k + 1)T , we have at most L + 1 non-zero terms of (1.8), as
suggested by Fig. 1.11. Hence, in the interval kT < t < (k + 1)T we have
k
X
x(t) = s(t − kT ; ak , σk ) = ci p(t − iT )
i=k−L
= ck (ak , µk )p(t − kT ) + ck−1(ak−1 , µk−1 )p[t − (k − 1)T ]
+ · · · + ck−L (ak−L , µk−L )p[t − (k − L)T ] (1.10)

L=3
ck−3 p[t − (k − 3)T ]
ck−2 p[t − (k − 2)T ]
ck−1 p[t − (k − 1)T ]
ck p[t − kT ]
kT (k + 1)T
Figure 1.11: Pulses p(t − nT ) in (1.8) having non-zero values in the interval
kT < t < (k + 1)T (L = 3).
and we assumed that ci = 0 for i < 0 and i > K, otherwise all these
expressions are correctly defined only for L ≤ k ≤ K. We explicitly showed
the dependence of the code symbols on the information symbols and the
code states. From (1.10), we can conclude that the signal in the interval
kT < t < (k + 1)T depends, in addition to the current information symbol
ak , on a system state σk that can be defined as
σk = (ak−1 , ak−2 , . . . , ak−L ; µk−L )

= (µk , µk−1, . . . , µk−L) . (1.11)
These definitions are equivalent since the coding law associates, in a one-
to-one correspondence, symbols {ak−L , . . . , ak−1} with states {µk−L , . . . , µk },
once the initial state µk−L is fixed. The first definition of state σk allows
to also compute the number of possible system states as a function of the
number of encoder states Sc , the duration of pulse p(t), which is related to
parameter L, and the cardinality M of the information symbols. In fact, it
is
S = M L Sc . (1.12)
Notice that the state definition (1.11) and the coding law (1.9) allow to easily
compute the state transition function (1.7). We now consider a few special
cases.
1.3.1 Uncoded transmissions

In this case, signal x(t) can be expressed as
K−1
X
x(t) = an p(t − nT ) .
n=0
Based on (1.10), the system state is now defined as
σk = (ak−1 , ak−2 , . . . , ak−L )
and the number of states is M L . From state definition, we can easily find
the state transition function t(ak , σk ). In fact, the next state is
σk+1 = (ak , ak−1 , . . . , ak−L+1)
and can be simply obtained by discarding the oldest symbol in the definition
of σk , moving to the right all other symbols and adding symbol ak on the left.
For this reason, we will say that the system states will form a shift register
sequence.
p(t)
0
T t
Figure 1.12: Example of short pulse.
1.3.2 Pulse of duration at most T (short pulse)

Let us now suppose that pulse p(t) has a duration of at most one signaling
interval, as shown in Fig. 1.12. In this case, it is L = 0, and the signal in the
interval kT < t < (k + 1)T can be simply obtained from (1.10), i.e.,
x(t) = s(t − kT ; ak , σk ) = ck (ak , µk )p(t − kT ) .
Clearly the system state coincides with that of the encoder, i.e.,
σk = µk .
The number of states is thus S = Sc .

1.3.3 Uncoded transmissions with short pulse

This case combines the previous two special cases. In the generic interval
kT < t < (k + 1)T , signal x(t) reads
x(t) = ak p(t − kT ) .
The signal is thus memoryless.
Remark 1.1. Conceptually, a linearly modulated signal can be obtained

by generating a train of Dirac deltas modulated by symbols ck and filtering
it with a linear time-invariant filter having impulse response p(t)
X X
ck δ(t − kT ) ⊗ p(t) = ck p(t − kT ) .
k k
The first block diagram in Fig. 1.13 represents this interpretation. From
a practical point of view, a linear modulator is a device that receives the
discrete-time signal {ck } and associates a pulse with each symbol. It is thus
made by a pulse generator, as shown in the second block diagram of Fig. 1.13.
For simplicity, we will also use the graphical representation shown in the third
block diagram of Fig. 1.13. ♦
P P
k ck δ(t − kT ) k ck p(t − kT )
p(t)
block diagram 1
P
{ck } PULSE k ck p(t − kT )
GENER
block diagram 2
P
{ck } k ck p(t − kT )
p(t)
block diagram 3
Figure 1.13: Block diagrams representing a linear modulator.
Example 1.5. The modulator described in the example 1.4 is linear. In

fact, we can express the transmitted signal as
X
x(t) = ck p(t − kT )
k
where code symbols are ck = {0, ±1}. Functions u(ak , µk ) and t(ak , µk )
in (1.9) are described by the tables in Fig. 1.14. The state σk defined in
the Example 1.4 and µk differ for their names only, and thus also u(ak , µk )
and t(ak , µk ). In fact, the code state µk ∈ {±1}. Alternatively, we can
analytically express u(ak , µk ) and t(ak , µk ) as


 µk+1 = t(ak , µk ) = µk if ak = 0
−µk if ak = 1


ck = u(ak , µk ) = ak µk .
This code belongs to the family of line codes. It is used to shape the signal
power spectral density and is called alternate mark inversion (AMI) code. ♦
µ µ
a +1 −1 a +1 −1
0 0 0 0 +1 −1
1 +1 −1 1 −1 +1
u(ak , µk ) t(ak , µk )
Figure 1.14: Output and code functions for the example 1.5.
Example 1.6. Let us consider an M-ary phase shift keying (M-PSK) signal
with a rectangular shaping pulse. The complex envelopes of the transmitted
signals are
r
Es 2π
s̃i (t) = 2 e M (i−1) , 0 < t < T, i = 1, . . . , M .
T
Defining
n 2π oM
ak ∈ e M (i−1)
i=1
we can express the complex envelope as
r K−1
Es X
x̃(t) = 2 ak p(t − kT ) (1.13)
T k=0
where p(t) is the rectangular pulse shown in Fig. 1.15:

1 0<t<T
p(t) =
0 otherwise .
p(t)
0 T t
Figure 1.15: Rectangular pulse in the Example 1.6
Clearly, this signal is memoryless.

Let us now assume that the information symbols are differentially en-
coded, i.e., from the information symbols ak code symbols ck are obtained
as
ck = ak ck−1 (1.14)
n 2π oM
where ck ∈ e M (i−1) . Considering that |ck−1 |2 = 1, from (1.14) we
i=1
obtain
ak = ck c∗k−1
showing that the information is associated with the phase difference of two
consecutive code symbols. This explains the name of this coding rule. Its im-
portant property is that if ck and ck−1 are both rotated by the same arbitrary
quantity, the product ck c∗k−1 does not change.
ak ck
p(t) x̃(t)
Modulator
z −1 q
2 ETs
Differential encoder Amplifier
Figure 1.16: Modulator for a differentially encoded PSK.
The encoding rule (1.14) can be described through a finite-state machine

whose state is µk = ck−1 . In particular, eqn. (1.9) simply becomes

ck = ak ck−1
(1.15)
µk+1 = ck .
The differentially encoded signal is simply obtained by substituting ak with

ck in (1.13):
r K−1
Es X
x̃(t) = 2 ck p(t − kT ) .
T k=0
The block diagram of this modulator is shown in Fig. 1.16, whereas the state
diagram of the encoder is shown in Fig. 1.17 for a quaternary PSK (QPSK)
modulation, i.e., when M = 4. ♦
µ(1)
a(1) /c(1)
a(0) /c(0)
a(0) /c(2)
µ(2) µ(0)
a(3) /c(3)
a(1) /c(3)
µ(3)
a(0) /c(3)
Figure 1.17: State diagram for a differential encoder (M = 4).

1.4 Exercises
Exercise 1.1. Try to figure out a possible application of the differential
encoding rule described in the Example 1.6.
Exercise 1.2. Let us consider the raised cosine function


 T for |f | ≤ 1−α
2T
T πT 1−α 1−α 1+α
G(f ) = 1 + cos |f | − for < |f | ≤
 2 α 2T 2T 2T
0 otherwise
with vestigial symmetry around 1/2T and excess bandwidth factor (roll-
off ) α.
• Show that pulse g(t) = F −1 [G(f )] can be expressed as

cos(παt/T ) t
g(t) = 2
sinc .
1 − (2αt/T ) T
• Draw a qualitative graph of g(t) for α = 0, 0.5, and 1.

p
• Show that function P (f ) = G(f ) (the root raised cosine function)
can be expressed as
 √
 √T πT for |f | ≤ 1−α
2T
P (f ) = T cos 2α |f | − 1−α for 1−α
< |f | ≤ 1+α
 2T 2T 2T
0 otherwise
and draw its graph.
• Show that pulse p(t) = F −1[P (f )] can be expressed as
1 sin[π(1 − α)t/T ] + (4αt/T ) cos[π(1 + α)t/T ]
p(t) = √ .
T π[1 − (4αt/T )2](t/T )
• Draw a qualitative graph of p(t) for α = 0, 0.5, and 1.
Exercise 1.3. Let us consider the linearly modulated signal

∞
X
s(t) = ck p(t − kT )
k=−∞
whose symbols {ck } have known mean value and autocorrelation sequence:
E{ck } = η
E{ck+m c∗k } = R(m)
(it is thus a wide-sense stationary discrete-time process).
1.4 – Exercises 19
• Show that signal s(t) is a cyclostationary process with period T .

• Demonstrate that the power spectral density of process s̄(t) = s(t − τ ),
where τ is a random variable with uniform distribution in (− T2 , T2 ) and
independent of the sequence of symbols {ck }, is given by
W (f )
Ws (f ) = |P (f )|2
T
where ∞
X
W (f ) = R(m)e−2πmf T
k=−∞
and P (f ) is the Fourier transform of p(t).

• Compute the bandwidth of signal s̄(t) when the shaping pulse p(t) is :
– a rectangular pulse having duration T .

– a pulse with root raised cosine spectrum having roll-off α.
Exercise 1.4. Independent and equally likely symbols ak ∈ {±1} are en-
coded according to the following rule
ck = ak + ak−1 .
Code symbols ck ∈ {−2, 0, 2} are transmitted using a linear modulation with

shaping pulse p(t) having root raised cosine spectrum with roll-off 0. The
transmitted signal is thus
∞
X
x(t) = ck p(t − kT ) .
k=−∞
• Compute the power spectral density of the transmitted signal and draw
its graph.
• Find a shaping pulse p′ (t) that allows to express the signal in the form
∞
X
x(t) = ak p′ (t − kT ) .
k=−∞
• Draw the graph of |P ′(f )|, where P ′(f ) = F [p′ (t)], i.e., P ′ (f ) is the
Fourier transform of p′ (t), and provide an interpretation of the rela-
tionship between the two expressions of signal x(t).
Chapter 2
Sequence detection
2.1 MAP sequence detection strategy

In this chapter, we will consider the problem of detecting the information
symbols {ak } based on the noisy observation of a signal with memory. We
will thus assume that the received signal has expression
r(t) = y(t; a, σ 0 ) + w(t)
where
K−1
X
y(t; a, σ0 ) = s(t − kT ; ak , σ k ) (2.1)
k=0
is the modulated signal according to the general model for signals with mem-
ory (1.6) and w(t) represents the thermal noise, modeled as Gaussian and
white. We used
a = (a0 , a1 , . . . , aK−1)
to denote the sequence that has been really transmitted. We will use σ for
the corresponding sequence of states. In y(t; a, σ 0 ) we explicitly expressed the
dependence on the transmitted sequence and the initial state only since, once
the initial state σ 0 has been chosen, all other states in σ are automatically
determined by the sequence a. One of the possible sequences that can be
transmitted and the corresponding sequence of states will be denoted as a
and σ, respectively. We will assume that the receiver perfectly knows the
signal model, i.e., the waveform y(t; a, σ0 ) associated with the pair (a, σ0 ),
for all possible pairs, and the state transition function. The transmission
system we are referring to is shown in Fig. 2.1. Given the initial state σ 0 , the
received signal r(t) depends on the entire sequence of information symbols a.
21
22 Sequence detection
w(t)
{ak } CHANNEL y(t) r(t) âk
ENC MOD REC
FILTER
System with memory
Figure 2.1: Transmission system under consideration.
Since we have M K possible sequences of K symbols belonging to the M-ary

alphabet A, we have M K possible waveforms y(t; a, σ 0 ).1
The problem of finding a detection strategy for the receiver in Fig. 2.1
can be thus formalized as a detection problem where the number of possible
messages is M K and they are associated with signals of duration KT . The
MAP detection strategy, described in Appendix B and recalled in the pre-
vious chapter, provides an optimal solution to this problem, optimal in the
sense of the minimization of the probability that an error occurs when taking
a decision on the information sequence at the receiver. This strategy, known
as MAP sequence detection strategy, can be expressed as
â = argmax P (a|r)
a
where r is the vector collecting the components of the received signal on an

orthogonal basis that are relevant for detection and â denotes the detected
sequence.
This criterion that minimizes the sequence error probability gives the
same relevance to error events in which the detected sequence differs from
the transmitted one by only one symbol over K, or two symbols, or a larger
number of symbols (up to K). This criterion is conceptually justified in
case of packet transmissions with automatic repeat request (ARQ) where,
independently of the number of wrong decisions within a packet, the receiver
will ask for a re-transmission.
An alternative criterion, that weights those error events in a different
way, is that minimizing the symbol error probability (MAP symbol detection
strategy). This detection strategy can be expressed as
âk = argmax P (ak |r) .

ak
In some applications, this criterion could appear to be more suited to repre-

sent the real goals of a digital transmission system. The relevant detection
1
If σ0 is unknown, we have to consider all possible waveforms y(t; a, σ0 ) which are
SM K .
2.2 – Detection through the Viterbi algorithm 23
strategy results to be more complex than that minimizing the sequence er-
ror probability. In addition, in typical applications both criteria practically
have the same performance since, as we will see, the dominant errors of the
MAP sequence detection strategy are those where only a few symbols are er-
roneously detected (they are more frequent since associated with sequences
that can easily be equivocated). These dominant errors are often the only
errors that occur for signal-to-noise ratio values of practical interest. In other
words, although the MAP sequence detection strategy is based on the min-
imization of the sequence error probability, for reasons that are not directly
related to it, favors the choice of sequences with only a few symbol errors.
In this chapter, we will investigate the MAP sequence detection strategy
whereas the MAP symbol detection strategy will be discussed in Chapter 5.
Based on the results of the previous chapter, we can now formalize the
MAP sequence detection strategy. In case of a baseband transmission, MAP
sequence detection strategy is thus2
ˆ KT
1 KT 2 N0
ˆ
â = argmax r(t)y(t; a, σ0 ) dt− y (t; a, σ0 ) dt+ ln P (a) . (2.2)
a 0 2 0 2
In case of a passband transmissions, assuming that r̃(t) represents the com-
plex envelope of the received signal and ỹ(t; a, σ0 ) the complex envelope of
a possible signal that can be transmitted, MAP sequence detection strategy
becomes
ˆ KT
1 KT
ˆ
â = argmax ℜ ∗
r̃(t)ỹ (t; a, σ0 ) dt − |ỹ(t; a, σ0 )|2 dt+N0 ln P (a) .
a 0 2 0
(2.3)
2.2 Detection through the Viterbi algorithm

Direct implementation of the strategy (2.2) would require M K correlators or
matched filters (or M K S, if σ0 is unknown) and then the search over M K
possible sequences. Thus, the receiver complexity increases exponentially
with the sequence length. As an example, if we consider the transmission
of K = 8 quaternary symbols, the number of sequences, and thus filters,
becomes M K = 48 = 216 = 65536. From a practical point of view, it would
be impossible to implement the receiver except for very small values of K.
It is thus important to look for receiver architectures that result to be more
effective than the direct search among the M K possible sequences (brute-force
approach).
2
If the initial state is known, we have to substitute σ0 with the known initial state σ 0 .
Let us consider, for example, a baseband transmission and detection strat-

egy (2.2). The two integrals that appear in (2.2) can be computed as follows.
As far as the first integral is concerned, i.e., the correlation between the
received signal and the possible waveforms, we have
ˆ KT ˆ KT K−1
X
r(t)y(t; a, σ0 ) dt = r(t) s(t − kT ; ak , σk ) dt
0 0 k=0
K−1
X ˆ (k+1)T
= r(t)s(t − kT ; ak , σk ) dt
k=0 | kT {z }
zk (ak ,σk )
K−1
X
= zk (ak , σk )
k=0
having exploited the fact that s(t − kT ; ak , σk ) has support in the interval
[kT, (k + 1)T ], and defined
ˆ (k+1)T
zk (ak , σk ) = r(t)s(t − kT ; ak , σk ) dt .
kT
The second integral, which represents the energy of the possible waveforms,
becomes
ˆ KT ˆ KT K−1
X K−1
X
2
y (t; a, σ0 ) dt = s(t − kT ; ak , σk ) s(t − nT ; an , σn ) dt
0 0 k=0 n=0
K−1
X (k+1)T
ˆ
= s2 (t − kT ; ak , σk ) dt
k=0 | kT {z }
E(ak ,σk )
K−1
X
= E(ak , σk )
k=0
having exploited the fact that the product s(t − kT ; ak , σk )s(t − nT ; an , σn )

is zero unless n = k and defined
ˆ (k+1)T
E(ak , σk ) = s2 (t − kT ; ak , σk ) dt .
kT
Notice that E(ak , σk ) depends on k only through ak and σk . Finally, assuming

that the information symbols are independent of each other, the last term in
r(t) R (k+1)T zk (a(ℓ) , σ (m) )

kT
dt
s(t − kT ; a(ℓ) , σ (m) )
t = (k + 1)T
r(t) zk (a(ℓ) , σ (m) )
(ℓ) (m)
s(T − t; a , σ )
Figure 2.2: Front end processing.
(2.2) becomes
K−1 K−1
N0 N0 Y N0 X
ln P (a) = ln P (ak ) = ln P (ak ) .
2 2 k=0
2 k=0
Strategy (2.2) can be thus expressed as
X
K−1
1 N0

â = argmax zk (ak , σk ) − E(ak , σk ) + ln P (ak )
a
k=0
2 2
K−1
X
= argmax λk (ak , σk ) (2.4)
a
k=0
having defined
1 N0
λk (ak , σk ) = zk (ak , σk ) − E(ak , σk ) + ln P (ak ) .
2 2
Remark 2.1. Terms zk (a(ℓ) , σ (m) ) can be interpreted as the output, at time
(k + 1)T of a bank of correlators or matched filters according to the block
diagrams in Fig. 2.2. The number of filters (or correlators) is SM, S being
the number of states. We thus have a number of filters or correlators which is
independent of the transmission length. In fact, the same SM filters can be
reused in any symbol interval. Values {zk (a(ℓ) , σ (m) )} represent a sufficient
statistic for detection. ♦
Remark 2.2. Terms E(a(ℓ) , σ (m) ) can be interpreted as the energy of all
MS possible waveforms s(t; a(ℓ) , σ (m) ) of duration T . They can be precom-
puted and stored in the receiver. ♦
Remark 2.3. We have significantly reduced the front end complexity. How-
ever, we still have to find the sequence a which maximizes (2.4), i.e., which
maximizes the sum of K terms λk (ak , σk ). For each k, we have MS possible
terms, one for each pair (a(ℓ) , σ (m) ). ♦
Remark 2.4. When the system is memoryless, the dependence on σk dis-

appears. As a consequence (2.4) becomes
K−1
X
â = argmax λk (ak )
a
k=0
that can be implemented by selecting, symbol-by-symbol, the value of ak

maximizing λk (ak ). Thus, the strategy can be equivalently expressed as
âk = argmax λk (ak ) .

ak
We can conclude that when the system is memoryless, the receiver can take
independent decisions on each symbol based on λk (ak ) that, in turns, depends
only on the slice of the received signal in the interval [kT < t < (k + 1)T ].
In this case, MAP sequence and symbol detection strategies coincide. ♦
Remark 2.5. When the information symbols are equally likely, the term
that depends on the a-priori symbol probabilities becomes irrelevant and can
be discarded. In this case, the knowledge about the noise intensity becomes
irrelevant too and the MAP strategy coincides with the maximum likelihood
(ML) strategy. ♦
Although we simplified the front end complexity of the receiver, at a first
sight it seems that we still have to compute (2.4) for any possible sequence
a and look for the sequence providing the largest value. In order to find
some ways to simplify the receiver, we introduce a special graph or diagram
describing the time evolution of a finite-state machine. This diagram can
be obtained by adding the temporal dimension to the state diagram, i.e.,
by representing the system states for each discrete-time instant k, and the
transitions between subsequent states.
Example 2.1. In an uncoded binary transmission system, the information

symbols belong to the alphabet {±1}. A linear modulation is employed with
shaping pulse of length (L + 1)T , with L = 2. The system state is thus
defined as
σk = (ak−1 , ak−2)
−1
−1, −1
+1 −1
+1
+1, −1 −1, +1
−1
+1 −1
+1,+1
+1
Figure 2.3: State diagram for the system of the Example 2.1.
and we have S = 4 states. The state definition automatically provides the

state transition law, as usual when the states form a shift register sequence.
The state diagram is shown in Fig. 2.3. The evolution in time of the states of
this system can be represented through paths in the diagram of Fig. 2.4. As
an example, starting from the state (−1, +1), the sequence ak−1 = +1, ak =
−1, ak+1 = −1 will determine the state evolution shown with a bold line in
Fig. 2.4. ♦
These diagrams are made of sections, representing the transitions between
subsequent states, that are periodically repeated. They are called trellis
diagrams. We will see that they are essential in MAP sequence detection.
We said that there exists a one-to-one correspondence between the se-
quences of information symbols and, given the initial state, the sequences of
states. Hence, the main characteristic of a trellis diagram is that there exists
a one-to-one correspondence between the sequences of information symbols
and the paths into the trellis diagram. In addition, every pair (ak , σk ) iden-
tifies a given transition, or branch, on the trellis diagram. Hence, the term
λk (ak , σk ) can be interpreted as a cost, or metric, of the corresponding trellis
branch. Based on this observation, the MAP sequence detection problem can
be equivalently formulated as the search for the path on the trellis diagram
having the largest metric from time 0 to time K − 1.
Example 2.2. Let us consider the previous example. The information

sequence {ak }5k=0 is in a one-to-one correspondence with the sequence of
σk = (ak−1 , ak−2 )
(+1, +1)
(+1, −1)
(−1, +1)
(−1, −1)
k−1 k k+1 k+2
time
Figure 2.4: Trellis diagram for the system of the Example 2.1. Starting
from the state (−1, +1), the sequence ak−1 = +1, ak = −1, ak+1 = −1 will
determine the state evolution shown with a bold line.
σk = (ak−1 , ak−2 )
(+1, +1)
(+1, −1)
(−1, +1)
(−1, −1)
k
0 1 2 3 4 5 6
ak = +1 −1 +1 +1 −1 +1
σ2 σ4 σ6
σ3 σ5
Figure 2.5: Path corresponding to the information sequence

{+1, −1, +1, +1, −1, +1} in the the Example 2.2.
σ̄
Figure 2.6: The optimal path that includes the state σ̄ at time n and another
path including the same state.
states {σk }6k=2 . In this example, the initial state σ2 is simply specified by the
first two information symbols. Fig. 2.5 shown an example of path associated
with the information sequence {+1, −1, +1, +1, −1, +1}. ♦
The number of possible paths in the trellis is M K , as many as the infor-
mation sequences. However, not all paths have to be considered in the search
of that with the largest metric. In fact, if we proceed in a smart way, we can
consider a significantly lower number of paths. It is convenient to define a
partial path (sequence) metric as the sum of the metrics of all branches of
that path till the kth discrete-time instant:
k
X
Λk (a0 , a1 , . . . , ak ) = λi (ai , σi ) .
i=0
This definition allows us to express the partial metric in a recursive way, as

the sum of the partial metric at the previous step plus the metric of the last
branch:
Λk (a0 , . . . , ak ) = Λk−1 (a0 , . . . , ak−1 ) + λk (ak , σk ) .
Let us now assume that the optimal path includes the state σ̄ at time n,
as shown in bold in Fig. 2.6. Since, by definition, the path shown in bold is
optimal, its metric
K−1
X
ΛK−1(a0 , . . . , aK−1 ) = Λn (a0 , . . . , an ) + λi (ai , σi )
i=n+1
is the largest one. This implies that also the first term Λn (a0 , . . . , an ) is the
largest among all path metrics ending into the state σ̄ at time n. In fact,
k k+1
Survivor
Survivors
candidates
for the new state
Figure 2.7: Survivors’ extension.
if we were able to find another path, for example that denoted through a
dashed line in Fig. 2.6, with larger partial metric, it would be sufficient to
substitute it at the first part of the bold path to obtain an overall path with
a larger metric, thus contradicting the hypothesis that the bold path is the
optimal one.
In general, we do not know the state included in the optimal path at time
n—we only know that if it includes a given state σ̄ at time n, the partial path
would have the largest partial metric Λn (a0 , . . . , an ). Since this is true for
all states at time n, for each state it is sufficient to consider, among all M n
possible paths at time n, one path for each state only, that with the largest
metric among all paths ending in that state, as candidate to become the
optimal path. This path is called survivor of that state. Since it is sufficient
to consider, at time k, S survivors only, one for each state, we can denote
the relevant path metric to as Λk (σk ), thus highlighting the dependence on
the state only.
Let us now assume that we know the survivors and the relevant metrics
for all states σk at time k. In order to find the survivors and the relevant
metrics at the next time instant (i.e., for all states σk+1 ), it is sufficient to
consider all possible ways of extending the S survivors from time k to time
k + 1. The number of these possible candidates is equal to the number of
branches in a trellis section, i.e., SM. The survivor of each state σk+1 can
be obtained by choosing, among the candidates that terminate in that state,
that with the largest metric, as shown in Fig. 2.7. Hence, the updating rule
for the survivor metrics will be
Λk+1 (σk+1 ) = max [Λk (σk ) + λk (ak , σk )]

σk
where ak , σk , and σk+1 are constrained by the state transition law.

We thus described an algorithm that allows to implement the strategy
(2.4) in an efficient way since, at each discrete-time instant, it is sufficient to
select the S survivors and compute the relevant metrics, by using the survivor
C(a(1) , σ (1) )
t = (k + 1)T
λk (a(1) , σ (1) )
(1) (1)
s(T − t; a , σ )
C(a(2) , σ (1) )
t = (k + 1)T
λk (a(2) , σ (1) ) {âk }
r(t) (2) (1)
s(T − t; a , σ )
VA
C(a(M ) , σ (S) )
t = (k + 1)T
λk (a(M ) , σ (S) )
(M ) (S)
s(T − t; a , σ )
N0

C(a(ℓ) , σ (m) ) = − 21 E a(ℓ) , σ (m) + 2
ln P a(ℓ)
Figure 2.8: Receiver architecture.
and the relevant metrics at the previous discrete-time instant and the branch
metrics of that trellis section. At every step, we need to always perform the
same operations. Thus the complexity will be linear in the transmission
length K. This algorithm is known in the literature as the Viterbi algorithm
(VA). A. J. Viterbi, who first proposed this algorithm, did not realize that
the algorithm was optimal [5]. The optimality was demonstrated by G. D.
Forney Jr. a few years later [6].
We can thus conclude that the structure of a MAP sequence detector is
based on the following elements:
• a bank of filters (or correlators) matched to all SM possible waveforms
s(t; a(ℓ) , σ (m) ) composing the modulated signal; this bank of filters pro-
vides the SM sequences {zk (a(ℓ) , σ (m) )} representing a sufficient statis-
tic;
• a Viterbi processor implementing the strategy (2.4) and providing, at

time KT , the MAP sequence.
The receiver architecture is shown in Fig. 2.8.
2.2.1 Implementation aspects for the Viterbi algorithm

We said that, at each step, the VA needs to store in memory the S survivors
and the relevant metrics. Since at time k the length of each survivor is of k
k−5 k
Figure 2.9: Survivors’ merge.
branches, the memory required to store the S survivors is of kS symbols. It

thus linearly increases with k. For a transmission of K symbols we thus need
a memory of KS symbols. In addition, we have to wait till the end of the
transmission before taking a decision. However, for a practical implementa-
tion, by introducing proper approximations we can both reduce the memory
size and take earlier decisions.
By considering the way survivors are computed, we can say that every
time two of them have a common state, they are also identically coincident
in all previous states. As a consequence, if at time k we go backward over
the survivors and we find that they originate from a common state (we will
say that there is a fusion or a merge), as shown in Fig. 2.9, then we can
identify the MAP sequence till that state. Fig. 2.9 shows an example where
the survivors at discrete-time index k merge 5 steps before.
The fusion depth is a random variable. Since the survivors tend to merge,
the probability that this random variable takes large values is very small. If
we choose a sufficiently large value D for such a depth, almost surely we
will have a merge between the discrete-time instants k − D and k. Thus, we
will be able to take a decision âk−D that almost surely will coincide with the
MAP decision. Hence, we can take a decision with a decision delay D without
waiting for the transmission end. In addition, we can reduce the memory to
DS, since we only need to store the last D symbols of each survivor.
The condition that has to be satisfied such that this approximation does
not affect the overall performance is that the probability that a merge does
not occur in the window (k − D, k), is one or more orders of magnitude lower
than the target error probability of the system. If this condition is satisfied,
the procedure we adopt in (rare) cases when a merge does not occur in the
window (k − D, k) does not affect the system error probability that results
2.3 – MAP sequence detection for linear modulations 33
to be dominated by that of an ideal system with an unlimited memory to

store the survivors. In these cases, any procedure is adequate—we can choose
the symbol corresponding to the survivor that is temporarily the winner, or
we can go back on a survivor chosen randomly, or back always on the same
survivor.
When the states form a shift register sequence of length L, i.e.,
σk = (ak−1 , ak−2 , . . . , ak−L )

it has been observed that by choosing D between 2L and 5L, the performance
of the VA with decision delay D coincides with that of the ideal case (D = ∞)
for error probability values of practical interest.
2.3 MAP sequence detection for linear modu-

lations
In the case of coded linear modulations, the MAP sequence detection strategy
can be expressed in a simpler form. This time, we will assume a passband
transmission. The complex envelope of the received signal reads
r̃(t) = ỹ(t; a, σ 0 ) + w̃(t)
where the complex envelope of the signal component is

K−1
X
ỹ(t; a, σ0 ) = ck p(t − kT ) . (2.5)
k=0
and w̃(t) is the complex envelope of the thermal noise. The shaping pulse
p(t) will be assumed as having support in
0 < t < (L + 1)T .
Hence, the modulated signal is zero outside the interval
T0 = [0, (K + L)T ] .
We can observe that, when we considered the model (1.6), we assumed that
the signal had support in [0, KT ]. With this model (2.5), we have border
effects related to the duration of the shaping pulse that need to be considered.
Taking into account the different observation interval and using the ex-
pression (2.3) of the MAP detection strategy related to complex envelopes,
t = kT
r̃(t) xk
p∗ (−t)
x(t)
Figure 2.10: Front end filter and computation of xk .
that we report here for convenience,

ˆ
1
ˆ
∗
â = argmax ℜ r̃(t)ỹ (t; a, σ0 ) dt − |ỹ(t; a, σ0 )|2 dt + N0 ln P (a)
a T0 2 T0
(2.6)
we have to find simplified versions of the two integrals by using the special
form of the modulated signal. As far as the first integral is concerned we can
write
ˆ K−1
X K−1
X ˆ
r̃(t) c∗k p∗ (t − kT ) dt = c∗k r̃(t)p∗ (t − kT ) dt
T0 k=0 k=0 | T0 {z }
xk
K−1
X
= xk c∗k (2.7)
k=0
where we defined3
ˆ
xk = r̃(t)p∗ (t − kT ) dt
T0
ˆ +∞
= r̃(t)p∗ (t − kT ) dt
−∞
= r̃(t) ⊗ p∗ (−t)|t=kT (2.8)
and exploited the fact that p(t−kT ) is zero outside T0 for any k = 0, 1, . . . , K−
1. Eqn. (2.8) shows that xk can be interpreted as the sample, at time t = kT ,
of the output of a filter matched to pulse p(t), as shown in Fig. 2.10. This in-
terpretation assumes the use of an anticausal filter. However, from a practical
point of view, we can use a causal matched filter, i.e., a filter with impulse
response p∗ [(L + 1)T − t], thus introducing a delay of L + 1 discrete-time
instants on the sequence {xk }.
Let us now consider the second integral in (2.6). With similar considera-
3
Symbol ⊗ denotes “convolution”.
tions we have
ˆ ˆ K−1
X K−1
X
2
|ỹ(t; a, σ0 )| dt = ck p(t − kT ) c∗m p∗ (t − mT ) dt
T0 T0 k=0 m=0
K−1
X K−1
X ˆ
= ck c∗m p(t − kT )p∗ (t − mT ) dt
k=0 m=0 T0
K−1
X K−1
X ˆ +∞
= ck c∗m p(t − kT )p∗ (t − mT ) dt
k=0 m=0 −∞
K−1
X K−1
X
= ck c∗m p(t) ⊗ p∗ (−t)|t=(m−k)T
k=0 m=0
| {z }
gm−k
where we defined τ = t − kT and t − mT = τ − (m − k)T . We thus obtain

ˆ K−1
X K−1
X
2
|ỹ(t; a, σ0 )| dt = ck c∗m gm−k (2.9)
T0 k=0 m=0
where
ˆ +∞
∗
g(t) = p(t) ⊗ p (−t) = p(τ )p∗ (τ − t)dτ
−∞
gk = g(kT ) .
We can observe that the signal at the matched filter (MF) output is
K−1
X
∗
x(t) = r̃(t) ⊗ p (−t) = ck p(t − kT ) ⊗ p∗ (−t) + w̃(t) ⊗ p∗ (−t)
| {z }
k=0
n(t)
K−1
X
= ck g(t − kT ) + n(t) .
k=0
Thus, this pulse g(t) is not only the autocorrelation of the finite energy
pulse p(t), but can be also interpreted as the impulse response of the overall
transmission system up to the MF output. Similarly, the discrete-time pulse
gk can be interpreted as the impulse response of the overall discrete-time
system up to the sampler. Fig. 2.11 summarizes these interpretations. We
may observe that the discrete-time signal {xk } can be directly expressed as
the discrete convolution of sequences {ck } and {gk }, that is
K−1
X
xk = ci gk−i + nk (2.10)
i=0
t = kT
{ck } r̃(t) {xk }
p(t) p∗ (−t)
x(t)
w̃(t)
{ck } {xk }
gk
{nk }
Figure 2.11: Original filters and discrete-time equivalent model.
p(τ ) p(τ )
p(τ − LT )
0 (L + 1)T τ 0 LT τ
(L + 1)T
Figure 2.12: Computation of g(t) as the autocorrelation of p(t).
where nk = n(kT ) represents the sequence of additive noise samples. Usually,

these noise samples are not independent, since the noise n(t) is colored by
the matched filter.
Let us now consider pulse g(t) and its duration. We assumed that pulse
p(t) has support in the interval [0, (L + 1)T ]. Hence g(t) will result to be
zero outside the interval
−(L + 1)T < t < (L + 1)T
as exemplified by Fig. 2.12. In particular, the discrete-time pulse gk will

result to be zero for |k| > L. By changing the summation index, we can thus
express (2.10) as
XL
xk = gℓ ck−ℓ + nk
ℓ=−L
where we used again the agreement that code symbols ck are zero when k < 0
or k ≥ K.
Let us consider again the energy term (2.9). It is clearly a real quantity.4
This happens since pulse g(t) has Hermitian symmetry, being an autocorre-
lation function:
ˆ +∞
g(−t) = p(τ )p∗ (τ + t) dτ
−∞
ˆ +∞ ∗
∗
= p (τ )p(τ + t) dτ
−∞
ˆ +∞ ∗
∗ ′ ′ ′
= p (τ − t)p(τ ) dτ
−∞
= g ∗ (t) .
Similarly, the Hermitian symmetry also holds for the corresponding discrete-
time pulse5
g−k = gk∗ .
Eqn. (2.9) can be interpreted as the sum of all elements of a Hermitian

matrix. In fact, by defining
Akm = ck c∗m gm−k
it is
Amk = cm c∗k gk−m = (ck c∗m gm−k )∗ = A∗km .
The energy of all possible signals (2.9) can thus be obtained by summing the
elements on the main diagonal plus two times the real part of all elements of
the lower triangular part of matrix
 
A00 A01 ... A0,K−1
 A10 A11 ... A1,K−1 
 
A= .. .. .. ..  .
 . . . . 
AK−1,0 AK−1,1 . . . AK−1,K−1
4
It is also non negative, since g(t) and {gk } are an autocorrelation function and an
autocorrelation sequence, respectively.
5
The symmetry of the discrete-time pulse {gℓ } is related to the assumption of using the
anticausal matched filter p∗ (−t). In a practical implementation, we will introduce a delay
to make the filter causal. Pulse {gℓ } will be thus symmetric with respect to this delay.
We thus obtain
K−1
(K−1 k−1 )
X K−1
X K−1
X XX
ck c∗m gm−k = |ck |2 g0 + 2ℜ ck c∗m gm−k
k=0 m=0 k=0 k=1 m=0
K−1
(K−1 k )
X XX
= |ck |2 g0 + 2ℜ ck c∗k−ℓ g−ℓ
k=0 k=1 ℓ=1
K−1
(K−1 L )
X XX
2
= |ck | g0 + 2ℜ c∗k ck−ℓ gℓ (2.11)
k=0 k=0 ℓ=1
that holds for k ≥ L, having defined ℓ = k − m, exploited the fact that

gℓ = 0 for ℓ > L, conjugated the terms of the sum, and included the term
with index k = 0 since its value is zero (c−ℓ = 0 for ℓ > 0).
By using (2.7) and (2.11), strategy (2.6) can be expressed as
K−1
( ( L
) )
X 1 X
â = argmax ℜ xk c∗k − |ck |2 g0 − c∗k ck−ℓ gℓ + N0 ln P (ak )
a
k=0
2 ℓ=1
(2.12)
having exploited the fact that g0 is real. In (2.12), the code sequence {ck }
depends on the information sequence a. In particular, every code symbol ck
can be expressed as a function of ak and µk according to (1.9), i.e.,
ck = ck (ak , µk )
The state of the system is defined by (1.11), that we report here for conve-
nience
σk = (ak−1 , ak−2, . . . , ak−L ; µk−L ) .
Equivalently, we can also express ck = ck (ak , µk ) = ck (ak , σk ) since every
code symbol ck can be univocally associated with the branch (ak , σk ). By
interpreting each term of (2.12) as branch metric
( L
)
1 X
λk (ak , σk ) = ℜ xk c∗k − |ck |2 g0 − c∗k ck−ℓ gℓ + N0 ln P (ak ) (2.13)
2 ℓ=1
the strategy (2.12) can be implemented through the Viterbi algorithm.

By comparing (2.12) with (2.4), it is straightforward to show that the
following equalities hold
K−1
X K−1
1X
zk (ak , σk ) = ℜ{xk c∗k }
k=0
2 k=0
K−1 K−1
( L )
X X 1 X
E(ak , σk ) = |ck |2 g0 + ℜ c∗k ck−ℓ gℓ .
k=0 k=0
2 ℓ=1
t = kT
r̃(t) λk (a(i) , σ (j) )
∗
p (−t) ℜ{·}
x(t) xk
c∗ (a(i) , σ (j) ) Vk (a(i) , σ (j) )
Figure 2.13: Receiver structure.
However, they do not hold term-by-term.

As said, every code symbol ck can be univocally associated with the
branch (ak , σk ). The receiver front end is now composed of a single (complex)
matched filter. The receiver structure is shown in Fig. 2.13, where
1
Vk (a(i) , σ (j) ) = − |ck (a(i) , σ (j) )|2 g0 + N0 ln P (a(i) ) +
2 ( )
X L
−ℜ c∗k (a(i) , σ (j) )ck−ℓ (a(i) , σ (j) )gℓ .
ℓ=1
Remark 2.6. In the derivation of (2.12), we assumed that p(t) had a lim-
ited duration. Under this hypothesis, g(t) is also limited and so also gk .
However, the problem can be solved through the Viterbi algorithm under
looser conditions. In fact, it is sufficient that only gk has limited duration.
In other words, we do not need that g(t) = 0 for |t| > (L + 1)T (it can also
have infinite support) but it is sufficient that gk = 0 for |k| > L. Under this
hypothesis, the MAP sequence strategy can be implemented as a search of
the optimal path on a trellis diagram, thus on a diagram with a finite number
of states.
When p(t) has an infinite support, g(t) also has an infinite support. In
this case, both the modulated signal and the useful signal at the MF output
cannot be represented by using a finite-state machine. However, if gk = 0
for |k| > L, the discrete-time signal at the sampler output, that results to
be a sufficient statistic for detection, has a finite memory. We will see later
a few examples of pulses p(t) having infinite duration but such that the
corresponding sequence gk is such that gk = 0 for |k| > L. In these cases,
the observation interval that appears in (2.6) must be infinite (T0 → ∞). ♦
2.3.1 Uncoded transmission

When an uncoded transmission is considered, code symbols must be substi-
tuted with information symbols. The system state is given by
σk = (ak−1 , ak−2 , . . . , ak−L ) .
Thus, the number of states is S = M L . The branch metrics (2.13) become

( L
)
1 X
λk (ak , σk ) = ℜ xk a∗k − |ak |2 g0 − a∗k ak−ℓ gℓ + N0 ln P (ak ) .
2 ℓ=1
This expression shows that the branch metric λk depends on the information
symbols {ak, ak−1, , . . . , ak−L }, that define the pair (ak , σk ). In this case, the
system memory is related to non-zero elements of the discrete-time pulse gk ,
i.e., to the presence of intersymbol interference (ISI) into the discrete-time
signal xk .
2.3.2 Absence of ISI

Absence of ISI means that the following condition is met

1 k=0
gk =
0 k 6= 0 .
In this case, the discrete-time signal {xk } has expression
xk = ck + nk
and clearly depends on one code symbol only. The state of the system will
coincide with the encoder state
σk = µk .
It is thus S = Sc and the branch metrics (2.13) become
1
λk (ak , σk ) = ℜ {xk c∗k } − |ck |2 + N0 ln P (ak ) .
2
The receiver trellis diagram thus coincides with that of the encoder.
Remark 2.7. Branch metrics (2.13) can be also expressed as
1 1 1
λk (ak , σk ) = ℜ {xk c∗k } − |ck |2 + N0 ln P (ak ) + |xk |2 − |xk |2
2 2 2
1 1
= − |xk − ck |2 + N0 ln P (ak ) + |xk |2
2 2
∼ − |xk − ck |2 + 2N0 ln P (ak ) .
Strategy (2.12) thus becomes

( K−1
)
X
â = argmax − |xk − ck |2 + 2N0 ln P (a)
a
k=0
(K−1 )
X
2
= argmin |xk − ck | − 2N0 ln P (a) (2.14)
a
k=0
and corresponds to a search for the sequence {ak } corresponding to the code
sequence {ck } that, unless the term depending on the a-priori probability of
the information sequence, has the minimum square Euclidean distance from
the received sequence {xk }. This interpretation is related to the observation
that, for a linear modulation without ISI, the dimension of the signal space
is equal to the number of samples of the discrete-time signal at the sampler
output (see Exercise 2.1). ♦
Remark 2.8. We assumed that the code is such that a single code symbol
ck corresponds to a single information symbol ak (in other words the possible
redundancy is introduced by expanding the constellation cardinality). If
instead we have, for example, a convolutional code of rate 1/2, and thus a
(1) (2)
pair of code symbols (ck , ck ) corresponds to a single information symbol,
(1) (2)
denoting by (xk , xk ) the corresponding received samples, the branch metric
can be expressed as
(1) (1) (2) (2)
λk (ak , σk ) = −| xk − ck |2 − | xk − ck |2 + 2N0 ln P (ak ) .
♦
2.3.3 Uncoded transmission and absence of ISI

In this case, the discrete-time received signal {xk } is memoryless, i.e.,
xk = ak + nk .
Thus, the state is not defined and the trellis diagram degenerates into a single
state. Branch metrics (2.13) become
1
λk (ak ) = ℜ {xk a∗k } − |ak |2 + N0 ln P (ak ) . (2.15)
2
MAP sequence detection strategy becomes
K−1
X
â = argmax λk (ak )
a
k=0
and coincides with MAP symbol detection strategy

âk = argmax λk (ak ) .
ak
2.3.4 Considerations on the absence of ISI

As said, absence of ISI means that the following condition is met

1 k=0
gk =
0 k 6= 0
i.e., the samples of g(t) for t = kT are all zero except for k = 0. In the
frequency domain, this condition can be expressed as
+∞
X m
G f− = constant (2.16)
m=−∞
T
where G(f ) is the Fourier transform of g(t). A class of functions G(f ) that
satisfy those conditions is that of the so-called raised cosine (RC) functions


 T |f | < 1−α
2T

T
G(f ) = 2
1 + cos πTα
|f | − 1−α
2T
1−α
2T
< |f | < 1+α
2T
(2.17)



0 altrimenti.
where parameter 0 ≤ α ≤ 1 is the excess bandwidth or roll-off factor. In
general, functions satisfying condition (2.16) are said to have a vestigial sym-
metry around f = 1/2T .
If we require that the Nyquist condition is met and taking into account
that the optimal front end filter has to be a filter matched to the shaping
pulse p(t) of the linearly modulated signal, we can choose p(t) such that the
following integral equation is satisfied
p(t) ⊗ p∗ (−t) = g(t)
2.4 – Whitened matched filter front end 43
with g(t) satisfying the Nyquist condition. This equation can be easily solved
by working into the frequency domain, i.e.,
P (f )P ∗(f ) = G(f ) =⇒ | P (f ) |2 = G(f )
from which p
| P (f ) | = G(f ) .
Hence, it is sufficient to choose p(t) such that its amplitude spectrum is the
square root of a function having vestigial symmetry. The phase spectrum of
p(t) can be arbitrary since it will be perfectly compensated by the matched
filter which has opposite phase response. Hence, P (f ) can have a root raised
cosine (RRC) spectrum.
2.4 Whitened matched filter front end

We said that the noise samples at the MF output are not independent. This
is because the MF usually colors the white noise at its input. The autocor-
relation function and the power spectral density (PSD) of the noise n(t) at
the MF output are
Rn (τ ) = 2N0 g(τ )
Sn (f ) = 2N0 |P (f )|2 . (2.18)
In fact, the autocorrelation function of a process y(t) at the output of a

filter with impulse response p(t) and input process x(t) with autocorrelation
function Rx (τ ) has expression
Ry (τ ) = Rx (τ ) ⊗ p(t) ⊗ p∗ (−τ ) .
Notice that, since w̃(t) is the complex envelope of a white noise process w(t)
with PSD N0 /2, its PSD is 2N0 .
If we consider the discrete-time noise process nk = n(kT ), its autocorre-
lation sequence is
Rn (m) = E {nk+m n∗k } = E {n [(k + m)T ] n∗ (kT )}

= Rn (mT ) = 2N0 gm .
We thus conclude that the discrete-time impulse response gk is also propor-

tional to the autocorrelation of the noise samples. In the particular case of
absence of ISI, the noise samples result to be uncorrelated and thus indepen-
dent.
An alternative approach to MAP sequence detection, that allows to sim-

plify the branch metrics in case of presence of ISI at the price of a more com-
plex receiver front end, is based on the use of a discrete filter that whitens
the correlated noise samples. This whitening filter can be obtained through
a spectral factorization procedure that will be described in the following.
The PSD of a discrete-time random process is defined as the Fourier
transform of its autocorrelation sequence. The PSD of {nk } is thus
L
X
Sn (f ) = 2N0 gℓ e−2πℓf T . (2.19)
ℓ=−L
Although we are using the same symbol, this function represents the PSD
of {nk } and not that of the continuous-time n(t) noise in (2.18). As known,
these two functions are related through the equation
1 X (c) m
+∞
Sn(d) (f ) = Sn f −
T m=−∞ T
where superscripts (d) and (c) denote the fact that we are referring to a
discrete- or a continuous-time process, respectively.
Let us now consider the bilateral Z-transform of sequence {gℓ }Lℓ=−L
L
X L
X 2L
X
−ℓ −L −ℓ+L −L
G(z) = gℓ z =z gℓ z =z gL−m z m (2.20)
ℓ=−L ℓ=−L m=0
where we defined m = L − ℓ. This function has L poles for z = 0 and 2L

zeros in the complex plane. It clearly results

Sn (f ) = 2N0 G e2πf T = 2N0 G(z)|z=e2πf T . (2.21)
The PSD is thus obtained by computing the function 2N0 G(z) on the unit
circle.
The Hermitian symmetry of pulse {gℓ } implies a particular symmetry of
function G(z). In fact, the Z-transform of g−ℓ
∗
is
L
X L
X −ℓ
L
X
∗ −ℓ 1
g−ℓ z = =gℓ∗z ℓ gℓ∗
ℓ=−L ℓ=−L ℓ=−L
z
L −ℓ ! ∗
X 1 1
∗
= gℓ ∗
=G
ℓ=−L
z z∗
1
ρ∗
ρ
ρ∗
1
ρ
Figure 2.14: Zeros position on the complex plane.
∗
and since gℓ = g−ℓ , we have

∗ 1
G(z) = G .
z∗
Thus, if ρ is a zero of G(z), we have

∗ 1
G(ρ) = G =0
ρ∗
and, hence, also 1/ρ∗ is a zero. This pair of zeros is located in the complex
plane as shown in Fig. 2.14. In particular, if ρ is inside the unit circle, 1/ρ∗
is necessarily outside. Instead, if ρ is on the unit circle, it will be double.
Let us denote by z1 , . . . , zL , all zeros of G(z) inside the unit circle (|zi | ≤
1). The remaining L zeros are 1/z1∗ , . . . , 1/zL∗ , clearly located outside the unit
circle (|1/zi | ≥ 1). Function G(z) can be factored in the following way
YL YL
1
G(z) = z −L g−L (z − zi ) (z − ∗ )
i=1 i=1
zi
YL L L
L 1 Y Y
= (−1) g−L ∗
(1 − zi z ) (1 − zi∗ z)
−1
zi
| {z i=1 } i=1 i=1
α
L
Y L
Y
= α (1 − zi z ) (1 − zi∗ z) .
−1
(2.22)
i=1 i=1
It is possible to show that constant α is real and positive. In fact, let us

consider coefficients of the power z −L in both (2.20) and (2.22). Since they
need to be equal, it is
Y L
zi
gL = g−L
z∗
i=1 i
that means
YL YL
1 1
g−L ∗
= gL .
z
i=1 i
z
i=1 i
Remembering that gL = g−L ∗
, we infer that both sides of this equation, and
thus constant α, are real. In addition, it is
L
Y
G(1) = α |1 − zi |2 > 0
i=1
since G(1) is proportional (through the positive constant 2N0 ) to the noise
PSD at frequency f = 0 that must be positive.
Now, defining
L L
√ Y −1
√ −L Y
F (z) = α (1 − zi z ) = αz (z − zi )
i=1 i=1
we have
L
!∗ L
1 √ Y √ Y
∗
F = α (1 − zi z ∗ ) = α (1 − zi∗ z) . (2.23)
z∗ i=1 i=1
We can thus factor G(z) as

∗ 1
G(z) = F (z)F (2.24)
z∗
where F (z) has L poles for z = 0 and L zeros inside or, at most, on the unit
circle, whereas F ∗ (1/z ∗ ) has L zeros outside or, at most, on the unit circle.
By using (2.21) and (2.24), the noise PSD results to be

Sn (f ) = 2N0 G e2πf T

= 2N0 F e2πf T F ∗ e2πf T
2
= 2N0 F e2πf T .
Let us now consider the two discrete-time filters with transfer function 1/F (z)
and 1/F ∗ (1/z ∗ ), respectively. If these filters have at their input the noise
sequence {nk }, the PSD of the noise at the output is, in both cases,
Sn (f ) Sn (f )
2 = ∗ 2πf T 2 = 2N0 .
F e2πf T F e
| {z } | {z }
Sw′ (f ) Sw′′ (f )
1
F (z)
{wk′ }
{nk }
1
F ∗ (1/z ∗ )
{wk′′}
Figure 2.15: Possible whitening filters (WFs).
Hence, the noise at the output of any of these two filters is white. In Fig. 2.15,
we report the two filters and the related white noise processes {wk′ } and {wk′′ }.
The following considerations hold.
i. Poles of 1/F ∗ (1/z ∗ ) are the zeros of F ∗ (1/z ∗ ) and they are all outside
the unit circle. Thus, the region of convergence of 1/F ∗ (1/z ∗ ) contains
at least the unit circle. Hence, there exists a left-sided sequence, i.e.,
anticausal, that has 1/F ∗ (1/z ∗ ) as Z-transform (see Appendix E). It
will be the impulse response of this first filter. Since it is anticausal,
it can be implemented in an approximate way only, by truncating the
impulse response and introducing a proper delay to make it causal.
ii. Poles of 1/F (z) are the zeros of F (z) and they are all inside the unit
circle. The region of convergence is, at least, the region outside the
unit circle. Hence, there exists a right-sided sequence, i.e., causal, that
has this function as Z-transform.
Both filters can be used to whiten the noise, as shown in Fig. 2.15. We will
choose the first anticausal WF, that can be implemented in an approximate
way.
Let us now consider the discrete-time equivalent model of the overall sys-
tem up to the sampler and put the WF at its output, as shown in Fig. 2.16(a).
By exploiting the factorization G(z) = F (z)F ∗ (1/z ∗ ), we can observe that the
system up to the output of the WF is equivalent to that shown in Fig. 2.16(b),
where {wk } is a white noise sequence whose samples have variance 2N0 . Since
L L
√ Y −1
X
F (z) = α (1 − zi z ) = fℓ z −ℓ
i=0 ℓ=0
the sequence {fℓ }Lℓ=0 can be interpreted as the impulse response of the discrete-
time equivalent model with white noise of the overall system represented in
{nk }
{ak } {ck } {xk } {yk }

1
COD G(z) F ∗ (1/z ∗ )
(a)
{wk }
{ak } {ck } {yk }

COD F (z)
(b)
Figure 2.16: (a) Whitening filter at the output of the discrete-time equivalent
model of the overall system. (b) Discrete-time equivalent model with white
noise of the overall system.
Fig. 2.16(b). The WF output can be expressed in the time domain as

L
X
yk = fℓ ck−ℓ + wk . (2.25)
ℓ=0
According to the reversibility theorem [7], the WF output is a sufficient statis-

tic for MAP sequence detection since the WF is invertible. In fact, we can
come back to {xk } by simply filtering {yk } with filter F ∗ (1/z ∗ ).
We are now able to explain why we did not use 1/F (z) as WF. In this
case, the discrete-time equivalent model at the WF output would be that in
Fig. 2.17 with equivalent impulse response of the overall system that can be
obtained through the inverse transform of
L
∗ 1 √ Y
F = α (1 − zi∗ z)
z∗ i=1
L
X 0
X
= fℓ∗ z ℓ = ∗ −ℓ
f−ℓ z .
ℓ=0 ℓ=−L
This impulse response {f−ℓ }ℓ=−L will be anticausal. The solution described
∗ 0
by (2.25) has to be preferred since its equivalent impulse response has min-
imum phase, i.e., all its zeros are inside the unit circle. On the contrary,
the second solution has maximum phase, i.e., all its zeros are outside the
unit circle. They have the same amplitude response and differ for the phase
{wk }
{ak } {ck } 1
{yk′ }
∗
COD F z∗
Figure 2.17: Discrete-time equivalent model at the WF output when filter

1/F (z) is used as WF.
sufficient
statistics
t = kT
r̃(t) {xk } {yk }
1
p∗ (−t) F ∗ (1/z ∗ )
Figure 2.18: Input stage of the receiver.
response only. It is possible to demonstrate that, for a given amplitude re-

sponse, minimum-phase filters are those whose impulse response has energy
concentrated in its initial part. We will see that this property results to be
very important when techniques for complexity reduction are adopted.
In summary, the input stage of the receiver is that shown in Fig. 2.18
and the discrete-time equivalent model with white noise of the overall system
is represented by Fig. 2.16(b). The concatenation of the matched filter,
sampler, and WF is called whitened matched filter (WMF). Parameter L
denotes the number of code symbols that interfere with the present symbol ck
in this discrete-time equivalent model with white noise. It is called dispersion
length of the (equivalent) channel.
We said that when G(z) has zeros on the unit circle, they have double
multiplicity. One of these double zeros will thus become a pole on the unit
circle of the whitening filter, which will result to be unstable. This case can
be handled with a more complex procedure that will be now mentioned. A
zero of G(z) on the unit circle represents a zero in the PSD (2.19) of noise
{nk }. This zero appears because of a zero in the amplitude response |P (f )|
of the matched filter. By changing this MF we can remove this zero and thus
the corresponding pole of the whitening filter, thus solving the mentioned
stability problems. In this case, the WMF can no more be interpreted as the
cascade of the elements in Fig. 2.18 and this justifies the name introduced
on purpose for this filter.
Based on the discrete-time equivalent model with white noise shown in
Fig. 2.16(b) and described by (2.25), the sufficient statistic {yk } is a signal
with memory whose state is
σk = (ak−1 , ak−2 , . . . , ak−L , µk−L )
that coincides with that required for sequence detection based on signal {xk }.
The MAP sequence detection strategy can be obtained by simply observ-
ing that (2.25) corresponds to a vector channel with additive Gaussian noise
having independent and identically distributed samples. This vector channel
can be expressed as
y = s(a) + w (2.26)
where the kth element of s(a) is
L
X
sk (a) = fℓ ck−ℓ . (2.27)
ℓ=0
Remembering that the code symbols are {ck }K−1 k=0 , the elements of y relevant
K−1+L
for detection are {yk }k=0 . Using (1.1), the detection strategy becomes
â = argmax P (a|y) = argmax f (y|a)P (a)

a a
= argmax ln f (y|a) + ln P (a)
a
= argmin ky − s(a)k2 − 2N0 ln P (a)
a
K−1+L
X
= argmin |yk − sk (a)|2 − 2N0 ln P (ak )
a
k=0
 2 
X 
K−1+L

L
X


= argmin yk − fℓ ck−ℓ − 2N0 ln P (ak ) .
a  
k=0 ℓ=0
It can be implemented through the VA with branch metrics

2
L
X

λk (ak , σk ) = yk − fℓ ck−ℓ − 2N0 ln P (ak ) . (2.28)

ℓ=0
We can obtain an alternative formulation of this strategy, as an example by

removing kyk because it is irrelevant. In this way, we obtain
1
â = argmax ℜ yT s∗ (a) − ks(a)k2 + N0 ln P (a)
a
 ( 2 
) L 2
K−1+L
X  L
X
1 X 
∗ ∗
= argmax ℜ yk fℓ ck−ℓ − fℓ ck−ℓ + N0 ln P (ak )
a  2 
k=0 ℓ=0 ℓ=0
2.5 – Performance of MAP sequence detectors 51
whose branch metrics are6

( ) 2
L
X 1 XL

λk (ak , σk ) = ℜ yk fℓ∗ c∗k−ℓ − fℓ ck−ℓ + N0 ln P (ak ). (2.29)
2
ℓ=0 ℓ=0
This second approach that makes use of a WMF has been proposed in
1972 by G. D. Forney [8]. Two years later, in 1974, G. Ungerboeck published
in [9] the approach we discussed in Section 2.3. These two solutions are
both optimal. In addition, they work on the same trellis diagram, thus
they have the same complexity. The Ungerboeck approach makes use of a
simpler front end but employs the branch metrics (2.13) that are slightly more
complex. The Forney approach employs a more complex front end filter but
uses simpler branch metrics (2.29). As a conclusion, these two solutions are
perfectly equivalent, at least unless we do not consider some aspects related
to complexity reduction that will be discussed in Chapter 6.7 In the case of
absence of ISI, both approaches perfectly coincide. This can be verified by
considering that, in the absence of ISI, gℓ = 0 for ℓ 6= 0 in (2.13) and fℓ = 0
for ℓ 6= 0 in (2.29).
The MAP sequence detection strategy is often applied to the case of
equally likely information symbols. In this case, it coincides with the maxi-
mum likelihood strategy. For this reason, it is often referred to as maximum
likelihood sequence detection (MLSD) and the metric is often called likelihood
function.
2.5 Performance of MAP sequence detectors

We said that the MAP sequence detection strategy minimizes the sequence
error probability (instead of the symbol error probability). We also said
that the adoption of this criterion is often preferred for its simplicity more
than for theoretical reasons. Nevertheles, we may still have interest in the
computation of the symbol or the bit error probability for the MAP sequence
detection strategy.
As discussed, each sequence of information symbols is in a one-to-one cor-
respondence with a path on the trellis. Hence, a detection error is equivalent
to the choice of a wrong path on the trellis, such as that shown in Fig. 2.19.
Under typical operating conditions, characterized by low error probabilities,
the detected path will differ from the correct path for some occasional de-
partures from the correct path. They will locally diverge and then converge
6
Remember that the VA can be used to either maximize or minimize a metric.
7
The effect of complexity reduction on these two strategies is still subject of investiga-
tion.
Correct path
Occasional departure
Figure 2.19: Error events
Correct path
k k+1 k+H k+H +1
Detected path
Figure 2.20: Error event of length H.
again on the correct path. They are called error events and represent the
basis for the performance analysis of these receivers.
We say that at discrete-time k an error event of length H begins, if correct
and detected paths differ for the states σk+1 , . . . , σk+H , as shown in Fig. 2.20.
The duration H of the error event thus represents the number of wrong states
in the detected path.
Example 2.3. Let us consider a coded linear modulation affected by ISI.

The system state is defined as
σk = (ak−1 , . . . , ak−L , µk−L )
where µk is the encoder state at time k and L the channel dispersion length.
Let us suppose that at time k an error event of length H begins. The state
definition assures that symbols ak−L , . . . , ak−1 are correctly detected. The
first wrong state is σk+1 , defined as
σk+1 = (ak , ak−1, . . . , ak−L+1 , µk−L+1) .

In order to have an error event beginning at time k, symbol ak must be

wrong. Since the error event has duration H, the next correct state will be
σk+H+1 , defined as
σk+H+1 = (ak+H , ak+H−1 , . . . , ak+H−L+1, µk+H−L+1)
and thus all information symbols ak+H , . . . ak+H−L+1 will be correct. The last
wrong state is
σk+H = (ak+H−1 , . . . , ak+H−L+1 , ak+H−L , µk+H−L ) .
Since σk+H+1 is correct, all information symbols ai with k + H − L + 1 ≤
i ≤ k + H − 1 will be correct. Thus, since σk+H is wrong either ak+H−L or
µk+H−L must be wrong. We can conclude that:
• in order to have an error event starting at time k, symbol ak must be
wrong;
• the further symbols that could be wrong are
ak+1 , . . . ak+H−L ;
• this means that we can have an error event only if H ≥ L. When

the system is uncoded, we can have H = L. Otherwise, it will be
H > L. In other words, before an error will propagate through the
system state, it is required that at least L time instants will pass in the
case of an uncoded transmission or more than L in the case of a coded
transmission. ♦
Example 2.4. With reference to the previous example, let us consider an

uncoded system with L = 2. The system state will be
σk = (ak−1 , ak−2 ) .
Let us also suppose that information symbols are binary, i.e., ak ∈ {±1}.
Fig. 2.21 shows some examples of error events of duration H = 2, 3, 4 begin-
ning at time k and deviating from the correct path that corresponds to the
sequence of all symbols equal to 1. Notice that in the case “H = 4 second”
we have ak wrong, ak+1 correct, and ak+2 wrong. We can have similar error
events departing from any correct path. ♦
There are different equivalent ways to represent the error events. In the
following, in order to simplify the notation we will not use the overline to
denote the transmitted sequence since there is no more need to destinguish
between the transmitted and the generic sequences. Three possible alterna-
tives are:
ak wrong ak+1 wrong ak+2 wrong

H=2 H=3 H = 4 second
ek = −2 ek+1 = −2 ek+2 = −2
Correct path
+1 +1 +1
(+1, +1)
(+1, −1)
−1 +1 −1
(−1, +1)
−1
−1
(−1, −1)
ak+1 correct ak+2 wrong

H = 4 second H = 4 first
ek+1 = 0 ek+2 = −2
Figure 2.21: Error events in the Example 2.4.
• the pair given by the correct and detected sequences of states

(σ, σ̂)
that are constrained by the state transition function;
• a pair given by the correct and the detected information sequences
(a, â)
both arbitrary;
• by defining the error sequence
e = â − a
where
ek = âk − ak
we can consider the pair given by the correct information sequence and
the error sequence
(a, e) .
Unless otherwise specified, we will consider this latter notation. Notice that
if in the case of the second notation all sequences a and â are possible, in
the third case, given the error sequence e, not all information sequences a
are possible.
Based on the previous examples (2.4 e 2.5), in the case of linear modula-
tion an error event beginning at time k and of duration H is characterized
by the error sequence
. . . , 0, ek , . . . , ek+H−L , 0, . . .
in which ek 6= 0 for sure. In the uncoded case, the information symbol

ak+H−L is necessarily detected in a wrong way (and thus ek+H−L 6= 0) because
otherwise the error event would converge prematurely. We cannot draw the
same conclusion in a coded case because the two paths can differ for the
encoder state µk+H−L .
When we have an error event, we will have one, or more, symbol (or bit)
errors. We will now link the error and bit error probabilities to the probability
that an error event happens. Let us denote an error event starting at time i
by
εi = (a, e)
where
e = (. . . , 0, ei , . . . , ei+H−L , 0, . . . ) .
(j)
In addition, we denote by εi a specific error event εi in which the superscript
j specifies the pair of sequences (a, e), hence the “shape” of the error event.
(j) (j)
As an example εi and εk have the same shape, but a different beginning
time.
By using the law of total probability, we have
XX (j) (j)
Ps = P (âk 6= ak ) = P (âk 6= ak | εi )P (εi ) .
j i
The first summation is extended to all possible shapes, the second to all
beginnings of the error events producing an error at time k. Let us define
the indicator function
 (j)
 1 if εi produces an error at time i + m,

(j)
qm [εi ] = i.e., m instants after its beginning


0 otherwise.
(j)
As an example, for a linear modulation, it is q0 [εk ] = 1, since an error event
(j)
that begins at time k is characterized by ek 6= 0; in addition, it is q−1 [εk ] = 0
(j)
since at time k − 1 the error event has not started yet; q1 [εk ] can be either
1 or 0 depending on the fact that either ek+1 6= 0 or ek+1 = 0. Thus, it is
(j) (j)
P (âk 6= ak | εi ) = qk−i [εi ]
and hence XX (j) (j)

P (âk 6= ak ) = qk−i [εi ]P (εi ) .
i j
(j)
The probability of the error event εi rigorously depends on the begin-
ning instant i, in addition to the shape j. However, if the transmission is
sufficiently long, we may reasonably imagine that it becomes independent of
i, except for an initial transient. Under this steady-state condition, we have
X X (j)
P (âk 6= ak ) = P (ε(j)) qk−i [εi ] .
j i
The second summation represents the number of wrong symbols correspond-

ing to the error event with shape (j). By defining
X k
X
(j) (j) (j)
w(ε ) = qk−i [εi ] = qk−i [εi ]
i i=−∞
∞
X (j)
= qm [εk−m ]
m=0
that results to be independent of k, we have

X
P (âk 6= ak ) = w(ε(j))P (ε(j) ) (2.30)
j
where the sum has to be extended over all error events beginning at any
given time k. The error probability is thus equal to the average number of
wrong symbols over all error events beginning at a given instant.
With alternative notation, the error probability (2.30) can be expressed
in the form
X X
P (âk 6= ak ) = w(e)P (a, e)
e a∈A(e)
X X
= w(e) P (e | a)P (a) (2.31)
e a∈A(e)
where
• the sum is extended to all error sequences beginning at time k;

• A(e) is the set of information sequences a compatible with the error
sequence e, i.e., such that âk = ak + ek belongs to the alphabet of the
information symbols; as an example, for the binary alphabet {±1}, if
ak = 1 then ek ∈ {0, −2}, whereas if ek = −2 then ak = 1;
• P (e | a) is the probability that, if the sequence a is transmitted, the
error sequence e occurs; it is thus the probability of the error event
(e, a) given that sequence a is transmitted;
• P (a) is the a-priori probability of sequence a.
2.5.1 Upper bound on the error probability

Denoting by Λ(a) the sequence metric that has to be maximized, the error
probability P (e | a) can be upper bounded with the pairwise error probability
(PEP), i.e.,
P (e | a) ≤ P [Λ(a + e) > Λ(a) | a] .
This relation allows to obtain an upper bound on the symbol error probability
based on the PEPs. In fact,
X X
Ps ≤ w(e) P [(Λ(a + e) > Λ(a) | a] P (a) .
e a∈A(e)
In the particular case that P [Λ(a + e) > Λ(a) | a] does not depend on a but
only on e, we will say that the uniform error property (UEP) holds. In this
case, we obtain
X
Ps ≤ w(e)P [Λ(a + e) > Λ(a) | a] P [a ∈ A(e)] (2.32)
e
where {a ∈ A(e)} represents the event that, given the sequence e, a generic
information sequence a is compatible with it.
In a similar way, we may obtain an upper bound on the bit error prob-
ability. By denoting with b(e, a) the number of bit errors corresponding to
the error event (e, a), similarly to (2.31) we have
1 X X
Pb = b(e, a)P (e | a)P (a) .
log2 M e
a∈A(e)
We can observe that whereas in (2.31) it is obvious that w(e) is indepen-

dent of a by definition, a similar property does not hold for b(e, a). Fac-
tor 1/ log2 M can be explained by observing that Ps can be interpreted as
the ratio between the average number of symbol errors and the number of
transmitted symbols. By substituting w(e) with b(e, a) we obtain the ratio
between the average number of bit errors and the number of transmitted
symbols. Thus, it has to be divided by the number of bits per symbol.
If the UEP holds, we have
1 X X
Pb ≤ P [Λ(a + e) > Λ(a) | a] b(e, a)P (a).
log2 M e
a∈A(e)
If, in addition, it is b(e, a) = b(e), i.e., b(e, a) is independent of a, as when

proper symmetry conditions hold (e.g., in the case of a QPSK transmission
with Grey mapping), we obtain
1 X
Pb ≤ b(e)P [Λ(a + e) > Λ(a) | a] P (a ∈ A(e)). (2.33)
log2 M e
The derived upper bounds on Ps and Pb are based on the computation of the
PEPs, that are usually easy to compute since they are related to a binary
signaling system. In the next section, we will discuss the computation of the
PEP in the specific case of an AWGN channel.
2.5.2 Additive white Gaussian noise

We will evaluate the PEPs in the case of an AWGN channel. Since the PEPs
are related to a binary signaling problem, they depend on the Euclidean
distance through the function Q(·). In fact, it is (see Appendix B)

d(e, a)
P [Λ(a + e) > Λ(a) | a] = Q √ (2.34)
2N0
where d(e, a) denotes the Euclidean distance between s(t, a) and s(t, a + e).
Under the assumption that the UEP holds, we can consider the subset
Emin of all possible error sequences defined as
Emin = {e : d(e) = dmin }
Thus, (2.32) becomes

X
dmin
Ps ≤ w(e)Q √ P [a ∈ A(e)]
e∈Emin
2N 0
X
d(e)
+ w(e)Q √ P [a ∈ A(e)] . (2.35)
e∈E
/
2N 0
min
For high values of the signal-to-noise ratio (SNR), the first term will take
into account the dominant errors whereas the second term can be neglected,
thus obtaining
" #
∼ X dmin
Ps ≤ w(e)P [a ∈ A(e)] Q √ (2.36)
e∈E
2N0
min
∼
where symbol ≤ means that the upper bound is approximate and the term
within square brackets is the multiplicity of the minimum distance error
events; by denoting this term with Ks , for high SNR values we obtain

∼ dmin
P s ≤ Ks Q √ .
2N0
Similarly, with reference to (2.33) we have
X
1 dmin
Pb ≤ b(e)Q √ P [a ∈ A(e)] + other terms (2.37)
log2 M e∈E 2N0
min
where “other terms” are related to error events with a larger distance, thus
infinitesimals of higher order with respect to the signal-to-noise ratio (SNR).
Asymptotically, we have
X
∼ dmin 1
Pb ≤ Q √ b(e)P [a ∈ A(e)] (2.38)
2N0 log2 M e∈E
min
| {z }
Kb
where Kb represents the multiplicity coefficient of dominant errors in the bit

error probability.
We can thus conclude that for high SNR values, bit and symbol error
probabilities solely depend on the multiplicity of the dominant errors, i.e.,
Ki
di
dmin d1 d2
Figure 2.22: Multiplicity of the error events and distance spectrum.

those having
√ minimum distance, and on the value that the Q function takes
on dmin / 2N0 . In general, in order to compute the upper bound we could
define a multiplicity distribution or distance spectrum, i.e., a diagram rep-
resenting both the values of all distances related to the error events and
their corresponding multiplicities, as shown in Fig. 2.22 where we assumed
d1 ≤ d2 ≤ . . . . In particular, the upper bound (2.35) can be expressed as
X
dmin di
P s ≤ Ks Q √ + Ki Q √
2N0 d∈{di }
2N0
where {di } represent the set of all possible distances between pairs of signals
(excluding dmin ) and Ki represents the multiplicity of the error events having
distance di , i.e.,
X
Ki = w(e)P [a ∈ A(e)]
e∈Ei
having denoted with Ei the set of all possible error sequences having distance
di . At high SNR values the dominant term is that related to the minimum
distance whereas for lower SNR values we could need to take into account
some terms with distance larger than dmin . Similar considerations hold for
the bit error probability.
It is thus clear that the evaluation of the distance d(a, e) (d(e) in case of
UEP) is crucial. In the case of the AWGN channel, it is
ˆ +∞
2
d (e, a) = k s(t, a) − s(t, a + e) k = 2
| s(t, a) − s(t, a + e) |2 dt.
−∞
2.5.3 Linear modulations

Let us consider a passband linear modulation and let express the transmitted
signal as
( )
√ X
s(t) = 2ℜ ck p(t − kT )eω0 t
k
where p(t) has unit energy.8
8
√
The coefficient 2 and the case of a pulse p(t) having energy Ep 6= 1 can be taken
into account by properly normalizing the constellation of symbols ck .
The square distance d2 (a, e) can be expressed as
d2 (e, a) = k s(t, â) − s(t, a) k2

1
= k s̃(t, â) − s̃(t, a) k2
2
ˆ +∞ X X
2

= ĉk p(t − kT ) − ck p(t − kT ) dt
−∞ k k

XX
= (ci − ĉi )(cj − ĉj )∗ gi−j
i j
where clearly ˆ +∞
gj−i = p(t − iT )p∗ (t − jT ) dt .
−∞
In the case of an uncoded transmission, it is ck = ak and, therefore,

XX
d2 (e, a) = ei e∗j gj−i = d2 (e)
i j
and we can thus conclude that the UEP holds. If we now assume that at
time k an error event of duration H begins, we can write
k+H−L
X k+H−L
X
d2 (e, a) = (ci − ĉi )(cj − ĉj )∗ gj−i .
i=k j=k
After the initial transient, the result does not depend on k. Hence, assuming
k = 0 we have
H−L
X H−L
X
2
d (e, a) = (ci − ĉi )(cj − ĉj )∗ gj−i .
i=0 j=0
Finally, in the case of absence of ISI, it is L = 0 and

1 for k = 0
gk =
0 otherwise
and the square distance is thus

H
X
d2 (e, a) = | ci − ĉi |2 . (2.39)
i=0
Remark 2.9. Bit and symbol error probabilities are usually expressed as
a function either of the ratio ES /N0 or the ratio Eb /N0 , ES and Eb being
the mean energy per information symbol and per bit, respectively. Thus, we
have to compute ES (clearly Eb = ES / log2 M) in the scenario at hand. The
mean energy per information symbol results to be
ES = Ps T
where Ps is the mean signal power and can be computed as half the power
of the complex envelope:
Ps = Ps̃ /2 .
Since the complex envelope of the signal is
√ X
s̃(t) = 2 ck p(t − kT )
k
its PSD is (see Exercise 1.3)
Wc (f )
Ws̃ (f ) = 2 |P (f )|2
T
where Wc (f ) is the Fourier transform of the autocorrelation sequence Rc (m) =
E{ck+m c∗k } of code symbols. We thus have
ˆ ∞
Ps̃ T
ES = = Wc (f ) |P (f )|2 df .
2 −∞
In the case of uncorrelated symbols with zero mean, it is Wc (f ) = C2 , having

defined C2 = E{|ck |2 }. It thus results
ˆ ∞
ES = C2 |P (f )|2 df = C2
−∞
since we assumed that the pulse has unit energy. This expression for ES can
be directly employed in the expression of symbol and bit error probabilities.
♦
Let us now consider the discrete-time equivalent model with white noise of
the system, reported in Fig. 2.23. The sufficient statistic yk can be expressed
as
L
X
yk = fℓ ck−ℓ +wk .
ℓ=0
| {z }
sk (a)
It is possible to demonstrate that the square distance between continuous-

time signals is equal to the square distance between the sequences {sk (a)}
(see Exercise 2.9). We can thus write
X
d2 (a, e) = | sk (a) − sk (a + e) |2
k
that, using the expression of sk (a), becomes

2
X X L

2
d (a, e) = fℓ (ck−ℓ − ĉk−ℓ ) .

k ℓ=0
The square distance can be thus expressed through the energy of the error
sequence of code symbols filtered by the impulse response of the equivalent
discrete-time model with white noise. In particular, in the case of an uncoded
transmission, we have
2
X X L

2
d (e) = fℓ ek−ℓ

k ℓ=0
which allows to conclude that, as already previously discussed, for an uncoded

linear modulation the UEP holds.
Considering an error event beginning at time n and of duration H, pre-
vious expressions read
2
X X
n+H L

2
d (e, a) = fℓ (ĉk−ℓ − ck−ℓ ) (2.40)

k=n ℓ=0
in the general case and

2
n+H
X XL

d2 (e) = fℓ ek−ℓ (2.41)

k=n ℓ=0
in the case of an uncoded transmission.
ck
fk yk
wk
Figure 2.23: Discrete-time equivalent model with white noise of the system.
Remark 2.10. Let us consider a channel with L = 1, i.e., with non-zero

coefficients f0 and f1 , and an uncoded transmission. Using (2.41), we can
find a lower bound on the square distance of an error event of duration H as
n+H
X
d2 (e) = |f0 ek + f1 ek−1 |2
k=n
≥ |f0 en + f1 en−1 |2 + |f0 en+H + f1 en+H−1 |2

= |f0 |2 |en |2 + |f1 |2 |en+H−1 |2

≥ ε2 |f0 |2 + |f1 |2 = ε2 (2.42)
in which:
• we neglected the (non-negative) terms of the summation corresponding
to the values of the index k = n + 1, . . . , n + H − 1;
• we exploited the fact that en−1 = 0, since the error event begins at time
n, and en+H = 0, since it ends at time n + H;
• in the last lower bound we exploited the fact that, for sure, en 6= 0 and
en+H−1 6= 0, and defined
ε = min {| en |} = min {| en+H−1 |}
(ε is the minimum distance between two points of the employed con-

stellation);
• we also used the fact that |f0 |2 + |f1 |2 = g0 = Ep = 1.

The lower bound (2.42) holds for any error event. In particular, an error
event of duration H = 1 involves only one symbol error (en = en+H−1 for
H = 1). When the error en has the minimum modulus, we exactly obtain a
square distance equal to the lower bound (2.42). We can thus conclude that
for a channel with L = 1, it holds
d2min = ε2 . (2.43)
Eqn. (2.43) has an interesting interpretation. If we transmit an isolated

information symbol, instead of a sequence, the error probability, or a bound
(upper or lower) on it, will depend on the minimum distance between two
possible signals. This distance is exactly given by (2.43). We can thus
conclude that in the case of a transmission over an ISI channel with L = 1,
the minimum distance in sequence detection will coincide with that of an
isolated transmission, thus not affected by ISI. The asymptotic performance

depends on the minimum distance. Thus, provided that L = 1, ISI does not
degrade the asymptotic performance of the MAP sequence detector. This
result is peculiar of the case L = 1. In fact, if we evaluate dmin for different
channels with L > 1 and compare the result with the minimum distance
related to an isolated transmission, we will observe a degradation due to ISI.
Table 2.1 shows the degradation of dmin for some channels with L ≥ 2 whose
discrete-time equivalent model with white noise is also reported. ♦
Remark 2.11(faster-than-Nyquist signaling). Let us consider an un-

coded linear modulation, with a shaping pulse p(t) having RRC spectrum,
transmitted over an AWGN channel. At the MF output we will then have a
pulse g(t) whose Fourier transform is given by (2.17). Thus, if we adopt a
symbol time T , samples at discrete-time instants kT will not be affected by
ISI. In order to increase the transmission efficiency (the number of bits trans-
mitted per unit of time), in [10] Mazo proposed the use of a smaller symbol
time τ T , where 0 < τ < 1 is a design parameter called time-compression
factor. In this case, at the MF output we have to sample at discrete-time in-
stants kτ T , and we will have (intentional) ISI. However, Mazo demonstrated
that if τ ≥ 0.802 , for the case of RRC pulses with roll-off zero, there is no re-
duction of the minimum Euclidean distance with respect to the Nyquist case.
So the asymptotic performance will remain the same of the case of absence
of ISI. Later, in [11], Liveris and Georghiades found the minimum values of τ
for different values of the employed roll-off, whereas in [12] Rusek and Ander-
son extended the approach to frequency-division-multiplexed systems where
interference is also introduced in the frequency domain by packing closer the
adjacent carriers. Notice that, in order for these techniques to be effective, a
receiver able to cope with the (possibly very large) interference, has to be em-
ployed. The computational complexity of such a receiver may be extremely
large, and no hints are given by those paper regarding the optimization of τ in
the more practical scenario where a suboptimal reduced-complexity receiver
is employed. ♦
L 10 log10 [d2min /(d2min )no isi ] |f0 |, |f1 |, . . . |fL |

2 −2.3 dB 0.50, 0.71, 0.50
3 −4.2 dB 0.38, 0.60, 0.60, 0.38
4 −5.7 dB 0.29, 0.50, 0.58, 0.50, 0.29
5 −7.0 dB 0.23, 0.42, 0.52, 0.42, 0.23
Table 2.1: Degradation due to the ISI in the case of some specific ISI channels.
Let us now evaluate the term P (a ∈ A(e)) appearing in the upper bound
expression when UEP holds. Since the information symbols are independent,
we have
n+H−L
Y
P (a ∈ A(e)) = P (ak + ek ∈ A)
k=n
in which A is the alphabet of the information symbols. We will consider the

case of an M-ary pulse amplitude modulation (PAM), i.e., a linear modula-
tion with alphabet
A = {±1, ±3, . . . , ±(M − 1)}
from which
ek ∈ {0, ±2, . . . , ±2(M − 1)} .
• When ek = 0, ak + ek belongs to the alphabet no matter the value of

ak , and thus P (ak + ek ∈ A) = 1.
• When ek = +2, it must be ak 6= (M − 1), otherwise ak + ek ∈

/ A. We
thus have
M −1 | ek |
P (ak + ek ∈ A) = P (ak 6= (M − 1)) = =1− .
M 2M
• Similarly, when ek = −2, it must be ak 6= −(M − 1) and thus
M −1 |ek |
P (ak + ek ∈ A) = P (ak 6= −(M − 1)) = =1− .
M 2M
• By proceeding in a similar way, when ek = ±4, it must be ak 6= ±(M −

3) and ak 6= ±(M − 1), and thus
M −2 | ek |
P (ak + ek ∈ A) = =1− .
M 2M
• In general, we can conclude that
| ek |
P (ak + ek ∈ A) = 1 − .
2M
In the PAM case, we thus obtain
Y
n+H−L
| ek |

P (a ∈ A(e)) = 1− .
k=n
2M
We can proceed in a similar way for different alphabets.

In an uncoded system with L = 1, the asymptotic behavior is given by
Eqn. (2.36), that we report here for convenience,

∼ dmin
P s ≤ Ks Q √
2N0
with X
Ks = w(e)P [a ∈ A(e)] .
e∈Emin
The error events having minimum distance belong to the set of those with
H = 1, as already said. When e ∈ Emin , we thus have w(e) = 1 and
X
Ks = P (ak + ek ∈ A) .
e∈Emin
This coefficient coincides with the similar coefficient related to an isolated

transmission. Previous considerations on the fact that ISI with L = 1 does
not degrade the asymptotic performance are thus confirmed in the sense that
the multiplicity of the dominant errors is still the same. Notice that the two
cases (ISI and isolated transmissions) differ for errors with distance greater
than the minimum one, which are present in larger number in the case of
sequence detection. Thus, that conclusion only holds asymptotically.
2.5.4 Lower bound on the error probability

In order to derive a lower bound on the error probability, let us still consider
(2.31), that we now express by representing the error event using the pair
(a, â), i.e., XX
Ps = w(â − a)P (â | a) P (a) .
a â6=a
Since w(â − a) ≥ 1 for any error event, we can lower bound Ps as

X X
Ps ≥ P (a) P (â | a) .
a â6=a
In the second summation, we can consider only one term, thus obtaining
a further lower bound. In order to have the tighter lower bound, we will
pick up the largest possible term, again that corresponding to the minimum
distance, i.e.,
X dmin (a)
P (â | a) ≥ Q √
â6=a
2N0
where dmin (a) denotes the minimum distance among all those of paths dif-
ferent from a. We thus have
X
dmin (a)
Ps ≥ P (a)Q √
a
2N0
that can be further lower bounded by limiting the summation to those se-
quences a that already have a sequence at minimum distance. Hence
X
dmin dmin
Ps ≥ P (a)Q √ =Q √ P (a ∈ Amin )
a∈A
2N 0 2N 0
min
where Amin is the subset of all sequences a that have at least one sequence at
minimum distance with error event starting at time k. The term P (a ∈ Amin )
is the probability that, by picking a sequence at random, it has at least one
sequence at minimum distance.
We can conclude by observing that the error probability satisfies

′ dmin ∼ dmin
Ks Q √ ≤ P s ≤ Ks Q √
2N0 2N0
in which
X
Ks = w(e)P (a ∈ A(e))
e∈Emin
Ks′ = P (a ∈ Amin )
∼
and symbol ≤ means that the upper bound holds only asymptotically for
high values of the SNR, and is thus approximate since we neglected the error
events having a distance larger than the minimum one. This approximation
is however asymptotically tight (i.e., exact). When Ks′ and Ks are close
enough, we can evaluate, in an accurate way, the probability Ps . Similar
results can be derived for the bit error probability.
Example 2.5. For the PAM constellation and an uncoded transmission

with L = 1, we have
X X 1

1

Ks = P (ak + ek ∈ A) = 1− =2 1− .
e∈E e∈E
M M
min min
In fact, H = 1, ε = min |ek | = 2, and the number of error sequences (or,

equivalently, error symbols in this case, since H = 1) at minimum distance
is 2, that is ek = ±2. In addition we have
Ks′ = P (a ∈ Amin ) = 1
since every symbol has at least one symbol at minimum distance. We thus
obtain
dmin ∼ 1 dmin
Q √ ≤ Ps ≤ 2 1 − Q √ .
2N0 M 2N0
In particular, the approximate upper bound exactly coincides with the asymp-
totic symbol error probability for a PAM transmission in the absence of ISI.
♦
2.6 Exercises
Exercise 2.1. Let us consider an M-ary transmission system employing a
linear modulation. The complex envelope of the transmitted signal is
K−1
X
s̃(t) = ck p(t − kT ) .
k=0
In general, symbols ck and pulse p(t) are complex and the noise is additive,
white, and Gaussian. Let us also assume that the condition for the absence
of ISI holds, i.e.,

1 for k = 0
gk =
0 otherwise
where gk = g(kT ) and g(t) = p(t) ∗ p∗ (−t).
• Demonstrate that an orthonormal basis of the signal space is
{p(t − kT )}K−1
k=0 .
• Compute the images of the possible signals on the previous basis.

• Derive the MAP sequence detection strategy by using the previous
results.
Exercise 2.2. Let us consider a baseband linear modulation transmit-

ted over an AWGN channel. The information symbols are binary, i.e.,
ak ∈ {−1, +1}, and are equally likely. Let us assume that the discrete-time
equivalent pulse at the MF output is

 1 for k = 0
gk = 0.1 for k = ±1

0 otherwise .
• Show the state and the trellis diagram required for MAP sequence
detection.
• Compute the MAP information sequence under the assumption that
the samples at the MF output are (K = 8)
k= 0 1 2 3 4 5 6 7
xk = 1.2 1.2 −0.6 −0.1 −0.2 −1.2 −0.9 0.5
• For the particular sequence just computed, is it possible to take a de-

cision with a delay D (âk−D ) without waiting for the end of the trans-
mission? What value of D has to be chosen?
Exercise 2.3. Let us consider a QPSK signal. The complex envelope of

the transmitted signal is
K−1
X
s̃(t) = ck p(t − kT ) .
k=0
where symbols ck are differentially encoded according to the law ck = ak ck−1 .

Information and code symbols belong to the alphabet {±1, ±j}. Let us
suppose that the thermal noise is additive, white, and Gaussian, that the
information symbols are equally likely, and that the condition for the absence
of ISI holds.
• Show the trellis diagram of the detector.
• Prove that the survivors at time k + 1 can be obtained by extending

the survivor at time k having the largest metric.
• Prove that it is possible to implement detection by working in a symbol-

by-symbol fashion.
• Discuss the effect on detection of a phase rotation of the received signal

of a multiple of π/2.
Exercise 2.4. Let us suppose that the discrete-time equivalent pulse at the
MF output {gℓ }Lℓ=−L , representative of a transmission system employing a
linear modulation, is real.
• Prove that if ρ is a zero of the Z-transform G(z) of {gℓ }, also ρ∗ , 1/ρ,

and 1/ρ∗ are also zeros of G(z).
• Find the general condition that p(t) has to satisfy to have g(t) real.
Exercise 2.5. Let us assume that the shaping pulse p(t) of a linear modu-
lation has duration T and unit energy. This pulse is distorted by the channel
so that the shaping pulse at the receiver results to be
h(t) = p(t) + αp(t − T )
where α is a constant.
• Compute the discrete-time equivalent pulse {gk } at the MF output.
• Compute the transfer function of the WF.

• Compute the discrete-time equivalent pulse {fℓ } at the WF output.
• Define the state required to represent the memory of the signal at the
WF output.
• Describe the MAP sequence detection strategy based on the WF out-

put.
Exercise 2.6. Repeat the previous exercise when pulse p(t) has a RRC
Fourier transform.
Exercise 2.7. With reference to Exercise 1.4, describe the MAP sequence
detection strategy.
Exercise 2.8. Let us consider a coded linear modulation that can be ex-
pressed in the form
( )
√ X
s(t) = 2ℜ ck p(t − kT ) ejω0 t
k
where p(t) is a pulse having unit energy.
• Prove that the square distance of a generic error event starting at time
n, having duration H, and characterized by the error sequence {ek }
and the information sequence {ak }, can be expressed as
n+H−L
X n+H−L
X
2
d (e, a) = (ĉi − ci )(ĉj − cj )∗ gj−i
i=n j=n
where L is the channel dispersion length, {ck } is the code sequence as-
sociated with the information sequence {ak }, {ĉk } is the code sequence
associated with the information sequence {ak + ek }, and {gk } is the
discrete-time impulse response at the MF output.
• In the absence of coding, show that the UEP holds.
• In the case of absence of ISI, show that d2 (e, a) is equal to the square
distance of the code sequences.
Exercise 2.9. Following Exercise 2.8, let us consider a coded linear mod-
ulation. Prove that the square distance of a generic error event starting at
time n, having duration H, and characterized by the error sequence {ek } and
the information sequence {ak }, can be expressed as
2
X X
n+H L

2
d (e, a) = fℓ (ĉk−ℓ − ck−ℓ )

k=n ℓ=0
where L is the channel dispersion length, {ck } is the code sequence associated
with the information sequence {ak }, {ĉk } is the code sequence associated with
the information sequence {ak + ek }, and {fℓ } is the discrete-time impulse
response at the WF output.
Exercise 2.10. Let us consider a baseband linear modulation transmit-

ted over an AWGN channel. The information symbols are binary, i.e.,
ak ∈ {−1, +1}, and are equally likely. Let us assume that the discrete-time
equivalent pulse at the MF output is

 1 for k = 0
gk = 0.1 for k = ±1

0 otherwise .
MAP sequence detection is implemented through the Viterbi algorithm.

• Identify the possible error events.
• Compute dmin .
• Compute a lower and an upper bound of the bit error probability.
Exercise 2.11. With reference to Exercises 1.4 and 2.7, compute the per-
formance of the MAP sequence detector.
Exercise 2.12. Let us consider an M-ary transmission system employing

a linear modulation whose complex envelope has expression
K−1
X
s(t) = ck p(t − kT )
k=0
m
where ck ∈ {ej2π M : m = 0, 1, . . . , M − 1} (M-PSK transmission). This
signal is transmitted over an AWGN channel and assume that the condition
for the absence of ISI holds.
• Prove that the noncoherent sequence detection strategy can be ex-

pressed in the form

K−1
X
â = argmax xk c∗k
a
k=0
where {xk } is the sequence of samples at the output of a filter matched

to the shaping pulse p(t).
• Describe the receiver structure.
• Prove that the detection strategy can be expressed in a recursive form

as ( n )
X X
K−1 X n−1
∗ ∗
â = argmax x c
k k − x c
k k .
a
n=0 k=0 k=0
• Is it possible to implement this strategy by working over a trellis?
Exercise 2.13. Repeat the previous exercise in the case of an M-ary

quadrature amplitude modulation (M-QAM).
Chapter 3
Detection in the presence of

unknown parameters
3.1 The synchronization problem
In transmission systems of practical interest, the receiver has to cope, in

addition to thermal noise, also with the presence of unknown parameters.
In other words, detection has to be performed in the presence of one or
more unknown parameters, possibly time-varying. These parameters can
be, for example, the phase and frequency of the carrier employed at the
transmitter, the channel attenuation, the propagation delay. The aim of
this chapter is to illustrate the general principles employed in the design of
detectors for these scenarios. Without loss of generality and with the aim
of illustrating the general design principles, we will consider the case of a
single unknown parameter θ, possibly time-varying. We will consider, for
illustration purposes, a transmission employing a linear modulation with a
shaping pulse satisfying the Nyquist condition for the absence of ISI on the
following channels.
i. A channel which introduces a constant (real and positive) attenuation.

The samples at the matched filter (MF) output represent a sufficient
75
76 Detection in the presence of unknown parameters
statistic. They can be expressed as1
rk = θck + wk , k = 0, 1, . . . , K − 1 (3.1)
where the discrete-time noise process {wk } is white as a consequence

of the fact that the condition for the absence of ISI is satisfied. The
variance of both the real and imaginary components of wk is σ 2 = N0 .
ii. The noncoherent channel. This channel introduces a phase shift unknown
to the receiver, constant for the whole transmission, and uniformly
distributed in [0, 2π). It is straightforward to show that the samples
at the MF output still represent a sufficient statistic. They can be
expressed as
rk = ck eθ + wk , k = 0, 1, . . . , K − 1 . (3.2)
iii. The time-varying noncoherent channel. In this case, it is not easy to

find a sufficient statistic. At the end of this chapter we will describe a
possible general technique. We will make here the hypothesis that the
channel phase varies in time very slowly such that it can be considered
constant over a few signaling intervals. In this case, the samples at the
MF output can still be considered as a sufficient statistic and can be
expressed as
rk = ck eθk + wk , k = 0, 1, . . . , K − 1 . (3.3)
As far as the variations in time of θk are concerned, they will be better

specified in the following.
iv. The channel that introduces a time-varying frequency-flat Rayleigh fad-

ing. Even in this case, since it is more difficult to find a sufficient
statistic, we will assume that fading is slowly varying and that the
samples at the MF output are still a sufficient statistic. They can be
expressed as
rk = θk ck + wk , k = 0, 1, . . . , K − 1 (3.4)
1
For all considered channels, samples at the MF output will be considered as an example
of (exact or approximate) sufficient statistic characterized by discrete-time white noise. In
this chapter, they will be denoted as rk instead of xk . This is due to the fact that,
in thepabsence of ISI, an orthonormal basis for the set of transmitted signals is {p(t −
kT )/ Ep }K−1k=0 (see Exercise 2.1). Thus, these samples also represent the components of
r(t) over this orthonormal basis and, as such, they will be denoted as {rk }. We will use
the notation rkk21 = [rk1 , rk1 +1 , . . . , rk2 ]T . According to it, it is thus r = rK−1
0 .
3.2 – Stochastic parameter with known pdf 77
where θk is a complex Gaussian process with mean zero. Its autocor-

relation Rθ (m) = E{θk+m θk∗ } is generally described by the Clarke’s
isotropic scattering model [13], i.e., Rθ (m) = J0 (2πBD T m), where
J0 (x) is the zero-th order Bessel function of the first kind defined as
2π
1
ˆ
J0 (x) = ex cos θ dθ
2π 0
and BD T is the Doppler bandwidth normalized to the signaling fre-

quency.
When considering a transmission over a channel that introduces an unknown

parameter, we can consider two cases, depending on the way this parameter
is modeled. In fact, it can be modeled as:
a. random variable with known probability density function (pdf) f (θ);
b. deterministic and unknown.
3.2 Stochastic parameter with known pdf

This model is employed when some a-priori information about the unknown
parameter is available, through its pdf f (θ). In this case, the optimal detec-
tion strategy is perfectly defined and takes the form [7]2
ˆ
â = argmax f (r|a) = argmax f (r|a, θ)f (θ|a)dθ . (3.5)
a a
Usually, the unknown parameter θ results to be independent of the sequence

of symbols a. Thus, it is f (θ|a) = f (θ).
The integral in (3.5) that allows to compute the conditional pdf f (r|a)
could be not available in closed form or the resulting strategy could be of dif-
ficult implementation. This latter case occurs in the case of the noncoherent
channel.
2
In the following, we will assume that the information symbols are independent and
equally likely. Under this assumption, in the MAP sequence detection strategy we can dis-
card the contribution of the a-priori sequence probability P (a). MAP sequence detection
becomes equivalent to maximum likelihood detection.
Example 3.1 For the noncoherent channel, pdf f (r|a, θ) is

K ( K−1
)
1 1 X 2
f (r|a, θ) = exp − rk − ck eθ
2πN0 2N0
k=0
K ( K−1
)
1 1 X 2
= exp − |rk | + |ck |2
2πN0 2N0 k=0
( " K−1
#)
1 X
−θ ∗
· exp ℜ e rk ck
N0 k=0
K ( K−1
)
1 1 X 2 2
= exp − |rk | + |ck |
2πN0 2N0 k=0
( )
1 X
K−1

· exp rk c∗k cos [θ − φ(r, c)] (3.6)
N0 k=0

hP i
K−1 ∗ 1
having defined φ(r, c) = arg k=0 k k . Hence, since f (θ|a) = f (θ) = 2π ,
r c
we obtain
ˆ 2π
1
f (r|a) = f (r|a, θ)dθ (3.7)
2π 0
K ( ) !
1
K−1
1 X 2 1 K−1
X

= exp − |rk | + |ck |2 I0 rk c∗k
2πN0 2N0 k=0 N0 k=0
where I0 (x) is the zeroth-order modified Bessel function of the first kind
defined as ˆ 2π
1
I0 (x) = ex cos θ dθ = J0 (x) .
2π 0
Neglecting terms that are irrelevant for detection, strategy (3.5) becomes
" ! ( )#
1 K−1
X 1
K−1
X

â = argmax I0 rk c∗k exp − |ck |2
a
N0 k=0 2N0 k=0
" ! #
1 X
K−1 K−1
X
1 2
= argmax ln I0 rk c∗k − |ck | . (3.8)
a N0 2N0
k=0 k=0
Excluding approximate implementations (see Exercises 2.12 and 2.13), this

strategy cannot be implemented by using the Viterbi algorithm (VA) but
through techniques based on a search over a tree (and not over a trellis)
whose complexity does not certainly increase linearly with K.
By observing (3.8), one can easily understand that two coded sequences
c1 and c2 satisfying the following condition c1 = ejφ c2 , where φ is any an-
gle multiple of the angle of symmetry of the employed constellation, have
the same metric. These sequences are thus indistinguishable (noncoherently
catastrophic sequences). For such a channel, it is thus required to use an
encoder having a codebook (the set of codewords) without codewords satis-
fying that condition (noncoherently noncatastrophic codes) [14]. Differential
encoding is an example noncoherently noncatastrophic code. Another way
to obtain a noncoherently noncatastrophic code is through serial concatena-
tion of a differential encoder and a rotationally invariant code, that is a code
such that, for every codeword c, all sequences obtained through a rotation of
any multiple of the angle of symmetry of the employed constellation is still
a codeword. ♦
An approximate strategy that allows to obtain receivers that can be im-
plemented with a complexity which is linear in K, is that called truncated-
memory strategy [15]. The optimal strategy (3.5) is based on the pdf f (r|a).
This pdf can be expressed, by using the chain rule, as
K−1
Y
f (r|a) = f (r|c) = f (rk |r0k−1, c) .
k=0
On the other hand, it is clear that, in the presence of a causal system, the pdf
f (rk |r0k−1 , c) will depend on coded symbols transmitted till the discrete-time
instant k and not on future symbols. Thus, we can write
K−1
Y
f (r|a) = f (rk |r0k−1, ck0 ) .
k=0
If the following condition is satisfied
f (rk |r0k−1, c0k−1 ) = f (rk |r0k−1, ckk−C ) (3.9)

i.e., the pdf depends on previous C coded symbols only, we have a system
with finite memory. Taking the logarithm, detection strategy (3.5) thus
becomes
K−1
X
â = argmax ln f (r|a) = argmax ln f (rk |r0k−1, ckk−C ) .
a a
k=0
This strategy can be implemented through the Viterbi algorithm with branch
metric λk (ak , σk ) = ln f (rk |r0k−1, ckk−C ). The trellis state is thus defined as
σk = (ak−1 , ak−2 , . . . , ak−C , µk−C ) (3.10)
where µk is the encoder state. The number of states is thus S = Sc M C where

Sc is the number of states of the employed encoder.
Condition (3.9) is hardly satisfied in practical systems. In particular, it
is not satisfied in the case of noncoherent and flat fading channels. Thus,
the search for the optimal sequence according to strategy (3.5) cannot be
implemented through a search over a trellis by using the Viterbi algorithm.
In order to obtain receivers with complexity linear in K we can resort to the
following approximation
f (rk |r0k−1, ck0 ) ≃ f (rk |rk−R
k−1
, ckk−C ) , (3.11)
i.e., we can approximate f (rk |r0k−1, ck0 ) with f (rk |rk−R
k−1
, ckk−C ) that takes into
account the dependence on the last R received samples and the last C ≥ R
symbols. This approach of memory truncation is intuitive in case of time-
varying channel parameter. In fact, in this case the present received sample
tends to be uncorrelated with past ones, the larger the distance, the lower
the correlation. With this approximation, the detection strategy becomes
K−1
X
k−1
â = argmax ln f (r|a) ≃ argmax ln f (rk |rk−R , ckk−C )
a a
k=0
that can still be implemented through the Viterbi algorithm with branch
k−1
metric λk (ak , σk ) = ln f (rk |rk−R , ckk−C ), and trellis state σk still defined by
(3.10). In the following two examples, we show how to compute the pdf
k−1
f (rk |rk−R , ckk−C ) appearing in the branch metric expression.
k−1
Example 3.2 For the noncoherent channel, it is R = C. Pdf f (rk |rk−C , ckk−C )
can be also expressed as
k−1 f (rkk−C |ckk−C ) f (rkk−C |ckk−C )
f (rk |rk−C , ckk−C ) = k−1
= k−1 k−1
.
f (rk−C |ckk−C ) f (rk−C |ck−C )
k−1 k−1
Pdfs f (rkk−C |ckk−C ) and f (rk−C |ck−C ) have both expression given by (3.7). It
is thus
P
I 1 C ∗
1 1 2 r c
0 N0 ℓ=0 k−ℓ k−ℓ
k−1
f (rk |rk−C k−1
, ck−C )= exp − |rk | + |ck |2 P .
2πN0 2N0 1 C ∗
I r c
0 N0 ℓ=1 k−ℓ k−ℓ
The branch metric results to be

!
X C
1
k−1 k−1
ln f (rk |rk−C , ck−C ) = const. + ln I0 rk−ℓ c∗k−ℓ
N0
ℓ=0
!
1 X
C
1
− ln I0 rk−ℓ c∗k−ℓ − |ck |2 .
N0 ℓ=1 2N0
−1
10
−2
10
known ph.
C=6; S=16
BER
C=6; S=1 &

−3
10 C=3; S=16
C=5; S=1
C=4; S=1
−4 C=3; S=1
10 C=3 D&S
C=2 D&S
C=1
−5
10 2 4 6 8 10
Eb/N0 [dB]
Figure 3.1: Performance of the strategy based on the branch metric (3.12)
for a differentially encoded QPSK modulation.
By using the approximation ln I0 (x) ≃ x, that results to be accurate for

x ≫ 1, we obtain the branch metric [14]

X C
k−1 k−1 ∗
ln f (rk |rk−C , ck−C ) = const. + rk−ℓ ck−ℓ

ℓ=0

X C 1

− rk−ℓ c∗k−ℓ − |ck |2 . (3.12)
2
ℓ=1
In Figs. 3.1-3.3, some examples of performance, in terms of bit error ratio

(BER) versus the signal-to-noise ratio, are reported. Fig. 3.1 refers to a dif-
ferentially encoded QPSK modulation, Fig. 3.2 to a 16-QAM with quadrant
differential encoding, whereas Fig. 3.3 to a couple of trellis coded modula-
tions (TCMs) with 16-QAM (see chapter 4). In this latter case, both the
considered TCMs have 8 states. The first one is a non-rotationally invariant
(NRI) code that results to be noncoherently noncatastrophic. The second one
is a 90o rotationally invariant (RI) code serially concatenated with an outer
−1
10
−2
10
BER
−3 known ph.
10 C=5; S=4
C=5; S=1 &
C=4; S=4
C=4; S=1 &
−4 C=3; S=4
10 C=2; S=4
C=3; S=1
C=2; D&S
C=1; D&S
−5
10 8 9 10 11 12 13 14 15 16
Eb/N0 [dB]
for a 16-QAM modulation with quadrant differential encoding.
differential encoder. The considered receivers employ techniques for com-

plexity reduction, described in chapter 6, that allow to choose the number of
states of the trellis employed by the detector independently of C. Curves de-
noted by D&S refer to a strategy called multiple-symbol differential detection
proposed in the literature by Divsalar and Simon [16, 17, 18]. ♦
Example 3.3 For the flat fading channel, it is R = C. We have to find

k−1
a way to express the pdf f (rk |rk−C , ckk−C ). We start by first solving the
following problem. Let us assume that the sequence of coded symbols is
known. In this case, we can try to predict the received sample rk by using the
previous C samples. Given the coded sequence, sequence {rk } is Gaussian
and the optimal estimator according to the minimum mean square error
(MMSE) criterion results to be linear. This optimal predictor is
C
X
r̂k = qℓ rk−ℓ
ℓ=1
100
S =8
10−1
S =16
−2
10
BER
S =64
10−3 known ph. NRI

o
" " 90 RI
C=7; NRI
C=5; NRIo
C=7; 90o RI
10−4 C=5; 90o RI
C=7; 90o RI
C=5; 90o RI
C=7; 90o RI
C=5; 90 RI
10−5
4 5 6 7 8 9 10
Eb /N0 [dB]
for a 8-state trellis coded modulations (TCMs) with 16-QAM.
where, in general, the prediction coefficients qℓ depend on the coded symbols.

These coefficients can be computed by using the orthogonality principle:
∗
E{(r̂k − rk )rk−m } = 0 , m = 1, 2, . . . , C ⇒
∗ ∗
E{r̂k rk−m } = E{rk rk−m } , m = 1, 2, . . . , C ⇒
C
X
∗ ∗
qℓ E{rk−ℓ rk−m } = E{rk rk−m } , m = 1, 2, . . . , C .
ℓ=1
When computing the autocorrelation function for the received samples, we

have to account that coded symbols are assumed to be known. Thus we
obtain
C
X
qℓ ck−ℓ c∗k−m Rθ (m − ℓ) + qm |ck−m |2 Rθ (0) + 2N0 = ck c∗k−m Rθ (m) .
ℓ=1
ℓ6=m
These equations (for m = 1, 2, . . . , C) can be expressed in the form

XC
ck−ℓ ck−m 2N0
qℓ Rθ (m − ℓ) + qm Rθ (0) + 2 = Rθ (m)
ℓ=1
c k c k |c k−m |
ℓ6=m
or, equivalently, as
C
X
2N0
pℓ Rθ (m − ℓ) + pm Rθ (0) + = Rθ (m) (3.13)
ℓ=1
|ck−m |2
ℓ6=m
c
having defined pℓ = qℓ k−ℓck
. Notice that the linear system (3.13) having C
equations that allow to compute coefficients pℓ , depends on coded symbols
through the terms |ck−m |2 . In the case of a PSK constellation (for which
|ck−m |2 = cost.), this dependence disappears and thus coefficients pℓ are
independent of the coded sequence. As a function of coefficients pℓ , the
optimal predictor is
C
X rk−ℓ
r̂k = ck pℓ .
ck−ℓ
ℓ=1
The mean square values of the prediction error results to be
" C
#
X
σe2 = E{|r̂k − rk |2 } = E{[rk − r̂k ]rk∗ } = 2N0 + |ck |2 Rθ (0) − pℓ Rθ (−ℓ)
ℓ=1
which, again, results to be independent of coded symbols in the case of a

k−1
PSK constellation. We can now compute the pdf f (rk |rk−C , ckk−C ). In fact,
we can express
rk = r̂k + ek
where the discrete-time process ek (the prediction errors) is orthogonal to the
previous received samples and, since the problem is Gaussian, is independent
of them. Hence, given the symbols and the previous received samples, rk is
a complex Gaussian random variable with mean r̂k and variance σe2 (the
variance per component will be σe2 /2):
 2 
1
1
C
X rk−ℓ 
k−1 k
f (rk |rk−C , ck−C ) = 
exp − 2 rk − ck pℓ .
πσe2 σe ck−ℓ
ℓ=1
As said, for a PSK modulation coefficients pℓ and the mean square prediction
error σe2 are independent of the coded symbols. Otherwise, they depend on
|ck−1|2 , |ck−2|2 , . . . , |ck−C |2 . A detection strategy based on linear prediction
has been first proposed in [19, 20, 21, 22]. The reader can also refer to [23, 4]
for a thorough analysis. ♦
3.3 – Parameter modeled as deterministic and unknown 85
r(t) rk Receiver âk

p∗(−t)
t = kT Chapter 2
e−θ
Figure 3.4: Optimal receiver in case of perfect knowledge of the channel

phase (θ is its value).
3.3 Parameter modeled as deterministic and un-

known
In this case, the existence of an optimal strategy is conditioned to the ex-
istence of a uniformly most powerful (UMP) test [7]. Given the received
vector, which depends on θ and a, a UMP test exists when the following
condition is satisfied. Let us consider the optimal strategy that we obtain
when parameter θ is perfectly known (and be θ its value). This strategy is
clearly3
â = argmax f (r|a, θ) . (3.14)
a
Let us denote by Rθ the receiver implementing this strategy. If the receiver

does not change for any possible value of θ, it is obvious that the knowledge
of θ is irrelevant and Rθ is the optimal receiver also under the assumption
that θ is unknown.
Example 3.4 Since for the noncoherent channel it is
X
K−1

2

â = argmax f (r|a, θ) = argmin rk − ck eθ
a a
k=0
X
K−1
−θ
2

= argmin rk e − ck
a
k=0
the receiver Rθ implementing this strategy is shown in Fig. 3.4. The receiver
changes when the value of θ changes. Thus, the UMP test does not exist. ♦
Example 3.5 Let us consider an uncoded binary transmission with sym-

bols ak ∈ {±1} over a channel that introduces a positive attenuation θ. In
3
In this case, notation f (r|a, θ) is improper because θ is not a random variable.
this case, the optimal receiver is a symbol-by-symbol receiver which com-

pares the MF output with the threshold zero, no matter the value θ. The
performance will depend on the value of θ, but the receiver structure will
not. In this case, the UMP test does exist. ♦
When the UMP test does not exist, we can resort to one of the following
heuristic strategies:
1. the adoption of the generalized likelihood (GL) criterion;
2. the use of an estimate of the unknown parameter in place of the true value
(synchronization).
3.3.1 Generalized likelihood criterion

This criterion is based on a search for the joint maximum, with respect to
θ and a, of the joint likelihood function, i.e., the pdf of the received samples
given both symbols and the parameter value. In other words, the strategy is
the following:
(â, θ̂) = argmax f (r|a, θ) . (3.15)
(a,θ)
This strategy also provides, as by-product, an estimate of the unknown pa-

rameter.
Example 3.6 Let us consider again the noncoherent channel. In this case,
the joint likelihood function can be expressed in the form (3.6) reported here
for convenience:
K ( K−1
)
1 1 X
f (r|a, θ) = exp − |rk |2
2πN0 2N0
k=0
( )
1 X
K−1
1 K−1
X

· exp − |ck |2 + rk c∗k cos [θ − φ(r, c)] .
2N0 k=0
N0 k=0

It is thus clear that it can be jointly maximized by choosing

K−1
X 1 K−1 X

â = argmax rk c∗k − |ck |2 (3.16)
a 2
k=0 k=0
"K−1 #
X
θ̂ = φ(r, ĉ) = arg rk ĉ∗k (3.17)
k=0
that is, we can maximize the joint likelihood function by neglecting the cosine
term, choosing then θ properly such that the cosine is maximized. Notice
3.4 – Estimation techniques 87
r(t) Receiver r(t) rk Receiver âk

p∗(−t)
Chapter 2 t = kT Chapter 2
Estimator θ̂ Estimator e−θ̂
(a) (b)
Figure 3.5: Synchronization approach.
that strategies (3.8) and (3.16) are very similar. The latter can be obtained
from the former by adopting the approximation ln I0 (x) ≃ x. ♦
3.3.2 Synchronization
The approach commonly used in the receiver design is that based on synchro-
nization. According to it, an estimate θ̂ of the unknown parameter is found
from the received signal according to some technique (as better specified in
the following). This estimate is then used in place of the true value of the
parameter, neglecting a possible residual error. In other words, the adopted
detection strategy will be
â = argmax f (r|a, θ̂) . (3.18)

a
This approach is described in Fig. 3.5(a). Fig. 3.5(b), instead, refers to the
particular case of a noncoherent channel.
3.4 Estimation techniques

We will consider here maximum likelihood (ML) estimators only. The esti-
mators we will consider can be classified based on the assumption about the
knowledge of symbols. In particular, we will consider the following estima-
tors:
1. data-aided (DA) estimator (θ̂DA );
2. decision-directed (DD) estimator (θ̂DD );
3. soft-decision-directed (SDD) estimator (θ̂SDD );
4. non-data-aided (NDA) estimator (θ̂N DA ).

We will consider both open-loop (OL) and closed-loop (CL) estimators. We
will see that for each OL estimator, it is possible to define a corresponding CL
estimator and vice versa. An OL estimator will employ a given number L of

received samples to provide an estimate. It can be thus employed for packet
transmissions where the channel parameter remains constant over the packet
duration (its coherence time is larger than the packet duration). On the
other hand, a CL estimator will update its estimate whenever a new received
sample becomes available. Feedback is thus essential in case of time-varying
parameter. Before discussing the different estimators, we will mention a
couple of lower bounds on the performance of any unbiased estimator, that
is the Cramer-Rao bound, (CRB) and the modified CRB (MCRB).
3.4.1 Bounds on the performance of an estimator

The Cramer-Rao bound represents the lower limit to the variance of any
unbiased estimator (that is of any estimator whose mean value corresponds
to the true value to be estimated). Being the estimator unbiased, its variance
will coincide with the mean square estimation error. The useful component
of the received signal will depend on the parameter to be estimated but
also on other parameters and, among them, on data. We will denote by
u a vector containing, among all other parameters, those that cannot be
considered known at the receiver whereas we will denote by v those that can
be considered known (for example because previously estimated). The CRB
for parameter θ can be expressed as [7]
var(θ̂) = E{[θ̂ − θ]2 } ≥ CRB(θ)
where
1 1
CRB(θ) = − n o= h i2 . (3.19)
∂ 2 ln f (r|θ,v)
Er ∂θ 2 Er ∂ ln f (r|θ,v)
∂θ
In this equation, the expectation must be computed with respect to r whereas

the derivatives of ln f (r|θ, v) have to be computed in correspondence of the
true value of θ. See [7] for a proof. The pdf f (r|θ, v) can be computed as
ˆ
f (r|θ, v) = f (r|θ, v, u)f (u|θ, v) du . (3.20)
The application of this bound to practical synchronization problems is

often impossible. In fact, the integral in (3.20) can hardly be computed
in closed form and also the expectation that appears in (3.19) is often of
difficult computation. We can thus resort to the MCRB, which is of simpler
computation but more loose. The MCRB for parameter θ can be expressed
as [24]
var(θ̂) = E{[θ̂ − θ]2 } ≥ MCRB(θ)
where
1 1
MCRB(θ) = − n o= h i2 . (3.21)
∂2 ln f (r|θ,v,u)
Er,u ∂θ 2 Er,u ∂ ln f (r|θ,v,u)
∂θ
The two bounds coincide when vector u contains no elements. In general, it

is
CRB(θ) ≥ MCRB(θ) .
Proof. To complete the proof, it is sufficient to show that
( 2 ) ( 2 )
∂ ln f (r|θ, v) ∂ ln f (r|θ, v, u)
Er ≤ Er,u .
∂θ ∂θ
Let us start with the observation that
( 2 ) ˆ 2
∂ ln f (r|θ, v) ∂ ln f (r|θ, v)
Er = f (r|θ, v) dr
∂θ ∂θ
and since
∂ ln f (r|θ, v) 1 ∂f (r|θ, v)
=
∂θ f (r|θ, v) ∂θ
we obtain
( 2 ) ˆ 2
∂ ln f (r|θ, v) ∂f (r|θ, v) 1
Er = dr . (3.22)
∂θ ∂θ f (r|θ, v)
From (3.20), we have

ˆ
f (r|θ, v) = f (r|θ, v, u)f (u|θ, v) du
and assuming that u is independent of θ, we obtain

∂f (r|θ, v) ∂f (r|θ, v, u)
ˆ
= f (u|v) du
∂θ ∂θ
∂ ln f (r|θ, v, u)
ˆ
= f (r|θ, v, u)f (u|v) du
∂θ
∂ ln f (r|θ, v, u)
ˆ
= f (r, u|θ, v) du
∂θ
ˆ
∂ ln f (r|θ, v, u) p p
= f (r, u|θ, v) f (r, u|θ, v) du .
∂θ
Thus, by using the Schwartz inequality

2 ˆ 2
∂f (r|θ, v) ∂ ln f (r|θ, v, u) p p
= f (r, u|θ, v) f (r, u|θ, v) du
∂θ ∂θ
ˆ 2
∂ ln f (r|θ, v, u)
ˆ
≤ f (r, u|θ, v) du f (r, u|θ, v) du
∂θ
ˆ 2
∂ ln f (r|θ, v, u)
= f (r|θ, v) f (r, u|θ, v) du
∂θ
and, substituting in (3.22), we have
( 2 ) ¨ 2
∂ ln f (r|θ, v) ∂ ln f (r|θ, v, u)
Er ≤ f (r, u|θ, v) drdu
∂θ ∂θ
( 2 )
∂ ln f (r|θ, v, u)
= Er,u .
∂θ
♦
In case of an estimation problem for a channel with additive white Gaus-
sian noise, i.e., under the assumption that the received samples have the
following expression
rk = sk (θ, v, u) + wk
the expression of the MCRB can be greatly simplified. Assuming that vector
r has L elements, it is
L ( )
1 1 X 2
f (r|θ, v, u) = exp − |rk − sk (θ, v, u)|
2πN0 2N0 k
and thus we obtain

∂ ln f (r|θ, v, u) 1 X ∂sk (θ, v, u) ∗
= [rk − s∗k (θ, v, u)]
∂θ 2N0 k ∂θ

∂s∗k (θ, v, u)
+ [rk − sk (θ, v, u)]
∂θ
( )
1 X ∂s∗k (θ, v, u)
= ℜ [rk − sk (θ, v, u)]
N0 k
∂θ
and

∂ 2 ln f (r|θ, v, u) 1 X ∂ 2 s∗k (θ, v, u)
= ℜ [r k − s k (θ, v, u)]
∂θ2 N0 k ∂θ2
)
∂sk (θ, v, u) 2
−
.
∂θ
By averaging this expression with respect to r we obtain

2 2
∂ ln f (r|θ, v, u) 1 X ∂sk (θ, v, u)
Er =−
∂θ2 N0 k ∂θ
and finally
N0
MCRB(θ) = . (3.23)
P ∂sk (θ,v,u) 2

Eu k ∂θ
3.4.2 DA estimator
In packet transmissions, a field of known data is usually available for syn-
chronization purposes. This field can be placed at the beginning (preamble)
or in the middle of a packet (midamble). In other cases, we have more fields
of known data distributed along the packet. These known symbols are often
called pilot symbols. They can thus be employed for synchronization. As-
suming that we have available L consecutive known symbols, in particular
symbols a0L−1 , the ML-DA estimator based on them and on the corresponding
received samples is
θ̂DA = argmax f (r0L−1 |a0L−1 , θ) . (3.24)

θ
Example 3.7 Let us consider the case of the noncoherent time-varying

channel where the channel phase has a linear behavior
θk = 2πF kT + φ .
In other words, we are assuming that, in addition to the unknown phase

φ, we also have an unknown frequency offset F that represents the error
between the carrier frequency employed at the transmitter and that of the
local oscillator used at the receiver. We would like to find a DA estimator of
F under the assumption that φ is unknown and with uniform distribution in
[0, 2π). In other words, this time the parameter to be estimated is F and φ
is a parameter that cannot be considered as known at the receiver. Being
L ( L−1
)
1 1 X
f (r0L−1 |a0L−1 , F ) = exp − |rk |2 + |ak |2
2πN0 2N0 k=0
!
1 X
L−1

·I0 rk a∗k e−2πF kT (3.25)
N0 k=0

it is
L−1
X

F̂DA = argmax f (r0L−1 |a0L−1 , F ) = argmax rk a∗k e−2πF kT
F F
k=0
PL−1
The quantity ∗ −2πF kT
k=0 rk ak e is the Fourier transform of the discrete-
time sequence rk ak . Its maximum value can be found through a two-step
∗
procedure. A coarse search is performed first. It is based on the use of

the fast Fourier transform (FFT) that allows to obtain some samples of the
Fourier transform of the sequence rk a∗k . Then, a fine search is performed by
interpolating some of the samples previously computed and looking for the
maximum of the interpolated function. This algorithm has been originally
proposed by Rife and Boorstyn [25]. It can be verified, through computer
simulations, that this estimator is able to reach the CRB for high values
of the signal-to-noise power ratio (SNR). For low SNR values a significant
divergence from the CRB occurs. This phenomenon is due to a few estimates
that result to be completely wrong (the so called outliers) and significantly
worsen the performance. ♦
Example 3.8 The CRB in the case of the previous example (DA frequency
estimation in the presence of an unknown phase) is not available in closed
form. This is due to the pdf (3.25) that, plugged into (3.19), makes the com-
putation of the expectation impossible in closed form. However, it is possible,
once the symbol constellation and the value of L have been selected, to nu-
merically compute it through a Montecarlo simulation. The computation
of the MCRB is, however, much simpler. Let us assume that the received
samples employed for the estimation are those for k0 ≤ k ≤ k0 + L − 1. By
using (3.23) that takes the form
N0
MCRB(F ) =
P ∂sk (F,a,θ) 2

Eθ k ∂F
where
sk (F, a, θ) = ak e(2πF kT +θ)
we easily obtain
N0
MCRB(F ) = Pk0 +L−1 . (3.26)
4π 2 T 2 k=k0 k 2 |ak |2
This result depends on the value of k0 . Since we are interested in the tight-
est possible lower bound, we can choose the value of k0 giving the largest
possible value of (3.26) and thus the lowest possible value of the denomina-
tor. Assuming that the symbols in the preamble are always the same when
changing k0 and that L is odd, the lowest possible value is obtained when
k0 = − L−1
2
. It is thus
N0
MCRB(F ) = P L−1 .
4π 2 T 2 2
k=− L−1
k 2 |ak |2
2
L−1
This expression for the MCRB(F ) allows also to optimize symbols a−2L−1
2
in the preamble. In fact, they have to be selected is such a way the term
P L−1
2
k=− L−1
k 2 |ak |2 is maximized. For M-PSK signals, being |ak | = 1 and
2
P
considering that nk=1 k 2 = n(n+1)(2n+1)
6
, we obtain
3N0
MCRB(F ) = .
π 2 T 2 (L2 − 1)L
♦
Example 3.9 Let us now consider the phase estimation. The channel
model is given by (3.2) where now symbols are obviously uncoded. The pdf
f (r0L−1 |a0L−1 , θ) can be expressed as (see (3.6))
L ( L−1
)
1 1 X
f (r0L−1|a0L−1 , θ) = exp − |rk |2 + |ak |2
2πN0 2N0 k=0
( )
1 XL−1

· exp rk a∗k cos [θ − φ(r, a)]
N0 k=0

and, thus, the ML-DA estimate results to be

"L−1 #
X
θ̂DA = φ(r, a) = arg rk a∗k .
k=0
We clearly obtained an OL estimator. The estimate can be used to demodu-

late the whole packet, provided that the channel phase varies so slowly in such
a way it can be considered constant during the packet duration. Let us inves-
tigate the performance of this estimator,Pconsidering the casePof an M-PSK
L−1 L−1
modulation with |ak | = 1. Defining r = k=0 rk a∗k = Leθ + k=0 wk a∗k , the
PL−1
random variable w = k=0 wk a∗k is Gaussian with mean zero and variance
2N0 L. Since it is θ̂DA = arg[r], let us try to compute the argument of the
ℑ ℑ
|r| |w|
L θw
|r|
θr |w|
θ θr − θ θw − θ
ℜ L ℜ
(a) (b)
Figure 3.6: Phasor diagram required to compute θr .
complex number r. By representing all quantities on the complex plane (see

Fig. 3.6(a), where we defined θr = arg[r] and θw = arg[w], and Fig. 3.6(b)
that represents the previous phasor diagram after a rotation of θ), we can
write
|w| sin(θw − θ)
θ̂DA = θ + arctan .
L + |w| cos(θw − θ)
Taking into account that random variable θw − θ is statistically equivalent to
θw and assuming that the SNR is very high, we can approximate
ℑ[w]
θ̂DA ≃ θ + .
L
Under this high-SNR condition, we have
E{θ̂DA } = θ
h i2 N0
var{θ̂DA } = E θ̂DA − θ = .
L
The estimator is thus unbiased and, as shown in the following, it reaches the
CRB (computed in the next example) at least under the high-SNR condition.
For low values of the SNR, the estimator becomes biased and we can no more
compare it with the CRB which holds for unbiased estimators. ♦
Example 3.10 Let us compute the CRB for the case of the previous exam-
ple, that is for the DA phase estimate when L received samples are observed.
In this case, v = a, whereas u is an empty set. The CRB and the MCRB
thus coincide and we can employ (3.23) that now takes the form
N0
CRB(θ) = MCRB(θ) = P
∂sk (a,θ) 2
k ∂θ
where
sk (a, θ) = ak eθ .
It thus results
N0
CRB(θ) = MCRB(θ) = P 2
k |ak |
that, in the case of M-PSK signals, becomes
N0
CRB(θ) = MCRB(θ) = .
L
♦
Example 3.11 We can obtain a CL version of the previous estimator. This

estimator, instead of waiting the availability of L received samples, before
processing them to provide an estimate, updates the previously provided
estimate once a new received sample becomes available.
Let us alternatively consider a scenario in which more pilot fields are
interleaved between data fields. A CL estimator can be turned on when re-
ceived samples corresponding to pilot fields arrive. It will provide an updated
estimate that can be used in the next data field till the next pilot field. In
this way, the possible variations of the channel phase can be tracked. In
other words, this estimator is tailored for a channel whose phase varies in
time and described by (3.3). Let us define
Λ(θ) = ln f (r0L−1|a0L−1 , θ) =
L L−1
" L−1
#
1 1 X 2 2 1 −θ
X
∗
= ln − |rk | + |ak | + ℜ e rk ak
2πN0 2N0 k=0 N0 k=0
whose derivative with respect to θ is

" L−1
# " L−1
#
dΛ(θ) 1 X 1 X
= ℜ −e−θ rk a∗k = ℑ e−θ rk a∗k .
dθ N0 k=0
N0
k=0
Considering the kth term of this derivative, we can recursively update the
phase estimate (according to the gradient algorithm) as
h i
θ̂k+1 = θ̂k + αℑ e−θ̂k rk a∗k (3.27)
in which α is a positive coefficient (the so called step-size) that selects the

amount of the adjustment in the estimate update. The larger the value of
α, the larger the estimator capability to track the phase variations of the
r(t) rk
p∗(−t)
t = kT
e−θ̂k
Error ak
LUT comput.
θ̂k ek = ℑ{rk a∗k e−θ̂k }

−1
αz
1−z −1
Figure 3.7: PLL block diagram.
channel. However, once the steady-state is reached, a larger value of α will

produce larger fluctuations around the equilibrium point, thus worsening
the steady-state performance. The value of α has thus to be chosen as a
compromise between these two conflicting needs, or reduced once the steady-
state condition is reached.
The recursive equation (3.27) defines a 1st order phase-locked loop (PLL)
whose block diagram is shownh in Fig. i 3.7. In this figure, we denoted by
ek the error signal ek = ℑ e −θ̂k
rk ak whereas LUT is a look-up table that
∗
provides e−θ̂k from the estimate θ̂k .

In order to investigate the performance of this CL estimator, we can
employ its equivalent model. Let us cut the feedback connection and compute,
when the loop in open, the quantity
S(φ) = E{ek |φk = φ}
where we defined φk = θk − θ̂k . In other words, we fix the difference between

the channel phase and the estimate and compute the mean value of the error
signal ek . This quantity will allow us to characterize the behavior of the
phase discriminator and is called S-curve since, as we will see, it looks like
a rotated “S”. It has an important role in the PLL equivalent model. In this
scenario, it results
n h i o
S(φ) = E{ek |φk = φ} = E ℑ e−θ̂k rk a∗k |θk − θ̂k = φ
n h i o
= E ℑ e−θ̂k (ak eθk + wk )a∗k |θk − θ̂k = φ = E{|ak |2 } sin φ .
The S-curve is shown in Fig. 3.8. This figure shows a stable equilibrium point
for φ = 0 and two unstable points for φ = ±π. When the PLL is in one of
these two unstable equilibrium points, it can get “trapped” in it for a long time
S(φ)
π
φ
−π
Figure 3.8: S-curve for a data-aided PLL.

νk
θk φk
S(φ)
θ̂k ek
αz −1
1−z −1
Figure 3.9: Equivalent model for a PLL.
until noise or a channel phase variation will perturb this condition—in this
case the PLL can move toward a stable equilibrium point. This phenomenon
is called hang-up and can significantly increase the acquisition time. The
existence of this stable equilibrium point for φ = 0 means that the estimator
is unbiased.
The PLL equivalent model is shown in Fig. 3.9. In this figure we denoted
by νk the component of the error signal having mean zero. We can thus write
ek = S(θk − θ̂k ) + νk .
In order to compute the mean square estimation error, we have to first com-
pute the power spectral density of the random process νk . Let us consider
an M-ary PSK transmission with |ak | = 1. It is
νk = ek − S(θk − θ̂k ) = ℑ[wk a∗k e−θ̂k ] .

Symbols ak have unit modulus. νk is thus statistically equivalent to the
imaginary part of wk . Hence, it is a discrete-time white process with samples
having power N0 .
νk
θk φk
Aφ
θ̂k ek
αz −1
1−z −1
Figure 3.10: Linearized equivalent model for a PLL.
At the steady-state, when the algorithm has reached a stable equilibrium

point, we can use the PLL linearized equivalent model that can be obtained
by substituting the memoryless nonlinear block S(φ) with its linear approx-
imation around φ = 0. In other words, by defining

dS(φ)
A= ,
dφ φ=0
block S(φ) can be replaced by a linear block with input-output characteristic

Aφ (Fig. 3.10). For an M-ary PSK transmission with |ak | = 1, we have
A = 1.
With reference to Fig. 3.10, the PLL transfer function is defined as

Φ(z) F (z)
H(z) = =−
V (z) θk =0 1 + AF (z)
where Φ(z) and V (z) are the Z-transforms of φk and νk , respectively, and
αz −1
F (z) = .
1 − z −1
Thus, it is
α
H(z) = − . (3.28)
z − (1 − Aα)
The mean square estimation error can thus be computed 2 taking into account
that the power spectral density of φk is N0 H(ej2πf T ) . Hence
1

ˆ
2T
var[θ̂k ] = E{[θk − θ̂k ] } = 2
E{φ2k } = N0 T H(e2πf T )2 df
1
− 2T
1

ˆ
2
= N0 H(e2πf T )2 d(f T ) = 2N0 Beq T |H(1)|2
− 21
having defined the PLL equivalent bandwidth as

1
1
ˆ
2
Beq T = H(e2πf T )2 d(f T ) . (3.29)
|H(1)|2 0
Considering that |H(1)|2 = A12 , we can easily compute the integral appearing
in (3.29) using the Parseval equality as
1 1
X
H(e2πf T )2 d(f T ) = 1 H(e2πf T )2 d(f T ) = 1
ˆ ˆ
2 2
|hn |2
0 2 − 12 2 n
where hn is the inverse transform of (3.28) and thus
hn = −α(1 − Aα)n−1 un−1
having denoted with un the discrete-time unit step function. Hence

1
1 αA
ˆ
2
Beq T = H(e2πf T )2 d(f T ) = .
|H(1)|2 0 2(2 − αA)
The mean square estimation error, in this case, coincides with the CRB
under the assumption that
A2
L= .
2Beq T
In general, it can be demonstrated that we need to make this substitution in
order to obtain the CRB for a CL estimator starting from the corresponding
CRB for OL estimators [26]. ♦
3.4.3 DD estimator
After an initial training period, sufficient to obtain an accurate enough es-
timate that can allow the detector to produce reliable decisions, in order to
track the parameter variations we can employ the same CL DA estimator
where known symbols are substituted by the decisions provided by the de-
tector. The resulting estimator is called DD. The ML-DD estimator can be
mathematically described as
θ̂DD = argmax f (r|â, θ) . (3.30)

θ
Sometimes, as for example in the case of CL phase estimation, the training

phase is not necessary and we can immediately employ a DD algorithm,
although accepting a longer acquisition time (a longer time to reach a stable

equilibrium point).
Particular attention must be paid when the receiver employs the VA. In
this case, in fact, this algorithm will provide its decisions possibly with a
delay of a few symbols (often more than ten). When the parameter to be
estimated changes in a fast way, we risk to update the estimate by employing
an error signal which is barely correlated with the actual parameter value.
In other words, decisions provided by the VA bring an information on our
parameter that can be very “old” with respect to its present value. A first
technique to solve this problem is represented by the use of preliminary de-
cisions. According to this technique, the VA is “augmented” to provide, in
addition to reliable decisions released with a large delay, also preliminary
decisions released with a very small delay (a few symbols at most). This de-
lay is properly optimized to maximize the performance. These preliminary
decisions are employed for estimation purposes only and are taken on the
survivor with the largest partial metric.
An alternative technique that provides a better performance but with also
a larger complexity, is represented by per-survivor processing (PSP) [27]. In
this case, multiple estimators are employed, one for each trellis state. Every
estimator is employed to compute the metrics for the branches extending
from the corresponding survivor and, for each of them, the error signal is
computed by using the decisions with delay zero taken on that survivor. An
intuitive explanation is the following. At every discrete-time instant we have
no idea on which survivor will be the winner at the end. By computing an
estimate for each survivor, we will be sure, however, that the winner survivor
will employ an estimate updated by using its own decisions that will be also
the final ones.
Example 3.12 Let us consider a CL-DD phase estimator. By employ-

ing preliminary decisions, the phase estimate is updated according to the
following recursion
h i
θ̂k+1 = θ̂k + αℑ e−θ̂k rk−d ĉˆ∗k−d . (3.31)
In (3.31), we denoted by ĉˆk−d the preliminary decisions with delay d < D,

where D is the decision depth of the VA. The block diagram of the corre-
sponding receiver is shown in Fig. 3.11.
When employing the PSP technique, we will have an estimator for each
trellis state σk . The estimate θ̂k+1 (σk+1 ) related to the survivor of state σk+1
will be updated according to this recursion
âk−D
r(t) rk
∗
p (−t) VA
t = kT
−θ̂k z −d
e
Error ĉˆk−d
LUT comput.
ek = ℑ{rk−dĉˆ∗k−d e−θ̂k }
−1
αz
1−z −1
Figure 3.11: DD-PLL based on preliminary decisions.

âk−D
r(t) rk
p∗(−t)
t = kT (1)
e−θ̂k VA
(2)
e−θ̂k
LUT
Error
comput. (2)
LUT c̆k
αz −1
1−z −1
Error
comput. (1)
c̆k
αz −1
1−z −1
Figure 3.12: DD PLL based on per-survivor processing.
h i
θ̂k+1 (σk+1 ) = θ̂k (σk ) + αℑ e−θ̂k (σk ) rk c̆∗k (σk ) (3.32)
where c̆k (σk ) is the decision taken on the survivor of state σk , having assumed
that the survivor of the state σk+1 at time k + 1 has been obtained by ex-
tending the survivor of the state σk at time k. The estimate θ̂k+1 (σk+1 ) will
be employed to compensate for the phase rotation introduced by the channel
into the received samples used to compute the metrics of branches at the
output of state σk+1 . The receiver block diagram is shown in Fig. 3.12 when
the VA operates over a trellis with only two states. ♦
Example 3.13 Let us consider again the CL-DD phase estimator, this time
for an uncoded transmission without ISI. A symbol-by-symbol detector can
be employed and we thus have no problems related to the decision delay. The
Figure 3.13: QPSK constellation.
phase estimate is updated according to this recursion

h i
θ̂k+1 = θ̂k + αℑ e−θ̂k rk â∗k . (3.33)
Let us compute the S-curve. We have

h i
−θ̂k ∗
S(φ) = E{ek |φk = φ} = E{ℑ e rk âk }
h i
= E{ℑ e−j θ̂k (ak eθk + wk )â∗k } = E{ℑ ak â∗k ejφ } . (3.34)
Assuming absence of noise, decision âk will depend, in a deterministic way, on

the phase error φ only. As an example, in the case of a QPSK transmission,
it is (see Fig. 3.13)


 âk = ak for − π4 < φ < π4


 π
 âk = ak e 2 = ak for π4 < φ < 43 π

 âk = ak eπ = −ak for 43 π < φ < 45 π


 π
 âk = ak e− 2 = −ak for − 34 π < φ < − π4
and, by plugging them into (3.34), we obtain the S-curve

 π π

 sin(φ) for − 4
<φ< 4



 sin(φ − π2 ) for π
< φ < 43 π
4
S(φ) = 3

 − sin(φ) for 4
π < φ < 45 π



 sin(φ + π2 ) for − 34 π < φ < − π4
shown in Fig. 3.14. This figure shows that the S-curve is periodic with a
S(φ)
φ
−π −π/2 0 π/2 π
Figure 3.14: S-curve for a DD-PLL and QPSK modulation.
period which is related to the angle of rotational symmetry of the employed

constellation. In this case, we have stable equilibrium points corresponding
to the values 0, ±π/2, ±π. In fact, the PLL will try to introduce a rotation
that allows to place the received samples onto the original constellation points
and, thus, it has no way to distinguish phase errors that are multiple of the
angle of rotational symmetry of the employed constellation. Thus, in order
to obtain correct decisions even in the presence of those errors, differential
encoding must be introduced. ♦
3.4.4 NDA estimator

The NDA estimator will perform the estimate with no information about
data, assumed independent and equally likely. The ML-NDA estimator based
on L received samples is thus
θ̂N DA = argmax f (r0L−1 |θ)

θ
where, assuming that we employ an M-ary alphabet A
X X X 1 L
f (r0L−1 |θ) = ··· f (r0L−1 |a0L−1 , θ) .
a0 ∈A a1 ∈A a ∈A
M
L−1
Example 3.14 Let us consider the OL NDA phase estimator for M-PSK
modulation. It is based on the pdf
X X X 1 L
f (r0L−1 |θ) = ··· f (r0L−1 |a0L−1 , θ)
a0 ∈A a1 ∈A aL−1 ∈A
M
X X X 1 L L−1
Y
= ··· f (rℓ |aℓ , θ)
a ∈A a ∈A a ∈A
M ℓ=0
0 1 L−1
L−1
Y X 1
= f (rℓ |aℓ , θ)
ℓ=0 a ∈A
M
ℓ
L−1
Y
= f (rℓ |θ) (3.35)
ℓ=0
where
X 1 M
X −1
1 m
f (rℓ |θ) = f (rℓ |aℓ , θ) = f (rℓ |aℓ = e2π M , θ)
aℓ
M m=0
M
M −1
1 X 1 1 m
2π M

θ 2
= exp − rℓ − e e
2πN0 m=0 M 2N0
n o
exp − 2N1 0 |rℓ |2 + 1 M X−1
1 1 −2π m −θ
= exp ℜ rℓ e M e .
2πN0 m=0
M N 0
We can thus write

θ̂N DA = argmax f (r0L−1|θ) = argmax ln f (r0L−1 |θ)
θ θ
L−1
X M
X −1
1 m
= argmax f (rℓ |aℓ = e2π M , θ)
ln
θ
ℓ=0 m=0
M
L−1
X M
X −1
1 1 −2π m −θ
= argmax ln exp ℜ rℓ e M e
θ
m=0
M N0
ℓ=0
L−1
X
= argmax T (rℓ , θ)
θ
ℓ=0
having defined
M
X −1
1 1 −2π m −θ
T (rℓ , θ) = ln exp ℜ rℓ e M e
m=0
M N0
M
X −1
1 1 −2π m −θ ∗ 2π m
θ

= ln exp rℓ e M e + rℓ e M e .
m=0
M 2N0
By using the Taylor series for the exponential and Newton’s binomial expan-
sion, we obtain
M
X −1
1 1 −2π m −θ ∗ 2π m
θ

T (rℓ , θ) = ln exp rℓ e M e + rℓ e M e
m=0
M 2N 0
(M −1 ∞ p )
X 1 X 1 1 −2π m −θ m p
∗ 2π θ
= ln rℓ e Me + rℓ e M e
m=0
M p=0
p! 2N 0
( M −1 ∞ p X p
X 1 X 1 1 p m
= ln rℓq e−2π M q e−θq
M p=0 p! 2N0 q
m=0 q=0
)
m
· (rℓ∗ )p−q e2π M (p−q) eθ(p−q)
(∞ )
X 1 1 p Xp
p

= ln rℓq (rℓ∗ )p−q eθ(p−2q) A(p − 2q)
p! 2N0 q
p=0 q=0
having defined
M −1
1 X  2π m(p−2q)
A(p − 2q) = e M
M m=0
1 1 − e2π(p−2q)
= .
M 1 − e 2π
M
(p−2q)
This term is always zero except when p − 2q is a multiple of M. In this case,

it takes on value 1. Taking into account that q ≤ p, we can consider the
following cases.
• p − 2q = 0. This value can be obtained by choosing (p, q) = (0, 0),
(p, q) = (2, 1), (p, q) = (4, 2), and so on. The corresponding terms in
the double summation appearing in T (rℓ , θ) do not depend on θ. We
will assume that we are working at low signal-to-noise ratio (SNR).
This is reasonable because it is expected that any estimator will work
well at high SNR and thus the goal is to be able to obtain an estimator
with a good performance for low SNR values. Under these conditions,
terms corresponding to larger values of p will be attenuated by a larger
p
1 1
coefficient (this coefficient is p! 2N0 ). Thus, we will keep only the
term corresponding to the pair (p, q) = (0, 0), which takes on value 1.
• p − 2q = M. This value can be obtained by choosing (p, q) = (M, 0),
(p, q) = (M + 2, 1), (p, q) = (M + 4, 2), and so on. In this case too, we
will keep only the term corresponding to the lower value of p, i.e., that
corresponding to the pair (p, q) = (M, 0).
• p−2q = −M. This value can be obtained by choosing (p, q) = (M, M),
(p, q) = (M + 2, M + 1), (p, q) = (M + 4, M + 2), and so on. In this
case too, we will keep only the term corresponding to the lower value
of p, i.e., that corresponding to the pair (p, q) = (M, M).
For the same reason, we will not consider values of p − 2q corresponding to
other integer multiples of M that would give higher values of p. The resulting
approximated expression of T (rℓ , θ) is thus
( M M )
1 1 1 1
T (rℓ , θ) ≃ ln 1 + (rℓ∗ )M eM θ + rℓM e−M θ
M! 2N0 M! 2N0
( " M #)
1 1
= ln 1 + 2ℜ rℓM e−M θ
M! 2N0
and, by introducing the further approximation ln(1 + x) ≃ x, we have

" M #
1 1
T (rℓ , θ) ≃ 2ℜ rℓM e−M θ .
M! 2N0
The NDA estimate results to be

L−1
" L−1
#
X X
θ̂N DA = argmax T (rℓ , θ) ≃ argmax ℜ e−M θ rℓM
θ θ
ℓ=0 ℓ=0
( " !#)
XL−1 L−1
X

= argmax ℜ e−M θ rℓM exp  arg rℓM
θ
ℓ=0 " !
ℓ=0
#
XL−1 L−1
X

= argmax rℓM cos arg rℓM − Mθ
θ
ℓ=0 ℓ=0
L−1
!
1 X
= arg rℓM .
M ℓ=0
This estimator is called Mth power estimator. By raising the received samples
to the power M, we are removing the modulation. It is however clear that,
π π
with this estimator, we can only estimate phase values in the range [− M , M ).
Differential encoding is a viable solution for this problem too.
By using computer simulations, it can be demonstrated that this esti-
mator reaches the CRB for medium/high SNR values (although it has been
derived under the assumption of low SNR). Its performance for low SNR
values can be improved by resorting to a generalization proposed by A. J.
Viterbi and A. M. Viterbi [28]. This generalization can be described in this
way. The Mth power estimator computes, for each received sample,
rℓM = |rℓ |M eM arg(rℓ ) .
In the Viterbi&Viterbi algorithm the phase estimate is computed as

L−1
!
1 X
θ̂N DA = arg F (|rℓ |) eM arg(rℓ )
M ℓ=0
where F (|rℓ |) is a proper function of |rℓ |. Obviously, when choosing F (|rℓ |) =

|rℓ |M we obtain again the Mth power estimator. A better performance can
be obtained when F (|rℓ |) = |rℓ |2 or F (|rℓ |) = 1 depending on the employed
modulation and the operating SNR. ♦
3.4.5 SDD estimator

In modern communication systems based on the use of powerful codes to
be decoded iteratively, more detection/decoding iterations are typically per-
formed. After each iteration, the decoder provides to the detection/synchro-
nization algorithm a new estimation
n ofothe a-priori probabilities of the coded
symbols. We will denote by P̂ (ck ) , where C is the constellation of
ck ∈C
coded symbols, the estimates of the a-priori probabilities of symbol ck avail-
able after a generic iteration (the so called soft decisions). By employing
these soft decisions, we can update the estimate of the unknown parame-
ter by employing an NDA algorithm in which symbols are considered inde-
pendento4 but not equally likely, and distributed according to probabilities
n
P̂ (ck ) . The resulting SDD algorithms is thus mathematically defined
ck ∈C
as
θ̂SDD = argmax fˆ(r0L−1 |θ)
θ
having defined
X X X
fˆ(r0L−1 |θ) = P̂ (c0 ) P̂ (c1 ) · · · P̂ (cL−1 )f (r0L−1 |c0L−1 , θ)
c0 ∈C c1 ∈C cL−1 ∈C
4
The validity of this assumption is usually guaranteed by the presence of an interleaver
placed between the detector and the decoder. We will see that this interleaver is required
to allow iterative detection and decoding.
that represents the estimate of the pdf of the received samples given the
parameter value only, computed from the soft decisions.
We may notice that, before the first iteration, when the soft decisions
are not available, or when they are not yet reliable, the SDD estimator is an
NDA estimator. When the iterations go on and soft decisions become more
reliable, the SDD estimator becomes a DA estimator.
Example 3.15 For the case of phase estimate, through manipulations sim-
ilar to those giving (3.35), we obtain
L−1
Y
fˆ(r0L−1 |θ) = fˆ(rℓ |θ)
ℓ=0
where X
fˆ(rℓ |θ) = P̂ (cℓ )f (rℓ |cℓ , θ) (3.36)
cℓ ∈C
with
1 1
θ 2
f (rℓ |cℓ , θ) = exp − rℓ − cℓ e .
2πN0 2N0
The SDD phase estimate becomes
L−1
X
θ̂SDD = argmax fˆ(r0L−1 |θ) = argmax ln fˆ(r0L−1 |θ) = argmax ln fˆ(rℓ |θ)
θ θ θ
ℓ=0
L−1
( )
X X 1 2
= argmax ln P̂ (cℓ ) exp − rℓ − cℓ eθ .
θ
ℓ=0 cℓ ∈C
2N0
An exact closed-form expression for θ̂SDD is not available. A commonly

employed approximation is the following. Pdf (3.36) is a linear combination
of Gaussian pdfs (a Gaussian mixture). It can be approximated with a
Gaussian pdf with the same mean and the same variance.5 Being
E{rℓ |θ} = eθ E{cℓ } = eθ γℓ

var {rℓ |θ} = E{|rℓ |2 | θ} − |γℓ |2 = E{|cℓ |2 } + 2N0 − |γℓ |2
= βℓ + 2N0 − |γℓ |2
5
This approximation corresponds to the use of the Gaussian pdf in the sense of the
minimum Kullback-Leibler distance, see Appendix C.
having defined
X
γℓ = E{cℓ } = P̂ (cℓ )cℓ
cℓ ∈C
2
X
βℓ = E{|cℓ | } = P̂ (cℓ ) |cℓ |2
cℓ ∈C
we thus use the approximation [29]

( )
1 rℓ − γℓ eθ 2
fˆ(rℓ |θ) = exp −
π βℓ + 2N0 − |γℓ |2 βℓ + 2N0 − |γℓ |2
( )
1 |rℓ |2 + |γℓ |2
= exp −
π βℓ + 2N0 − |γℓ |2 βℓ + 2N0 − |γℓ |2
( )
2ℜ rℓ γℓ∗ e−θ
· exp (3.37)
βℓ + 2N0 − |γℓ |2
from which we obtain

L−1
"L−1 #
X X rℓ γℓ∗
θ̂SDD = argmax ln fˆ(rℓ |θ) = arg .
θ
ℓ=0 ℓ=0
βℓ + 2N0 − |γℓ |2
In the case of a CL estimation, we have

d ln fˆ(rℓ |θ) 1 dfˆ(rℓ |θ)
θ̂ℓ+1 = θ̂ℓ + α = θ̂ℓ + α
dθ
θ=θ̂ℓ
fˆ(rℓ |θ) dθ θ=θ̂
P h i n h io
ℓ
−θ̂ℓ ∗ 1 −θ̂ℓ ∗
cℓ ∈C P̂ (cℓ )ℑ rℓ e cℓ exp N0 ℜ rℓ e cℓ
= θ̂ℓ + α P n h io
1 −θ̂ℓ c∗
cℓ ∈C P̂ (c ℓ ) exp N0
ℜ r ℓ e ℓ
or, by using approximation (3.37),

γℓ∗
θ̂ℓ+1 = θ̂ℓ + αℑ rℓ e−θ̂ℓ
βℓ + 2N0 − |γℓ |2
which has a simpler implementation. ♦
Example 3.16 We here considered estimators based on the ML criterion,

only. However, when the statistical information is available, we can also use
a Bayesian estimator. Still with reference to the SDD phase estimation, let
us consider the case of a channel phase modeled according to the Wiener

model:
θℓ = θℓ−1 + ∆ℓ
where {∆ℓ } is a discrete-time white Gaussian process with mean zero and
variance σ∆
2
. It is thus6

1 1 2
f (θℓ |θℓ−1 ) = p exp − 2 (θℓ − θℓ−1 )
2
2πσ∆ 2σ∆
and, in addition,
L−1
Y
f (θ 0L−1 ) = f (θ0 ) f (θℓ |θℓ−1 )
ℓ=1
1
with f (θ0 ) = 2π
. We can thus resort to MAP estimation:
(M AP )
θ̂ SDD = argmax fˆ(r0L−1 |θ 0L−1 )f (θ0L−1 )
θ 0L−1
n h i o
= argmax ln fˆ(r0L−1 |θ 0L−1 ) + ln f (θ 0L−1 )
θ L−1
0
"L−1 L−1
#
X X
= argmax ln fˆ(rℓ |θℓ ) + ln f (θℓ |θℓ−1 ) + ln f (θ0 ) .
θ L−1
0 ℓ=0 ℓ=1
Let us define
"L−1 L−1
#
X X
Γ(θ 0L−1 ) = γ(rℓ , θℓ ) + ln f (θℓ |θℓ−1 )
ℓ=0 ℓ=1
where7
γ(rℓ , θℓ ) = ln fˆ(rℓ |θℓ ) .
The MAP estimator can be obtained through the equation

∇Γ(θ 0L−1 )θ=θ̂ = 0 .
It is

∂Γ ∂γ(rℓ , θℓ ) 1 1 1 1
= − 2 θ̂ℓ + 2 θ̂ℓ−1 − 2 θ̂ℓ + 2 θ̂ℓ+1 = 0 (3.38)
∂θℓ θ=θ̂ ∂θℓ θℓ =θ̂ℓ σ∆ σ∆ σ∆ σ∆
6
Notice that, since the channel phase is defined modulo 2π, pdf f (θk+1 |θk ) can be
approximated as Gaussian only when σ∆ ≪ 2π.
7
As far as γ(rℓ , θℓ ) is concerned, we can use either the exact expression or the approx-
imation (3.37).
3.5 – A general technique to obtain a sufficient statistic 111
thus we have to find a way to solve this nonlinear system of equations. To

simplify the notation let us denote

∂γ(rℓ , θℓ )
δ(rℓ , θ̂ℓ ) = .
∂θℓ θℓ =θ̂ℓ
An approximate way to solve this equation (3.38) is to resort to the following

forward and backward recursions
2
(f ) (f )σ∆ (f )
θ̂ℓ = θ̂ℓ−1 + δ(rℓ , θ̂ℓ−1 )
2
2
(b) (b) σ (b)
θ̂ℓ = θ̂ℓ+1 + ∆ δ(rℓ , θ̂ℓ+1 )
2
obtaining, at the end, the following approximate MAP estimation
(f ) (b)
θ̂ + θ̂ℓ
θ̂ℓ = ℓ .
2
♦
3.5 A general technique to obtain a sufficient

statistic
At the end of this chapter, we would like to introduce a possible general
technique to obtain a sufficient statistic in case of transmissions over a time-
varying channel [30]. We will assume that the useful signal, after propagation
over the channel that can modify and widen its spectrum, has a complex en-
velope of limited bandwidth B and is also affected by additive white Gaussian
noise.
We will first filter the complex envelope of the received signal r(t) with a
filter having bandwidth Bv ≥ B and frequency response
(
1 for |f | ≤ Bv
HBL (f ) = ,
0 for |f | > Bv
thus obtaining signal rBL (t). It is composed by a useful component, which

is clearly unmodified, since the filter acts on the noise only, and by a noise
with limited bandwidth Bv . This signal is clearly a sufficient statistic for
detection since the operated filtering is a reversible transformation. In fact,
this transformation can be reverted by simply adding to rBL (t) a proper
|H(f )|2 |H(f − 1/Tc)|2
signal
spectrum
f
1−δ 1 1+δ
2Tc 2Tc 2Tc
B
Figure 3.15: Squared magnitude of the front end filter H(f ).
Gaussian process with constant power spectral density for |f | > Bv and
independent of the original noise affecting r(t). Signal rBL (t), having limited
bandwidth Bv , can be now sampled at the Nyquist frequency fc = 1/Tc =
2Bv and the resulting samples {rBL (kTc )} represent a discrete-time sufficient
statistic.8 Notice that the discrete-time additive noise affecting {rBL (kTc )}
is white.
This solution assumes the use of an ideal front end filter HBL (f ). Actually,
we can obtain another sufficient statistic which is statistically equivalent to
the previous one, by simply employing a filter whose frequency response H(f )
is flat over the signal bandwidth and whose square magnitude |H(f )|2 has
edges with vestigial symmetry around frequency 1/2Tc , as shown in Fig. 3.15.
In this way, in fact, the effect of the filter on the useful signal is clearly the
same as that of HBL (f ). As far as the noise is concerned, since
2
1
|H(f )| + H(f − ) = const.
2
Tc
the noise after sampling with frequency fc = 1/Tc has certainly a power
spectral density which is constant as in the previous case. A possible filter
H(f ) satisfying the above mentioned conditions is a filter with a root raised
cosine frequency response with roll-off δ, bandwidth 1+δ
2Tc
and such that 1−δ
2Tc
>
B.
8
In general, we will have more than one sample per symbol interval.
3.6 Exercises
Exercise 3.1. Let us consider a coded BPSK signal over an AWGN channel
that also introduces an unknown phase modeled as a discrete random variable
that takes the values {0, π} with the same probability. Compute the metric
of the optimal MAP sequence detector. Is the use of differential encoding
necessary?
Exercise 3.2. Let us consider a coded PSK signal over an AWGN channel
that also introduces a positive attenuation, unknown at the receiver and
modeled as a random variable with exponential distribution. Describe in
detail the structure of the optimal MAP sequence detector. In particular,
state if an UMP test exists.
Exercise 3.3. Let us consider a transmission over a flat fading channel.

Let the samples at the output of the front end filter be
rk = fk ck + wk
where symbols ck ∈ {±1, ±j}, possibly coded, thus belong to a QPSK con-
stellation, the fading process fk has autocorrelation Rf (m) = α|m| , and wk
are the complex noise samples, with variance σ 2 per component. Compute
the branch metrics of a receiver based on linear prediction with C = 2, by
specifying the expression of prediction coefficients and the prediction error.
Exercise 3.4. Let us consider an uncoded M-PSK transmission. Samples

at the matched filter (MF) output have expression
xk = ak ejθ + wk .
• Design a closed-loop data-aided phase synchronizer based on a 1st-order

PLL and compute the S-curve in the absence of noise.
• Assuming that the samples at the MF output have expression
xk = ak ej[2πνkT +θ] + wk
i.e., an uncompensated frequency offset is also present, prove that, still

in the absence of noise, the I-order PLL designed at the previous step
is able to reach a stable equilibrium point for a non-zero phase error,
that is in the steady-state θ̂k = 2πνkT + θ − ǫ, where ǫ is the phase
error.
Exercise 3.5. Let us consider a BPSK transmission and the M-th power
estimator. Design a closed-loop implementation for this estimator. In par-
ticular:
• compute the error signal and draw the PLL structure;
• compute and plot the S-curve.

Chapter 4
Codes in the signal space
4.1 Continuous phase modulations

Having introduced the general model (1.6) for modulated signals with mem-
ory, we are now able to describe an important class of phase modulations
characterized by a continuous phase and a constant envelope, known as con-
tinuous phase modulations (CPMs) [31]. Due to their constant envelope,
they do not require that the final stages of radio-frequency power amplifiers
are linear, thus allowing the use of low cost amplifiers in strong saturation
regime. Because the phase is continuous and, as we will see, constrained to
have highly structured variations, these modulation formats have spectral
characteristics that make them appealing for applications which require, for
the above mentioned reasons, a signal with a constant envelope. In par-
ticular, some of the modulations in this class have, at the same time, high
spectral and energy efficiencies [32, 33]. However, the continuity and the
structure of phase variations introduce a memory in the modulated signal.
The complex envelope of a CPM signal is
r ( " #)
2Es X
s̃(t; a) = exp  2πh ai q(t − iT ) + θ (4.1)
T i
where Es is the energy per symbol, T is the signaling interval, h is a constant

value called modulation index, ak are the information symbols belonging to
the alphabet {±1, . . . , ±(M − 1)}, q(t) is the so-called phase-smoothing re-
sponse, also improperly known as phase pulse, which is continuous and such
that
q(t) = 0 for t < 0
1
q(t) = for t > LT
2
115
116 Codes in the signal space
g(t)
0
T 2T t
q(t)
1
2
T 2T t
Figure 4.1: Examples of frequency pulse and phase-smoothing response.
and θ is the initial phase. The integer parameter L is known as correlation

length of the CPM signal. The phase-smoothing response can be expressed
as the integral of a frequency pulse g(t) defined as
dq(t)
g(t) =
dt
and having support in the interval [0, LT ]. The frequency pulse satisfies the
following normalization condition
ˆ LT
1
g(t) dt = .
0 2
Hence, it results that ˆ t
q(t) = g(τ ) dτ .
−∞
Fig. 4.1 reports an example of frequency pulse, with the corresponding phase-
smoothing response.
A CPM signal can be equivalently expressed as follows
r ( " ˆ t X #)
2Es
s̃(t; a) = exp  2πh ai g(τ − iT ) dτ + θ
T −∞ i
where it is possible to recognize the instantaneous frequency

X
h ai g(t − iT ) .
i
4.1 – Continuous phase modulations 117
Hence, a CPM signal can be interpreted as a frequency modulation of a lin-

early modulated signal with shaping pulse g(t). These observations allow us
to design a CPM modulator as reported in Fig. 4.2. The frequency modula-
tor (FM) shown in the figure responds to a modulating signal x(t) with the
signal
r ˆ t
2Es
y(t) = cos 2πf0 t + 2πh x(τ ) dτ + θ .
T −∞
Hence, it results
r " ˆ t X #
2Es
s(t; a) = cos 2πf0 t + 2πh ai g(τ − iT ) dτ + θ . (4.2)
T −∞ i
Let us now consider the slice of signal in the generic interval [kT, (k+1)T ].
Remembering the general model (1.6) for modulated signals, we have
r ( " #)
2Es X
s̃(t; a) = s̃(t − kT ; ak , σk ) = exp  2πh ai q(t − iT ) + θ .
T i
(4.3)
The limits of the summation in (4.3) have not been specified because they
can extend from −∞ to ∞, if we imagine a signal of infinite duration, or from
0 to K − 1, if we consider the transmission of a finite number of symbols. In
any case, in (4.3) the summation can be stopped to the k-th term, because
all following terms do not contribute to that slice. In fact, in the interval
[kT, (k +1)T ] we have that q[t−(k +1)T ] = 0, q[t−(k +2)T ] = 0, etc., as also
clear from Fig. 4.3. If we remember that the frequency pulse has support in
the interval [0, LT ], and hence that the phase-smoothing response is constant
for t > LT , the signal phase can be expressed by means of the sum of three
contributions (besides the initial phase θ). By referring, for example, to a
transmission of finite duration, the phase of the complex envelope can me
expressed as
P
{ak } i ai g(t − iT ) FM MOD s(t; a)
g(t)
h
Figure 4.2: CPM modulator.

L=2
ak−2 q[t − (k − 2)T ]
ak−1 q[t − (k − 1)T ]
ak q[t − kT ]
ak+1 q[t − (k + 1)T ]
kT (k + 1)T
Figure 4.3: A few terms in the summation appearing in (4.3).
k−L
X 1
φ(t; ak , σk ) = 2πh ai
i=0
2
k−1
X
+2πh ai q(t − iT )
i=k−L+1
+2πhak q(t − kT ) + θ kT ≤ t < (k + 1)T .
The first term depends on “old” symbols, whose pulse q(t) has already reached
the final value 1/2. This term is called phase state
k−L
X
ϕk = πh ai mod 2π (4.4)
i=0
in which, in general, we have

θ
θ mod 2π = θ − 2π
2π
where ⌊x⌋ denotes the greatest integer lower than or equal to x. The second
term depends on the more recent L − 1 symbols ak−L+1 , . . . , ak−1. This set of
symbols defines the so-called correlative state and, together with the phase
state, contributes to the definition of the modulator state at time instant kT ,
i.e.,
σk = (ak−1 , ak−2, . . . , ak−L+1 ; ϕk )
| {z } |{z}
correlative state phase state
Given the present symbol ak and the state σk , we can thus compute the
signal phase φ(t; ak , σk ), and then the signal itself as
r
2Es
s̃(t − kT ; ak , σk ) = exp {φ(t; ak , σk )} kT ≤ t < (k + 1)T .
T
At time instant t = (k + 1)T , that is at the end of the present signaling
interval, the successive modulator state becomes
σk+1 = (ak , ak−1 , . . . , ak−L+2 ; ϕk+1)
where the new correlative state is obtained by a simple shift, and the new
phase state is given by
ϕk+1 = ϕk + πhak−L+1 mod 2π .
It takes into account the contribution of that q(t) which has reached the final
value 1/2 at the end of the interval [kT, (k + 1)T ].
To evaluate the number of states of the modulator, let us first observe
that the number of correlative states is M L−1 . The number of phase states
can, in principle, be very large, or even infinite if we start the sum in (4.4)
from −∞. However, if we consider that the phase state must necessarily
belong to [0, 2π) and we remember that symbols belong to the alphabet
{±1, ±3, . . . , ±(M − 1)}, it is possible to verify that the number of phase
states is finite if the modulation index is rational
n
h=
p
where n e p are relatively prime integers. If n is even, the possible values of
the phase state are

n 2n (p − 1)n
ϕk ∈ 0, π , π , . . . , π n even
p p p
and the number of different phase states is p. If n is odd, the possible values
of the phase state are

n 2n (2p − 1)n
ϕk ∈ 0, π , π , . . . , π n odd
p p p
and the number of different phase states is 2p. However, the phase state can
only take p of them at even instants, and the remaining p at odd instants.
Hence, the overall number of states of a CPM modulator results in any case
to be pM L−1 .
Remark 4.1. A CPM signal can then be represented by using the model
for signals with memory described in Chapter 1. In other words, a CPM
signal can be expressed as the cascade of an encoder (actually a system with
memory, described by the state transition table) and a memoryless modu-
lator (described by the waveform table), concentrating in the encoder the
memory source [34]. Hence, it is not necessary to investigate MAP sequence
detection for CPMs, since it can be implemented as described in Chapter 2.
The optimal receiver employs a bank of M L−1 matched filters and
operates on a trellis with pM L−1 states, as discussed in Excercise 4.1.♦
Remark 4.2. It is often convenient to adopt an integer representation of

the phase state and of information symbols, that allows to work with a new
phase state which always belongs to the same alphabet. In fact, if we define
ak = 2āk − (M − 1) (4.5)
ϕk = −πh(M − 1)k + 2πhϕ̄k (4.6)
we have that āk ∈ {0, 1, . . . , M − 1}, ϕ̄k ∈ {0, 1, . . . , p − 1}, and moreover the
integer ϕ̄k can be recursively updated using the expression
ϕ̄k+1 = [ϕ̄k + āk−L+1 ] mod p . (4.7)
With this new notation, it is possible to express the phase φ(t; ak , σk ) for
kT ≤ t < (k + 1)T as
φ(t; ak , σk ) =2πhϕ̄k − πh(M − 1)k

k−1
X
+ 2πh [2āi − (M − 1)]q(t − iT )
i=k−L+1
+ 2πh[2āk − (M − 1)]q(t − kT ) + θ .
simplifying in this way the notation because the new phase state ϕ̄k takes on
values belonging to the same alphabet, independently of the instant k (even
or odd). ♦
CPMs can be classified in two categories:
• full response, when L = 1;
• partial response, when L > 1.
An important case is that of full response CPMs with linear phase-smoothing

response, i.e., with a rectangular frequency pulse, as reported in Fig. 4.4. In
this case, the phase can be expressed as
k−1
X
φ(t; ak , σk ) = πh ai + 2πhak q(t − kT ) + θ
i=−∞
k−1
X t − kT
= πh ai + 2πhak + θ kT < t < (k + 1)T .
i=−∞
2T
The modulated signal, in this interval, results to be

s(t; a) = ℜ s̃(t; a)e2πf0 t
r " k−1
#
Es X t − kT
= 2 cos 2πf0 t + πh ai + 2πhak +θ
T i=−∞
2T
r " k−1
#
Es h X
= 2 cos 2π f0 + ak t − πhkak + πh ai + θ .
T 2T i=−∞
We can then conclude that this is a frequency shift keying (FSK) modulation.
Since the phase is continuous, these full response CPMs with linear phase-
smoothing response are also called continuous phase FSK (CPFSK). In the
special case of a binary modulation with h = 1/2, since information symbols
belong to the alphabet {±1}, the possible frequency values are f0 ± 1/4T .
The difference between these frequency values is 1/2T , that is the minimum
to ensure the orthogonality of signals on a signaling interval (for coherent
demodulation). For this reason, a binary CPFSK with h = 1/2 is also called
g(t) q(t)
1 1
2T 2
0 T t 0 T t
Figure 4.4: Frequency pulse and phase-smoothing response for CPFSK mod-
ulations.
2π
3π
2
π
2
0 T 2T 3T 4T 5T t
− π2
−π
− 3π
2
−2π
Figure 4.5: Phase tree for an MSK modulation.
minimum shift keying (MSK). In this case, the possible phase states are

π 3π
ϕk ∈ 0, , π, .
2 2
Notice that in full response CPMs the correlative state disappears and the
modulator state coincides with the phase state.
A way to represent the phase evolution in a CPM is through the so called
phase trees. Let us assume that the initial phase θ = 0, that the phase state
is ϕ0 = 0, and consider, for the sake of simplicity, the MSK modulation. In
the interval 0 < t < T , the phase has expression
t
φ(t; a0 , ϕ0 ) = πa0 a0 = ±1 .
2T
In the next interval, T < t < 2T , it has expression
t−T
φ(t; a1 , ϕ1 ) = ϕ1 + πa1 a1 = ±1
2T
where ϕ1 = a0 π/2. If we proceed in this way, we can build the phase tree
of an MSK, as reported in Fig. 4.5. Notice that at time instants kT , with
k even, the possible phase state values are 0 and π, while when k is odd
the possible phase states values are π/2 and 3π/2. Moreover, we can always
write
ϕk = (ϕk−1 + πhak−L ) mod 2π .
We can observe that, because the phase-smoothing response is linear, in a
signaling interval the slope of the phase is the same as the frequency of the
signal, which is constant during that interval.
Let us now consider an example of a phase tree for a partial response
CPM. Let us now assume M = 2, h = 1/2, L = 2, and a linear phase-
smoothing response, as in Fig. 4.6. Phase states {ϕk } belong again to the
alphabet {0, π2 , π, 3π
2
}, while modulator states are defined as
σk = (ak−1 , ϕk ) .
In this case, the phase during the interval kT < t < (k + 1)T has expression
k−2
π X t − (k − 1)T t − kT
φ(t; ak , σk ) = ai + πak−1 + πak +θ .
2 i=−∞ 4T 4T
If we assume ϕ0 = 0, a−1 = 1 and θ = −π/4 we can express the behavior of

the phase in the generic interval kT < t < (k + 1)T as
π t − (k − 1)T π t − kT
φ(t; ak , σk ) = ϕk + ak−1 + ak +θ
2 2T 2 2T
where
π
ϕk = ϕk−1 +ak−2 mod 2π .
2
We can then build the tree diagram in successive steps.
• k = 0: it is clearly ϕ0 = 0 and the diagram starts as in Fig. 4.7.
q(t)
1
2
2T t
Figure 4.6: Phase-smoothing response for the second considered case.

Figure 4.7: Phase tree in [0, T ].
• k = 1: we obtain ϕ1 = π/2, hence the diagram evolves as in Fig. 4.8.

• k = 2: we have ϕ2 = π/2 + a0 π/2 and the diagram evolves as shown in
Fig. 4.9.
We can now draw the tree diagram at a generic step, which results as in
Fig. 4.10. By proceeding similarly, we can obtain the complete tree diagram
reported in Fig. 4.11.
The interpretation, previously introduced for the MSK modulation, for
which the slope of branches corresponds to the signal frequency, still holds.
However, in this case there are 3 possible frequency values although, because
of the constraint introduced by the correlation length of g(t), at each node
of the tree only 2 of the 3 frequency values are possible, depending on the
previous evolution of the phase.
Finally, notice that, in general, phase trajectories are in a one-to-one
correspondence with information sequences, given an initial phase state.
CPMs with a rectangular frequency pulse of length LT are denoted as L-
REC (RECtangular ). The signal phase is clearly continuous, but the phase
derivative has discontinuities at points kT . Besides rectangular frequency
pulses, in general with length LT , a class of pulses that make phase trajecto-
ries “softer” because also the phase derivative is continuous, is that of raised
cosine pulses. Fig. 4.12 shows one of these frequency pulses with length LT .
These CPMs are denoted as L-RC (Raised Cosine).
Figure 4.8: Phase tree in [0, 2T ].

Figure 4.9: Phase tree in [0, 3T ].
π
ϕk − 2
ak−1 = ak = 1
ak−1 = 1 , ak = −1
ϕk
π
ϕk − 4
π
ϕk − 2
ak−1 = −1 , ak = 1
ak−1 = ak = −1
ϕk − π
(k − 1)T kT (k + 1)T
Figure 4.10: Phase tree in the generic interval.

2π
3π
2
π
+1
π -1
2
+1
0 +1
-1 T 2T 3T 4T 5T t
-1
− π2
−π
− 3π
2
−2π
Figure 4.11: Complete phase tree.
As a further example of CPM, we can mention the Gaussian MSK (GMSK)

format, adopted by the GSM standard. The modulation format of the 2nd
generation European mobile wireless systems is a partial response CPM ob-
tained by filtering the MSK rectangular pulse with a filter whose frequency
response is Gaussian. Hence, for a GMSK the correlation length is theoreti-
cally infinite, although, in practice, we may consider it as finite.
Everything presented so far can be generalized to the case of the so called
multi-h CPMs, i.e., formats whose modulation index h is changed periodically
in time, or to the case of information symbols belonging to an odd M-ary
alphabet, i.e., {0, ±1, ±2, . . . }.
Remark 4.3. A CPM signal can be decomposed as the sum of a finite num-
ber of linearly modulated signals. This decomposition, originally proposed
by Laurent [35] for binary CPMs, has been then extended to M-ary CPMs
by Mengali and Morelli [36]. According to this decomposition, it is possible
to express the complex envelope (4.1) of a CPM signal as
F
X −1 X
s̃(t; a) = αm,k pm (t − kT ) (4.8)
m=0 k
where F = (M − 1)2(L−1) log2 M is the number of linearly modulated signals

1
1 − cos( 2πt

g(t) = 2LT LT
)
1
LT
L LT t
2
T
Figure 4.12: L-RC frequency pulse.
that compose the CPM signal, {pm (t)} are the corresponding shaping pulses
and {αm,k } are the corresponding symbols (also known as pseudo-symbols).
The expressions of pulses {pm (t)} can be found in closed form from the
expression of the phase response q(t) of the CPM and from the value of the
modulation index. Similarly, symbols {αm,k } can be expressed as a function
of the information symbols {ak } and the modulation index. For instance,
symbol α0,k can be expressed as
( k
)
X
α0,k = exp πh ai = α0,k−1 exp {πhak } .
i=0
Notice, in particular, that the argument of the complex exponential is iden-

tical to the definition of the phase state of a CPM. This fact tells us that
α0,k can only take p values. Besides this, the recursive definition of α0,k tells
us that CPM signals have a sort of intrinsic differential encoding.
An important property of this decomposition is that a great amount of
the signal power is concentrated in the first M −1 linearly modulated signals,
the so-called principal components. In other words, the approximation
M
X −2 X
s̃(t; a) ≃ αm,k pm (t − kT ) (4.9)
m=0 k
is generally characterized by an excellent accuracy. At the receiver, it is then

possible to adopt this approximation to design simplified receivers. Before we
briefly describe these receivers, we recall an important property of symbols
{αm,k }M −2
m=0 of the principal components. It can be proved that these symbols
can all be expressed as a function of ak and α0,k−1 only. If we replace the
approximation (4.9) in (2.6) and take into account the fact that the envelope
of CPM signals is constant, we obtain, after steps similar to those derived in
Section 2.3, that the branch metric of the MAP sequence detection receiver
can be expressed as
"M −2 #
X
∗
λ(ak , α0,k−1 ) = ℜ xm,k αm,k + N0 ln P (ak ) (4.10)
m=0
where we defined
xm,k = r(t) ⊗ pm (−t)|t=kT
i.e., xm,k is the output of a filter matched to the m-th component, and we
have expressed the branch metric as a function of ak and α0,k−1 only, having
taken advantage of the previously mentioned property of principal pseudo-
symbols. A simplified receiver is then made of a bank of M − 1 matched
filters and a Viterbi detector that operates on a trellis with p states, with
branch metrics (4.10), thus significantly simplifying the complexity. ♦
4.2 Trellis coded modulations

In this section, we will consider again coded linear modulations according to
the scheme in Fig. 4.13. In particular, we will refer to codes for error cor-
rection which allow to reduce the error probability for a given transmitted
power and level of noise. These codes operate by introducing a certain degree
of redundancy on the transmitted signal that can be exploited by a proper
decoder. Usually, they introduce redundancy by associating the information
sequence with a coded sequence composed by a larger number of symbols.
If the code associates k information bits with n coded bits (with n > k), it
is said to have a code rate k/n. Examples of such codes are represented by
“traditional” block and convolutional codes (see Appendix D). Block codes
associate a codeword of n bits with each block of k information bits. Convo-
lutional codes operates in a continuous fashion on the information sequence,
by generating a code sequence that can be interpreted as the result of a dis-
crete binary convolution (i.e., in modulo 2 arithmetic, or in general in a finite
field of order q) between the information sequence and proper weights, called
code generators. As for block codes, n coded bits are generated every k in-
formation bits, but this time the n code bits at the encoder output not only
{ak } {ck } s(t)

COD p(t)
Figure 4.13: Block diagram for a coded linear modulation.

4.2 – Trellis coded modulations 129
depend on the k input bits at the same instant but also on previous infor-
mation bits. For a given receiver complexity, as an example a given number
of encoder states for a convolutional encoder, we can look for good codes,
i.e., for codes having the largest possible minimum Hamming distance, that
characterizes the dominant errors and thus the asymptotic performance.
Let us now assume that the source generates the information bits at a
given rate. If we encode the information by using an encoder with rate k/n,
due to the redundancy insertion we need to increase the signaling frequency
of our transmission system of a factor n/k. This means that the bandwidth of
the transmitted signal will be expanded of a factor n/k, i.e., of the inverse of
the code rate. In other words, the redundancy is introduced in the frequency
domain. As a reward, the code will provide an energy gain in the sense that,
for a given error probability, we can transmit a lower power. Error-correcting
coding is thus a valuable tool for transmission systems in which the power
is a limited resource, since they allow to spare power at the expense of a
bandwidth and a complexity (at both transmitter and receiver) increase.
Let us now consider a transmission system for which the bandwidth is
a limited resource and cannot be increased. The bandwidth increase can
be avoided by enlarging the signal set, that is, by employing a higher-order
constellation, to compensate for the redundancy introduced by the code.
This possibility is discussed in the following example.
Example 4.1. Let us consider an uncoded QPSK transmission. QPSK

symbols are transmitted with a signaling interval T and carries 2 information
bits. In the absence of ISI, a bit error rate of 10−6 is obtained for Eb /N0 =
10.5 dB. If we want to improve the system performance, we may encode the
information bits through a rate-2/3 binary code. Each QPSK symbol will
now carry 2 coded bits, and thus every 3 QPSK symbols (6 coded bits), we
will transmit 4 information symbols. Every QPSK symbol will thus carry 4/3
information bits and, hence, to match the information rate of the source, the
symbol interval should be reduced to 2T /3, thus expanding the bandwidth
by a factor 3/2. This bandwidth increase can be avoided by adopting an
8-PSK constellation instead of the original QPSK. In this way, every 8-PSK
symbol will carry 3 code bits and thus 2 information bits as in the uncoded
QPSK case.
We would like to compare the first uncoded QPSK system with the sec-
ond coded 8-PSK system. Fig 4.14 shows these two systems and the relevant
constellations. In the figure, “S/P CONV” is a serial-to-parallel converter,
whereas “MAP” denotes the constellation mapping. In the absence of encod-
ing, the power loss of the 8-PSK with respect to the QPSK is about 3.5 dB,
for a same error probability. We could conclude that the coded system could
guarantee a gain only if a very powerful code, with a gain much larger than
the intrinsic penalty of 3.5 dB is employed. Since codes with a gain of at
least 3.5 dB are very complex, it seems that the price to be paid is a very
high decoding complexity. ♦
This point of view implicitly assumes that modulation and coding are sep-
arately designed. We will see that a change of perspective is required. In fact,
coding and modulation must be designed jointly while the receiver, instead of
performing demodulation and decoding in two separate steps, must combine
the two operations into one. In this way, the parameter governing the system
performance will be no more the minimum Hamming distance but, at least
on AWGN channels in the absence of ISI, the minimum Euclidean distance
between the transmitted sequences. The idea behind coded modulations is
thus that coding must have the goal of the maximization of the minimum
Euclidean distance dmin between any possible pair of sequences.
Trellis-coded modulation (TCM) is indeed a technique based on the com-
bination of coding and modulation to increase the efficiency in bandlimited
environments. It was originally described in the seminal work of G. Unger-
boeck and I. Csajka [37] and clearly formalized by G. Ungerboeck in 1982 [38]
with reference to the AWGN channel. Before going into the details of this
technique, we have to introduce the concept of set partitioning. It is a way
to partition the original constellation that we employ on the channel, in sub-
sets whose elements have increasing Euclidean distance. The following two
examples show how this partitioning can be implemented.
Example 4.2 Let us consider an 8-PSK √ constellation. The constellation

symbols are all on the circle of radius ES . With iterative partitioning
S/P QPSK p(t) √

CONV MAP d= 2ES
S/P ENCOD 8PSK p(t)

CONV 2/3 MAP
√
d = 2 ES sin π8
Figure 4.14: Systems to be compared: an uncoded system using QPSK, and

a rate-2/3 coded system based on the 8-PSK constellation.
we can obtain subsets whose elements are characterized by an increasing

Euclidean distance, as reported in Fig. 4.15. ♦
A
011
100 010
√
d0 = 2 ES sin π8
101 001
110 000
B0 111 B1
√
d1 = 2ES
C0 C2 C1 C3 √
d2 = 2 ES
D0 D4 D2 D6 D1 D5 D3 D7
000 100 010 110 001 101 011 111
Figure 4.15: Set partitioning for the 8-PSK constellation.
Example 4.3 Let us consider a 16-QAM constellation. Proceeding as in

the previous example, we obtain the partitioning reported in Fig. 4.16. In
this case, every partitioning
√ increases the minimum distance of the points of
every subset of a factor 2. ♦
It is possible to immediately identify the constellation points belonging
to the different subsets by adopting a proper mapping rule, the so called
mapping by set partitioning. As an example, let us consider Fig. 4.15. Con-
stellation points {D0 , D2 , D4 , D6 } belonging to the subset B0 are associated
with triplets of bits whose least significant bit (LSB) is zero, whereas points
{D1 , D3 , D5 , D7 } belonging to subset B1 are associated with triplets of bits
whose LSB is one. The LSB thus allows to identify which first level subset
(B0 or B1 ) the point belongs to. Similarly, the two LSBs of the triplet allow
to identify which second level subset (C0 , C1 , C2 , or C3 ) the point belongs
to. This mapping is useful when implementing the encoder and the decoder.
B0 B1
C0 C2 C1 C3
1011 1001 0111 0101
0011 0001 1111 1101
1010 1000 0110 0100
0010 0000 1110 1100
D0 D1 D4 D5 D2 D3 D6 D7
1011 1001 0111 0101
0011 0001 1111 1101
1000 1010 0100 0110
0000 0010 1100 1110
Figure 4.16: Set partitioning for the 16-QAM constellation.
We will now discuss how the set partitioning can be used in the design
of a coded modulation. In the most general case, the block diagram of the
encoder/modulator is shown in Fig. 4.17. Among the k information bits at
the encoder input, k1 are coded, by using an encoder of rate k1 /n, obtaining
n code bits, whereas the remaining k2 = k − k1 bits are left uncoded. Hence,
the overall encoder has a rate k/(k2 +n). The group of n bits at the output of
the binary encoder is used to select one of 2n possible subsets in a proper way,
as discussed in the following, whereas the group of k2 uncoded bits is used
to select one of the 2k2 points of a subset. In fact, the information related
to the subset needs a higher protection, whereas the information related to
the point within the subset is intrinsically more protected, since the points
within a subset are at the largest possible distance, according to the set
partitioning principle. It is also clear that it is not necessary to carry out
the set partitioning as far as we have subsets of one point only. In fact, it is
sufficient to stop at a partitioning level such that we have 2n subsets of 2k2
points each.
TCMs are based on this idea. To be more precise, in TCMs the code rate
is k/(k + 1) (or equivalently, the employed binary code has rate k1 /(k1 + 1))
and the binary encoder in the block diagram of Fig. 4.17 can be a con-
volutional encoder (and thus linear) or a non-linear encoder defined as a
k = k1 + k2 k2
.. Select
. 2
k1 + 2 point from
1 subset
k1 + 1 Constellation
n
k1 point
Rate-k1/n ..
.. . Select
.
binary
2 2 subset
encoder 1
1
m = n + k2 Mapper
Figure 4.17: Conceptual structure of a coded modulation for AWGN chan-

nels.
finite-state machine with a time-invariant trellis.1 The number of states of

the overall encoder is given by the number of states of the binary encoder
since it is the only source of memory.
The encoder is completely defined once we specify the output and the
state transition functions. The state transition function, that specifies the
next state as a function of the current state and the input symbol, can be
specified by simply providing the encoder trellis. The output function, that
specifies the code symbol given the current state and the input symbol, can be
specified by labeling each trellis branch with the corresponding code symbol.
In TCM encoding, the labeling of the trellis branches with the constellation
points is made by trying to satisfy the following empirical rules provided by
Ungerboeck:
(a) all signal points should occur with the same frequency;
(b) parallel transitions, when present, must be assigned to signal points

belonging to the same subset, in such a way that they are separated by
the maximum Euclidean distance;
(c) transitions originating from or merging into the same state are assigned
to subsets that belong to a same subset of lower order.
1
It is possible to show that block codes can also be described by using a trellis, although
it is time-varying.
(0, 0) C0 0
C2 2 4
6
(1, 0) C2 C 2
0
0 6
4
C1 1
(0, 1) C3 7 3 5
C3 3 7
(1, 1) C1 1
5
(a) (b)
Figure 4.18: (a) Trellis of the binary encoder. (b) Trellis of the overall
encoder.
Rule (a) guarantees that the trellis code has a regular structure. Rule (b)
represents a formalization of the concept previously expressed that the n
bits at the output of the binary encoder must select the subset whereas the
uncoded bits select the point within the subset. Finally, rule (c) is important
to improve the overall performance of the code by a proper assignment of
the subsets to the coded bits. In order to understand this, let us consider
the following design.
Example 4.4 Let us assume that we want to design a 4-state TCM encoder
to be employed with the 8-PSK constellation. Since we need 3 bits to select
the points of a constellation of cardinality 8, it results k2 + n = 3. Since, as
said, in TCM the code rate is k/(k + 1), we will have k = 2 and thus the
information bits at the input of the scheme in Fig. 4.17 are 2. Let us denote
(1) (2)
these bits as am and am . With reference to Fig. 4.17, let us suppose that
we choose k1 = k2 = 1 (a different choice is considered in the Exercise 4.2).
Thus, it is n = 2. The binary encoder must select one of 2n = 4 subsets, i.e.,
one of the subsets C0 , C1 , C2 , and C3 of second level, whereas the uncoded
bit will select the point within these subsets.
(1) (1)
The state µm of the binary encoder is defined as µm = (am−1 , am−2 ).
the corresponding trellis is shown in Fig. 4.18(a). The trellis of the overall
encoder, shown in Fig. 4.18(b), is built by taking into account the further
(2)
input bit am that will originate parallel transitions. In fact, given the present
(1)
state, the next state will be determined by am only, independently of the
(2)
value of am . Rule (b) states that parallel transitions are associated with
points of the same subset of second level, and thus, at the end, the binary
encoder selects the subset of second level.
0 0 0
4 4 4
2
6 2
Figure 4.19: An error event for the designed TCM code.
Rule (c) is important to maximize the Euclidean distance for error events
different from those related to parallel transitions. Let us consider Fig. 4.18(a).
In this figure, it is shown a possible association of subsets of second level to
the trellis branches that satisfies the three Ungerboeck’s rules. The corre-
sponding association of the constellation points to the trellis branches of the
overall encoder is shown in Fig. 4.18(b). Let us now find the minimum dis-
tance between pairs of code sequences that we can obtain with this coded
scheme. We will compare it with that related to the uncoded QPSK. The
comparison is fair because both schemes carry 2 bits per signaling interval
and thus have, for a same signaling frequency, the same bandwidth.
The uncoded QPSK scheme √ employs points of subsets B0 or B1 , equiva-
lently, whose distance is d1 = 2ES . Let us now consider the coded scheme
with the 8-PSK constellation. The distance between symbols both belong-
ing to subsets of second √level, thus between symbols associated with parallel
transitions, is d2 = 2 ES . However, we must also consider the distance
between code sequences related to longer error events. Let us consider, for
example, the pair of paths in the trellis related to symbols (D0 , D0 , D0 ) and
(D2 , D1 , D2 ), that correspond, in terms of subsets, to sequences (C0 , C0 , C0 )
and (C2 , C1 , C2 ), as reported in Fig. 4.19. The square distance between the
signals corresponding to those paths is given by (2.39), reported here for
convenience: X
d2 (e, a) = |cm − ĉm |2 .
m
We thus have
d2 (D0 , D2 ) + d2 (D0 , D1 ) + d2 (D0 , D2 ) = 2d21 + d20

= [2 · 2 + 0.7652 ]ES ≃ 4.585ES
and results to be larger than that related to parallel transitions which is 4ES .
It is easy to understand that this result is due to the fulfillment of rule (c).
It is also easy to verify that no other error event can √ have lower distance.
We can thus conclude that the minimum distance is 2 ES . By √ comparing
this distance with that related
√ to an uncoded QPSK, which is 2ES , we can
notice that their ratio is 2. Hence, the coded modulation allows a gain of
3 dB with respect to an uncoded QPSK and this is obtained with a simple
4-state code.
An intuitive interpretation of this result is the following. The code op-
erates at the subset level by introducing a correlation in the code sequence
in such a way points of different subsets cannot be confused unless further
error in adjacent instants occur. The uncoded bits are instead used to select
points within a subset that are intrinsically at a maximum distance and thus
more protected. In this case, we can say that the encoder does a very good
job since the minimum distance is related to parallel transitions and thus
those are the most frequent errors.
The minimum distance is able to predict the asymptotic coding gain of
these schemes. Fig. 4.20 reports the bit error ratio (BER) of the designed 4-
state code obtained through a computer simulation. The performance of the
uncoded system is also shown for comparison along with the performance of
a more complex 8-state TCM still employing the 8-PSK constellation. It can
be noticed that this latter coding scheme provides a further gain, although
very limited (3.6 dB of asymptotic gain with respect to the uncoded QPSK
system). The described 4-state TCM scheme is optimal in the sense that
there not exist other 4-state codes having a larger minimum distance. The 8-
state code has no parallel transitions since, otherwise, the minimum distance
would remain that associated with them and thus no gain would be obtained
with respect to the 4-state code.
Detection and decoding of these TCM schemes must be performed jointly
through a search on a trellis diagram by using the Viterbi algorithm. By
denoting with µ(i) a generic encoder state, the branch metrics to be used to
decode the designed 4-state TCM code using the 8-PSK constellation are
λℓ (a(i) , µ(i) ) = min | xℓ − c |2 (4.11)

c∈C(a(i) ,µ(i) )
where C(a(i) , µ(i) ) denotes the second level subset associated with the tran-
sition corresponding to the pair (a(i) , µ(i) ). The minimum operation imme-
diately identifies the symbol in C(a(i) , µ(i) ) and thus the most significant bit
(MSB). This symbol will be then selected in case the winner survivor will
contain that branch. Hence, the decision on this bit is related to an uncoded
system. ♦
0
10
uncoded
-1 4-state TCM-8PSK
10 8-state TCM-8PSK
-2
10
-3
10
BER
-4
10
10-5
10-6
10-7
0 2 4 6 8 10 12
Eb/N0 [dB]
Figure 4.20: BER performance for an uncoded QPSK and two TCM schemes.
An alternative method for the design of TCMs was developed by Calder-

bank and Sloane [39] and Forney [40]. An extension of TCMs, which we will
use in Chapter 9, is represented by multidimensional TCMs and multiple
TCMs where each trellis branch is associated with more than one symbol
that are then transmitted sequentially on the channel. For a comprehensive
treatment of TCMs, the reader can refer to [41].
4.3 Exercises
Exercise 4.1 In the interval [kT, (k + 1)T ] the complex envelope of an
M-ary CPM is
r
2Es [θk (t)+ϕk ]
s̃(t) = e
T
where
k
X
θk (t) = 2πh ai q(t − iT )
i=k−L+1
is called phase branch and depends on the correlative state
ωk = (ak−1 , ak−2, . . . , ak−L+1 )
whereas
k−L
X
ϕk = πh ai mod 2π
i=0
is the phase state, h = n/p is the modulation index, and L is the correlation
length. The number of correlative and phase states is M L and p, respectively.
Let us assume that the information symbols are equally likely.
• Prove that the detection strategy can be expressed as
K−1
X
â = argmax zk (ak , σk )
a
k=0
(i.e., that terms E(ak , σk ) are irrelevant).
• Denoting by θ(m,l) (t) and ϕ(l) the phase branches and phase states as-
sociated with the trellis branch (a(m) , σ (l) ), m = 1, . . . , M and l =
1, 2, . . . , pM L−1 , show that
ˆ (k+1)T
(m) (l)
zk (a ,σ )= cos ϕ(l) rc (t) cos θ(m,l) (t) + rs (t) sin θ(m,l) (t)
kT

+ sin ϕ(l) rs (t) cos θ(m,l) (t) − rc (t) sin θ(m,l) (t) dt .
• Plot the receiver block diagram.

(c(3) , c(2) , c(1) )

(2) (3)
ak ck 011
100 010
(1)
ak T T 101 001
(2)
ck 110
(1)
000
ck 111
k1 = k2 = 1 n=2
Figure 4.21: Implementation of the designed TCM encoder.
Exercise 4.2 We would like to design a 4-state TCM encoder with rate
(1) (2)
2/3 to be employed with the 8-PSK constellation. Denoting by ak and ak
the bits at the encoder input,
(1) (1)
A. design a code based on a trellis whose state is defined as µk = (ak−1 , ak−2 );
B. show that the trellis has parallel transitions and compute the minimum
Euclidean distance of this code;
C. verify that the encoder can be implemented as shown in Fig. 4.21, i.e.,
by employing a convolutional encoder and a mapper that associates the
(3) (2) (1)
triplets of coded bits (ck , ck , ck ) with points of the 8-PSK constel-
lation, as shown in the figure (mapping by set partitioning);
(1) (2)
D. design a code based on a trellis whose state is defined as µk = (ak−1 , ak−1 );
E. show that the encoder trellis has no parallel transitions and compute
the minimum Euclidean distance of this code.
Exercise 4.3 We would like to design a 4-state TCM encoder with rate 3/4
(1) (2)
to be employed with the 16-QAM constellation. Denoting by ak , ak , and
(3)
ak the bits at the encoder input, design the code and compute its minimum
Euclidean distance.
Chapter 5
MAP symbol detection strategy
5.1 Minimization of the symbol error probabil-

ity
In Chapter 2, we described the MAP sequence detection strategy and we said
that it is commonly used since its complexity is lower than that of the MAP
symbol detection strategy. A possible implementation of this latter strategy,
which minimizes the symbol error probability, is known since the paper by
Chang and Hancock [42], although more efficient implementations have been
proposed more recently [43, 44]. However, their use has become popular
only after the invention of turbo codes and iterative decoding. This is not
due to the need, in the decoding of turbo codes, of minimizing the symbol
error probability, but to the fact that MAP symbol detection algorithms also
provide, as a byproduct, an estimate of the reliability of the decisions which
is fundamental in iterative decoding.
5.2 BCJR algorithm

We will now describe the most famous implementation of the MAP symbol
detection strategy known as BCJR algorithm from the initials of the authors
who proposed it (Bahl, Cocke, Jelinek, and Raviv [43]). We remember that
MAP symbol detection strategy can be expressed as
âk = argmax P (ak |r)

ak
where r is the vector representation of the received signal. We will describe

it with reference to the case of coded linear modulations in the presence of
ISI and we will consider the discrete-time equivalent model with white noise
141
142 MAP symbol detection strategy
ak ck
ENC fk yk
wk
Figure 5.1: Discrete-time equivalent model with white noise of the channel.
of the channel, reported in Fig. 5.1.1 In fact, since samples yk , that can be
expressed as (see Chapter 2)
L
X
yk = fl ck−l + wk , k = 0, 1, . . . , K − 1 (5.1)
l=0
represent a sufficient statistic to compute the a-posteriori probability P (a|r)

of a sequence a, they are also a sufficient statistic for the computation of
the a-posteriori probabilities (APPs) {P (ak |r)}, for any k, since they can be
obtained from P (a|r) through a marginalization.
We will thus describe how to compute the APPs P (ak |y) = P (ak |r),
having defined y = (y0 , y2 , . . . , yK−1). To simplify the notation, we will also
define ykk12 = (yk1 , yk1+1 , . . . , yk2 ). The MAP symbol detection algorithm will
operate according to the following strategy:
f (y|ak )P (ak )
âk = argmax P (ak |y) = argmax
ak ak f (y)
= argmax f (y|ak )P (ak ) . (5.2)
ak
Thus, we have to compute the probabilities P (ak |y) or, equivalently, the
probability density functions (pdfs) f (y|ak ).
The brute-force computation of P (ak |y) through the marginalization
X X X X
P (ak |y) = ··· ··· P (a|y)
a0 ∈A ak−1 ∈A ak+1 ∈A aK−1 ∈A
has clearly an exponential complexity in the number of transmitted sym-

bols K. Proceeding in a smarter way, we can significantly reduce the com-
plexity. Before going into the details, let us introduce some definitions. As
usual, we will denote by
σk = (ak−1 , . . . , ak−L , µk−L )

1
The extension to the Ungerboeck model is reported in [45].
5.2 – BCJR algorithm 143
the system state. The pair (ak , σk ) will allow us to univocally identify symbols
(ck , ck−1, . . . , ck−L ). We can thus compute the probability density function
 2 
1  1 XL 

γk (ak , σk ) = f (yk |ak , σk ) = exp − yk − fl ck−l
2πN0  2N0 
l=0
Code symbols (ck , ck−1, . . . , ck−L ) can be also univocally identified by the
pair represented by the state σk+1 and the information symbol ak−J at some
previous instant k − J, where J depends on the employed encoder, if any. As
an example, in the absence of coding we have ak−J = ak−L , being
(ak , σk ) = (ak , ak−1 , . . . , ak−L+1 , ak−L )

| {z } |{z}
σk+1 ak−J
Hence, the pair (ak , σk ) is in a one-to-one correspondence with the pair rep-
resented by σk+1 and a past symbol. We will denote this pair by (a− +
k−J , σk+1 ).
Similarly, given the pair (ak−J , σk+1 ), the corresponding pair will be denoted
as (a+ −
k , σk ).
We are now able to compute the probability density function f (y0K−1|ak )
as
f (y|ak ) = f (y0k−1, yk , yk+1

K−1
|ak )
X
= f (y0k−1, yk , yk+1
K−1
|ak , σk )P (σk |ak )
σk
X
K−1 k−1
= f (yk+1 |y0 , yk , ak , σk )f (y0k−1|yk , ak , σk )f (yk |ak , σk )P (σk )
σk
X
K−1 k−1
= f (yk+1 |y0 , yk , a− + k−1
k−J , σk+1 )f (y0 |yk , ak , σk )f (yk |ak , σk )P (σk )
σk
X
K−1 +
= f (yk+1 |σk+1 )f (y0k−1 |σk )f (yk |ak , σk )P (σk )
σk
having exploited the fact that σk and ak are independent. By defining
αk (σk ) = f (y0k−1 |σk )P (σk ) (5.3)

K−1
βk+1 (σk+1 ) = f (yk+1 |σk+1 ) (5.4)
we can write
X
+
f (y|ak ) = αk (σk )γk (ak , σk )βk+1 (σk+1 ). (5.5)
σk
Remark 5.1. Quantities αk (σk ), βk+1 (σk+1 ), and γk (ak , σk ) can be arbi-
trarily normalized. In fact, the multiplication of them by some arbitrary
constants, independent of the information symbols, does not modify the fi-
nal decisions. ♦
Quantities αk (σk ) and βk+1 (σk+1 ) can be computed, for each system state,
through a forward and a backward recursion, respectively. As far as the
computation of αk (σk ) is concerned, we have
αk+1 (σk+1 ) = f (y0k−1, yk |σk+1 )P (σk+1 )

X
= f (y0k−1, yk |ak−J , σk+1 )P (ak−J , σk+1 )
ak−J
X
= f (y0k−1, yk |a+ − + −
k , σk )P (ak , σk )
ak−J
X
= f (y0k−1, yk |a+ − − + +
k , σk )P (σk |ak )P (ak )
ak−J
X
= f (y0k−1|yk , a+ − − + − +
k , σk )P (σk )f (yk |ak , σk )P (ak )
ak−J
X
= αk (σk− )γk (a+ − +
k , σk )P (ak ) (5.6)
ak−J
having exploited the identities
P {σk− |a+ −
k } = P {σk }
f (y0k−1|yk , a+ − k−1 −
k , σk ) = f (y0 |σk ) .
Similarly, βk (σk ) can be computed as

K−1
βk (σk ) = f (yk , yk+1 |σk )
X
K−1
= f (yk , yk+1 |ak , σk )P (ak )
ak
X
K−1
= f (yk+1 |yk , ak , σk )f (yk |ak , σk )P (ak )
ak
X
K−1
= f (yk+1 |yk , a− +
k−J , σk+1 )γk (ak , σk )P (ak )
ak
X
+
= βk+1 (σk+1 )γk (ak , σk )P (ak ) (5.7)
ak
The initial state for each recursion can be computed starting from the initial
and final states of the encoder, if known at the receiver. As an example, let
5.2 – BCJR algorithm 145
us consider a coded transmission on a channel without ISI for which σk = µk .

If the initial encoder state µ̄0 is known, it will be
(
1 if µ0 = µ̄0
α0 (µ0 ) =
0 otherwise .
The final decoder state is typically unknown to the receiver. So, the backward
recursion is typically initialized as
1
βK (µK ) = for any µK .
S
The algorithm thus proceeds as follows:
1. Through the forward recursion (5.6) and the backward recursion
(5.7), metrics αk (σk ) and βk (σk ) are computed for every value of
k and every state σk .
2. These quantities, together with the pdfs γk (αk , σk ) are then em-
ployed to compute the pdfs f (y|ak ) through (5.5)
3. The APPs P (ak |y) that are needed for the final decisions (5.2)
are finally computed.
It is clear that the algorithm does not work in real time (unless proper ap-
proximations are introduced [46]). However, it can be employed in burst
transmissions. The algorithm provides, as a byproduct, the APPs of sym-
bols ak that can be considered as estimates of the reliability of the possible
decisions. For this reason, the BCJR belongs to the family of soft-input
soft-output (SISO) algorithms. Typically, in the case of binary information
symbols, the log-likelihood ratio (LLR), defined as
P (ak = 0|y)
ℓk = ln . (5.8)
P (ak = 1|y)
is computed. Its sign provides the decision on the bit ak , whereas its am-
plitude is an estimate of the reliability of the decision —the larger its value,
the higher the reliability. The LLR is usually called soft decision.
Remark 5.2. In order to avoid numerical instability problems and to re-

duce the complexity, it is more convenient to implement the BCJR algorithm
in the logarithmic domain [47]. We will illustrate the main ideas with refer-
ence to the forward recursion (5.6). The key point is the effective computation
of ln (ex1 + ex2 ). Let us assume that x1 > x2 . It is thus

ln (ex1 + ex2 ) = ln ex1 1 + ex2 −x1 = x1 + ln 1 + ex2 −x1
where ex2 −x1 is certainly lower than 1. When x2 > x1 , we can reverse the
role of x1 and x2 . Thus, we can write

ln (ex1 + ex2 ) = max(x1 , x2 ) + ln 1 + e−|x2 −x1 | .
In the following, we will define

x1 ⊎ x2 = max(x1 , x2 ) + ln 1 + e−|x2 −x1 | . (5.9)
On the other hand, when ln (ex1 + ex2 + ex3 ) has to be computed, we can
operate recursively by first computing
x = ln (ex1 + ex2 ) = x1 ⊎ x2 .
Since
ex = ex1 + ex2
it is
3
]
ln (ex1 + ex2 + ex3 ) = ln (ex + ex3 ) = x ⊎ x3 = x1 ⊎ x2 ⊎ x3 = xi .
i=1
Let us come back to the forward recursion, reported here for convenience
X
αk+1 (σk+1 ) = αk (σk− )γk (a+ − +
k , σk )P (ak ) .
ak−J
By defining
α̊k (σk ) = ln αk (σk )
γ̊k (ak , σk ) = ln γk (ak , σk )
we can write
 
X
α̊k+1 (σk+1 ) = ln  αk (σk− )γk (a+ − + 
k , σk )P (ak )
ak−J
 
X − + − +
= ln  eα̊k (σk )+γ̊k (ak ,σk )+ln P (ak ) 
ak−J
]
= α̊k (σk− ) + γ̊k (a+ − +
k , σk ) + ln P (ak ) . (5.10)
ak−J
In the logarithmic domain, γ̊k (a+ − +

k , σk )+ln P (ak ) is exactly the branch metric
(2.28) used by the Viterbi algorithm implementing MAP sequence detection.
♦
5.3 – Soft-output Viterbi algorithm 147
Remark 5.3. Both recursions and the final completion have exactly the
same complexity. In fact, in the two recursions we have to compute S quan-
tities through a sum (or through the new operator ⊎) involving M terms.
In the completion, we instead have to compute M quantities through a sum
involving S terms. If we approximate
x1 ⊎ x2 ≃ max(x1 , x2 ) (5.11)
the forward recursion in the logarithmic domain has exactly the same com-
plexity as the Viterbi algorithm. The BCJR algorithm has a complexity
which is thus roughly 3 times that of the Viterbi algorithm. ♦
Remark 5.4. By introducing the indicator function I(ak , σk , σk+1 ) defined

as (
+
1 if σk+1 = σk+1
I(ak , σk , σk+1 ) =
0 otherwise
both recursions and the final complestion can be expressed in an equivalent
way as
XX
f (y|ak )P (ak ) = P (ak ) αk (σk )γk (ak , σk )βk+1 (σk+1 )I(ak , σk , σk+1 )
σk σk+1
XX
αk+1 (σk+1 ) = αk (σk )γk (ak , σk )P (ak )I(ak , σk , σk+1 )
ak σk
XX
βk (σk ) = βk+1 (σk+1 )γk (ak , σk )P (ak )I(ak , σk , σk+1 ) .
ak σk+1
Example 5.1. We said that MAP sequence and symbol detection algo-
rithms have a quite similar performance. This can be observed in Fig. 5.2
where the BER performance of both Viterbi and BCJR algorithms is shown
with reference to a rate-1/2 convolutional code with 16 states. ♦
The algorithm can be also derived for the Ungerboeck model. However,
the probabilistic derivation cannot be used and it is necessary to resort to the
framework based on factor graphs and the sum-product algorithm described
in Chapter 8 [45].
5.3 Soft-output Viterbi algorithm

It is possible to “enrich” the Viterbi algorithm in such a way it can also provide
an estimate of the reliability of each decision. In this way, we obtain an
0
10
uncoded
uncod. (theory)
−1 Coded (BCJR)
10 Coded (Viterbi)
−2
10
BER
−3
10
−4
10
−5
10
−6
10
0 1 2 3 4 5 6 7 8 9 10
Eb/N0 [dB]
Figure 5.2: Performance of the BCJR algorithm.

5.4 – Computation of the information rate 149
algorithm providing soft decisions and based on the MAP sequence detection
criterion. It is called soft-output Viterbi algorithm (SOVA) [48, 49].
With reference to Fig. 5.3, that represents the trellis diagram of a decoder
for a 4-state binary code, the idea behind SOVA can be explained as follows.
(m)
Let Λk be the partial metric of a generic path m at time k. Since Λk derives
from a logarithmic likelihood function, it will be (assuming that Λ is a metric
that has to be minimized)
(m)
P {path m is correct} ∝ e−Λk ; m = 1, 2 .
(1) (2)
When Λk < Λk , the Viterbi algorithm will select path 1. Thus, the prob-
ability that the Viterbi algorithm chose the wrong path at state σk is
(2)
e−Λk 1
Pσk = (1) (2)
= (2) (1)
.
−Λk −Λk
e +e 1 + e[Λk −Λk ]
With probability Pσk , that depends on the difference between the metrics of
the two paths, the Viterbi algorithm will make an error on those positions in
which the two paths differ. Based on this principle, i.e., on the observation
of the difference between the metrics of different paths ending into a same
state, the probabilities of the single bits for each state and each time instant
are properly updated.
5.4 Computation of the information rate

An interesting problem, related to the analysis and the design of a commu-
nication system, is the evaluation of the ultimate performance limit imposed
−1
m=1
σk
m=2
k−3 k−2 k−1 k
Figure 5.3: Trellis of the Viterbi decoder for a rate-1/2 convolutional code
with 4 states.
by a given channel, to be interpreted as an ideal benchmark for any practical

system over the same channel. In particular, an important performance limit
is given by the information rate. The definition of information rate is pro-
vided in Appendix C. In particular, if we have a channel with finite memory
having input x and output y, the information rate is defined as

1 f (y0N −1|x0N −1 )
i(x; y) = lim E log2 (bits/channel use) . (5.12)
N →∞ N f (y0N −1)
We can express
i(x; y) = h(y) − h(y|x)
where
1
h(y) = − lim E log2 f (y0N −1 )
N →∞ N
1
h(y|x) = − lim E log2 f (y0N −1 |x0N −1 )
N →∞ N
are the entropy rate and the conditional entropy rate of the channel output.
The information rate will clearly change when changing the distribution of
the input symbols. However, we are not looking here for the input distribu-
tion providing the maximum of the information rate since we are interested in
the case we are constrained to use a specific input distribution (in particular,
that corresponding to independent and uniformly distributed input symbols
belonging to a given constellation). One of the key results of the information
theory is that an error-free communication is, in principle, possible when the
rate R of the employed code does not exceed the information rate i(x; y)
[50, 51]. Notice that, when the employed code is based on the use of a binary
code with rate Rc whose coded bits are mapped onto an M-ary constellation,
the rate R of the overall code (in bits/channel use) is given by the product of
the rate of the binary code and the number of bits per modulation symbol,
i.e., R = Rc log2 M.
In most cases of interest, it is unfortunately not possible to analytically
compute the information rate i(x; y). On the other hand, the complexity of
the direct numerical computation of

1 f (y0N −1|x0N −1 )
iN (x; y) = E log2
N f (y0N −1 )
is exponential in N, and the sequence i1 , i2 , i3 , . . . converges rather slowly
even for very simple cases. However, there exists a simulation-based recursive
algorithm, described in [52, 53, 54, 55], that can provide an accurate numeri-
cal estimate of the information rate and that only requires the availability of
the optimal MAP symbol detection algorithm for that channel. In [55], the
sequence x0N −1 is allowed to be Markovian and the general case of a channel
with finite memory is considered. Without loss of generality, we will consider
here the case of a channel with ISI described by the Forney model (5.1), i.e,
L
X
yk = fℓ xk−ℓ + wk (5.13)
ℓ=0
and independent and uniformly distributed input symbols belonging to a

given constellation, although the generalization to a Markovian input se-
quence and any channel with finite memory is rather straightforward.
The channel output y0N −1 is a stationary ergodic finite-state Markov pro-
cess.2 Hence, the strong law of large numbers assures that
1 1
− lim log2 f (y0N −1) → − lim E log2 f (y0N −1 ) = h(y) (5.14)
N →∞ N N →∞ N
with probability one. An estimate of h(y) can be thus obtained by a single

long simulation in which, according to the statistics of the source and the
channel, a sequence x0N −1 of modulation symbols and the corresponding se-
quence y0N −1 of received samples are generated—the choice of the value of N
will be clarified later. The pdf f (y0N −1) can be then computed by using the
forward recursion of the BCJR algorithm with input y0N −1 . In fact, from the
definition (5.3), it is X
f (y0N −1 ) = αN (σN )
σN
where in this case

σk = (xk−1 , xk−2 , . . . , xk−L )
and an estimate of h(y) can be then computed according to (5.14). The
conditional entropy can be often computed in closed form. As an example,
in the case of the channel model (5.13), it is
N
Y −1
f (y0N −1 |x0N −1 ) = f (yk |x0N −1 )
k=0
where  2 
1 1 XL 

f (yk |x0N −1 ) = exp − yk − fℓ xk−ℓ
2πN0  2N0 
ℓ=0
2
These are the same assumptions we used for the derivations of the BJCR algorithm.
and thus

h(y|x) = − E log2 f (yk |x0N −1 )
ˆ
= − f (yk |x0N −1 ) log2 f (yk |x0N −1 ) dyk
ˆ
= − f (z) log2 f (z) dz
where
1 1 2
f (z) = exp − |z| .
2πN0 2N0
Hence
h(y|x) = 1 + log2 (2πN0 ) = log2 (4πN0 ) .
However, when a closed-form expression is not available, we can again
resort to a numerical simulation. In fact, if we define
µk (σk ) = f (y0k−1 |σk , x0k−1 )P (σk |x0k−1 )
we can obtain f (y0N −1 |x0N −1 ) as

X
f (y0N −1 |x0N −1 ) = µN (σN ) .
σN
Again, µN (σN ) can be obtained through the following iterative procedure

similar to the forward recursion of the BCJR algorithm:
µk+1 (σk+1 ) =f (y0k |σk+1 , xk0 )P (σk+1|xk0 )

X
= f (y0k |σk+1 , σk , xk0 )P (σk+1, σk |xk0 )
σk
X
= f (y0k−1 , yk |σk+1, σk , x0k−1 , xk )
σk
· P (σk+1 |σk , xk0 )P (σk |x0k−1, xk )

X
= f (y0k−1 |yk , σk+1 , σk , x0k−1 , xk )f (yk |σk+1 , σk , xk0 )
σk
· P (σk+1 |σk , xk0 )P (σk |x0k−1)

X
= f (y0k−1 |σk , x0k−1 )f (yk |σk , xk0 )
σk
· P (σk+1 |σk , xk0 )P (σk |x0k−1)

X
= µk (σk )f (yk |σk , xk0 )P (σk+1|σk , xk0 )
σk
Hence, the simulation-based estimation of the ultimate achievable informa-

tion rate yields ( P )
1 σN µN (σN )
i(x; y) ≃ E log2 P (5.15)
N σN αN (σN )
As discussed at the end of Section 5.2, a simple way for avoiding problems
of numerical stability consists of properly scaling the metrics after each step
of the recursions, and further improvements are obtained by implementing
the algorithm in the logarithmic domain [47]. In this case, the additional
constraint of preserving, at each time epoch k, the ratio between the terms
µk (σk ) and αk (σk ) must be accounted for. An alternative way is the modifi-
cation of the two recursions as follows:
X
αk+1 (σk+1 ) =λk+1 αk (σk− )γk (x+ − +
k , σk )P (xk )
ak−L
X
µk+1(σk+1 ) =δk+1 µk (σk )f (yk |σk , xk0 )P (σk+1|σk , xk0 )
σk
where {λk } and {δk } are positive scale factors. If these scale factors are
chosen such that
X
αk+1 (σk+1 ) = 1
σk+1
X
µk+1(σk+1 ) = 1
σk+1
then
N
1 X
log2 λk = h(y)
N k=0
N
1 X
log2 δk = h(y|x) .
N
k=0
The choice of the value of N is critical for the estimation procedure.

The value of N should be large enough that the estimation based on (5.15)
converges to the true value given by the limit in (5.12), according to some
suitable confidence criterion. A pragmatic approach for the choice of the
value of N, explained in [55], is recalled in the following. Let us assume that
NG is a guess on a suitable value of N, and let us run some tens of simulations,
each with a different seed for the random generator and with N = NG . Then,
if all simulation-based estimations match the desired accuracy (for instance,
if the maximum and the minimum outcomes differ for less than 0.05 bits per
channel use) output their average as a final estimation, otherwise, increase
the value of NG and repeat the procedure. Although the minimum value of N
providing a given target accuracy strongly depends on the system parameters,
it is unusual that the required value of N is larger than 107 symbols.
5.5 Mismatched detection

The computed information rate is achievable by using the BCJR algorithm in
the receiver, clearly provided that a proper powerful channel code is employed
and that detection and decoding are performed jointly. For this reason, it is
often called achievable information rate. A practical alternative to the use
of joint detection and decoding can be the adoption of iterative detection
and decoding that will be described in Chapters 7 and 8. As we will see
later, in this case, the BCJR has to perform detection only, but it has also
to exchange soft-outputs with the decoder in an iterative fashion. Thus,
the complexity will be certainly lower with respect to the case when joint
detection and decoding has to be performed. Nevertheless, it can be still
significant. In addition, the channel could be of infinite memory or the
optimal MAP symbol detector algorithm, required for the computation of
the information rate, could be not available. In this case, we can resort to
mismatched detection [56].
Mismatched detection is based on the following simple concept, high-
lighted in Fig. 5.4. As we said, the technique described in the previous sec-
tion allows to compute the achievable information rate for any channel with
memory, provided that the optimal MAP symbol detector for this channel
is available. This detector, has to process a sequence y0N −1 of received sam-
ples generated according to the statistics of the actual channel. However, if
in place of the optimal detector, the same sequence y0N −1 generated by the
optimal channel is processed by a detector which is optimal not for the opti-
mal channel but for any other channel (the so-called auxiliary channel ), we
obtain a lower bound i′ (x; y) of the information rate i(x; y). In other words,
the sequences x0N −1 and y0N −1 should be generated according to the statis-
tics of the actual source and the actual channel, while the metrics µk (σk )
and αk (σk ) should be computed according to the statistics of the auxiliary
channel.3 Clearly, the more similar the auxiliary channel to the actual chan-
nel, the tighter the lower bound. An important property of this lower bound
3
Formally, the claims above are valid under the assumptions that the actual system
and the assumed one share the same channel-input domain and the same channel-output
domain [56].
5.5 – Mismatched detection 155
x Channel y Optimal i(x; y)

with memory detector
Optimal i′ (x; y) ≤ i(x; y)

detector for the
auxiliary chann.
Figure 5.4: Mismatched detection.
is that it is achievable by the considered optimal receiver for the auxiliary

channel [56].
The proof that i′ (x; y) ≤ i(x; y) is quite simple. Let us denote by
q(y0N −1 |x0N −1 ) the channel law of the auxiliary channel in order to distinguish
it from f (y0N −1 |x0N −1 ) which describes the actual channel. In the following,
we will omit subscripts and superscripts to simplify the notation. So we will
denote by q(y|x) the channel law of the auxiliary channel and by f (y|x) that
of the actual channel. Let us also define
X
qP (y) = P (x)q(y|x)
x
where the subscript P means that the average of x is with respect to the
actual statistics P (x), and
P (x)q(y|x)
QP (x|y) =
qP (y)
is the stochastic inverse of channel q(y|x). For what we said about the way
i′ (x; y) is computed, it is
Xˆ q(y|x)
′
i (x; y) = P (x)f (y|x) log2 dy
x
qP (y)
i.e., the expectation of log2 q(y|x)

qP (y)
which is provided by the optimal receiver
for the auxiliary channel is performed based on the actual joint distribution
P (x)f (y|x), since x and y are generated according to the statistics of the
{bk } {xk } {yk } {b̂k }

Channel SISO
Mapper
with memory detector {ℓk }
Figure 5.5: Scheme considered for the pragmatic capacity computation.
actual source and the actual channel. With these definitions, it is

Xˆ
′ f (y|x) q(y|x)
i(x; y) − i (x; y) = P (x)f (y|x) log2 − log2 dy
x
f (y) qP (y)
Xˆ
f (y|x)qP (y)
= P (x)f (y|x) log2 dy
x
f (y)q(y|x)
Xˆ
f (y|x)P (x)qP (y)
x
f (y)P (x)q(y|x)
Xˆ
f (y|x)P (x)
x
f (y)QP (x|y)
= D (f (y|x)P (x)||f (y)QP (x|y)) ≥ 0
where D (f (y|x)P (x)||f (y)QP (x|y)) is the Kullback-Leibler distance between

the joint pdfs f (y|x)P (x) and f (y)QP (x|y) and is therefore positive (see Ap-
pendix C).
Unfortunately, we cannot measure the information rate of suboptimal
detectors that are not optimal for a mismatched (auxiliary) channel law.
5.6 Pragmatic capacity

The performance limits described in the previous section are related to the
case of adoption of optimal joint detection and decoding (or, as surrogate, of
iterative detection and decoding). However, in some applications the com-
plexity of joint or iterative detection and decoding can be prohibitive due to
the high complexity of the detector. Thus, we could be interested in the rate
achievable by a detector without any information from the decoder. This
is commonly referred to as pragmatic (constrained ) capacity or BCJR once
rate, since the BCJR (or in general the algorithm used for detection) is run
only once [57, 58].
Let us consider the scheme shown in Fig. 5.5. A sequence of N log2 M
bits {bk } is mapped onto a sequence of N M-ary symbols {xk }. These latter
symbols are then transmitted over a channel with memory and a sequence
{yk } of received samples is obtained. These sequences will be also denoted as
5.6 – Pragmatic capacity 157
b, x, and y, respectively. At the receiver, a SISO detector provides, for each

input bit bk , a decision b̂k and the corresponding reliability, expressed for
example through the LLR ℓk . If the SISO detector implements the optimal
MAP symbol detector, it will provide the exact LLR given by
P (bk = 0|y)
ℓk = ln . (5.16)
P (bk = 1|y)
Otherwise, an approximation of it will be obtained. The pragmatic capacity
is defined as I(bk ; ℓk ), i.e., as the mutual information of the channel having bk
as input and ℓk as output. The pragmatic capacity can be used to compute
an achievable lower bound on the information rate i(x; y). In fact, it is
1
I(bk ; ℓk ) ≤ i(x; y) .
log2 M
The proof is quite simple. For sure, it is I(bk ; ℓk ) ≤ I(bk ; y) for the data
processing inequality (C.12). The equality holds when the employed SISO
detector is the optimal MAP symbol detector since in this case the ℓk is a
sufficient statistic for the detection of bit bk . Thus, for a sufficiently large
value of N
1 1
i(x; y) = I(x; y) = I(b; y)
N N
N log2 M −1
1 X
= I(bk ; y|bk−1, bk−2 , . . . , b0 )
N k=0
≥ log2 M I(bk ; y)
≥ log2 M I(bk ; ℓk )
having exploited the chain rule for the mutual information and (C.10), since
bits {bk } are independent.
As far as the computation of the pragmatic capacity is concerned, from
(5.16) we have that4
eℓk 1
P (bk = 0|y) = ℓ
=
1+e k 1 + e−ℓk
1
P (bk = 1|y) = .
1 + eℓk
4
We are assuming now that the optimal MAP symbol detector is available and so ℓk is
the true LLR.
By defining
f (x) = log2 (1 + e−x )
it is
f (ℓk ) = − log2 P (bk = 0|y)

f (−ℓk ) = − log2 P (bk = 1|y)
and thus
1
f (ℓk (1 − 2bk )) = − log2 P (bk |y) = log2 .
P (bk |y)
Hence,
I(bk ; ℓk ) = H(bk ) − H(bk |ℓk )

1
= 1 − E log2
P (bk |y)
N log2 M −1
1 X
≃1− f (ℓk (1 − 2bk )) . (5.17)
N log2 M k=0
The procedure is thus the following:
1. generate a “long” sequence {bk };
2. map it into the sequence x and generate the channel output y

according to the channel statistics;
3. compute the output of the SISO detector;
4. use the exact or approximate LLRs into (5.17).

5.7 Exercises
Exercise 5.1 Let us consider the implementation of the BCJR algorithm
to perform detection in the case of differentially encoded PSK signals trans-
mitted over a channel without ISI and with unknown initial code symbol.
Demonstrate that the MAP symbol detection strategy becomes a symbol-
by-symbol detection strategy with decision rule given by
X
1 ∗ ∗
ân = argmax exp 2
ℜ[cn−1 (xn an + xn−1 )]
an
c
σ
n−1
where an are the information symbols, cn the differentially encoded symbols,

and xn the output of the matched filter sampled at time nT .
Hint: considering
1 that, in2 this case, µn = cn−1 and that γn (an , µn ) =
γn (cn ) ∝ exp − 2σ2 |xn − cn | , demonstrate that
X
P (an |x) ∝ γn (cn−1 an )γn−1 (cn−1 )
cn−1
by demonstrating that
αn+1 (cn ) ∝ γn (cn )

1
βn (cn ) ∝ .
M
Exercise 5.2 Consider the implementation of the BCJR algorithm in the

logarithmic domain and the approximation of (5.9) as
x1 ⊎ x2 = max(x1 , x2 ) .
Demonstrate that, in this case, the forward recursion becomes exactly the
Viterbi algorithm.
Chapter 6
Reduced-complexity and adaptive

receivers
6.1 Reduced-state sequence detection

MAP sequence detection algorithms have a complexity that, in the first in-
stance, depends on the number of trellis states at the receiver. As an example,
in the case of a coded linear modulation, the number of trellis states is1
S = Sc M L
where Sc is the number of encoder states, M is the cardinality of the informa-

tion symbols, and L is the channel dispersion length. Thus, S exponentially
depends on L and this makes the receiver complexity unaffordable unless all
parameters take small values. In particular, parameter L must be small and
thus it seems that MAP sequence detection can be adopted only for mildly
dispersive channels. However, in many applications the channel has a long
impulse response, although its significant samples are often a few. In these
cases, we can employ approximate sequence detection algorithms searching
a trellis diagram with a much limited number of states. In other words, the
trellis employed by the approximate receiver is different from that employed
by the optimal one. If the reduced trellis represents a significant portion of
the system memory, we expect that the reduced-complexity algorithm has a
performance close to that of the ideal one.
A first class of reduced-complexity algorithms employ a trellis diagram
whose state is obtained by arbitrarily truncating the channel dispersion
1
We will describe these techniques for complexity reduction with reference to coded
linear modulations transmitted over a channel with ISI, although they can be also applied
to the receivers described in Chapter 3.
161
162 Reduced-complexity and adaptive receivers
length to a value K < L. Remembering the state definition
σk = (ak−1 , . . . , ak−L , µk−L )
we can define the new state as
ωk = (ak−1 , . . . , ak−Lr , µk−Lr ) . (6.1)
In this way, the number of trellis states is reduced to S = Sc M Lr . We will

consider the Forney’s approach to MAP sequence detection, i.e., the receiver
based on the output of the withened matched filter, with branch metrics
given by (2.29), that we report here for the case of equally likely symbols
2
L
X

λk (ak , σk ) = yk − fℓ ck−ℓ . (6.2)

ℓ=0
PL
In these branch metrics, the term ℓ=0 fℓ ck−ℓ appears. It depends on
{ck , ck−1, . . . , ck−L } .
Whereas in the full-complexity trellis these symbols are associated with the
considered branch, in the reduced trellis, a branch is associated with symbols
{ck , ck−1 , . . . , ck−Lr } only. We can express these metrics as
2
Lr L
X X

λk (ak , σk ) = yk − fℓ ck−ℓ − fℓ ck−ℓ . (6.3)
ℓ=0 ℓ=Lr +1
Symbols appearing in the first summation are thus associated with the branch
of the reduced trellis we are considering. We have the problem to find out
the symbols of the second summation.
Before considering the possible solutions, let us assume that the symbols
appearing in the second summation are known. In this case, the second
summation can be perfectly evaluated and this will correspond to the ideal
cancellation of some ISI terms affecting the channel. In fact, by denoting
with {ck } the transmitted sequence, we have
L L Lr
X X X
fℓ ck−ℓ + wk − fℓ ck−ℓ = fℓ ck−ℓ + wk .
ℓ=Lr +1
|ℓ=0 {z } ℓ=0
yk
The algorithm performance will thus correspond to that of the channel with
Lr
truncated pulse {fℓ }ℓ=0 .
6.1 – Reduced-state sequence detection 163
âk−D
yk yk′
r(t) WMF VA
- ĉˆk−Lr −1
PL
ℓ=Lr +1 fℓ ĉˆk−ℓ
ĉˆk−Lr −1
−1 −1 −1
z z z
fL fLr +2 fLr +1
Figure 6.1: Reduced-complexity receiver with preliminary decisions.
k−Lr −1
Clearly, symbols {ci }i=k−L are unknown to the receiver. However, we
can proceed in an approximate way by employing the decision-feedback tech-
nique. It is based on the use, in the second summation, of the decisions
as if they were the true symbols. The resulting scheme is shown in Fig. 6.1,
where we denoted by {ĉˆk } the sequence of detected code symbols used for ISI
cancellation. Notice that these symbols are obtained with a delay lower than
the decision delay D of the VA. For this reason, we called them preliminary
decisions (see Chapter 3).
A problem with this reduced-complexity receiver is that preliminary de-
cisions provided by the VA are less reliable, thus cancellation of ISI symbols
not included in the trellis definition is performed with decisions of poor qual-
ity—the higher the complexity reduction, the lower the delay of preliminary
decisions. A more effective solution consists in evaluating the second sum-
mation in (6.3) by considering the evolution of each survivor, i.e., by using
the following branch metrics
2
Lr L
X X
λk (ak , ωk ) = yk − fℓ ck−ℓ − fℓ c̆k−ℓ (ωk ) . (6.4)
ℓ=0 ℓ=Lr +1
1 1
2
2 1
2
3
3
3
4
4 4
Figure 6.2: In PSP, the branch metric computation depends, as far as the
residual ISI is concerned, from the history of the survivors that those branches
extend.
Here, c̆k−ℓ (ωk ) denotes the code symbol ck−ℓ associated with the survivor of
state ωk . This time, the second summation takes different values on different
trellis branches, depending on the history of the survivor that those branches
extend, i.e., according to the PSP principle, already described in Chapter 3,
and highlighted in Fig. 6.2. It clearly provides a better performance than
the previous technique. Its intuitive explanation is the following: although
the receiver does not know, at time k, which survivor will result to be the
winner, this latter survivor will be extended for sure with decisions having
the best possible quality. The performance of this technique based on the
PSP principle is shown in Fig. 6.3 in terms of bit error ratio (BER) versus
the signal-to-noise ratio, for the case of a BPSK transmission over a channel
with impulse response characterized by f0 = f1 = f2 = f3 = 0.5 (L = 3).
This class of reduced-complexity receivers, whose state is defined by trun-
cation, is a particular case of a more general class. Before describing it, we
remember that in the full-complexity trellis, the state can be equivalently
defined as
σk = (µk , ck−1, . . . , ck−L ) . (6.5)
In fact, the knowledge of ck−1 , . . . , ck−L can be extracted from σk whereas the
pair (ak , µk ) can allow to compute ck . With reference to the state definition
(6.5), a reduced-complexity trellis can be obtained by defining a new state
where code symbols {ck−i } are substituted by subsets of the constellation
they belong to. In other words, the state defines only the subset symbols
{ck−i } belong to.
For a formal definition, let us call M ′ the cardinality of the alphabet
0
10
-1
10
10-2
BER
10-3
10-4
Full complexity
-5 Lr=2
10
Lr=1
-6 Lr=0
10
0 2 4 6 8 10 12 14
Eb/N0 [dB]
Figure 6.3: Performance of reduced-complexity receivers bbased on PSP.
of symbols {ck },2 and let us define L partitions Ω(i), i = 1, . . . , L, of the

alphabet of code symbols. Let Ji = card{Ω(i)}, the number of elements of
partition Ω(i) (clearly 1 ≤ Ji ≤ M ′ ). As an example, if Ji = 1, partition
Ω(i) has a single element (the entire constellation). Instead, when Ji = M ′ ,
then Ω(i) coincide with the alphabet of code symbols. In the intermediate
cases, i.e., 1 < Ji < M ′ , the elements of Ω(i) are subsets of the original
constellation. Let us now define In (i) ∈ Ω(i) the subset symbol cn belongs
to, and define the following reduced state
ωk = (µk , Ik−1(1), Ik−2(2), . . . , Ik−L (L)) .
This state groups together all states σk define by (6.5) having symbol ck−i
belonging to the same Ik−i (i), for i = 1, . . . , L.
In order to correctly define the state, it is required that, given the state ωk
and the subset Ik (1) symbol ck belongs to, the next state ωk+1 is univocally
determined. In fact, since
ωk+1 = (µk+1; Ik (1), Ik−1(2), . . . , Ik−L+1 (L))

2
We called M the cardinality of the alphabet of the information symbols {ak }. In the
case of a TCM code based on a binary encoder having rate k1 /n, it is M ′ /M = 2n−k1 .
B0 B1
C0 C2 C1 C3
Figure 6.4: Original constellation and relevant partitions for the Example
6.1.
partitions must be such that Ω(i) is a further partition of Ω(i + 1). In this
way, Ik−i (i) univocally determines Ik−i (i + 1) of the next state. In other
words the partition depths must satisfy the condition
J1 ≥ J2 ≥ · · · ≥ JL .
Example 6.1 Let us consider an 8-PSK constellation, an uncoded system

with con L = 2, and partitions B and C reported in Fig. 6.4. The full-
complexity state is defined as
σk = (ak−1 , ak−2 ) .
The number of states is thus S = M L = 82 = 64. We can define a “partial”
state by truncating the memory. Considering, as an example, the case of
Lr = 1, we obtain the following state definition
ωk′ = ak−1
and thus the number of states of the reduced trellis is S ′ = M Lr = 81 = 8.
We can define another partial state through partitioning by defining
Ω(1) = {C0 , C1, C2 , C3 } (J1 = 4)
Ω(2) = {B0 , B1 } (J2 = 2)
ωk′′ = (Ik−1 (1), Ik−2(2))
where Ik−1 (1) ∈ Ω(1) is one of subsets C whereas Ik−2 (2) ∈ Ω(2) is one of
subsets B. In this case, the number of states is S ′′ = 4 · 2 = 8. Although
the number of states is the same, the receivers corresponding to the defined
trellis can have a different performance.
Notice that this second approach is more general. In fact, if we choose
(J1 , J2 ) = (8, 1) , we obtain exactly the state ωk′ by noticing that Ik−2 (2)
becomes irrelevant, since Ω(2) is composed of one element only, and we can
equivalently define the state as
ωk′ = Ik−1 (1) = ak−1 .
In addition, this second approach allows to define trellises with a larger va-
riety of states. As an example, we can have a 16-state trellis (by choosing
(J1 , J2 ) = (8, 2) or (J1 , J2 ) = (4, 4)) which cannot be obtained by truncation.
♦
If Lr is the last value of index i such that Ji > 1, we have

ωk = µk ; Ik−1 (1), . . . , Ik−Lr (Lr ) .
By defining (
M ′ for i = 1, . . . , Lr
Ji =
1 for i = K + 1, . . . , L
we obtain exactly the state (6.1) defined by truncating the memory. It is
thus clear that the second class of reduced-complexity algorithms includes
the first one as a special case. The number of states is
( QL
r for an uncoded system
i=1 Ji
S= QLr Ji
Sc i=1 2n−k1for a TCM system.
As far as the branch metrics to be employed on the reduced trellis are

concerned, they can be obtained by extracting from the branch the infor-
mation on the subset and supplementing this information using preliminary
decisions or PSP, i.e.,
2
L
X

λk (ak , ωk ) = yk − f0 c − fℓ c̆k−ℓ (ωk )

ℓ=1
where c̆k−ℓ (ωk ) denotes the code symbol at time k − ℓ associated with the
survivor of state ωk .
This technique has an interesting performance (i.e., a limited performance
loss) when set partitioning follows the Ungerboeck’s rule already discussed
in the previous chapter, i.e., when symbols within each subset have a large
relative distance. In addition, the channel impulse response {fℓ }Lℓ=0 must
have minimum phase such that the energy is concentrated for low values of
ℓ whose corresponding symbols are better represented on the reduced trellis.
The described technique based on set partitioning and PSP is called
reduced-state sequence detection (RSSD) and has been proposed by three
independent groups of researchers [59, 60, 61]. In practice it consists in
building a reduced trellis that is then processed with full complexity. It can
be also extended to the BCJR algorithm described in Chapter 5 [62]. Other
techniques are available to perform detection on the original trellis but ex-
ploring only a fraction of it (see, for example, [63, 64]). At the end of this
chapter, we will describe an alternative technique, which still builds a re-
duced trellis that is then processed with full complexity, and that works on
the Ungerboeck metrics.
6.2 Adaptive equalization

In the previous section we discussed about trellis diagrams describing only a
portion of the channel memory and handling the residual ISI through can-
cellation. This approach can be pushed to the limit by reducing the number
of states to its minimum value, i.e., the number of states of the encoder or,
in the case of an uncoded transmission, to a single state (and thus the trellis
will collapse). In the following, we will refer to this latter case. The branch
metrics (6.3) become, when Lr = 0,
2
L
X

λk (ak ) = yk − f0 ak − fℓ âk−ℓ (6.6)

ℓ=1
where {âk−ℓ } represent the previous decisions. The minimization of (6.6) can
be performed in a symbol-by-symbol fashion as
âk = argmin λk (ak )

ak
since we now have a one-state trellis. The resulting scheme becomes that of
a decision-feedback equalizer (DFE) shown in Fig. 6.5. The name equalizer
means that ISI on the signal is eliminated, or reduced, before performing
symbol-by-symbol detection.
In the case of a coded modulation, the lowest complexity can be achieved
when the receiver state coincides with that of the encoder, i.e., ωk = µk . In
the case, ISI cancellation can be implemented based on preliminary decisions
6.2 – Adaptive equalization 169
yk yk′
r(t) WMF DET âk
−
FEEDBACK
FILTER
z −1 z −1 z −1 âk
fL f1
Feedback filter
Figure 6.5: Decision-feedback equalizer as a special case of the MAP sequence

detection algorithm.
yk âk−D
r(t) WMF
−
c̆k−1 (σk )
VA
FEEDBACK
FILTER
Figure 6.6: DFE for a coded system.

r(t) FRONT END xk xk−1 xk−N +1

z −1 z −1
FILTER t = kT
c0 c1 cN −1
+
yk
zk âk−d
+ DET
pN ′ p1
z −1 z −1
âk−d−N ′ âk−d
Figure 6.7: The structure of a decision-feedback equalizer considered here.
or PSP. With reference to this latter case, the receiver block diagram is shown
in Fig. 6.6 where a bank of feedback filters is present, one per each survivor.
The DFE, that we obtained as a special case of the MAP sequence detec-
tion algorithm with complexity reduction pushed to the limit, has historically
been proposed well before MAP sequence detection. In fact, in the presence
of ISI, the most intuitive, and presently also the most used, detection strategy
is the symbol-by-symbol one with channel equalization used in the attempt
to remove the ISI [65, 66, 67, 68, 69, 70, 71]. Thus, we will now investi-
gate equalization techniques. To this aim, let us consider a receiver model
characterized by special constraints on the input and feedback filters. In
particular, we will suppose that these filters are discrete-time finite impulse
response (FIR) filters. In the literature, they are often called tapped delay-
line (TDL) filters (with a finite number of taps). The receiver structure
is shown in Fig. 6.7. In this DFE, in addition to the feedforward filter, a
feedback filter is also present. The analog front end filter can be a matched
filter or an approximation for it in the case of an unknown channel. The
feedforward filter is characterized by N taps and the relevant coefficients will
be called {ci }, whereas the feedback filter has N ′ taps and coefficients {pi }.3
In a structure like this, we need to optimize coefficients {ci } and {pi }
according to some criterion. Excluding the minimization of the symbol error
3
From now on, we will consider an uncoded transmission. As a consequence, there is
no confusion between the code symbols and the equalizer’s taps.
probability Ps , which is hard to implement since it does not admit a closed-

form solution, a criterion that tries to cope with both ISI and thermal noise
and that can be easily handled from a mathematical point of view, is the
minimization of the mean square error (MSE) between the sample zk at the
input of the symbol-by-symbol detector and the corresponding transmitted
symbol, i.e.,
E |zk − ak−d |2
where d is a proper delay that takes into account the presence of the feedfor-
ward filter and has to be optimized.
We will start investigating the special case when only the feedforward
filter is employed. It corresponds to the case N ′ = 0 in the scheme of Fig. 6.7.
This time, the equalizer is a simple linear filter and for this reason it is called
linear in order to distinguish it from the DFE equalizer where the detector is
included in the feedback loop and, hence, the feedback of detected symbols
makes the structure non linear.
6.2.1 Linear equalization

When N ′ = 0 in the scheme of Fig. 6.7, we have a linear equalizer. We will
adopt a vector notation. We define an input vector as
 
xk
 xk−1 
 
xk =  .. 
 . 
xk−N +1
and a vector collecting the equalizer taps as
 
c0
 c1 
 
c =  ..  .
 . 
cN −1
We can thus express the signal yk at the equalizer output as a scalar product:
N
X −1
yk = ci xk−i = xTk c = cT xk .
i=0
Let us consider the MSE

E(c) = E | yk − ak−d |2

= E | yk |2 + E | ak−d |2 − 2E {ℜ [yk∗ ak−d ]}
The first term can be expressed as

E | yk |2 = E {yk∗ yk } = E cH x∗k xTk c = cH Ac
where (·)H denotes the Hermitian (transpose conjugate) of a matrix. The

N × N matrix A is called channel correlation matrix and is defined as

A = E x∗k xTk .
Notice that A = AH , i.e., it is an Hermitian matrix. Assuming that E {ak } =

0, and defining

E |ak−d |2 = σa2
finally, we have
E {yk∗ ak−d } = cH E {x∗k ak−d } = cH b
having defined the channel vector
b = E {x∗k ak−d }
of dimension N. The MSE can be thus expressed through the quadratic form

E(c) = cH Ac − 2ℜ cH b + σa2 .
6.2.2 Minimum mean square error

The MSE E(c) is a real function (a paraboloid) of a complex vector. Since
this function is quadratic, it has a single minimum provided that A is positive
definite. Indeed, this is the case since

cH Ac = E | yk |2 = E | sk + nk |2 = E |sk |2 + σn2
where we called sk the signal component in yk . Since σn2 > 0, then cH Ac > 0,
∀c 6= 0. In the absence of noise, the matrix is positive definite too. In fact,
we have a positive semi-definite matrix if there exists a vector c 6= 0 such
that

E | sk |2 = E | cT xk |2 = 0 .
This can happen when cT xk is zero with probability one 1, i.e., when xk is
linearly dependent on xk−1 , . . . , xk−N +1 with probability 1 and this is clearly
impossible.
In order to find the minimum of E(c), we have to equate its gradient to

zero. Since E(c) is a real function of complex variables, its gradient is defined
as4
▽c E = ▽cR E +  ▽cI E (6.7)
having denoted by cR and cI the real and imaginary components of vector
c. We can verify that
▽c cH Ac = 2Ac
and
▽c 2ℜ cH b = 2b .
Thus, equating the gradient to zero we obtain5
▽c E = 2 (Ac − b) = 0
where 0 is the zero vector of length N. The optimal vector is thus
c0 = A−1 b . (6.8)
We can evaluate the minimum mean square error as

H
E0 = E(c0) = cH 0 Ac0 − 2ℜ c0 b + σa
2

= bH A−1 H AA−1b − 2ℜ bH A−1 H b + σa2
= bH A−1 b − 2bH A−1 b + σa2
= σa2 − bH |A{z
−1
b}
c0
having exploited the Hermitian symmetry of matrix A, that also implies

that A−1 H = A−1 , and the fact that bH A−1 b is real since it is equal to its
complex conjugate.
4
A real function of a complex variable is indeed a function of two real variables. Its
gradient is thus a vector having as components the partial derivatives with respect to these
two variables. By interpreting this vector as a complex number, we obtain (6.7).
5
The same equation can be obtained by using the orthogonality principle (see Ap-
pendix B). In fact, by interpreting yk as a minimum mean square estimate of ak−d , the
orthogonality principle states that the error must be orthogonal to data. Thus, we obtain
E{x∗k (yk − ak−d )} = 0
and hence, remembering that yk = xTk c and the definitions of A and b, we have
(Ac − b) = 0 .
c2 c(k+1)
c(k)
c0
c1
Figure 6.8: Contour plot of the MSE.
This solution has both a theoretical and an implementation value when

the channel characteristics (i.e., matrix A and vector b) are perfectly known
at the receiver. However, it often happens that the receiver does not know
the channel either because it is described through a stochastic model, or
because it changes with time. Under these conditions, a receiver that would
like to compute c0 through (6.8), should estimate A first and then invert it,
or directly estimate A−1 .
6.2.3 Stochastic gradient algorithm

An alternative solution is represented by the use of the gradient method (also
called steepest descent algorithm) that tries to find the minimum MSE by
iteratively adjusting the vector of taps as
1

c(k+1) = c(k) − α ▽c E (k)
2 c=c
where α is a proper updating constant (the step size). In order to understand

this algorithm, let us consider the simple case of an equalizer with N = 2
real taps. We can thus draw the contour plot of the MSE E(c) on the plane
(c1 , c2 ). It is clearly represented by ellipses since the function has a quadratic
dependence on c1 e c2 and the matrix A is positive definite. Fig. 6.8 shows this
contour plot. Every iteration, the weights c(k) are modified in the opposite
direction with respect to the gradient computed in c(k) and the algorithm
will converge to the optimal point c0 .
In order to implement this algorithm we need
▽c E = 2 (Ac − b)

= 2 E x∗k xTk c − E {x∗k ak−d }
= 2E {x∗k (yk − ak−d )}
= 2E {x∗k ek }
having defined
ek = yk − ak−d
All quantities appearing in this expression are available at the receiver. In
fact:
• vector xk is contained in the TDL;
• yk is the sample at the equalizer output;
• ak−d is known if a training sequence is available (data-aided mode) or

can be approximated by âk−d (decision-directed mode).
We could estimate the gradient, which is an average of random quantities
whose realizations are available at the receiver, through a temporal mean,
under an ergodic assumption. However, a simpler unbiased estimator of the
gradient can be obtained by simply removing the expectation. The resulting
algorithm is thus
c(k+1) = c(k) − αx∗k ek (6.9)
and is called stochastic gradient algorithm. It is also known as least mean
square (LMS) algorithm. As said, sequence {ak−d} can be a proper training
sequence during convergence and then, when the system is in its steady-
state and the decisions are sufficiently reliable, it can be substituted with
the decisions themselves, allowing the tracking of the channel variations,
provided they are not too fast.6 This receiver is thus adaptive since it is able
to follow the channel variations. Its block diagram is shown in Fig. 6.9.
The step size α in (6.9) has a large influence on the performance of the
adaptive equalizer. It has to be optimized taking into account the oppo-
site needs of a stable control (α small) and of a system able to follow the
channel variations (α large). In particular, the stochastic gradient algorithm
requires step-size values much smaller than the non-stochastic algorithm,
since the recursion (6.9) needs to average in time the random oscillations of
the term x∗k ek .
6.2.4 Decision-feedback equalization

Let us come back to the scheme in Fig. 6.7 and assume N ′ ≥ 1, i.e., let us
consider the DFE. This scheme is clearly non-linear. It can be linearized, thus
6
It is not always possible to use a training sequence to drive the equalizer convergence.
For this reason, in the literature many algorithms have been proposed for blind equal-
ization, i.e., to help the equalizer to converge towards a fairly good configuration such
that sufficiently reliable decisions are provided before switching in decision-directed mode
[72, 73, 74, 75].
yk âk−d
EQ DEC
Tracking
-
Taps
adjust. ek Training
ak−d
Figure 6.9: Block diagram of an adaptive equalizer.
simplifying the analysis, by approximating the sequence of decisions {âk−d }

with the transmitted sequence {ak−d }. Under this hypothesis, by defining
 
ak−d−1
 ak−d−2 
 
ak =  .. 
 . 
ak−d−N ′
and
 
p1
 p2 
 
p= ..  .
 . 
pN ′
we have
zk = xTk c + aTk p = uTk v
where we also defined

   
xk c
uk =  · · ·  v =  ···  .
ak p
It is thus possible to analyze the linearized DFE as a linear equalizer. In

particular, the MSE is

E(v) = vH Uv − 2ℜ vH w + σa2
where we defined

U = E u∗k uTk
 ∗  
 xk 
= E  · · ·  xTk ; aTk
 
a∗k
 . 
A .. G
 
=  ··· · ··· 
.
GH .. σa2 I
and

G = E x∗k aTk
 
b
w = E {u∗k ak−d } =  · · ·  .
0
Unlike the previous case, matrix U is not guaranteed to be invertible. If this

happens, we have the solution
v0 = U−1 w .
Since the gradient of the MSE is
▽c E = 2 (Uv − w) = 2E {u∗k (zk − ak−d )}
the stochastic gradient algorithm, in this case, becomes
v(k+1) = v(k) − αek u∗k
with
ek = zk − ak−d .
It can be split in two equations related to the taps of the forward and back-
ward filters, respectively:
(
c(k+1) = c(k) − αek x∗k
p(k+1) = p(k) − αek a∗k
where α is still the step-size.

6.2.5 Notes on the performance
We saw that it is quite easy to compute the minimum MSE corresponding

to the optimal equalizer configuration. This minimum MSE is not directly
related to the symbol error probability but its use is justified by the fact that
the minimization of the MSE is easy to handle from a mathematical point of
view.
For a given channel, it is possible to evaluate in closed-form the MSE
performance of a linear equalizer having an infinite number of taps (N → ∞)
and compare it with that of a DFE equalizer, still with an infinite length
(N → ∞ , N ′ → ∞). This latter equalizer always results to have a minimum
MSE lower than or at most equal to the former one. An intuitive explanation
is the following: a linear equalizer of infinite length, when the noise tends
to vanish will try to implement a transfer function which is the inverse of
that of the discrete-time overall channel up to the sampler. If the frequency
response of the channel has a spectral null at some frequency, the equalizer
will try to synthesize a frequency response with a high gain to that sequence,
thus emphasizing the noise at its output. The DFE of infinite length, instead,
will play a role similar to that of the whitening filter. It will not increase
the noise but will only try to equalize the channel phase making causal the
impulse response at its output, in such a way the feedback filter can cancel
the interferers.
In the case of equalizers of finite length, the performance depends on
parameters N, N ′ , d, and on the channel characteristics. In this case, gen-
eral results are not available. However, many years of computer simulations
demonstrated that a DFE has typically a better performance than a linear
equalizer for a same number of overall taps.
The investigation we made on the DFE was based on its linearization
under the assumption of an ideal feedback. In typical operating conditions,
at a low symbol error rate, the performance of the real system will be close
to that of the linearized system. A decision error could, however, generate
further errors when passing through the feedback filter. This phenomenon is
called error propagation and can degrade the DFE performance. The analysis
of the error propagation is made difficult by the presence of the non-linear
element (the detector) and requires the use of a Markov chain. Typically, the
error propagation has limited effects that can be usually neglected, especially
at high signal-to-noise ratio values. In any case, error propagation is not
catastrophic, in the sense that every error can produce other errors but its
effects tend to disappear coming back to the normal operating conditions.
6.3 – Adaptive channel identification 179
nk
ak xk yk âk−d
hk EQ DET
Tracking
−
Taps
adjust. ek Training
ak−d
d
Figure 6.10: Communication system using a linear equalizer.
nk
ak xk âk
hk DET
Taps
- ek adjust.
Tracking
ck
Training
Figure 6.11: Communication system with channel identification.
6.3 Adaptive channel identification

We now consider the dual problem with respect to adaptive equalization, i.e.,
the channel identification. By observing Fig. 6.10, that represents the scheme
of an adaptive linear equalizer, we can observe that the taps are selected in
such a way
1
C(z) ≃
H(z)
where H(z) is the channel transfer function and C(z) is the transfer function
of the inverse filter that has to compensate for the channel effects. Hence,
the equalizer task is the identification of the inverse filter C(z).
Let us now consider the block diagram in Fig. 6.11. At the receiver,
we now need to estimate the discrete-time channel impulse response {hk }

under the assumption that a proper detector is available, able to provide
reliable decisions. We thus compare the received signal xk with the output
N −1
of a discrete-time identifier filter with impulse response {ck }k=0 and having
information symbols {ak } at its input. We still use the minimization of the
mean square error as optimization criterion such that we obtain
C(z) ≃ H(z)
Defining the error ek = cT ak − xk , its mean square value is

E(c) = E{|ek |2 } = E |cT ak − xk |2

= E{| cT ak |2 } − 2ℜ E cH ak xk + E | xk |2
n o
= cH E{a∗k aTk } c − 2ℜ cH E{a∗k xk } + E{| xk |2 }
| {z } | {z } | {z }
A b σx2
H H
= c Ac − 2ℜ{c b} + σx2
having defined
   
c0 ak
 c1   ak−1 
   
c= ..  ak =  .. .
 .   . 
cN −1 ak−N +1
Assuming that the information symbols have mean zero and are uncorre-
lated, matrix A is diagonal and b contains the samples hk . Hence, we cannot
simply compute the optimal solution as
c0 = A−1 b (6.10)
since in this case, although A is diagonal and perfectly known, b is unknown.

We can thus resort to the gradient algorithm. It is
▽cE = 2(Ac − b)

= 2 E a∗k aTk c − E {a∗k xk }

= 2 E a∗k aTk c − xk
= −2E {a∗k ek } .
An unbiased estimate of the gradient can be obtained by removing the ex-

pectation
cc E = −2a∗ ek
▽ k
6.3 – Adaptive channel identification 181
nk
âk−D
ak yk
{fℓ }L0 VA
ˆk−d
â
d
f̂ k
Taps
ek adjust.
−
f̂k Ins.
{fˆℓ }L0
Appr.
d
Figure 6.12: Adaptive MAP sequence detection based on channel identifica-

tion.
and, thus, the stochastic gradient algorithm will read

1
k+1) (k)
c = c − α ▽c E = c(k) + αa∗k ek .
2 c=c(k)
This result can be employed to perform adaptive MAP sequence detec-

tion. Let us consider the MAP sequence detection strategy according to the
Forney model, thus with branch metrics
2
L
X

λk (ak , σk ) = yk − fℓ ck−ℓ .

ℓ=0
These branch metrics can be evaluated only if the receiver perfectly knows
the channel impulse response {fℓ }Lℓ=0 . In other words, the receiver has to first
identify the discrete-time equivalent model with white noise of the channel.
Let us consider Fig 6.12, where the generic discrete-time impulse response
{hk } is substituted by the discrete-time equivalent model with white noise
{fℓ }Lℓ=0 , and the generic detector is implemented through the Viterbi algo-
rithm. The vector collecting the identified weights will be denoted by f̂ (k)
to emphasize that it represents the estimate of the equivalent channel model
{fk }. In the scheme of Fig 6.12 we used preliminary decisions with delay d
and, as a consequence, samples {yk } are also delayed by the same amount
before the comparison with the output of the identifier filter. In addition, we
will assume that the number of weights of this filter is exactly L + 1 which
is the number of samples of the equivalent channel model.
The estimated channel impulse response can be updated by using the
stochastic gradient algorithm as
f̂ (k+1) = f̂ (k) + αek−d a∗k−d .
The presence of the delay d implies that we are estimating an “old” version
of the channel. By increasing d up to the decision delay D of the VA we will
have a better estimate of the channel provided that it is slowly-varying. For
a fast channel, we can reduce d but, in this way, we will have decisions with
a low reliability. We can thus resort to the following techniques:
A. we can use a large delay and predict f (k) based on previous identifica-
tions;
B. we can employ PSP.

In this latter case, we can define an error sequence for each transition as
ek (ak , σk ) = yk − f̂ (k)T (σk )ak (ak , σk )
where f̂ (k) (σk ) is a per-survivor channel estimate. The branch metrics thus
become
λk (ak , σk ) = | ek (ak , σk ) |2
and, by using the VA, we obtain
Λk+1 (σk+1 ) = min [Λk (σk ) + λk (ak , σk )] (6.11)

σk
The per-survivor channel estimates can be updated according to the following

iterative procedure
f̂ (k+1) (σk+1 ) = f̂ (k) (σk ) + αa∗k (ak , σk )ek (ak , σk )
for the pairs (ak , σk ) satisfying (6.11) (i.e., along the transitions extending
the survivors).
6.4 Channel shortening

Another technique for complexity reduction, which is based on the Unger-
boeck metrics, will be now described. It is known under the name of channel
shortening (CS) technique and can be used to reduce the complexity of both
the Viterbi and the BCJR algorithm.
6.4 – Channel shortening 183
The history of CS receivers starts in the early 1970s with the work of
Falconer and Magee [76], which originated further research on the topic [77,
78, 79, 80, 81, 82]. The work of Rusek and Prlja [83] generalized the previous
works by proposing a technique to design the optimal CS receiver, from an
information theoretic point of view, for a generic linear channel.
In this section, we briefly review the main results of [83] for the special
case of ISI channels. Let us consider the Forney model
L
X
yk = fℓ xk−ℓ + wk , k = 0, 1, . . . , K − 1
ℓ=0
where, this time, we denoted by {xk } the transmitted symbols (and not the
samples at the matched filter output, as in Chapter 2). We can collect all
transmitted symbols and received samples into vectors x = (x0 , x1 , . . . , xK−1 )T
and y = (y0 , y1, . . . , yK−1)T , respectively, and write
y = Fx + w
where F is a K × K Toeplitz matrix whose elements are Fi,j = fi−j , i.e.,

 
f0 f1 . . . fL 0 0
 
 0 f0 f1 . . . . . . 0 
 
 .. .. .. 
 0 0 . . . fL 
F= 
.
 0 .. .. f . . . . ... 
 0 
 
 0 . . . . . . . . . . . . f1 
0 0 0 0 0 f0
and w = (w0 , w1 , . . . , wK−1)T . We will use the notation
F = Toeplitz({fℓ })
to specify that matrix F is a Toeplitz matrix obtained from the elements

of the sequence {fℓ }. With this matrix notation, the channel law can be
expressed as
K ( )
1 ky − Fxk2
f (y|x) = exp −
2πN0 2N0
K ( )
1 yH y − 2ℜ yH Fx + xH Gx
= exp − (6.12)
2πN0 2N0
having defined G = FH F. Vector FH y collects the samples at the MF output

whereas the Toeplitz matrix G has elements Gi,j = gi−j , where {gℓ } is the
discrete-time equivalent model at the MF output, already defined in Chap-
ter 2. To achieve the desired complexity reduction, (6.12) can be replaced
by the mismatched channel law
K !
1 yH y − 2ℜ yH Fr x + xH Gr x
q(y|x) = exp − (6.13)
2πNr 2Nr
where the Toeplitz matrices Fr and Gr , and the mismatched noise density
Nr are subject to optimization.7 By removing the terms irrelevant for the de-
tection process, namely those that do not depend on the transmitted symbols
x, the mismatched (auxiliary) channel law (6.13) can be redefined as

q(y|x) = exp 2ℜ yH Fr x − xH Gr x (6.14)
where, without loss of generality, Nr has been absorbed into the design of Fr
and Gr . We can notice from (6.14) that the need for trellis processing arises
only from the matrix Gr . In other words, when Gr is diagonal, the optimal
receiver for the auxiliary channel becomes a symbol-by-symbol detector. In
order to obtain an optimal receiver with a limited number of states, Gr must
be constrained such that Gr i,j = 0 for |i − j| > Lr , where Lr is the desired
length of the resulting shortened channel response. To achieve an effective
complexity reduction, Lr must be selected to be lower than the actual channel
memory L.
In [83], the matrices Fr , and Gr are designed to maximize the lower
bound on the achievable information rate of the channel based on mismatched
detection assuming Gaussian inputs. Toeplitz matrices Fr and Gr can be
defined from two discrete sequences {fℓr } and {gℓr }, i.e.,8
Fr = Toeplitz({fℓr })
Gr = Toeplitz({gℓr }) .
Sequence {fℓr } is the impulse response of the CS filter whereas {gℓr } is the
equivalent channel response after the CS filter. They can be obtained through
the following steps [83]. Let F (ej2πf T ) and G(ej2πf T ) be the Fourier trans-
forms of {fℓr } and {gℓr }, respectively. It can be demonstrated that, for an
ISI channel with impulse response {fℓ } and a receiver trellis characterized by
7
As demonstrated in [84], (6.13) is a valid channel law although it is not necessarily a
valid pdf.
8
Sequence {gℓr } is such that gℓr = 0 for |ℓ| > Lr .
Gr (ej2πf T ), with minf Gr (ej2πf T ) > −1, the optimal CS filter can be obtained
as [83]
F H (ej2πf T )
F r (ej2πf T ) = (Gr (ej2πf T ) + 1) , (6.15)
|F (ej2πf T )|2 + 2N0
where F (ej2πf T ) is the Fourier transform of the actual channel response {fℓ }.
Notably, the filter (6.15) can be seen as the cascade of an MMSE filter (see
Appendix B), that does not depend on the reduced channel memory Lr ,
followed by a filter with transfer function Gr (ej2πf T ) + 1. When the memory
Lr is equal to zero, (6.15) reduces to a classical MMSE filter.
We now have to compute the optimal response Gr (ej2πf T ). This can be
done through the following steps (refer to [83] for deteils):
A. Compute
2N0
B(ej2πf T ) = (6.16)
|F (ej2πf T )|2 + 2N0
Lr
and its inverse Fourier transform {bℓ }ℓ=−L r.
B. Define the vector b = [b1 , . . . , bLr ] and the matrix B as the Toeplitz
matrix of dimension Lr × Lr formed from the vector [b0 , . . . , bLr −1 ] as
 
b0 b1 . . . bLr −1
 b1 b0 . . . bLr −2 
 
B =  .. .. . . ..  .
 . . . . 
bLr −1 bLr −2 . . . b0
C. Compute the real-valued scalar c = b0 − bB−1 bH .
D. Define the vector u = √1 [1, −bB−1 ].

c
E. Finally, compute the optimal reduced channel response {gℓr } as

min(Lr ,Lr +ℓ)
X
gℓr = ui u∗i−ℓ − δℓ , ℓ = −Lr , . . . , Lr , (6.17)
i=max(0,ℓ)
where δk is the Kronecker delta function, and its Fourier transform

Gr (ej2πf T ).
The authors of [83] also provided a closed-form expression for the lower bound
of the achievable information rate of the channel, achievable with the optimal
detector for the considered reduced trellis, under the assumption of Gaussian
CS
filter
r(t) kT
WMF {fℓr} VA/
BCJR
Figure 6.13: Block diagram of the receiver using the CS technique.
distributed input symbols. This lower bound can be computed as i′ (x; y) =

log2 √1c .
Although the theoretical derivation of the CS filters assumed that the in-
put symbols have Gaussian distribution, the performance of the CS technique
is excellent even with discrete input alphabets. To demonstrate this fact, we
show in Fig. 6.14, the achievable information rate for a BPSK modulation
over the channel with response characterized by f0 = f1 = f2 = f3 = 0.5
(L = 3). The figure compares the information rate of the channel, com-
puted by using the method described in Chapter 5, which employs a full-
complexity BCJR detector, working on a trellis with S = 2L−1 = 8 states,
with the achievable lower bounds obtained by using a CS receiver with re-
duced memory Lr . In other words, the reduced-complexity receiver is made
as in Fig. 6.13 and employs a discrete-time filter with impulse response {fℓr }
(the channel shortener) and a Viterbi or a BCJR algorithm with Ungerboeck
metrics optimal for a channel with impulse response {gℓr }.
The described procedure to compute the optimal CS filters assumes that
the channel is perfectly known at the receiver. In particular, the channel
response {fℓ } and the noise density N0 are required for the computation
of (6.15). If the channel parameters are unknown, the classical approach is to
estimate them by using the technique described in Section 6.3. Alternatively,
a fully-adaptive CS receiver is described in [85]. Finally, we mention that the
interaction between CS complexity reduction and decision feedback has been
recently addressed in [86, 87].
0.9
0.8
i′ (x; y) [bits/ch. use]
0.7
0.6
0.5
0.4
0.3 Full complexity

CS, Lr = 2
0.2 CS, Lr = 1
CS, Lr = 0
0.1
-10 -5 0 5 10 15 20
Es /N0 [dB]
Figure 6.14: Achievable information rate for the CS receiver with increas-
ing complexity on the channel with response [0.5, 0.5, −0.5, −0.5]. A BPSK
modulation has been considered.
6.5 Exercises
Exercise 6.1 Consider the transmission of independent and uniformly dis-
tributed symbols belonging to a 16-QAM constellation on an ISI channel
with dispersion length L = 2.
• Define the trellis for MAP sequence detection.
• When adopting the RSSD technique, define the state of the possible
reduced trellises.
Exercise 6.2 With reference to the Example 6.1, defining
Ω(1) = {C0 , C1, C2 , C3 } (J1 = 4)

Ω(2) = {B0 , B1 } (J2 = 2)
and the reduced state
ωk = (Ik−1 (1), Ik−2(2))
where Ik−1 (1) ∈ Ω(1) and Ik−2 (2) ∈ Ω(2), draw the reduced trellis.
Exercise 6.3 Consider a discrete-time channel with impulse response {hk }.

The received signal is thus
X
xk = an hk−n + nk .
n
Symbols {ak } are uncorrelated, have mean zero, and have mean square value
E{|ak |2 } = σa2 . Noise samples have mean zero and autocorrelation sequence
E{nk+ℓ n∗k } = σn2 ρℓ . The input vector is defined as
△
xk = (xk , xk−1 , . . . , xk−N +1 )T .
• Show that the channel autocorrelation matrix
A = E{x∗k xTk }
has components
X
Aij = σa2 h∗n hn+i−j + σn2 ρi−j , i, j = 0, 1, . . . , N − 1 .
n
• Show that the channel vector

b = E{ak−d x∗k }
has components
bi = σa2 h∗d−i , i = 0, 1, . . . , N − 1 .
Exercise 6.4 Consider a baseband transmission of binary independent and

uniformly distributed symbols an ∈ {±1} over an AWGN channel. Transmit
and receive filters have RRC frequency response. The channel has frequency
response
C(f ) = 1 − be−j2πf T , |b| < 1 (b real)
where T denotes the symbol time, and also introduces AWGN with power
spectral density N0 /2. After the front end filter and the sampler, the receiver
employs a linear equalizer with 3 taps and a symbol-by-symbol detector with
threshold zero.
• Compute the mean square value at the equalizer input.
• Compute the equalizer taps when adopting the minimum MSE criterion
assuming a delay d = 0, and the corresponding minimum MSE.
Exercise 6.5 Consider a discrete-time channel with impulse response {hk }.

The received signal is thus
X
xk = an hk−n + nk .
n
Symbols {ak } are real, uncorrelated, have mean zero and mean square value
E{a2k } = σa2 . Samples {hk } are all zero except h0 = 1 and h1 = b. Thermal
noise can be considered as negligible.
• Compute the taps of a linear equalizer with N = 1 (2-tap equalizer)

that minimizes the MSE E{(yk − ak )2 }, where yk = c0 xk + c1 xk−1 is
the equalizer output.
• Compute the MSE Elin at the equalizer output and compare it with
the MSE at the equalizer input.
• Compute the equivalent impulse response {qk } at the equalizer output.
• Compute the taps of a DFE equalizer with output yk = c0 xk + c1 âk−1

under the assumption of ideal feedback.
• Compute the minimum MSE at its output and compare it with Elin .
Chapter 7
Turbo codes and iterative

decoding
7.1 Turbo codes

After the rising of the information theory, it was immediately clear that
the increase of the codeword length, and thus, in some way, the increase of
the encoder complexity, would have allowed to obtain a better performance.
On the other hand, optimal decoding of more complex codes is more and
more complex. For this reason, scientists tried to find complex codes whose
decoding could have been implemented in a simple, although suboptimal,
way.
In his Ph.D. thesis [88], Forney proposed concatenated codes as a possible
way to obtain complex codes that can be decoded in a simple way. Turbo
codes, proposed in 1993 by Berrou, Glavieux, and Thitmajshima [89, 90]
represent their evolution. With the introduction of turbo codes, it has been
demonstrated that reliable transmissions at a rate close to the Shannon limit
are practically possible. As we will see, the adjective “turbo”, although used
to qualify these codes, is rather related to the corresponding decoding whose
principle is similar to that of a turbo engine.
In order to describe turbo codes, we will make reference to their first
example in the literature. A sequence of information bits is first encoded
through a simple binary recursive systematic convolutional (RSC) encoder
with rate 1/2 to produce a sequence of parity bits (in addition to the input
sequence, the encoder being systematic). The same sequence of information
bits is then permuted through a very long interleaver 1 and then encoded
1
In the figures, the interleaver will be represented by using the symbol Π. The inverse
block, that reports the permuted sequence to its original order (the deinterleaver ), will be
191
192 Turbo codes and iterative decoding
again with a second RSC encoder with rate 1/2 to obtain a second sequence of
parity bits. The original information sequence and the parity-check sequences
are then transmitted, as shown in Fig. 7.1. We can observe that the rate of
the overall encoder is 1/3. A higher rate can be obtained through puncturing,
i.e., by transmitting less parity bits. As an example, in the case of the original
turbo code in [89, 90], a rate 1/2 is obtained by transmitting only odd bits
of the first parity-check sequence and only even bits of the second one.
(1)
ak ck
RSC (2)
ck
encoder
RSC (3)
ck
encoder
Figure 7.1: Encoder for a turbo code with rate 1/3.
The impressive performance of turbo codes is related to their specific

property [91]. Let us consider a turbo code as a block code whose input
sequence has a length given by the length of the interleaver and their con-
volutional encoders are initialized before the arrival of an input sequence by
resetting their memory elements. As for usual block codes, the asymptotic
performance depends on the codewords having minimum Hamming weight
and their multiplicity. However, codewords with larger Hamming weight may
have an influence on the performance for lower values of the signal-to-noise
ratio, especially if they have a very large multiplicity. Before the advent of
turbo codes, the code design has been guided by the attempt to reduce the
minimum Hamming weight in order to improve the asymptotic performance.
As Forney said in one of his lectures, “turbo codes, rather than attacking the
minimum distance, they attack multiplicities”. In fact, by properly shaping
the distance spectrum, it is possible to obtain a bit-error probability that de-
represented by using the symbol Π−1 .
7.1 – Turbo codes 193
creases in a very steep way for low-medium signal-to-noise ratio values (the
so-called waterfall region) and then, for higher values of the signal-to-noise
ratio (and typically for bit-error probability values below 10−5 ), where the
performance is governed by the minimum distance, it starts decreasing in a
very slow way (the so-called error floor, although the term floor is improper
since there is no irreducible floor in the performance).
The two encoders in the scheme of Fig. 7.1 are called component encoders
and are typically identical. As said, in the turbo code first appeared in the
literature, two RSC encoders have been employed as component encoders. It
was understood later that the systematic nature is not necessary, although it
simplifies the decoder implementation [91]. On the contrary, it is fundamental
to adopt recursive component encoders in order to obtain an interleaver gain,
i.e., a turbo code whose performance improves when increasing the interleaver
length.
Recursive codes are such that the code bits at time k not only depend on
the information bits at the same instant and at previous ν instants, where ν
is the code constraint length, but on all previous bits since the encoder has
a structure with feedback connections. Starting from a non-recursive non-
systematic convolutional encoder with rate 1/n, it is possible to obtain in a
very simple way a RSC encoder with the same rate and characterized by the
same codewords, and thus with the same minimum Hamming distance dH,min .
Obviously, for a given input sequence, the corresponding codeword will be
different in the two cases. As an example, let us consider a non-recursive
non-systematic convolutional encoder with rate 1/2. The two coded bits at
time k can be expressed as
" ν #
(1)
X (1)
ck = gi ak−i mod 2 (7.1)
" i=0
ν
#
(2)
X (2)
ck = gi ak−i mod 2 (7.2)
i=0
where the sum is modulo 2. The corresponding RSC encoder can be obtained
by placing, at the input of the encoder shift register, having ν memory ele-
ments, not the information bit ak but an auxiliary bit wk . The coded bits
can be expresses as a function of the information and auxiliary bits as
(1)
ck = ak (7.3)
" ν #
(2)
X (2)
ck = gi wk−i mod 2 (7.4)
i=0
whereas the recursive equation that expresses the sequence of auxiliary bits
a a function of the information bits is
" ν
#
X (1)
wk = ak + gi wk−i mod 2 . (7.5)
i=1
Another RSC encoder with the same minimum Hamming distance can be
(1) (2)
obtained by exchanging the role of coefficients gi and gi , i.e., by employing
(2) (1)
coefficients gi in the update of auxiliary bits and coefficients gi to compute
the sequence of non-systematic coded bits.
(1)
ck
bk
z −1 z −1 z −1 z −1
(2)
ck
Figure 7.2: The 16-state component encoder of the turbo code proposed in
[89, 90].
The 16-state convolutional encoder with rate 1/2, employed in the original
turbo code in [89, 90], is shown in Fig. 7.2, whereas Fig. 7.3 reports the 8-
state encoder with rate 1/2 employed as component encoder of the turbo
code in the UMTS (universal mobile telecommunications service) standard.
The use of turbo codes is also considered in many other standards such as
DVB (digital video broadcasting) and CCSDS (consultative committee for
space data systems) standards. For the choice of good component codes, the
reader can refer to [92].
Another fundamental component in the turbo code structure is the inter-
leaver which has to be non-uniform.2 In practice, a non-uniform interleaver
performs a random permutation. Hence, a pair of adjacent bits in the input
sequence is separated, after the permutation, by a number of bits which is
not always the same but depends on the position of the considered pair. The
minimum distance of the turbo code depends on the interleaver. Thus, the
2
A uniform interleaver operates by writing the input bits in a matrix row by row and
outputs them by reading column by column.
7.1 – Turbo codes 195
(1)
ck
bk
z −1 z −1 z −1
(2)
ck
Figure 7.3: The 8-state component encoder of the turbo code of the UMTS
standard.
interleaver influences the asymptotic performance. However, its choice is not

critical for low-medium values of the signal-to-noise ratio. Starting from the
original non-uniform interleaver proposed in [89, 90], many other interleavers
have been proposed in the literature. One of them assuring an excellent
performance is the spread interleaver. Let us consider a block of M input
bits. The integer numbers, specifying the position of the input bits after the
permutation, are randomly generated with the following constraint. Every
integer is randomly generated and compared with the last S1 previously gen-
erated integers. If the distance from them is lower than a threshold S2 , the
generated integer is discarded and randomly generated again until the con-
dition is satisfied. Parameters S1 and S2 must be larger than the memory of
the component encoders. When they are identical, it is convenient to choose
S1 = S2 . Obviously, the time required to generate the interleaver (which is
generated when designing the code and clearly known by both encoder and
decoder) will increase with S1 and S2 and there is no guarantee that the
generation can be accomplished successfully. It has been verified empirically
that, by choosing p
S1 = S2 ≃ M/2
it is possible to generate the interleaver in a reasonable amount of time.
Many variations of the original idea behind turbo codes have been in-
troduced later. As an example, Benedetto, Divsalar, Montorsi, and Pollara
proposed the serial concatenation of convolutional codes [93], still through
a non-uniform interleaver (see Fig.7.4). Serially concatenated schemes have
some interesting advantages with respect to turbo codes. In fact, for them
the interleaver gain is larger, i.e., the bit-error probability goes as M −3 , where
M is the interleaver length, whereas for turbo codes it goes as M −1 [91]. Se-
rially and parallelly concatenated schemes have been then extended to higher
constellations to obtain coded modulations with high spectral efficiency.
outer P/S Π inner P/S

encoder encoder
Figure 7.4: Serial concatenation.
7.2 Iterative decoding
ℓe,in,1
k
âk
− ℓe,out,1
k Π−1
BCJR
+
(1) Π ℓe,out,2
k
yk
(2) −
yk ℓe,in,2
k +
(3)
yk BCJR Π−1
Figure 7.5: Decoder for a turbo code with rate 1/3.
The presence of the interleaver in the scheme of Fig. 7.1 makes the mem-
ory of the turbo code very large, despite the use of simple component en-
coders. As a consequence, the optimal MAP sequence (or symbol) decoder
would be characterized by a huge number of states and thus it would be
unfeasible. For this reason, it has been proposed to resort to a suboptimal
iterative scheme whose complexity is much lower than that of the optimal
decoder but, as empirically verified, with a performance very close to that of
the optimal one. The decoder for the turbo code with rate 1/3 of Fig. 7.1
is shown in Fig. 7.5. As said, its operation principle is similar to that of
the turbo engine. The received samples corresponding to the information se-
quence and the first parity-check sequence are first decoded through a BCJR
7.2 – Iterative decoding 197
algorithm (or any other SISO decoder) corresponding to the first convolu-
tional encoder. This decoder will thus provide a sequence of soft decisions
for the every bit of the information sequence. These soft decisions are then
interleaved and employed by a second BCJR designed for the second com-
ponent encoder, in addition to the received samples corresponding to the
second parity-check sequence. The soft decisions produced by the second de-
coder are then deinterleaved and fed back to the first component decoder for
the next iteration, where the additional information provided by the second
decoder are employed by the first decoder to produce more reliable soft de-
cisions. This iterative process will proceed for 10÷20 iterations after which
the final decisions will be taken.
Both component decoders in the scheme of Fig. 7.5 are, as said, soft-
output decoders. Hence, they additionally provide information about the
reliability of their decisions. The basic principle of the iterative decoding is,
in fact, the following: each component decoder employs the “suggestions” of
the other decoder to provide more and more reliable decisions. Let us see in
detail how the reliability information is produced and employed.
In order to speed up the convergence of the iterative process, each decoder
has to receive at its input an information that it did not produce. For this
reason, in [89, 90] the concept of estrinsic information has been introduced,
to identify the part of the reliability information produced by a decoder
that does not depend on the information that the decoder itself has received
(see also [94]). Let us consider the generic component decoder, and let us
assume that it operates according to the BCJR algorithm.3 Let us denote
by ℓe,in
k the extrinsic information at its input and by ℓe,out
k that at its output,
with reference to the information bit ak . We used the same symbol already
used in (5.8) to denote the log-likelihood ratio since every decoder indeed
provides the log-likelihood ratio related to the information bit ak , except for
the subtraction needed to compute the extrinsic information. The extrinsic
information at the input of the other decoder is then used as an estimate
of the a-priori probability of the information bits. In other words, the other
decoder assumes that
P (ak = 0)
ℓe,in
k ≃ ln (7.6)
P (ak = 1)
3
Similar considerations hold in case we employ the SOVA.
0
10 uncoded
uncoded (theory)
1 iteration
2 iterations
-1 3 iterations
10 6 iterations
18 iterations
10-2
BER
-3
10
-4
10
-5
10
0 1 2 3 4 5 6 7 8
Eb/N0 [dB]
Figure 7.6: Performance of the turbo code proposed in [89, 90].
and thus
exp{ℓe,in
k }
P (ak = 0) ≃ (7.7)
1 + exp{ℓe,in
k }
1
P (ak = 1) = 1 − P (ak = 0) ≃ . (7.8)
1 + exp{ℓe,in
k }
These estimates have to be considered as a suggestion a decoder receives from

the other one which, having already processed the channel output, is able to
judge which bits have been, more reliably transmitted. They are employed
as a-priori probabilities in the recursions (5.6) and (5.7).
In order to understand the relationship between the extrinsic information
at the input and that at the output, let us consider the soft output ℓk related
to the information bit ak as produced by the BCJR algorithm. It can be
7.2 – Iterative decoding 199
0
10 1 iteration
3 iterations
6 iterations
12 iterations
-1
10
10-2
BER
-3
10
-4
10
-5
10
0 0.5 1 1.5 2 2.5 3 3.5 4
Eb/N0 [dB]
Figure 7.7: Performance of the UMTS turbo code with an interleaver of

length 1024.
expressed as
P (ak = 0|y) f (y|ak = 0)P (ak = 0)
ℓk = ln = ln
P (ak = 1|y) f (y|ak = 1)P (ak = 1)
f (y|ak = 0)
= ln + ℓe,in
k . (7.9)
f (y|ak = 1)
The first term of the right-hand side of (7.9) is the output extrinsic informa-
tion that can be thus obtained as
ℓe,out
k = ℓk − ℓe,in
k . (7.10)
As mentioned, the iterative decoding process converges very quickly. This

can be observed in Figs. 7.6 and 7.7, where the performance, expressed in
terms of BER as a function of the signal-to-noise ratio, is reported for the
original turbo code with rate 1/2 proposed in [89, 90] and for the turbo
code with rate 1/2 of the UMTS standard, respectively. In the former case,
the component encoder is that shown in Fig. 7.2, the interleaver has length
65536 bits, and the number of performed iterations is 1, 2, 3, 6, and 18.

In the second case, the component encoder is that shown in Fig. 7.3, the
interleaver has length 1024 bits, and the number of performed iterations is
1, 3, 6, and 12.
The iterative decoding principle [95] introduced for turbo codes has been
then extended to the decoding of serially concatenated codes and to many
scenarios other than decoding. In fact, it can be employed every time the
transmission system includes more sources of memory separated by an inter-
leaver. In the case of turbo codes, the sources of memory are the compo-
nent encoders, but we can employ the turbo principle on ISI channels (the
so-called turbo equalization) [96, 97, 98], on noncoherent channels [99], on
fading channels [100], and many others.
7.3 EXIT chart analysis

The pragmatic capacity described in Section 5.6 can be extended to study the
exchange of information in the iterative detection/decoding process and pre-
dict the waterfall region. The resulting technique, called extrinsic informa-
tion transfer chart (EXIT chart), is a valuable tool to better understand the
convergence behavior of iterative detection/decoding schemes without resort
to long bit-error rate simulations. While good analytical bounding techniques
have been found for moderate to high signal-to-noise ratio values [91, 93], the
waterfall region is much more difficult to predict analitically, although much
more interesting since, typically, the operating conditions lie in this region.
EXIT chart analysis, originally introduced by S. ten Brink [101], is a tool
based on computer simulations to describe the flow of the extrinsic infor-
mation through the soft-in/soft-out component decoders. Simulation results
suggest that the EXIT chart accurately predicts the convergence behavior of
the iterative decoder for large interleaving depth.
We will describe this technique with reference to turbo codes transmitted
over a baseband AWGN channel with transmitted symbols belonging to the
alphabet {±1}, although it can be easily extended to serially concatenated
codes, to another class of powerful codes that will be described in Chapter
8 [102], and to any other scheme based on iterative detection and decoding.
Before going into the details, we will first introduce a turbo decoding scheme
equivalent to that in Fig. 7.5 and a slightly different notation. This scheme
is shown in Fig. 7.8. First of all, we omitted the dependence on the discrete-
time index k to symplify the notation. The scheme is now symmetric, in
the sense that the second component decoder also receives the interleaved
(1)
received samples {yk } related to the information bits {ak }. This has a
7.3 – EXIT chart analysis 201
A1 D1
Π−1
BCJR 1 E1
Π
y (1)
y (2) E2
Π A2
y (3) BCJR 2 D2
Π−1
Figure 7.8: Equivalent turbo decoder.
main implication, as we will see in the following, that the contribution of the
(1)
received samples {yk } has to be removed from the extrinsic information to
avoid that each component decoder processes this contribution twice. We
denoted by
P (a = 0) P (a = 0|y)
Ai = ln , Di = ln , i = 1, 2
P (a = 1) P (a = 1|y)
the input and output LLR related to decoder i and by Ei the extrinsic infor-
mation. Clearly A2 = E1 (A1 = E2 ) after proper interleaving (deinterleav-
ing) since the extrinsic information on the systematic bits E1 (E2 ) is passed
through the interleaver (deinterleaver) to become the a-priori input A2 (A1 )
of the second (first) decoder . We will also define
f (y (1) |a = 0)
Y = ln .
f (y (1) |a = 1)
We said that we are considering a baseband transmission over an AWGN

channel and that the transmitted symbols belong to the alphabet {±1}. It
is thus
y (1) = x + w
where we will assume that
(
1 when a = 0
x=
−1 when a = 1
and w is a Gaussian random variable with mean zero and variance σ 2 . Hence

(1) 1 1 (1) 2
f (y |a) = √ exp − 2 y − x
2πσ 2 2σ
and
2 (1)
Y = y . (7.11)
σ2
The extrinsic information can be thus computed as
E1 = D1 − Y − A1
E2 = D2 − Y − A2 . (7.12)
Let us come back to eqn. (7.11). We can write
2 2
Y = 2
y = 2 (x + w) = µY x + nY
σ σ
with µY = σ22 and σY2 = E{n2Y } = 4
σ2
. Thus, mean and variance of Y are
connected by
σY2
µY = .
2
This relationship will turn out to be useful for modeling the a-priori knowl-
edge later.
The parallel decoder of Fig. 7.8 has a symmetric arrangement. Thus, the
situation for the second decoder with respect to A2 , D2 , and E2 is essentially
the same as for A1 , D1 , and E1 . Long sequence lengths make sure that tail
effects (open/terminated trellises of convolutional codes) can be neglected.
Hence, it is sufficient to focus on the first decoder for the remainder. To
simplify notation the decoder index “1” is omitted in the following. We
will try to predict the behavior of the iterative decoder by looking at the
input/output relations of individual component decoders. Since analytical
treatment of the BCJR-decoder is difficult, we will employ the following
assumptions:
• For large interleavers, the a-priori values remain fairly uncorrelated

from the respective channel observations over many iterations.
• The probability density functions of the extrinsic output values (which

becom the a-priori values for the next decoder) approach Gaussian-like
distributions with increasing number of iterations. A reason for this
Gaussian-like behavior is that sums over many values are involved in
the computation of the extrinsic information E which typically leads
to Gaussian-like distributions.
These two assumtions suggest that we can model the a-priori input to the
constituent decoder by applying an independent Gaussian random variable
with variance σA2 and mean zero in conjunction with the known transmitted
systematic bits, i.e., as
A = µA x + nA
Since A is supposed to be an LLR-value based on a Gaussian distribution,
as in the case of Y the mean value must fulfill the condition
σ2
µA = A .
2
We can now measure the information contents of the a-priori knowledge,
by computing the mutual information IA = I(X; A) between the transmitted
systematic bits and the LLR-values A:
X ˆ ∞ f (A = ξ|x)
IA = P (x)f (A = ξ|x) log2 dξ
x=−1,1 −∞ f (A = ξ)
X ˆ ∞ 2f (A = ξ|x)
= P (x)f (A = ξ|x) log2 dξ
x=−1,1 −∞ f (A = ξ|x = 1) + f (A = ξ|x = −1)
where (
2 )
2
1 1 σ
f (A = ξ|x) = p exp − 2 ξ − A x .
2
2πσA 2σA 2
It is thus
1 X
ˆ ∞ ξ
2e− 2 x
IA = f (A = ξ|x) log2 ξ ξ dξ
2 x=−1,1 −∞ e− 2 + e 2
ξ
1 ∞ 2e− 2
ˆ
= f (A = ξ|x = 1) log2 ξ ξ dξ
2 −∞ e− 2 + e 2
ξ
1 ∞ 2e 2
ˆ
+ f (A = ξ|x = −1) log2 ξ ξ dξ
2 −∞ e−2
+ e 2
( )
ˆ ∞ 2 2
1 1 σA −ξ
=1 − p exp − 2
ξ − log 2 1 + e dξ .
2πσA2 −∞ 2σA 2
The last result is a function of σA which is not available in closed form but
can be numerically computed. It will be denoted as J(σA ). It holds that
lim J(σA ) = 0
σA →0
lim J(σA ) = 1 .
σA →∞
In addition, this function is monotonically increasing [51] and thus reversible.

Mutual information can be also used to quantify the extrinsic output IE =
I(X; E). We thus have a technique to compute, by simulation the mutual
information IE as a function of IA the value of Eb /N0 under consideration
through the following steps:
1. Given the value of IA , we compute the relevant value σA =

J −1 (IA ).
2. We then generate a long sequence N 4 of input symbols {ak } (and

the corresponding systematic channel inputs {xk }), the corre-
(2) (1)
sponding code sequence {ck } and channel outputs {yk } and
(2)
{yk } for the signal-to-noise ratio value Eb /N0 under considera-
tion.
3. We can generate the a-priori input knowledge {Ak } according to

the model
σ2
Ak = A xk + nA,k
2
where {nA,k } are independent and identically distributed Gaus-
sian random variables with mean zero and variance σA2 .
(1) (2)
4. We run the BCJR algorithm with input samples {yk } and {yk }
and a-priori knowledge {Ak } and compute the corresponding
LLRs Ek that are used to compute IE through (5.17) that we
report for convenience:
N −1
1 X
IE ≃ 1 − f (Ek (1 − 2ak ))
N
k=0
with f (x) = log2 (1 + e−x ).
Once we computed the function IE = T (IA , Eb /N0 ), we are ready to study

the convergence behavior of the turbo decoder. Both decoder characteristics
are plotted into a single diagram. However, for the transfer characteristics
of the second decoder the axes are swapped. This diagram is referred to as
EXIT chart since the exchange of extrinsic information can be visualized as
a decoding trajectory. An example is given in Fig. 7.9 where two values
of the signal-to-noise ratio, namely 0.1 dB and 0.8 dB are considered for a
specific component code. The trajectory is a simulation result taken from
4
Sequence lengths of 104 systematic bits can be sufficient.
output IE1 of first decoder becomes input IA2 of the second decoder 1
first decoder, 0.8 dB
second decoder, 0.8 dB
0.8
0.6
0.4
0.2
0
0 0.2 0.4 0.6 0.8 1
output IE2 of second decoder becomes input IA1 of the first decoder
Figure 7.9: Example of EXIT chart for different values of Eb /N0 .
the iterative decoder. For 0.1 dB the trajectory (lower left corner) gets stuck
after two iterations since both decoder characteristics do intersect. For 0.8
dB the trajectory is able to pass through the bottleneck. After six passes
through the decoder, increasing correlations of extrinsic information start to
show up and let the trajectory deviate from its expected zigzag-path. For
larger interleavers the trajectory stays on the characteristics for some more
passes through the decoder.
EXIT charts are very useful to predict the waterfall region. It can be
also used to design the component codes by choosing those with the lower
waterfall. The main advantage of the EXIT chart to the understanding of
iterative decoding is that only simulations of individual constituent decoders
are needed to obtain the desired transfer characteristics. These can then be
used in any combination in the EXIT chart to describe the behavior of the
corresponding iterative decoder, asymptotic with respect to the interleaver
size. No resource-intensive bit-error rate simulations of the iterative decoding
scheme itself are required.
7.4 Exercises
Exercise 7.1 For the component encoders in Figs. 7.2 and 7.3, define the
encoder state µk and plot the corresponding trellis diagram. Finally, consider
the trellis branch associated to the pair (ak , µk ) and find the corresponding
pair (a− +
k−J , µk+1 ).
Exercise 7.2 Write a computer program to randomly generate, given the

parameters M, S1 and S2 , a spread interleaver.
Exercise 7.3 Draw the block diagram of the iterative decoder correspond-
ing to the serially concatenated scheme in Fig. 7.4. Show that, at least for one
of the two component decoders, it is required to modify the BCJR algorithm
in such a way it provides, in addition to the a-posteriori probabilities of the
information symbols, also those of code symbols. Suggest how to modify the
BCJR algorithm in this sense.
Chapter 8
Factor graphs and the

sum-product algorithm
8.1 Introduction
207
208 Factor graphs and the sum-product algorithm
Chapter 9
Codes for fading channels
9.1 TCM for fading channels

Since the invention of TCM, it has been universally accepted that modulation
and coding must be combined into a single entity to improve the performance.
The same paradigm has been thus applied to the case of transmissions over
fading channels where, as shown in the following, the performance depends
no more on the Euclidean distance between codewords.
Let us consider, as a possible TCM transmission scheme for a fading
channel, that shown in Fig. 9.1. The TCM encoder is modeled as the cascade
of a binary encoder (of rate k/m in the figure) plus a modulator (or mapper)
(1) (m)
that univocally associates the m-tuple (cℓ , . . . , cℓ ) with an M-ary, with
M = 2m , symbol cℓ of the adopted complex constellation A. Information
(1) (k)
bits are denoted in the following as (aℓ , . . . , aℓ ). With respect to the case
of a transmission over the AWGN channel, coded symbols are interleaved
before transmission (the interleaver is denoted by Π in the figure). This
interleaver, useless on the AWGN channel, is instead very important on a
channel with correlated fading to spread adjacent received samples affected
by deep fade, and thus to help the decoder.
Let {cℓ } denote the sequence of coded symbols after interleaving and
adopt the following assumptions:
• one sample per symbol is sufficient for optimal decoding; we will denote
by {r ℓ } the received sample sequence and by {rℓ } the corresponding
deinterleaved sequence;
• the channel model is

√
rℓ = γhℓ cℓ + nℓ , ℓ = 0, 1, . . . , N − 1 (9.1)
209
210 Codes for fading channels
(1)
(1)
ak ck CHANNEL
(2)
(2) ck ck Q ck
ak ENC (3) MOD f (r|c, h)
ck
COD TCM
(1)
âk
rk Q−1 rk
DECOD â(2)
k
Figura 9.1: Possible transmission scheme adopting a coded modulation over

a fading channel.
where the transmitted symbols are normalized such that E{|cℓ |2 } = 1,

N −1
h = {hℓ }ℓ=0 is a zero-mean discrete-time complex Gaussian process
∗
with autocorrelation function Rh (i) = E{hℓ+i hℓ } normalized such that
2
Rh (0) = E{hℓ } = 1, {nℓ } is a complex discrete-time white Gaus-
sian noise process with independent real and imaginary components,
properly normalized such that E{|nℓ |2 } = 1, γ = ENS0 is the average re-
ceived signal-to-noise ratio, and N is the codeword length. The channel
is characterized by the conditional probability density function (pdf)
f (r|c, h) and by the joint a priori pdf f (h). With the above assump-
tions we have that
N
Y −1 N
Y −1
1 n √ 2 o
f (r|c, h) = f (r ℓ |cℓ , hℓ ) = exp − rℓ − γhℓ cℓ
(9.2)
π
ℓ=0 ℓ=0
whereas we will assume that the channel amplitude has Rayleigh distri-
bution. Defining ρℓ = |hℓ |2 , this random variable has the exponential
distribution: (
e−ρ for ρ ≥ 0
f (ρ) = ; (9.3)
0 for ρ < 0
• the receiver perfectly knows the channel, i.e., it is equipped with an

ideal estimator of the gains {hℓ };
• the interleaver is ideal. The fading samples, correlated before the
deinterleaver, become independent after it. According to the nota-
tion adopted for code symbols and received samples, we will denote by
9.1 – TCM for fading channels 211
h = {hℓ } the sequence of fading gains after deinterleaving. Similarly

to (9.2), it is
N
Y −1 N
Y −1
1 n √ o
f (r|c, h) = f (rℓ |cℓ , hℓ ) = exp − |rℓ − γhℓ cℓ |2 . (9.4)
ℓ=0 ℓ=0
π
Under these assumptions, let us compute the pairwise error probability. From
(9.4) it follows that a maximum likelihood decoder that perfectly knows the
fading gains will operate on the code trellis with metric to be maximized
equal to
N
X −1
Λ(c) = ln f (r|c, h) = ln f (rℓ |cℓ , hℓ ) . (9.5)
ℓ=0
It can thus employ the Viterbi algorithm with branch metrics

√
λℓ = ln f (rℓ |cℓ , hℓ ) ∝ − |rℓ − γhℓ cℓ |2 . (9.6)
Let us now consider two codewords c and ĉ stemming from the same state and
merging after a given number of trellis steps. Thus, given a particular channel
realization, the pairwise error probability, which represents the probability
that the decoder chooses the sequence ĉ when c is the transmitted sequence
and ĉ and c are the only two possible decoding outcomes, can be expressed
as1
2
d(s, ŝ) 1 d (s, ŝ)
Pr{Λ(ĉ) > Λ(c)|c, h} = Q √ ≤ exp −
2 2 4
( )
1 γX
= exp − |hℓ |2 |ĉℓ − cℓ |2
2 4
ℓ∈I
1 Y n γ o
2
= exp − ρℓ |ĉℓ − cℓ |
2 ℓ∈I 4
where d(s, ŝ) is the Euclidean distance between the the discrete-time signals
√
s and ŝ corresponding to vectors c and ĉ, respectively, i.e., sℓ = γhℓ cℓ and
√
ŝℓ = γhℓ ĉℓ , Λ(c) and Λ(ĉ) are the path metrics corresponding to c and ĉ,
respectively, and I is the set of all ℓ such that |ĉℓ − cℓ | =
6 0. By using (9.3)
1
The following inequality has been used:
1 −x2 /2
Q(x) ≤ e , x ≥ 0.
2
Tighter upper bounds could be derived as described in [103, 104].
and the independence of random variables {ρℓ }, the average of the channel
realization is easily computed obtaining the average pairwise error probability
n o
Pr{Λ(ĉ) > Λ(c)|c} = Eh Pr{Λ(ĉ) > Λ(c)|c, h}
that, for simplicity, will be denoted by Pr{c → ĉ}:
1 Y +∞ n γ o
ˆ
Pr{c → ĉ} ≤ exp − ρℓ |ĉℓ − cℓ |2 e−ρℓ dρℓ
2 ℓ∈I 0 4
1Y ∞ n hγ io
ˆ
= exp −ρℓ |ĉℓ − cℓ |2 + 1 dρℓ
2 ℓ∈I 0 4
1Y 1
= γ
2 ℓ∈I 4 |ĉℓ − cℓ |2 + 1
which, asymptotically (for high signal-to-noise ratio values, i.e., for γ →

+∞), becomes
1 γ −|I| 1
Pr{c → ĉ} ≤ Q 2
2 4 ℓ∈I |ĉℓ − cℓ |
where |I| is the cardinality of set I. If we represent this pairwise error

probability as a function of the signal-to-noise ratio, the larger the value of |I|
the larger the rate of its decrease
Q (the slope of the curve in a log-log plot). In
addition, the higher the term ℓ∈I |ĉℓ −cℓ |2 , the lower the asymptotic pairwise
error probability. The cardinality of the set I is the Hamming distance of
the considered error event. Hence, on a Rayleigh fading channel, in order
to improve the performance it is not the minimum Euclidean distance which
plays an important role but rather the code minimum Hamming distance (the
code diversity).
Q As a secondary merit criterion, we should try to maximize
the term ℓ∈I |ĉℓ − cℓ |2 on error events with minimum Hamming distance.
This term, usually called coding gain, does not depend on the signal-to-noise
ratio and displaces the error probability curve instead of changing its slope.
The basic code design principles over fading channels with ideal interleaving
are thus the following:
Code diversity criterion: The minimum code diversity |I| must be max-
imized.
Coding gain criterion: In order

Q to obtain 2the maximum possible coding
advantage, the coding gain ℓ∈I |ĉℓ − cℓ | over error events having min-
imum diversity should be maximized.
Codes designed for the AWGN channel are thus suboptimal when employed
on a fading channel and a redesign is necessary. In particular, parallel tran-
sitions must be avoided since, when present, the code minimum Hamming
distance results to be equal to one (and thus the asymptotic performance
linearly decreases with the signal-to-noise ratio).
Several authors worked on the design of codes for fading channels. We will
now focus on a particular design procedure for constructing trellis codes with
optimal performance on the Rician/Rayleigh fading channel [105]. Although
this procedure applies to both conventional and multiple trellis codes (that
is, codes characterized by multiple output symbols for each trellis transition),
we will focus on the latter since their potential can be fully exploited on this
channel. In fact, when multiple trellis codes are employed, we can again
design a trellis diagram with parallel paths and still have an asymptotic
performance that decreases faster than linearly with the signal-to-noise ratio.
As an example, by using a multiple trellis code with L symbols associated
with each trellis branch, it is possible to have a code diversity ofQ
L and also to
design the code in such a way the minimum value of the term ℓ∈I |ĉℓ − cℓ |2
on error events with minimum Hamming distance is maximized. This is made
possible by a specific set partitioning procedure, obviously different from that
previously described, that will be now illustrated for the case of L = 2 and
an M-PSK constellation.
Let A denote the original M-PSK constellation and A ⊗ A the twofold
Cartesian product of A with itself. Hence, each element of A ⊗ A is a pair of
symbols belonging to the original constellation A. In the following, we will
i
identify the PSK symbol e2π M , i = 0, 1, . . . , M − 1, through integer i. At
the first partition level, set A ⊗ A is partitioned into M sets defined by the
ordered Cartesian product A ⊗ Bi , i = 0, 1, . . . , M − 1, whose p-th element,
p = 0, 1, . . . , M − 1, is the ordered pair (p, [pq + i]M ), where q ≤ M is a
proper odd integer and [·]M denotes a sum modulo M. As an example, for
M = 8 and an integer q = 3, we obtain the following subsets:
   
0 0 0 1
 1 3   1 4 
   
 2 6   2 7 
   
 3 1   3 2 
A ⊗ B0 = 


 A ⊗ B1 = 



 4 4   4 5 
 5 7   5 0 
   
 6 2   6 3 
7 5 7 6
   
0 2 0 3
 1 5   1 6 
   
 2 0   2 1 
   
 3 3   3 4 
A ⊗ B2 = 


 A ⊗ B3 = 



 4 6   4 7 
 5 1   5 2 
   
 6 4   6 5 
7 7 7 0
   
0 4 0 5
 1 7   1 0 
   
 2 2   2 3 
   
 3 5   3 6 
A ⊗ B4 = 


 A ⊗ B5 = 



 4 0   4 1 
 5 3   5 4 
   
 6 6   6 7 
7 1 7 2
   
0 6 0 7
 1 1   1 2 
   
 2 4   2 5 
   
 3 7   3 0 
A ⊗ B6 = 


 A ⊗ B7 = 

.

 4 2   4 3 
 5 5   5 6 
   
 6 0   6 1 
7 3 7 4
As it can be observed, within any of the M partitions, each pair differs
from all other pairs in both elements. Hence, when the pairs of a partition
are employed to label parallel transitions of a multiple trellis code, a code
diversity of L = 2 is obtained.
The parameter q is a key point in this partition method. QIn fact, its choice
takes aim at maximizing the minimum value of the term ℓ∈I |ĉℓ − cℓ |2 on
parallel transitions. Before going into details, let us observe that Q the set
Bi+1 is simply a cyclic shift of set Bi . Thus, since the term ℓ∈I |ĉℓ −
cℓ |2 is simply the product of the squared Euclidean distances between the
corresponding symbols in the pair, the adopted set partitioning guarantees
that the intradistance structure of each partition A ⊗ Bi is the same. Hence,
it is sufficient to study the intradistance structure of the so-called generating
set A ⊗ B0 . In other words, let us consider the pairs (p, [pq + i]M ) and
(l, [lq + i]M ) of the set A ⊗ Bi . The product of the square distances between
Table 9.1: Optimal values of q for different values of M.

M q∗
2 1
4 1
8 3
16 7
32 7, 9
64 19, 27
these two pairs is

l 2

lq+i 2

l−p 2

(l−p)q 2
j2π Mp pq+i
e − ej2π M ej2π M − ej2π M = 1 − ej2π M 1 − ej2π M

2 l−p 2 l−p
= 16 sin π sin πq
M M
and, as mentioned, is independent of i. The value q ∗ of q maximizing the

minimum value of the product of these square distances is thus
h p p i
∗ 2 2
q = argmax min 16 sin π sin πq . (9.7)
q odd p>0 M M
It can be easily proven that M − q ∗ is also a valid solution. Table 9.1 reports
the optimal values of q for different values of M.
The next possible step in this set-partitioning procedure consists in parti-

tioning each of the M sets A ⊗ Bi into two sets C0 ⊗ Di0 and C1 ⊗ Di1 , where
the first set contains the even elements and the second set the odd elements
of A ⊗ Bi . In the case of the example above, we obtain the following subsets:
   
0 0 1 3
 2 6   3 1 
C0 ⊗ D00 =   4 4 
 C 1 ⊗ D 01 = 
 5 7 

6 2 7 5
   
0 1 1 4
 2 7   3 2 
C0 ⊗ D10 =
 4
 C1 ⊗ D11 = 
5   5 0 
6 3 7 6
   
0 2 1 5
 2 0   3 3 
C0 ⊗ D20 =
 4
 C1 ⊗ D21 = 
6   5 1 
6 4 7 7
   
0 3 1 6
 2 1   3 4 
C0 ⊗ D30 =
 4
 C1 ⊗ D31 = 
7   5 2 
6 5 7 0
   
0 4 1 7
 2 2   3 5 
C0 ⊗ D40 =
 4
 C1 ⊗ D41 = 
0   5 3 
6 6 7 1
   
0 5 1 0
 2 3   3 6 
C0 ⊗ D50 =
 4
 C1 ⊗ D51 = 
1   5 4 
6 7 7 2
   
0 6 1 1
 2 4   3 7 
C0 ⊗ D60 =
 4
 C1 ⊗ D61 = 
2   5 5 
6 0 7 3
   
0 7 1 2
 2 5   3 0 
C0 ⊗ D70 =
 4
 C1 ⊗ D71 = 
3   5 6 
6 1 7 4
Obviously, within each of these new sets, each pair is still distinct from
all other pairs in both positions. However,
Q it is in general no more true that
now the minimum value of the term ℓ∈I |ĉℓ − cℓ |2 is maximized within each
set unless in the previous step we used the value q ∗ corresponding to M/2
instead of M. Hence, the choice of q ∗ depends on the target partition level.
The third and subsequent steps are identical in construction to the second
step, i.e., we need to partition each set in the present level into two sets
containing the alternate rows, with the set of the previous levels obtained
by using a value of q ∗ computed as in (9.7) with M successively replaced by
M/4, M/8, and so on.
This procedure can be easily generalized to the case of L multiple of two.
As an example, in the case of L = 4 the sets belonging to the first partition
level will be the M 2 sets A ⊗ Bi ⊗ A ⊗ Bp , i, p = 0, 1, . . . , M − 1.
A ⊗ B0 A ⊗ B0
A⊗ B4
B 4
B2 A⊗
A⊗
A ⊗ B6
Figure 9.2: Trellis diagram for the considered 2-state rate-4/6 multiple TCM.
When the number of sets required to satisfy the trellis is less than the
number of sets generated on a particular partition level, only those having
largest interdistance must be chosen, as in the examples below. Let us now
discuss, through a couple of practical examples, how to employ these sets in
the code construction. The examples deal with two- and four-state rate-4/6
multiple (L = 2) trellis coded 8-PSK modulations, respectively.
Example Let us consider a 2-state rate-4/6 multiple TCM using the 8-

PSK as output constellation. Being L = 2, two 8-PSK symbols (hence 6
bits) are transmitted every 4 input information bits. The encoder trellis has
thus 24 = 16 branches leaving each state. Since there are only two states,
each transition between states has eight parallel paths. The encoder trellis
is shown in Fig. 9.2, where parallel paths are denoted by bold lines. For the
properties of the set partition method just described, if we associate pairs
from sets A ⊗ Bi with parallel transition, we are sure that a code diversity of
2 is obtainedQon them. In addition, it can be shown that the minimum value
of the term ℓ∈I |ĉℓ − cℓ |2 on parallel transitions is 2.
Let us now consider longer error events. Since it can be shown that this
code is linear, without loss of generality, we assume that the all-zero path
is the correct one and we consider an error event of length 2 such as that
shown in Fig. 9.2. The four paths of length 2 that differ by a minimum of
two symbols from the correct one are those corresponding to coded symbols
4], and [2 0 4 0]. It can be easily shown that for them
[0 2 0 4], [0 2 4 0], [2 0 0Q
the value of the term ℓ∈I |ĉℓ − cℓ |2 is 8, hence larger than that related to
parallel transitions that, thus, lead the asymptotic behavior of the code.
In this code construction, we used sets A ⊗ Bi , i = 0, 2, 4, 6. Equivalently,
sets A ⊗ Bi , i = 1, 3, 5, 7 could have been employed.

Example Let us now consider a 4-state rate-4/6 multiple TCM using the
8-PSK as output constellation. As before, being L = 2, two 8-PSK symbols
(hence 6 bits) are transmitted every 4 input information bits. The encoder
trellis has thus 24 = 16 branches leaving each state. Since there are now four
states, and assuming a completely connected encoder trellis, as that shown in
Fig. 9.3, each transition between states has four parallel paths. The encoder
trellis is shown in Fig. 9.3, where parallel paths are denoted by bold lines.
For the properties of the set partition method just described, if we associate
pairs from sets C0 ⊗ Di0 and C1 ⊗ Di1 with parallel transition, we are sure
that a code diversity of 2 is obtained onQ them. In addition, it can be shown
that the minimum value of the term ℓ∈I |ĉℓ − cℓ |2 on parallel transitions
is 4, provided that sets C0 ⊗ Di0 and C1 ⊗ Di1 are obtained through the
procedure described above but with q = 1, which is the optimal value for
M/2 = 4. Even in this case, not all sets C0 ⊗ Di0 and C1 ⊗ Di1 are required,
since only the following eight sets, for simplicity denoted as Si in the figure,
can be used:    
0 0 1 5
 2 2   3 7 
S1 =  4 4 
 S 2 = 
 5 1 

6 6 7 3
   
0 4 1 1
 2 6   3 3 
S3 =  4 0 
 S 4 = 
 5 5 

6 2 7 7
   
0 2 1 7
 2 4   3 1 
S5 =  4 6 
 S6 =  5 3 

6 0 7 5
   
0 6 1 3
 2 0   3 5 
S7 =  4 2 
 S 8 = 
 5 7 

6 4 7 1
Every other error event consisting of two or more branches has an Ham-
ming distance greater than 2 regardless of which path is chosen as the correct
path. Thus, the dominant term on the asymptotic symbol or bit error prob-
ability expressions corresponds again to parallel paths

9.1.1 Decoding algorithms

The scheme of principle in Fig. 9.1 relies, as already mentioned, on the
assumption that an ideal channel estimator is available. The channel law
S1 S1 S1
S2
S4 S3
S5 S5
S6
S7
S8
S2 S1
S1
S4
S3
S6 S5
S8
S7
Figure 9.3: Trellis diagram for the considered 4-state rate-4/6 multiple TCM.
is thus (9.4) where hℓ is assumed perfectly known at the receiver. In this

case, as already discussed in Chapter 4, the optimal receiver jointly performs
detection and decoding through a search on the trellis diagram of the overall
encoder using the Viterbi algorithm. Denoting by µℓ the generic state of the
(1) (k)
overall encoder, by aℓ = (aℓ , . . . , aℓ ) the k-tuple of the encoder input bits,
and by cℓ (aℓ , µℓ ), µℓ+1 (aℓ , µℓ ), and λℓ (aℓ , µℓ ) the code symbol, the successive
state, and the branch metric associated with the trellis transition from state
µℓ driven by aℓ , respectively, it is clearly
√
λℓ (aℓ , µℓ ) = |rℓ − γhℓ cℓ (aℓ , µℓ )|2
using the deinterleaved samples rℓ and working on the trellis of the overall
encoder with branch metrics. In case of a multiple TCM with L code symbols
per trellis branch, we will clearly have the sum of L terms of this form, one
per code symbol.
In practical conditions, i.e., when this ideal channel estimator is not avail-
able, due to the presence of the interleaver, the optimal decoder can hardly
be implemented. Assuming that a knowledge on the channel statistics is
available at the receiver, a soft-input soft-output detection algorithm based
on linear prediction (the MAP symbol detection version described in Chap-
ter 8) can be used and the relevant extrinsic information (in the logarithmic
domain)
Pr {cℓ |r}
ln = ln Pr {cℓ |r} − ln Pr {cℓ } = ln f (r|cℓ ) (9.8)
Pr {cℓ }
deinterleaved and employed as input of the decoder, like in decoding schemes
for serially concatenated convolutional codes (see Chapter 7). In this case,
the turbo principle [95] can be advocated and, by using a soft-output decoder,
a few iterations between detector and decoder performed.
9.1.2 Error performance

Upper bounds on symbol and bit error probabilities can be computed through
the technique described in Chapter 2, at least under the hypothesis that
an ideal channel estimator is available. The asymptotic expressions can be
obtained, as usual, by considering only those error events associated with the
largest asymptotic pairwise error probabilities
As a final remark, it is interesting to consider the effect of the inter-
leaver/deinterleaver in the scheme of Fig. 9.1. In their absence, the assump-
tion that the fading is independent from one sample to the next one is no
longer valid. When the fading is sufficiently slow as to be constant to a
value |h| over the duration of an error event of minimum distance (quasi-
static fading assumption), the bit error probability, can be asymptotically
approximated by
r
γ
Pb . Kb Eρ Q |h|dmin
2
Kb n γ o
. Eρ exp − ρd2min (9.9)
2 4
where Kb is a proper constant (see Chapter 2) and dmin is the minimum
Euclidean distance of the code. Performing the average over the Rayleigh
probability density function, one obtains the average bit error probability
Kb 1
Pb .
2 1 + d2min γ4
which can be approximated, for large values of the signal-to-noise ratio, by
1
Pb . 2Kb .
γd2min
Hence, in the absence of interleaver/deinterleaver, independently of the em-
ployed trellis code, the code diversity is equal to 1.2
In these conditions, improvements can be obtained by resorting to receive
diversity. Assuming nR receive antennas, the corresponding received samples
can be expressed, in vector notation, as
√
ri = γhi c + ni , i = 1, 2, . . . , nR (9.10)
ri being a row vector collecting the samples received by antenna i, hi the cor-
responding channel gain, and ni the noise samples corresponding to antenna
2
It should be noticed that, for quasi-static channels, the union bound results to be
loose [106]. We will see later how to solve this problem.
i, assumed independent of each other and independent of noise samples re-

lated to other antennas. Coefficient γ has now the meaning of average signal-
to-noise ratio per receive antenna. In addition, we assume that channel gains
{hi } are independent of each other and perfectly known at the receiver. The
optimal ML detection strategy is
ĉ = argmax f (r1 , r2 , . . . , rnR |c, h1 , h2 , . . . , hnR )

c
nR 2 nR h
X √ X √ ∗ Hi |hi |2 2
= argmin ri − γhi c = argmax ℜ γhi ri c − γ kck
c
i=1
c
i=1
2
PnR ∗
√ i=1 hi ri H kck2 √ 2
= argmax ℜ γ PnR 2
c − γ = argmin r − γc (9.11)
i=1 |hi | 2
c c
having defined PnR ∗ PnR ∗

i=1 hi ri √ hi ni
r = PnR 2
= γc + Pi=1
nR 2
. (9.12)
i=1 |hi | i=1 |hi |
This latter expression can be considered

PnR as2 the definition
PnR of an equivalent
channel whose noise variance is 1/ i=1 |hi | = 1/ i=1 ρi instead of 1. It is
thus
( r Pn !)
√ R
i=1 iρ
Pb . Kb Eρ1 ,ρ2 ,...,ρnR Q γdmin
2
( nR
!)
Kb γ X
. Eρ1 ,ρ2 ,...,ρnR exp −d2min ρi
2 4 i=1
Kb Y
nR n γ o
= Eρi exp −d2min ρi (9.13)
2 i=1 4
and nR
Kb 1
Pb . (9.14)
2 1 + d2min γ4
that, asymptotically, becomes
Kb h 2 γ i−nR
Pb . dmin . (9.15)
2 4
The rate of decrease is thus proportional to the number of receive antennas.
Space diversity can thus be employed instead of time diversity. We will
see later that transmit antenna diversity can be also exploited through the
adoption of properly designed space-time codes.
(1)
(1)
aℓ cℓ
Binary Q cℓ
P/S S/P Mapper Channel
encoder
(k)
aℓ (n)
cℓ m bits
BICM encoder
(1)
âℓ
rℓ Q−1 Binary
Demod. P/S S/P
decoder
(k)
âℓ
m soft-outputs n soft-inputs
Q
S/P P/S
m soft-inputs n soft-outputs
BICM demodulator/decoder
Figure 9.4: Transmission scheme employing BICM.
9.2 Bit-interleaved coded modulation (BICM)

As discussed, the paradigm that coding and modulation must be designed
jointly was adopted on fading channels, although, in this case, it is the code
diversity the main parameter to be maximized and not the code minimum
Euclidean distance. A first deviation from this paradigm is represented by bit-
interleaved coded modulation (BICM). This technique, originally proposed by
Zehavi [107] in 1992, was further developed and analyzed by Caire, Taricco,
and Biglieri [108] in 1998. According to this technique, coded modulations
with a very good performance over flat fading channels can be built by using
off-the-shelf binary codes that are optimal in the sense of the free Hamming
distance, and thus available on common textbooks.
9.2.1 Code construction

The scheme of a system employing BICM is shown in Fig. 9.4. The pres-
ence at the transmitter of parallel-to-serial (P/S) and serial-to-parallel (S/P)
converters clarify that the interleaver operates at the bit level. After inter-
leaving, groups of m coded bits are mapped into M-ary (with M = 2m )
symbols {cℓ } belonging to a complex constellation A. In other words, by
comparing the schemes in Figs. 9.1 and 9.4 one may observe that, at the
transmitter side, the only difference is that now coded bits instead of coded
symbols are interleaved.
9.2 – Bit-interleaved coded modulation (BICM) 223
The idea behind BICM is very simple. If we consider two codewords of the
binary code having Hamming distance dH , and thus that differ for dH bits,
after the interleaver they will likely belong to the label of dH different coded
symbols. Hence, with high probability, the corresponding M-ary codewords
still have Hamming distance dH . The use of a binary code optimal in the sense
of the free Hamming distance (and of a proper interleaver), thus ensures that
the code diversity is maximized. These codes are known from early sixties
and thus an ad-hoc code design is not necessary. Obviously, the obtained
coded schemes are not optimal since no attempt is made Q in the direction of
2
trying to maximize the minimum value of the term ℓ∈I |ĉℓ − cℓ | over the
set of codewords with minimum Hamming distance. However, since the code
diversity is by far the most important parameter, these codes are expected
to provide a very good performance and to be practically optimal.
9.2.2 Decoding algorithms

The presence of the interleaver between encoder and mapper generates a
new encoder with a much larger memory. Hence, the optimal decoder is
unfeasible, even under the assumption of perfect knowledge of the channel.
We can thus resort to the suboptimal receiver shown in Fig. 9.4. In this
scheme, for each coded bit a demodulator computes the soft-outputs corre-
sponding to each coded bit that are then employed by the binary decoder.
The turbo principle can be advocated and, by using a soft-output decoder,
a few (optional) iterations between detector and decoder performed.
(i)
Going into the details, let us assume that, after the interleaver, bit cℓ
becomes the j-th bit of the label of symbol cp . We will thus write lab(j) (cp ) =
(i) (i)
cℓ . The logarithm of the extrinsic information of coded bit cℓ will be
X n o
(i)
ln f (r|cℓ = b) = ln f (r|cp )Pr cp |lab(j) (cp ) = b , b = 0, 1 (9.16)
cp ∈A
where
(
n o 1
if lab(j) (cp ) = b
(j) 2m−1
Pr cp |lab (cp ) = b =
0 otherwise.
Considering the employed constellation A and the assigned mapping, let us

(j)
denote by Ab the partition of the original constellation A obtained when
the j-th bit of the mapping is set to b. In case of the 8-PSK constellation
(1) (1)
with Gray mapping shown in Fig. 9.5-(a), partitions A0 and A1 are shown
(1) (1)
A 011 A0 011 A1
010 001 010 001
110 000 000 110

111 100 111 100
101 101
(a) (b) (c)
(1) (1)
Figure 9.5: Constellation A and partitions A0 e A1 .
in Fig. 9.5-(b) and Fig. 9.5-(c), respectively. Hence,

(i) 1 X
ln f (r|cℓ = b) = ln f (r|cp )
2m−1 (j)
cp ∈Ab
X
∝ ln f (r|cp ), b = 0, 1 . (9.17)
(j)
cp ∈Ab
The extrinsic information is then permuted and employed by the binary

decoder. As already mentioned above, optional iterations can be performed
according to the turbo principle; this results in the so-called BICM with
iterative decoding (BICM-ID),nproposed by Li and o Ritcey [109]. In this case,
(j)
in (9.16) the probabilities Pr cp |lab (cp ) = b will be those provided by
the soft decoder in case of soft feedback. They can be hard quantized to
reduce the computational complexity.
When the channel statistics are known at the receiver, a soft-input soft-
output detection algorithm based on linear prediction can be used to compute
the extrinsic information on coded symbols f (r|cp ). In the simpler case of
a receiver that perfectly knows the channel, i.e., equipped with an ideal
(i)
estimator of the gains {hℓ }, the extrinsic information of coded bit cℓ to be
computed becomes3
(i)
X
ln f (r|cℓ = b, h) ∝ ln f (r|cp , h), b = 0, 1 (9.18)
(j)
cp ∈Ab
3
The outer decoder will use, as branch metric
n
X (i)
λℓ = ln f (r|cℓ = b, h)
i=1
(i)
i.e., the sum of the extrinsic information of coded bits {cℓ }ni=1 (cf. the case of a convo-
lutional code transmitted over an AWGN channel, discussed in Remark 2.8).
9.2 – Bit-interleaved coded modulation (BICM) 225
where f (r|cp , h) can be easily expressed as

n √ 2 o

f (r|cp , h) ∝ f (rp |cp , hp ) ∝ exp − rp − γhp cp . (9.19)
Expression (9.18) can be simplified. In fact, by using the approximation

(5.11), we have
∼
h √ 2 i

ln f (r|cℓ = b, h) ∝ max − rp − γhp cp
(i)
(j)
cp ∈Ab
√ 2
= − min rp − γhp cp , b = 0, 1 . (9.20)
(j)
cp ∈Ab
The employed mapping rule has a significant influence on the system

performance. When iterative decoding is not employed, the Gray mapping
usually gives the best performance [108]. In case of BICM-ID, other mapping
rules can provide a performance improvement when increasing the number of
iterations whereas, usually, this improvement is very limited with the Gray
mapping [109].
9.2.3 Error performance

Before giving some hints on the error performance of BICM, at least when
iterative decoding is not employed, we introduce here the equivalent paral-
lel channel model for BICM in the case of ideal interleaving. This model
is shown in Fig. 9.6 and consists of a set of m parallel independent and
memoryless binary input channels connected to the encoder output by a
random switch, which models ideal interleaving. Each channel corresponds
(i)
to a position in the label of the symbols of A. For every coded bit cℓ ,
the switch selects randomly and independently of other selections a position
(i)
index j, j = 1, 2, . . . , m and transmits cℓ on the j-th channel. The detec-
tor, which knows the sequence of switch positions, computes the bit metrics
(i)
{ln f (r|cℓ = b)} that are used to compute the branch metrics of the code
trellis over which the decoder operates. This model is the basis for the com-
putation of the capacity of BICM with ideal interleaving. These results show
that for 8-PSK and 16-QAM schemes, in the range of practical interest, the
capacity loss of BICM with respect to the optimum approach is negligible if
and only if Gray mapping is used [108].
Considering now the error performance, let us assume that the binary
code employed in the BICM scheme of Fig. 9.4 is linear, so that it admits a
(possibly time-varying, as in the case of block codes) trellis representation,
and consider two codewords c and ĉ of this binary code stemming from the
Binary-input
channel 1
(1)
cℓ Binary-input (i)
ln f (r|cℓ = b)
Binary channel 2
P/S
encoder
(n)
cℓ
Binary-input
channel m
Figure 9.6: Equivalent parallel channel model for BICM in the case of ideal
interleaving.
same state and merging after a given number of trellis steps. Our aim is
to compute the pairwise error probability Pr{c → ĉ} that, however, may
depend on the pair (c, ĉ) rather than on their difference. This is because
the binary-input channels of the BICM equivalent parallel channel model in
Fig. 9.6 may be nonsymmetric, depending on the mapping and the signal
constellation A. In [108], a symmetrization procedure is thus described
which leaves the performance unmodified.
After symmetrization, the pairwise error probability will depend, besides
the channel and the type of detection, on the Hamming distance dH between
c and ĉ, the employed mapping µ, and the signal constellation A:
Pr{c → ĉ} = g(dH , µ, A) .
The usual bound on the bit error probability of binary codes can be computed
as ∞
1 X
Pb ≤ wI (dH )g(dH , µ, A) (9.21)
k d =1
H
in case of a convolutional code of rate k/n and

n
1 X
Pb ≤ wI (dH )g(dH , µ, A) (9.22)
k d =1
H
in case of a block code of rate k/n, where wI (dH ) is the input weight of error
events having Hamming distance dH . In the original paper by Zehavi [107],
a Chernoff bound on the pairwise error probability has been derived in closed
form for 8-PSK with Gray mapping and a receiver with perfect knowledge
of the channel. It cannot, however, be extended to other mappings or signal
constellations. A more general and very accurate upper bound is instead
derived in [108] based on the Bhattacharyya bound [110].
9.3 – Space-time coding for frequency-flat fading channels 227
9.3 Space-time coding for frequency-flat fading

channels
We now consider systems with multiple antennas at both transmitter and

receiver and codes designed for these applications. The use of multiple re-
ceive antennas to provide diversity has been known for many decades. Only
recently the interest has been attracted by the use of multiple transmit an-
tennas. The number of antennas at the transmitter and receiver (nT and
nR ) depends on the considered application. For cellular systems, the base
station is typically equipped with several antennas while the mobile terminal
can have only one or two antennas. Hence, in the uplink we have nR > nT ,
whereas in the downlink it is nT > nR . On the other hand, in wireless
local-area network applications most nodes will have a similar number of
antennas.
Multiple antennas can be used to increase data rates through multiplexing

or to improve performance through diversity. This is a fundamental trade-
off in multiple-antenna systems [111, 112, 113] and will be studied in detail
later. Multiplexing is obtained by exploiting the MIMO channel to obtain
independent signaling paths that can be used to increase the spectral effi-
ciency [114, 115, 116, 117]. This spectral efficiency increase often relies on
an accurate knowledge of the channel at the receiver, and sometimes at the
transmitter as well, and is obtained at the price of an increase in the receiver
processing (in addition to the cost of deploying multiple antennas). Diversity
is obtained by exploiting the independent fading gains that affect the same
signal and that can be averaged out to increase the reliability of the receiver
decisions.
We have seen that the adoption of a properly designed coded transmission

provides time diversity over fading channels. Time diversity, however, is not
available in systems with a limited mobility. In this case, in fact, the channel
is quasi-static, meaning that the channel variations are very slow compared
with the duration of one codeword. We have already mentioned that, in this
case, receive antenna diversity can be employed. Space-time (ST) codes are
particularly important in such scenarios to also exploit transmit antenna di-
versity and/or the potential increase of the system spectral efficiency related
to MIMO channels.
9.3.1 System model for frequency-flat MIMO channels

and main results on capacity
At discrete time ℓ, the received samples at the output of the nR receive
antennas are collected into an nR × 1 vector rℓ that can be expressed as
√
rℓ = γHℓ cℓ + nℓ , ℓ = 1, . . . , N (9.23)
where cℓ is the nT × 1 vector collecting the modulation symbols transmitted

in parallel by the nT transmit antennas, nℓ is a nR ×1 complex Gaussian noise
vector having independent real and imaginary components and representing
the thermal noise samples at the nR receive antennas, Hℓ is the nR × nT
matrix of the channel gains, its (i, j)-th element hi,j [ℓ] representing the gain
from transmit antenna j to receive antenna i at discrete time ℓ, and γ is a
proper real coefficient.
Matrix Hℓ will be assumed random with zero-mean independent and iden-
tically distributed Gaussian entries having independent real and imaginary
components. Equivalently, we can say that each entry of Hℓ has uniformly-
distributed phase and Rayleigh-distributed magnitude. This choice models
a Rayleigh fading environment with enough separation within the receive
antennas and the transmit antennas such that the channel gains for each
transmit-receive antenna pair are independent. This assumption becomes
questionable as nT and/or nR increase. In fact, it relies on a separation of
the transmit and/or receive antennas by some multiple of the wavelength,
which cannot be obtained when a large number of antennas is packed in a
finite volume. Although the results on MIMO channel capacity that will
be briefly summarized here have been obtained under the assumption of a
Rayleigh fading environment, the results on code design criteria can be easily
generalized to the case of channel gains having Rician-distributed magnitude.
When considering the design of ST codes, we will also assume that transmit-
ted symbols belong to the M-ary complex constellation A, with unit average
energy, and that the noise vector and the channel matrix are such that4
E[nℓ nHℓ ] = InR (uncorrelated noise components on different antennas and
with unit variance) and nT1nR trace(E[Hℓ HH ℓ ]) = 1, i.e., also the elements of
Hℓ have unit variance. Under these assumptions, γ has the meaning of aver-
age SNR per transmit antenna (or per transmitted symbol), i.e., γ = E s /N0 ,
and the average received SNR per receive antenna is given by nT γ.
A ST code with block length N is a set C of nT × N complex matri-
ces (codewords). Codeword matrices C = [c1 , . . . , cN ] are transmitted by
columns, in N consecutive channel uses. The ST code spectral efficiency
4
In the following, In will denote a n × n identity matrix.
is given by η = N1 log2 |C| bits per channel use. By definition, the average
information bit-energy over noise power spectral density ratio is given by
E b /N0 = γnT /η.
Eqn. (9.23) describes the general model for a time-varying frequency-flat
MIMO channel. When N is much larger than the channel coherence time,
each codeword sees a large number of channel realizations. We can assume
that {Hℓ } is an ergodic random process and the channel is consequently
ergodic. In scenarios characterized by a limited mobility, the channel can
be assumed to be slow or quasi-static, i.e., each codeword sees only one
channel realization. In other words, Hℓ = H, ℓ = 1, 2, . . . , N. In this case,
this fading model is nonergodic. A different model for time-varying fading
channels has been introduced by Marzetta and Hochwald in [118]. They
considered a block-fading channel constant for L consecutive channel uses
and independent from block to block, modeling, as an example, a system
with quasi-static fading and frequency hopping every L channel uses. This
case will be denoted as block fading channel.
Different assumptions can be made on the knowledge of the channel gain
matrix at the transmitter and receiver. For a quasi-static channel, it is
generally assumed that H is perfectly known at the receiver since the channel
gains can be obtained fairly easily by sending a pilot sequence for channel
estimation (see [119, Section 3.9] or [120, Section 10.1]). On the contrary,
the assumption of perfect knowledge of the channel matrix at the transmitter
holds only if a delay-free error-free feedback link from receiver to transmitter
exists, allowing the receiver to send back the estimated channel gains, or if
time-division duplexing is used, where each end can estimate the channel
from the incoming signal in the reverse direction (channel reciprocity). On
the contrary, on a block fading channel, the assumption adopted in [118] is of
absence of knowledge of the channel gains at both transmitter and receiver.
The case of perfect knowledge of the channel gains at both transmitter
and receiver is of scarce interest in this section devoted to ST codes since, in
this case, through simple transmit precoding and receive filtering the MIMO
channel can be decomposed into a set of parallel and independent SISO
channels. Consider for example, the quasi-static channel and the singular
value decomposition of matrix H as [121]
H = UΣVH (9.24)
where the nR × nR matrix U and the nT × nT matrix V are unitary matrices

and Σ is a nR × nT diagonal matrix collecting the singular values {σi } of
H.5 We can assume that the input vector cℓ is obtained from a vector čℓ
5
The number of singular values is RH = rank(H) ≤ min(nT , nR ). The case of RH =
through the linear transformation cℓ = VH čℓ . At the receiver, vector rℓ is

still linearly tranformed obtaining the vector řℓ = UH rℓ . Thus, the following
equivalent channel results:
√
řℓ = UH rℓ = UH ( γHcℓ + nℓ )
√
= UH ( γUΣVH cℓ + nℓ )
√
= γΣčℓ + ňℓ (9.25)
having defined ňℓ = UH nℓ which is statistically equivalent to nℓ , being U

a unitary matrix. We thus have a set of RH parallel independent channels,
each corresponding to a nonzero singular value of H for which classical results
apply [122]. In particular, the optimal capacity-achieving power distribution
can be obtained through waterfilling (see Appendix C).6 Since these parallel
channels do not interfere with each other, the optimal demodulator com-
plexity is linear in RH . Moreover, when independent data are sent over the
parallel channels, the MIMO channel can support RH times the data rate of
a SISO system. A multiplexing gain of RH is thus obtained, although the
performance over each channel depends on the corresponding gain σi . Hence,
for this reason, in the following we will consider the more interesting case of
knowledge of the channel gains at the receiver only.
The case when each channel use, that is each transmission of one sym-
bol from each of the nT transmit antennas, corresponds to an independent
realization of Hℓ , has been studied in [117]. Even if the channel realization
is not known at the transmitter, it can be proved [117] that, in the asymp-
totic limit of a large number of transmit and receive antennas, the average
capacity of a MIMO channel still grows linearly with ξ = min(nT , nR ), as
long as the channel can be accurately estimated at the receiver. Moreover,
this linear growth of capacity with ξ is observed even for a small number of
antennas [124]. Similarly, for large values of the signal-to-noise ratio, capac-
ity also grows linearly with ξ.7 In particular, it grows as ξ log γ. In other
words, even in the absence of knowledge of the channel at the transmitter,
we can say that multiple antennas increase the capacity by a factor ξ as in
the case of independent parallel channels. This explains why ξ is often called
the number of degrees of freedom generated by the MIMO channel.
In the case of a quasi-static fading channel, when H is chosen randomly
at the beginning of the transmission and remains fixed for all channel uses,
min(nT , nR ) is often referred to as a rich scattering environment.
6
When the channel is time-varying, waterfilling across time should be also used [123].
7
It must be noted, however, that at very low signal-to-noise ratio values, transmit
antennas are not beneficial since capacity only scales with the number of receive antennas
independently of the number of transmit antennas [117].
average capacity has no meaning (is strictly zero), as the channel is noner-
godic [125]. In this case, outage probability, defined as the probability that
the transmission rate exceeds the mutual information of the channel, must
be evaluated. The maximum rate that can be supported by the channel with
a given outage probability is the outage capacity (see Appendix C). As in the
case of ergodic channels, for a given outage probability, the outage capacity
increases linearly with ξ.
Finally, in the case of a block fading channel with coherence time of L
symbols and absence of knowledge of the channel at both transmitter and
receiver [118, 126], when L ≥ γ + nR , at high SNR values the capacity (in
bits per channel use) can be approximated as

ξ
C ≃ξ 1− log2 γ .
L
Hence, for L → ∞ the capacity of the noncoherent MIMO channel ap-
proaches that of the coherent channel. However, when L < γ + nR the
capacity increases as ς 1 − Lς log2 γ, where ς = min(nT , nR , ⌊L/2⌋). As a
consequence, it is not convenient to have a number of transmit antennas
larger that ⌊L/2⌋, although, when fading is correlated, additional transmit
antennas do increase capacity [127].
9.3.2 ST codeword design criteria for slow fading chan-

nels
We now consider the case of a slow or quasi-static fading channel where
H is random but constant over N ≫ max{nT , nR } channel uses, and we
assume that the receiver knows H perfectly, while the transmitter has no
knowledge of H. Having collected the codewords in proper matrices {C},
we can similarly organize the corresponding received samples and the noise
samples in two nR × N matrices R and N, respectively, whose ℓ-th columns
are composed by the nR received samples and noise samples at time ℓ. Hence,
we may write
√
R = γHC + N . (9.26)
Under these assumptions, a maximum likelihood decoder will operate follow-
ing the decision rule
Ĉ = argmax f (R|H, C) (9.27)
C
and f (R|H, C) is clearly a Gaussian joint probability density function. Hence,
N
X √ √
Ĉ = argmin krℓ − γHcℓ k2 = argmin kR − γHC)k2 (9.28)
C C
ℓ=1
where k · k denotes the Frobenius norm of a matrix (i.e., the square root of
the sum of the square magnitudes of its elements). Hence,
n given a particular
o
channel realization, the pairwise error probability Pr C → Ĉ|H can be
computed as
n o r
γ
Pr C → Ĉ|H = Q kH(Ĉ − C)k
2
1 n γ o
2
≤ exp − kH(Ĉ − C)k . (9.29)
2 4
Denoting by hi , i = 1, 2, . . . , nR , the i-th row of matrix H and defining

A = (Ĉ − C)(Ĉ − C)H , we may write
h i X
nR
2 H H
kH(Ĉ − C)k = trace H(Ĉ − C)(Ĉ − C) H = hi AhH
i . (9.30)
i=1
Matrix A is a nonnegative-definite Hermitian matrix. Hence, it can be diag-

onalized using a unitary matrix U, i.e., A = UΛUH , where Λ is a diagonal
matrix whose elements are the (nonnegative) eigenvalues λi , i = 1, 2, . . . , nT
of A. Thus
nR
X
2
kH(Ĉ − C)k = hi UΛUH hH
i . (9.31)
i=1
Since the components of hi are independent complex Gaussian random vari-

ables and matrix U is unitary, hi and pi = hi U are statistically equivalent.
Hence, the elements of pi are still independent complex Gaussian random
variables with the same mean and variance of the elements of hi . We will
denote by pi,j the j-th element of pi . We thus have
nR X
X nT
2
kH(Ĉ − C)k = λj |pi,j |2 (9.32)
i=1 j=1
Let us denote by ν the number of nonzero eigenvalues of A, i.e., ν =

rank(A) ≤ nT (assuming N ≥ nT , that is codewords of length greater or
equal to the number of transmit antennas). Assuming that the eigenvalues
of A are ordered in such a way λi ≥ λi+1 , we may write
nR X
X ν
2
kH(Ĉ − C)k = λj |pi,j |2 (9.33)
i=1 j=1
and
( )
n o 1 γX
nR X ν
Pr C → Ĉ|H ≤ exp − λj |pi,j |2
2 4 i=1 j=1
1Y
nR Y ν n γ o
= exp − λj ρi,j (9.34)
2 i=1 j=1 4
having defined the random variable ρi,j = |pi,j |2 whose probability density
function is (
e−ρ for ρ ≥ 0
f (ρ) = (9.35)
0 for ρ < 0 .
We may thus compute the average pairwise error probability 8
n o n n oo
Pr C → Ĉ = EH Pr C → Ĉ|H
1Y
nR Yν ˆ n γ o
≤ exp − λj ρi,j f (ρi,j ) dρi,j
2 i=1 j=1 4
" ν #−nR
1 Y γ
nR Y ν
1Y 1
= = 1 + λj (9.36)
2 i=1 j=1 1 + γ4 λj 2 j=1 4
which, asymptotically (for γ → ∞), becomes

" ν #−nR " ν #−nR −νnR
n o 1 Y γ −νnR 1 Y Es
Pr C → Ĉ . λj = λj .
2 j=1 4 2 j=1 4N0
(9.37)
By considering now upper bounds of the symbol or bit error probability, as
an example obtained through the technique described in Chapter 2, it is clear
that the total diversity order (the slope of the symbol or bit error probability
curve in a log-log plot) of the coded system is thus νmin nR (having denoted
by νmin the minimum valueQof ν). As a secondary merit criterion, we should
try to maximize the term νj=1 λj on error events with minimum diversity.
By using a notation already employed for TCM over fading channels, we will
call this term coding gain. It displaces the error Qν probability curve instead
of changing its slope. Since ν = rank(A) and j=1 λj = det(A), the basic
code design principles for ST codes over slow frequency-flat Rayleigh fading
channels are the following [111]:
8
The extension to the case of a Rician fading channel is straightforward.
Rank criterion: The maximum diversity of nT nR is achieved by ensuring

that the matrix9
A = (Ĉ − C)(Ĉ − C)H
is full-rank for all the pairs of distinct codewords Ĉ and C. Otherwise,
if the minimum rank of A among all codeword pairs is νmin ≤ nT , a
diversity order νmin nR is achieved.
Determinant criterion: In order to obtain the maximum possible coding

advantage, the minimum determinant of matrices A having minimum
rank should be maximized.
These design principles, also known as Tarokh-Seshadri-Calderbank criteria,

are based on the asymptotic receiver performance. Let us consider
P (9.36)
and, in particular the term in square brackets. Considering that νj=1 λj =
trace(A), it can be expressed as
ν
γ γ ν Y
Y ν ν
γX
1 + λj = 1 + λj + · · · + λj
j=1
4 4 j=1 4 j=1
γ γ ν
= 1 + trace(A) + · · · + det(A) (9.38)
4 4
stating that, for low values of γ (γ ≪ 1), the pairwise error probability is
governed essentially by trace(A) instead of by det(A). Note that
h i
trace(A) = trace (Ĉ − C)(Ĉ − C) = kĈ − Ck2 .
H
that is the squared Euclidean distance between C and Ĉ. This is somehow
expected since for low values of the signal-to-noise ratio the performance
is governed by the additive noise rather than the fading. Thus, the error
probability curve changes its behavior from a waterfall shape (for small values
of γ) to a linear shape (for high values of γ) where the performance is governed
by the above-mentioned rank-determinant design principles.
For large values of νnR , let’s say νmin nR ≥ 4, this linear behavior is
observed for error probability values so small that a code design based on
the asymptotic behavior is highly suboptimal for error probability values of
interest [128, 129]. For these values, the pairwise error probability can be
obtained by examining the asymptotic behavior for νnR → ∞. Let us come
9
Note that A and Ĉ−C have the same rank. Hence, this criterion could be equivalently
enunciated with reference to matrix Ĉ − C.
back to (9.33) and consider that, for the law of large numbers10
ν
X
kH(Ĉ − C)k2 → nR λj = nR trace(A) = nR kĈ − Ck2 .
j=1
Hence ( )
n o 1 γnR kĈ − Ck2
Pr C → Ĉ ≤ exp − .
2 4
The following alternative code design principle thus results:
Euclidean distance criterion: When the matrix
A = (Ĉ − C)(Ĉ − C)H
has rank at least 4 for all the pairs of distinct codewords Ĉ and C, the
minimum trace of matrices A, which is the minimum squared Euclidean
distance between Ĉ and C, should be maximized.
In practice, when νnR ≥ 4, an optimum code for AWGN channels, whose

codewords are properly formatted in nT × N matrices, can be adopted.
9.3.3 ST codeword design criteria for fast fading chan-

nels
Let us now consider the case of a fast frequency-flat fading channel. Under
the assumption that the receiver perfectly knows the sequence of channel
matrices {Hℓ }, the decision rule will be
N
X √
Ĉ = argmin krℓ − γHℓ cℓ k2 (9.39)
C
ℓ=1
and the pairwise error probability given the channel realization is

r N !
n o γX
Pr C → Ĉ|{Hℓ } = Q kHℓ (ĉℓ − cℓ )k
2 ℓ=1
( N
)
1 γX 2
≤ exp − kHℓ (ĉℓ − cℓ )k (9.40)
2 4
ℓ=1
10
Other design criteria are discussed in [130, 131].
where we may express
kHℓ (ĉℓ − cℓ )k2 = trace[Hℓ (ĉℓ − cℓ )(ĉℓ − cℓ )H HH

ℓ ].
Let us now consider the matrix (ĉℓ − cℓ )(ĉℓ − cℓ )H . Being a nonnegative-

definite Hermitian matrix, it can be diagonalized using a unitary matrix Uℓ ,
i.e., (ĉℓ − cℓ )(ĉℓ − cℓ )H = Uℓ Λℓ UH ℓ where Λℓ is a diagonal matrix whose
elements are the nonnegative eigenvalues of (ĉℓ − cℓ )(ĉℓ − cℓ )H . However,
being (ĉℓ −cℓ )(ĉℓ −cℓ )H of rank one, only one nonzero eigenvalue results. Let
us denote by λ1,ℓ this eigenvalue and by u1,ℓ the corresponding eigenvector.
By using the property that the sum of the eigenvalues is equal to the trace
of the matrix, we have that

λ1,ℓ = trace (ĉℓ − cℓ )(ĉℓ − cℓ )H = kĉℓ − cℓ k2 .
Hence
kHℓ (ĉℓ − cℓ )k2 = trace[p1,ℓ λ1,ℓ pH 2

1,ℓ ] = kĉℓ − cℓ k kp1,ℓ k
2
(9.41)
having defined p1,ℓ = Hℓ u1,ℓ , and

( )
n o 1 γX
N
Pr C → Ĉ|{Hℓ } ≤ exp − kĉℓ − cℓ k2 kp1,ℓ k2
2 4 ℓ=1
1Y
N n γ o
= exp − kĉℓ − cℓ k2 kp1,ℓ k2
2 ℓ=1 4
1Y n γ o
= exp − kĉℓ − cℓ k2 kp1,ℓ k2 (9.42)
2 ℓ∈I 4
where, as in Section 9.1, we denoted by I the set of all 1 ≤ ℓ ≤ N such that

6 0. Being u1,ℓ an eigenvector, p1,ℓ is statistically equivalent to
kĉℓ − cℓ k =
one column of Hℓ , i.e., its components are independent complex Gaussian
random variables with mean zero and unit variance.
Let us now assume that the channel coefficients {Hℓ } are independent of
each other (fast fading or ideal channel interleaving). To compute the average
pairwise error probability we will consider the cases of small and large values
of |I|nR . In the former case, the average pairwise error probability is easily
obtained as
n o 1 Y γ −nR
2
Pr C → Ĉ ≤ 1 + kĉℓ − cℓ k (9.43)
2 ℓ∈I 4
which, asymptotically (for high signal-to-noise ratio values), becomes

!−nR
1 γ −|I|nR Y
Pr{C → Ĉ} . kĉℓ − cℓ k2 . (9.44)
2 4
ℓ∈I
Hence, the basic code design principles over fast frequency-flat fading chan-
nels are thus the following:
Code diversity criterion: The minimum diversity |I| between all pairs of
distinct codewords must be maximized.
Coding gain criterion: In orderQto obtain the maximum possible coding

advantage, the coding gain ℓ∈I kĉℓ − cℓ k2 over error events having
minimum diversity should be maximized.
For large values of |I|nR , the average pairwise error probability can be ob-
tained by examining the asymptotic behavior for |I|nR → ∞. From (9.41)
and the law of large numbers
kHℓ (ĉℓ − cℓ )k2 → kĉℓ − cℓ k2 nR (9.45)
and ( )
n o 1 γ X
Pr C → Ĉ ≤ exp − nR kĉℓ − cℓ k2 .
2 4
ℓ∈I
The following alternative code design principle thus results:

Euclidean distance criterion: For large values of the product between
the number of receive antennas nR and the minimum diversity |I| be-
tween
pP all pairs of distinct codewords, the minimum Euclidean distance
ℓ∈I kĉℓ − cℓ k between all pairs of distinct codewords should be
2
maximized.
9.3.4 First naïve scheme: delay diversity

One of the first proposed ST codes for quasi-static fading channels is the
delay diversity scheme [132]. This scheme employs a rate-1/nT repetition
code where each symbol is transmitted from a different antenna after being
delayed. In other words, assuming, as an example, nT = 2, this scheme
transmits the same information from both antennas simultaneously but with
a delay of one symbol. The codewords are thus of the form

c1 c2 c3 . . .
C= .
0 c1 c2 . . .
h1 [ℓ]
h2 [ℓ]
cℓ
TX RX ĉℓ
hnR [ℓ]
Figure 9.7: System with receive diversity.
Although not optimized in the sense of the determinant criterion, it is easy to

verify that for all the pairs of distinct codewords, the matrix (Ĉ − C) always
has rank nT . Hence, the maximum diversity of nT nR is obtained. This
is also intuitive since each symbol traverses nT nR paths. This maximum
diversity is obtained at the cost of having a rate of only one symbol per
channel use. In practice, this scheme transforms the frequency-flat channel
into a channel with intersymbol interference (and hence a frequency-selective
channel). Optimal decoding may be performed by using the Viterbi algorithm
or through the suboptimal reduced-complexity schemes described in Section
9.3.7.
9.3.5 ST block codes

These codes were introduced to provide transmit diversity for quasi-static
frequency-flat fading channels. When employed with multiple receive anten-
nas, receive diversity is also obtained in addition to transmit diversity. The
first and well-known ST block code is that proposed by Alamouti [133] for
the case of two transmit antennas.
Before describing it into details, let us consider the case of an uncoded
system with receive diversity only (nT = 1) shown in Fig. 9.7. This system
will be employed for a comparison with the systems described later. From
(9.23), the channel model becomes
√
rℓ = γhℓ cℓ + nℓ , ℓ = 1, 2, . . . , N
since the channel matrix is now a vector hℓ of nR components. The optimal
detection strategy, under the assumption of perfect knowledge of the channel
coefficients, can be easily derived (see also Section 9.1.2 for the case of a
coded transmission). In fact, being the system memoryless (uncoded system
and perfect channel knowledge) we have
√ √ 2 2

ĉℓ = argmin krℓ − γhℓ cℓ k2 = argmin hH ℓ r ℓ − γkhℓ k c ℓ
cℓ cℓ
√
H ∗ γ 2 2
= argmax ℜ[hℓ rℓ cℓ ] − khℓ k |cℓ | . (9.46)
cℓ 2
It is thus clear, from this latter expression, that the optimal decision rule
linearly combines the received samples of different antennas after co-phasing
and weighting them with their respective channel gains. Samples from anten-
nas experiencing better channel gains (and thus higher signal-to-noise ratio
values) are emphisized more than others, and this is intuitive since they are
more reliable. This detection strategy is commonly known as maximal-ratio
combining detection.
It is easy to verify that it is the same optimal strategy corresponding to
the following equivalent single-input single-output channel
√
řℓ = hH
ℓ rℓ = γkhℓ k2 cℓ + ňℓ (9.47)
where, given the channel gains, the noise term ňℓ is still Gaussian with
variance khℓ k2 . Under the hypothesis that the components of hℓ are in-
dependent and identically distributed Gaussian random variables with mean
zero and unit variance (Rayleigh fading environment), the random variable
αℓ = γkhℓ k2 , representing the instantaneous signal-to-noise ratio, is chi-
squared distributed with 2nR degrees of freedom [134]. Its probability density
function is thus given by
( n −1 − α
α R e γ
n for α ≥ 0
f (α) = γ R (nR −1)! . (9.48)
0 for α < 0
The average symbol error probability can thus be easily computed. From
the equivalent channel model (9.47), considering, as an example, a BPSK
modulation whose bit error√probability for a given value of the instantaneous
signal-to-noise ratio is Q( 2α), we obtain the following expression for the
average bit error probability:
ˆ ∞ √
Pb = Q( 2α)f (α) dα .
−∞
A closed form expression for this probability exists and reads [135, p. 781]
r nR nXR −1
r m
1 γ nR − 1 + m 1 γ
Pb = 1− 1+ .
2 1+γ m=0
m 2 1 + γ
(9.49)
h1,1
r̃[1] ĉ1
c1 , −c∗2 DEC
c1 , c2 LIN
TX h1,2
r[1], r[2] r̃[2] ĉ2
COMB
DEC
c2 , c∗1
Figure 9.8: Alamouti scheme with nT = 2 and nR = 1.
√
A simpler upper bound can be found by considering that Q( 2α) ≤ 21 e−α
and hence
1 ∞ −α 1 1
ˆ
Pb ≤ e f (α) dα = (9.50)
2 −∞ 2 (1 + γ)nR
that, for γ → ∞, gives
1 1
Pb . (9.51)
2 γ nR
which clearly shows that a diversity order of nR is achieved.
The motivation behind the Alamouti scheme is thus the following. In a
cellular system, the base station can be easily equipped with multiple an-
tennas with sufficient separation among them. Hence, the technique just
described can be conveniently adopted in the uplink. On the contrary, since
at the mobile terminal it is difficult to place multiple antennas, receive diver-
sity can hardly be employed. The aim of the scheme proposed by Alamouti
is thus of obtaining transmit diversity when there are two transmit antennas.
ST block codes represent a generalization of the Alamouti scheme to the
case of nT > 2. Although they provide full diversity, there is no coding
advantage provided by ST block codes.11 However, optimal decoding can be
performed efficiently through a simple linear processing of the samples at the
output of the receive antennas.
The Alamouti scheme. Let us consider for now the case of a channel
with nT = 2 transmit antennas and nR = 1 receive antenna shown in Fig. 9.8.
The codewords have length N = 2 and the channel, perfectly known at the
receiver, is assumed to remain the same over two consecutive time intervals
(quasi-static fading over N = 2 symbol intervals). In the two considered
symbol intervals, it will be described by the matrix
11
To achieve an additional coding gain, one should concatenate an outer code with an
inner ST block code [136, 137, 138].
H = [h1,1 , h1,2 ] .
The codeword matrices are of the form

c1 −c∗2
C=
c2 c∗1
meaning that, during the first interval, symbol c1 is transmitted from the first
antenna and symbol c2 from the second antenna whereas, during the second
interval, symbol −c∗2 is transmitted from the first antenna and symbol c∗1 from
the second antenna. A rate of one symbol per channel use is thus achieved.
The corresponding received samples in the two intervals are12
√
r[1] = γ(h1,1 c1 + h1,2 c2 ) + n[1]
√
r[2] = γ(−h1,1 c∗2 + h1,2 c∗1 ) + n[2] . (9.52)
If we thus consider the vector ř = (r[1], r[2]∗)T , it can be expressed as

√ c
ř = γ Ȟ 1 + ň (9.53)
c2
where
h h1,2
Ȟ = ∗1,1
h1,2 −h∗1,1
and ň = (n[1], n[2]∗ )T is statistically equivalent to the vector (n[1], n[2])T .
An alternative set of sufficient statistics is represented by
r̃ = (r̃[1], r̃[2])T = ȞH ř (9.54)
since it can be obtained through a linear transformation of the vector ř. It is

easy to verify that ȞH Ȟ = (|h1,1 |2 + |h1,2 |2 )I2 . Defining ñ = (ñ[1], ñ[2])T =
ȞH ň, we have
√
r̃[1] = h∗1,1 r[1] + h1,2 r[2]∗ = γ |h1,1 |2 + |h1,2 |2 c1 + ñ[1]
√
r̃[2] = h∗1,2 r[1] − h1,1 r[2]∗ = γ |h1,1 |2 + |h1,2 |2 c2 + ñ[2] . (9.55)
As mentioned, the channel is assumed perfectly known at the receiver and,

given the channel coefficients, ñ is still a Gaussian vector with uncorrelated
[being ȞH Ȟ = (|h1,1 |2 + |h1,2 |2 )I2 ] and thus independent components having
12
In order to avoid confusion between the time interval and the antenna index, in the
following the time index in receive and noise samples will be kept in square brackets.
mean zero and variance (|h1,1 |2 + |h1,2 |2 ). Decisions on symbols c1 and c2 can
thus be obtained by adopting the following symbol-by-symbol rules:
√
ĉ1 = argmin r̃[1] − γ |h1,1 |2 + |h1,2 |2 c1
c1
√
ĉ2 = argmin r̃[2] − γ |h1,1 |2 + |h1,2 |2 c2 . (9.56)
c2
In other words, after a proper linear combining of the received samples, de-
tection on symbols c1 and c2 can be decoupled. For this reason, the Alamouti
scheme is called an orthogonal design.
This scheme can be generalized to the case of multiple receive antennas.
Denoting by ri [1] and ri [2] two consecutive samples at the output of antenna
i, i = 1, 2, . . . , nR , following the same steps of the case for nR = 1 after linear
combining and normalization we have the following samples
√
r̃i [1] = h∗i,1 ri [1] + hi,2 ri [2]∗ = γ |hi,1 |2 + |hi,2|2 c1 + ñi [1]
√
r̃i [2] = h∗i,2 ri [1] − hi,1 ri [2]∗ = γ |hi,1 |2 + |hi,2 |2 c2 + ñi [2] (9.57)
where ñi [1] and ñi [2] are independent Gaussian noise samples with variance
(|hi,1 |2 + |hi,2 |2 ). Optimal decision on symbols c1 and c2 can thus be obtained
through maximal-ratio combining. After straightforward manipulations, we
obtain
ĉ1 = argmax f (r̃1 [1], r̃2 [1], . . . , r̃nR [1]|c1 , h1,1 , h1,2 , h2,1 , h2,2 , . . . , hnR ,1 , hnR ,2 )
c1
nR
Y
= argmax f (r̃i [1]|c1 , hi,1 , hi,2 )
c1
i=1
nR
" nR
#
√ X X
= argmin γ|c1 |2 (|hi,1 |2 + |hi,2 |2 ) − 2ℜ c∗1 r̃i [1]
c1
i=1 i=1
n 2
X R
√
2 2
= argmin r̃i [1] − γc1 (|hi,1 | + |hi,2 | ) (9.58)
c1
i=1
ĉ2 = argmax f (r̃1 [2], r̃2 [2], . . . , r̃nR [2]|c2 , h1,1 , h1,2 , h2,1 , h2,2 , . . . , hnR ,1 , hnR ,2 )
c2
nR
Y
= argmax f (r̃i [2]|c2 , hi,1 , hi,2 )
c2
i=1
n 2
X R
√
2 2
= argmin r̃i [2] − γc2 (|hi,1 | + |hi,2 | ) . (9.59)
c2
i=1
Again, decisions are decoupled.

The performance analysis of this scheme is quite simple. From (9.58) and
(9.59), in fact, it is clear that a decision on symbol cℓ , ℓ = 1, 2 is obtained
from the following equivalent single-input single-output channel
nR
X nR
X
√
r̃[ℓ] = r̃i [ℓ] = γcℓ |hi,1 |2 + |hi,2 |2 + ñ[ℓ] (9.60)
i=1 i=1
P R
having defined ñ[ℓ] = ni=1 ñi [ℓ]. Given the channel P gains, samples {ñ[ℓ]}
are jointly Gaussian, independent, and with variance ni=1 R
(|hi,1 |2 + |hi,2 |2 ).
Comparing (9.60) with (9.47), it is thus clear that the Alamouti scheme with
nT = 2 transmit antennas and nR receive antennas is perfectly equivalent to
a scheme with nT = 1 transmit antenna and 2nR receive antennas and using
maximal-ratio combining, provided that the same value of γ is employed,
i.e., provided that the same power per transmit antenna is spent (meaning
that for an equal overall transmitted power, the performance of the Alamouti
scheme exhibits a 3-dB degradation). It is thus also clear that the Alamouti
scheme achieves full diversity (diversity 2nR ). This can be easily verified by
considering two distinct codewords

c1 −c∗2 ĉ1 −ĉ∗2
C= , Ĉ =
c2 c∗1 ĉ2 ĉ∗1
and computing the matrix

|ĉ1 − c1 |2 + |ĉ2 − c2 |2
H 0
A = (Ĉ − C)(Ĉ − C) =
0 |ĉ1 − c1 |2 + |ĉ2 − c2 |2
that clearly has full diversity provided that Ĉ 6= C.
Orthogonal ST block codes. The Alamouti scheme has been designed for
nT = 2 transmit antennas. Orthogonal ST block codes [139, 140] represent
an extension for the case nT > 2.
In the general case of nT transmit antennas, in order to design a code
having a rate of 1 symbol per channel use and full diversity, we need to
design a set of nT × nT (squared) matrices, with elements from the employed
constellation, whose rows are orthogonal to each other. This latter property,
in fact, will ensure that an optimal receiver can be designed based on a
linear processing plus symbol-by-symbol detection. Unfortunately, it is not
possible to always find such an orthogonal design. For real constellations (as
an example M-PAM), it exists for nT = 2, 4, and 8, only. As an example, for
nT = 4, the corresponding orthogonal design is that using codeword matrices
of the form  
c1 −c2 −c3 −c4
c2 c1 c4 −c3 
C=
c3 −c4 c1
. (9.61)
c2 
c4 c3 −c2 c1
It is easy to prove, proceeding as done for the Alamouti code, that it achieves
full diversity. It also has a rate of 1 symbol per channel use since 4 symbols
are transmitted in 4 time slots.
On the other hand, for complex constellations, there exists a unique or-
thogonal design for nT = 2 (that proposed by Alamouti). It is, however,
possible to find many orthogonal designs by removing some of the mentioned
constraints. As an example, for nT = 4, the code having codewords
 
c1 −c2 −c3 −c4 c∗1 −c∗2 −c∗3 −c∗4
c2 c1 c4 −c3 c∗2 c∗1 c∗4 −c∗3 
C= c3 −c4 c1
. (9.62)
c2 c∗3 −c∗4 c∗1 c∗2 
c4 c3 −c2 c1 c∗4 c∗3 −c∗2 c∗1
achieves full diversity [as can be easily proved by computing matrix A =
(Ĉ − C)(Ĉ − C)H ] but has a rate of 1/2 symbol per channel use since 4
symbols are transmitted in 8 time slots.
A mathematical framework to describe the general class of linear orthog-
onal designs has been proposed in [141]. The nT ×N matrices {C} describing
an orthogonal space-time block code and used to transmit K symbols (thus
achieving a rate of K/N symbols per channel use) can be expressed in the
form
XK
C= (ck Ak + c∗k Bk ) (9.63)
k=1
where Ak and Bk are proper nT × N matrices. That is, all elements of C are
linear combinations of the symbols {ck }K
k=1 being transmitted and/or their
conjugates. As an example, the Alamouti code can be described by using
this framework with nT = N = K = 2 and

1 0 0 0 0 0 0 −1
A1 = , A2 = , B1 = , B2 = .
0 0 1 0 0 1 0 0
Clearly, matrices {C} must satisfy the property that their rows are or-
thogonal, that is CCH is a diagonal matrix with strictly positive elements.
More precisely, the following condition must hold
K
X
CCH = Dk |ck |2 (9.64)
k=1
where Dk is a diagonal matrix with strictly positive elements. As demon-

strated in [141], this can be equivalently expressed in terms of matrices Ak
and Bk as:
Ak AH H
m + Bm Bk = δk−m Dk (9.65)
Ak BH H
m + Am Bk = 0 (9.66)
where δk−m is the Kronecker delta. In the case of the Alamouti code, property
(9.64) and conditions (9.65) and (9.66) can be easily verified.
This framework is very useful to describe the decoding algorithm. Con-
sidering the samples ri [ℓ], ℓ = 1, 2, . . . , N, received by antenna i, they can be
collected in a row vector
ri = [ri [1], ri [2], . . . , ri [N]] (9.67)
that can be expressed as
√
ri = γhi C + ni (9.68)
where hi is the i-th row of the channel matrix H (supposed known at the
receiver and constant for N consecutive samples) and ni is a row vector
collecting the noise samples at the output of antenna i. In other words, hi
is a row vector that collects the channel gains from all transmit antennas to
receive antenna i. The detection strategy can be expressed in the form
XnR XnR
√ √ √
Ĉ = argmin kri − γhi Ck2 = argmin [ri − γhi C] [ri − γhi C]H
C C
i=1 i=1
nR
X √
= argmin γhi CCH hH H H
i − 2 γℜ ri C hi . (9.69)
C
i=1
Taking into account (9.63) and (9.64), we obtain

nR X
X K
√
Ĉ = argmin γhi Dk hH 2 H H ∗ H H
i |ck | − 2 γℜ ri Ak hi ck + ri Bk hi ck
C
i=1 k=1
nR X
X K
√
= argmin γhi Dk hH
i |c k | 2
− 2 γℜ r i A H H ∗
h c
k i k + hi B k r H ∗
c
i k .
C
i=1 k=1
(9.70)
It is thus clear that decisions on symbols {ck }K
k=1 can be decoupled in the
following symbol-by-symbol rules:
nR
X √
ĉk = argmin γhi Dk hH 2 H H ∗ H ∗
i |ck | − 2 γℜ ri Ak hi ck + hi Bk ri ck
ck
i=1
√ 2
= argmin r̃k − γξk2 ck , k = 1, 2 . . . , K. (9.71)
ck
having defined
nR
X
r̃k = ri A H H H
k hi + hi Bk ri (9.72)
i=1
nR
X
ξk2 = hi Dk hH
i . (9.73)
i=1
This detection strategy is that corresponding to an equivalent single-input

single-output channel. In fact, substituting (9.68) in (9.72), and using (9.63),
(9.65), and (9.66), we obtain
nR
X
r̃k = ri A H H H
k hi + hi Bk ri (9.74)
i=1
nR
X √ √
= ( γhi C + ni )AH H
k hi + hi Bk ( γhi C + ni )
H
i=1
nR
X nR
X
√
= γck hi Dk hH
i + ni AH H H
k hi + hi Bk ni
i=1 i=1
√
= γck ξk2 + ñk (9.75)
having defined
nR
X
ñk = ni AH H H
k hi + hi Bk ni
i=1
whose variance, given hi and taking into account that the noise samples at
the output of antenna i are uncorrelated and with unit variance, is ξk2. Hence,
the detection strategy (9.71) can be considered as derived from the equivalent
single-input single-output channel model (9.75) and the performance analysis
carried out accordingly as for the Alamouti scheme, easily verifying that these
schemes achieve full diversity. This can be also verified by considering two
distinct codewords
K
X
C= (ck Ak + c∗k Bk )
k=1
XK
Ĉ = (ĉk Ak + ĉ∗k Bk ) (9.76)
k=1
PK
and verifying that the matrix (Ĉ − C)(Ĉ − C)H = k=1 |ĉk − ck |
2
Dk has full
rank provided that Ĉ 6= C.
Quasi-orthogonal ST block codes. As mentioned, for complex constel-

lations, the only orthogonal design is that proposed by Alamouti for nT = 2.
It provides full diversity and transmission rate of 1 symbol per channel use.
ST block codes with rate of 1 symbol per channel use can be obtained, as pro-
posed in [142], by using the Alamouti’s orthogonal design as a building block
(or another orthogonal design in case of real constellations) but clearly giv-
ing up the orthogonality of the resulting ST code and only providing partial
diversity.
To illustrate the main ideas beyond this “quasi-orthogonal” design, let us
consider how to design a ST block code for nT = 4 by properly employing
two Alamouti’s codewords, one denoted as C12 for transmitting symbols c1
and c2 , i.e.,
c1 −c∗2
C12 =
c2 c∗1
and a second one, denoted as C34 , for transmitting symbols c3 and c4 , i.e.,

c3 −c∗4
C34 = .
c4 c∗3
The resulting codeword will be obtained through an orthogonal design in-

volving these two matrices:
 
c1 −c∗2 −c∗3 c4
C12 −C∗34 c2 c∗1 −c∗4 −c3 
C= =c3 −c∗4 c∗1

C34 C12 ∗
−c2 
c4 c∗3 c∗2 c1
It is easy to prove that this code does not achieve full diversity (it achieves
diversity 2nR when nR receive antennas are employed). Although not all its
rows are orthogonal, we can observe that the first and fourth columns are
orthogonal to the second and third ones. Hence, through a proper linear
processing it is possible to decouple the decisions on symbols c1 and c4 from
those on symbols c2 and c3 . The decisions on symbols c1 and c4 and those
on symbols c2 and c3 must be performed jointly, thus increasing the receiver
complexity with respect to that of orthogonal ST codes.
Linear dispersion codes. The same mathematical framework (9.63) em-

ployed to describe linear orthogonal designs can be adopted to describe the
quasi-orthogonal ST block codes but also another class of ST codes called
linear dispersion codes. These codes have rate larger than one symbol per
channel use since for them it can be K > N. Obviously, this time, constraints
(9.65) and (9.66) no longer hold and optimal decoding becomes prohibitive.
Suboptimal decoding techniques, such as those mentioned in the following
Section 9.3.7, can be adopted. Regarding the code design or, in other words,
the design of matrices Ak and Bk , in [143] a technique is proposed aimed at
maximizing the mutual information between the input and the output of the
channel.
9.3.6 ST trellis codes

Another important class of codes is represented by ST trellis codes originally
proposed in [111]. They are the natural extension of TCMs to MIMO chan-
nels—the only difference is represented by the fact that each trellis branch is
labeled with a vector of nT symbols that are transmitted in parallel by the
nT transmit antennas. More precisely, they are multiple TCMs whose trellis
branch is associated with nT symbols belonging to a given M-ary constella-
tion that are transmitted in parallel over the nT transmit antennas instead
of sequentially. A memory is thus introduced with the aim of obtaining a
coding advantage in addition to a code diversity at the price of an increased
decoding complexity.
In general, the code will have S states. A rate of η bits per channel
use is obtained when a trellis with 2η branches departing from each state
is employed. We already discussed the different design criteria of the ST
codewords. ST trellis codes are designed accordingly. In the case of quasi-
static fading, for ST trellis codes for nT = 2, these simple two design rules
allow to obtain full diversity, according to the rank criterion:
• Rule 1 : Transitions departing from the same state differ in the second
symbol only.
• Rule 2 : Transitions merging at the same state differ in the first symbol
only.
In fact, by following these rules, the error matrix assumes the form (for all
(Ĉ, C))
··· 0 ··· β ···
Ĉ − C =
··· α ··· 0 ···
with α and β nonzero complex numbers. Thus, every such error matrix has
full rank and the ST code achieves full diversity. The maximization of the
minimum determinant of matrices A = (Ĉ − C)(Ĉ − C)H having minimum
rank is a harder task. The code design is therefore performed through a
computer search [111] or through algebraic techniques [130, 144].
input symbol: 0 1 2 3
00 01 02 03
10 11 12 13
20 21 22 23
30 31 32 33
input symbol: 0 1 2 3
00 01 02 03 22 23 20 21
10 11 12 13 32 33 30 31
branch labels branch labels
20 21 22 23 02 03 00 01
30 31 32 33 12 13 10 11
(a) (b)
Figure 9.9: Trellis diagrams of two ST trellis codes with (a) S = 4 and (b)
i
S = 8 states. QPSK symbol e2π 4 , i = 0, 1, 2, 3, is specified through integer
i.
For quasi-static channels, examples of a couple of good ST trellis codes

using the QPSK modulation, with rate of 2 bits per channel use, nT = 2,
and S = 4 and 8, respectively, are provided in Fig. 9.9. Other examples can
be found in [111, 130, 144, 145, 146].
The optimal detector is based on the Viterbi algorithm working on the
code trellis. Since the number of trellis branches departing from the same
state is 2η , the larger the rate η, the higher the receiver complexity. Similarly,
the larger the number of transmit antennas, the higher the receiver complex-
ity. Hence, for transmissions requiring very high spectral efficiency and/or a
large number of transmit antennas, other codes are more appropriate (such
as layered ST codes described below).
Upper bound on the error probability can be computed through the tech-
nique described in Chapter 2. However, as mentioned for quasi-static chan-
nels, this bound results to be loose, especially when the number of antennas
is limited. The reason is simple. In the bound computation, a same contribu-
tion is accounted for many times. This does not represent a problem for the
AWGN channel since, in this case, the pairwise error probability terms decay
exponentially and a few dominant terms exist. On the quasi-static frequency-
flat fading channel, instead, when the available diversity is limited, pairwise
error probability terms decay very slowly and, as a consequence, the number
of dominant terms is not limited.
A possible solution is represented by the technique described in [106]
for the application to convolutional codes, and applied in [147] to ST trellis
codes. The idea is very simple. Let us assume that we are interested in
(U )
the computation of an upper bound on the bit error probability Pb (the
same considerations hold for the symbol error probability or the frame error
probability). Up to now, the starting point was the computation of the
pairwise error probability given a channel realization H. It was then averaged
over the channel realizations and employed in the computation of an upper
bound on the bit error probability. We can instead apply the technique
described in Chapter 2 to compute an upper bound of the bit error probability
(U )
given the channel realization Pb (H), upper bound it with one if it exceeds
unity, and then perform the average over the channel realizations:
(U )
Pb = EH {min[1, Pb (H)]} .
In other words we are changing the order of the average and the summation
and, when the channel coefficients are so small that the pairwise error prob-
ability terms become close to one producing a bound given H having a value
larger than 1, we trivially upper bound it with 1, and then average over the
channel statistics. The average cannot be now computed in closed form but
Montecarlo averaging must be adopted.
9.3.7 Layered space-time codes

BLAST architectures. ST block and trellis codes can achieve full di-
versity (diversity nT nR ) on quasi-static frequency-flat channels, thus rep-
resenting an effective way to combat the effects of fading. However, their
application is limited to transmissions with a small rate η. In fact, ST block
codes achieve a rate of at most log2 M bits per channel use (where M is the
cardinality of the employed constellation), whereas the complexity of ST trel-
lis codes limits their adoption to applications where a very limited number
of bits per channel use is required. It could be thus useful to trade diversity
against rate for those wireless applications requiring high data rates. Layered
ST (LST) architectures, originally proposed by Foschini [148], have been de-
veloped for such a purpose and to handle a large number of antennas with
limited complexity.
The first and most effective proposed LST architecture is represented by
diagonal Bell laboratories layered space-time (DBLAST) scheme. We will
concentrate mainly on it, describing the encoding procedure and a few sub-
optimal low-complexity decoding algorithms. Horizontal and vertical BLAST
1 1
encoder mapper
demux spatial
information .. .. .. ..
bits . . . .
1:L interleaver
L nT
encoder mapper
Figure 9.10: DBLAST encoder.
(HBLAST and VBLAST) will be also mentioned along with alternative LST
architectures such as multilayered ST codes [149], threaded ST codes [150],
and wrapped ST codes [151].
In a BLAST architecture, multiple independent coded streams are dis-
tributed throughout the transmission resource array in the so-called layers.
Since the complexity of the optimal decoder is impractical, the aim is to de-
sign the layering architecture and the associated signal processing so that the
receiver can efficiently separate the individual layers and decode each of them
effectively. In other words, low-complexity suboptimal decoding schemes
based on individual decoding of the component codes and on mitigation of
the mutual interference among component codewords can be adopted.
The block diagram of DBLAST encoder is shown in Fig. 9.10 [148]. The
information bit stream is demultiplexed into L parallel substreams. Each
substream is independently encoded and the code bits are mapped onto M-
ary symbols belonging to a constellation A. The resulting L codewords are
collected in the row vectors {c(i) }Li=1 , of length N ′ = nT d symbols. These
row vectors are then broken into nT small subblocks of d symbols each. We
(i)
will denote by cj the row vector representing the j-th subblock of codeword
c(i) . These subblocks are cyclically assigned by the spatial interleaver to all
transmit antennas in such a way the codewords share a balanced presence
over all nT antennas and none of the individual substreams is hostage of
the worst of the nT paths. In the case of nT = 4, the transmitted nT × N
codeword matrices, with N = d(L+nT −1), have thus the following structure:
 (1) (2) (3) (4) (5)

c1 c1 c1 c1 c1 ... ... ... ...
 (1) (2) (3) (4) (5) 
 0 c2 c2 c2 c2 c2 ... ... . . .
C= (1) (2) (3) (4) (5)  (9.77)
 0 0 c3 c3 c3 c3 c3 ... . . .
(1) (2) (3) (4) (5)
0 0 0 c4 c4 c4 c4 c4 ...
where the entries below the first diagonal layer are zeros. Symbols belonging
to layer i are placed in the entries

cn,k+(i+n−2)d | k = 1, 2, . . . , d, n = 1, 2, . . . , nT
of matrix C.
Decoding is accomplished through a multiuser detection strategy based
on a combination of cancellation and suppression. Since each diagonal layer
constitutes a complete codeword, decoding is performed layer-by-layer start-
ing from the first diagonal. The receiver generates a soft-decision statistic for
each symbol in this diagonal by suppressing the interference from the upper
diagonals. This can be obtained by projecting the received signal onto the
null space of the upper interference. These soft statistics are then used by
the corresponding channel decoder to decode the first codeword. The de-
coder output is then fed back to cancel the first diagonal contribution in the
interference while decoding the next diagonal and so on so forth. This is the
so called zero-forcing (ZF) suppression strategy that requires nR ≥ nT .
Going into details, the receiver is obtained by a linear front end followed
by decision-feedback interference cancellation [148, 150, 151, 152]. Matrix H
is first decomposed using the so-called QR decomposition [153]:
H = QB (9.78)
where Q is a nR × nT matrix with orthonormal columns (i.e., QH Q = InT )
and B is a nT × nT upper triangular matrix whose diagonal elements (when
H is nonsingular, which occurs with probability 1 in the case of the Rayleigh
channel) are all positive. From the nR × N matrix R in (9.26) that collects
the received samples, the linear front end, defined by the nT × nR complex
matrix QH , produces the alternative sufficient statistic given by matrix
√ √
V = QH R = γQH HC + QH N = γBC + Ň (9.79)
where Ň = QH N. Let us now consider the (n, ℓ)-th element of V. Being B
an upper triangular matrix, it can be expressed as
nT
√ X
vn,ℓ = γ bn,k ck,ℓ + ňn,ℓ
k=n
nT
√ √ X
= γbn,n cn,ℓ + γ bn,k ck,ℓ + ňn,ℓ
k=n+1
where we assumed that cn,ℓ is the symbol we would like to detect. As one
n−1
can observe, the interference of symbols {ck,ℓ }k=1 has been removed. The
remaining symbols belong to lower layers. Hence, samples

k = 1, 2, . . . , d
vn,k+(n−1)d (9.80)
n = 1, 2, . . . , nT
can be used to decode layer 1 since no interference from other layers is present.
Once this layer has been decoded, the corresponding information bits at the
decoder output are encoded again and can thus be subtracted when decoding
the second layer. The process will continue layer-by-layer by using samples
nT
X
√ k = 1, 2, . . . , d
v̂n,k+(i+n−2)d = vn,k+(i+n−2)d − γ bn,k ĉk,k+(i+n−2)d
n = 1, 2, . . . , nT
k=n+1
(9.81)
where {ĉk,ℓ } are the decisions on code symbols already taken for the previous
layers, to decode layer i. Hence, the decisions needed in (9.81) are provided
by earlier decoded codewords (layers). Samples v̂n,k+(i+n−2)d in (9.81) can be
expressed as
√
v̂n,k+(i+n−2)d = γbn,n cn,k+(i+n−2)d
nT
√ X
+ γ bn,k (ck,k+(i+n−2)d − ĉk,k+(i+n−2)d) + ňn,k+(i+n−2)d
k=n+1
showing that, when previously decoded codewords are all correct, detection
is interference-free.
As mentioned, the ZF suppression strategy requires nR ≥ nT . This re-
quirement can be relaxed, and also a better performance obtained in the same
conditions, by using minimum mean-square error (MMSE) filtering [150, 151].
The linear front end is defined, in this case, by an nT × nR complex matrix
FH which produces the alternative sufficient statistic V = FH R whose ℓ-th
column is
vℓ = FH rℓ . (9.82)
We know that vector rℓ can be expressed as (see (9.23)).
nT
√ √ X
rℓ = γHcℓ + nℓ = γ hk ck,ℓ + nℓ (9.83)
k=1
where hk is the k-th column of H. Hence

nT
H H √ √ √ X
vℓ = F rℓ = F ( γHcℓ + nℓ ) = γGcℓ + ňℓ = γ gk ck,ℓ + ňℓ
k=1
having defined ňℓ = FH nℓ and G = FH H, whereas gk = FH hk is the k-

th column of G. Once the layer i has been detected, the corresponding
symbols can be cancelled. Let us assume that symbols of layers up to i to
be cancelled correspond, at discrete time ℓ, to symbols {ck,ℓ}nk=n+1

T
. Hence,
after cancellation we have vector
nT
X nT
X
√ H √
v̂ℓ = vℓ − γ gk ĉk,ℓ = F (rℓ − γ hk ĉk,ℓ ) = FH r̂ℓ (9.84)
k=n+1 k=n+1
having defined
nT
X
√
r̂ℓ = rℓ − γ hk ĉk,ℓ
k=n+1
Vectors and r̂ℓ and v̂ℓ may be expressed, under the assumption of correct
decisions, as
n
√ X
r̂ℓ = γ hk ck,ℓ + nℓ (9.85)
k=1
n
√ X
v̂ℓ = γ gk ck,ℓ + ňℓ . (9.86)
k=1
The (n, ℓ)-th element of nT ×N matrix V̂ = (v̂1 , v̂2 , . . . , v̂N ) can be expressed
as
v̂n,ℓ = fnH r̂ℓ (9.87)
where fn is the n-th column of F. Since vn,ℓ is employed as soft statistic
associated with symbol cn,ℓ , column fn is selected as that minimizing the mean
square error E{|fnH r̂ℓ − cn,ℓ |2 }, that under the assumption of correct decisions
(i.e., under the assumption that (9.85) holds) can be easily computed in
closed form as [154, 155] (see also Appendix B)
n
!−1
√ X
fn = γ I nR + γ hk hH
k hn .
k=1
Notice that, this time, the interference of the upper layers is not removed
through filtering, but the joint effect of interference and noise is minimized
according to the MMSE criterion. Decoding is accomplished as for the ZF
strategy by decoding a layer and cancelling it before decoding the next layer.
Other LST architectures have been conceived aiming at improving the
performance or reducing the overall receiver complexity. Horizontal BLAST
(HBLAST) has a structure very similar to DBLAST, the only difference is
the absence of the spatial interleaver. The corresponding encoder is shown in
Fig. 9.11. As one can observe, in this scheme the number of layers is equal to
the number of transmit antennas nT . In other words, each layer is univocally
1 1
encoder mapper
information demux .. .. .. ..
bits . . . .
1 : nT
nT nT
encoder mapper
Figure 9.11: HBLAST encoder.
associated with a transmit antenna. Decoding can be again accomplished

layer-by-layer. In the case of the ZF strategy, after the linear front end, layer
nT is decoded first by using samples
√
vnT ,ℓ = γbnT ,nT cnT ,ℓ + ñnT ,ℓ
of matrix V in (9.79) and, in general, layer j by using samples

nT
X
√
v̂j,ℓ = vj,ℓ − γ bj,k ĉk,ℓ
k=j+1
nT
X
√ √
= γbj,j cj,ℓ + γ bj,k (ck,ℓ − ĉk,ℓ ) + ñj,ℓ .
k=j+1
It can be noticed that different layers are decoded with different reliability.
In particular, the last detected layer has the highest reliability since, for it,
the contribution of all other layers has been cancelled. A way to overcome
this problem is to sort the received sequences starting detection from that
with the highest power. This corresponds to sort the columns of H according
to their squared norms. In the case of MMSE filtering, detection proceeds
as mentioned in the case of DBLAST—the only difference being represented
by the different allocation of codewords in matrix C. In this case also, the
received sequences can be properly sorted.
Finally, in vertical BLAST (VBLAST) the different layers are not en-
coded. This simplifies the receiver structure but a less reliable cancellation
can be performed. This scheme can be concatenated with an outer channel
encoder, possibly through an interleaver. In this case, iterative detection and
decoding can be performed based on the turbo principle.
Multilayered ST architecture. BLAST architectures allow to achieve

a spectral efficiency up to η = Rc nT log2 M bits per channel use, where Rc is
the rate of the employed encoders. However, there is no attempt to maximize

the code diversity. Other layered architectures allow to trade spectral effi-
ciency for diversity in an attempt to improve the system performance with
respect to BLAST architectures and the data rate with respect to ST block
and trellis codes. As an example, the multilayered ST architecture [149] is a
hybrid approach using both space-time channel codes and layered processing.
Transmit antennas are partitioned into small groups and independent space-
time block or trellis codes are employed for each group. The corresponding
codewords are then organized in layers and decoded using the techniques pre-
viously described, properly modified in order to perform group interference
cancellation.
Wrapped ST codes. The final layering scheme we describe is represented

by wrapped ST codes (WSTCs) [151] which represent a significant improve-
ment with respect to DBLAST. In this latter scheme, the codewords have
length N ′ = nT d. For given nT , a large delay d is thus needed in order to
have long codewords. If interleaving delay is an issue, the DBLAST scheme
is forced to work with a short component code block length N ′ . This might
pose a serious problem for using trellis codes with a large number of states. In
fact, the code memory might not be negligible with respect to N ′ thus yield-
ing a non-negligible rate loss due to trellis termination. In addition, in case
of block component codes, powerful codes cannot be used. A solution is rep-
resented by wrapped ST codes, that keep the simplicity of decision-feedback
interference mitigation while allowing for arbitrarily long component code-
words and small interleaving delay. In these schemes, a single encoder is em-
ployed. The corresponding codeword of length N ′ is diagonally interleaved,
through a ST formatter, in order to form the nT × N codeword matrix C,
with N = N ′ /nT + (nT − 1)d. The codeword matrix C is filled by wrapping
the codeword c around the matrix diagonals, from which the name of this
layering scheme arises, as illustrated by Fig. 9.12. We can write
C = F (c)
where the formatter F is defined such that the element cn,ℓ of the codeword
matrix C is related to the element ck of the codeword c by
(
ckn,ℓ if 1 ≤ kn,ℓ ≤ N ′
cn,ℓ = (9.88)
0 otherwise
zero symbols
d
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69
2 6 10 14 18 22 26 30 34 38 42 46 50 54 58 62 66 70
nT
3 7 11 15 19 23 27 31 35 39 43 47 51 55 59 63 67 71
4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72
N
zero symbols
Figure 9.12: Wrapped ST codeword for nT = 4, d = 2, and N ′ = 72. The

entries in the above array indicate the index of the symbols in the component
codeword.
for 1 ≤ n ≤ nT and 1 ≤ ℓ ≤ N, where
kn,ℓ = [ℓ − 1 − (n − 1)d]nT + n . (9.89)
In this way, the interleaving delay d becomes a free parameter, independent

of the component codeword block length N ′ . As a limiting case, the inter-
leaving delay may be also d = 0, i.e., a vertical interleaver may be used. For
consistence with the case d > 0, where code symbols with lower index take
the lower positions in each column of the codeword matrix C (see Fig. 9.12),
the space-time formatter for d = 0 is defined by replacing (9.89) by
kn,ℓ = (ℓ − 1)nT + nT − n + 1 . (9.90)
When the component encoder is a trellis code of rate b/nT , the corresponding
wrapped ST code with d = 0 coincides with a standard ST trellis code. For
d > 0, the corresponding wrapped ST code can be seen as the concatenation
of a trellis code with delay-diversity. Because of the lower and upper triangles
of zero symbols in the codeword matrix in Fig. 9.12, there is an inherent rate
loss of (nT − 1)d/N which, however, is negligible if N ≫ nT d. Moreover, if
the transmission of a long sequence of codewords is envisaged, the codeword
matrices can be concatenated in order to fill the leading and tailing triangles
of zeros, so that no rate loss is incurred.
The wrapped ST architecture has been designed such that when the com-
ponent codewords are produced by a trellis encoder, decoding can be imple-
mented efficiently by ZF or MMSE decision-feedback interference mitigation
coupled with Viterbi decoding, through the use of per-survivor processing.
Let us consider, as an example, the use of the ZF suppression strategy. The

extension to the case of MMSE filtering is straightforward—only the lin-
ear filter requires to be modified. The decoder works on the trellis of the
component code and takes as observable the sequence of samples
nT
X
√
zk = vn,ℓ − γ bn,m c̆m,ℓ , k = 1, 2, . . . , N ′
m=n+1
where vn,ℓ is the (n, ℓ) element of matrix V = QH R (or of matrix V = FH R

in case of MMSE filtering), bn,ℓ is the (n, ℓ) element of matrix B given in
(9.78), 1 ≤ n ≤ nT and 1 ≤ ℓ ≤ N are the unique integers for which kn,ℓ = k.
From the index mapping (9.89) [or (9.90) for d = 0], we see that the elements
cm,ℓ , for m = n+1, . . . , nT , correspond to either zeros (for which no decision is
needed) or to symbols of the codeword with index k ′ ≤ k −nT d+1 (k ′ ≤ k −1
for d = 0). These decisions are found in the survivor history of the Viterbi
decoder, according to standard per-survivor processing.
A major advantage of WSTCs is that off-the-shelf component codes can
be employed, thus avoiding an ad hoc code search. A sensible criterion for
the design of the component code is, in fact, the maximization of the code
block diversity δ, defined by
δ = min |{j ∈ {1, . . . , nT } | wj 6= 0}| (9.91)

c,ĉ:ĉ6=c
where wj is the squared Euclidean weight defined as

N
X
wj = |ĉj,n − cj,n |2 (9.92)
n=1
that is, to maximize the minimum number of non-zero rows in the matrix
difference Ĉ − C = F (ĉ) − F (c) for each pair of distinct codewords matrices
Ĉ − C, which is strictly related to the rank diversity of a WSTC. The block-
diversity criterion has been investigated in [106, 156, 157] for the design
of trellis codes for cyclic interleaving and/or periodic puncturing and codes
optimized in this sense are thus available. The relationship between the rank
diversity ν of a WSTC and the block diversity of its component code is the
following [151]:

Rc
ν ≤ δ ≤ 1 + nT 1 − . (9.93)
log2 M
where Rc is the rate of the component code and M is the cardinality of
the employed constellation A. Moreover, there exist values of d for which
ν = δ [151]. Since it is known from [111] that for any STC with spectral
efficiency η = nT Rc the rank diversity satisfies the inequality

Rc
ν ≤ 1 + nT 1 − (9.94)
log2 M
which is the same upper bound on block diversity given in (9.93), we conclude
that the wrapping construction incurs no loss of optimality in terms of rank
diversity (for an appropriate choice of the delay d). As a matter of fact, while
it is difficult to construct codes with rank diversity equal to the upper bound
(9.94), it is very easy to find trellis codes for which the upper bound (9.93) on
δ is met with equality, for several coding rates and values of nT . Examples of
these codes are tabulated in [156, 157]. Therefore, the wrapping construction
is a powerful tool to construct STCs with optimal rank diversity.
9.3.8 Multiplexing-diversity trade-off

As mentioned, there are two types of gain a MIMO system can provide,
that is the diversity and the multiplexing gains. We focus our attention
on the quasi-static Rayleigh fading channel, with channel state information
available at the receiver only. The diversity gain D is mathematically defined
as the negative asymptotic slope of the error rate curve as a function of the
signal-to-noise ratio in a log-log scale:
log Pe (γ)
D = lim − (9.95)
γ→∞ log γ
where Pe (γ) is the average error probability of the system, whereas the mul-
tiplexing gain r is defined as the asymptotic ratio between the data rate of a
specific MIMO scheme and the logarithm of the signal-to-noise ratio (which
is a measure of the capacity increase) [112]:
R(γ)
r = lim (9.96)
γ→∞ log γ
where R(γ) is the data rate in bits/s/Hz.
The maximum diversity gain that a MIMO system can achieve is given,
as it is now clear, by nT nR . This maximum diversity gain is achieved by
some of the schemes previously described. As for the multiplexing gain,
it cannot exceed the number of degrees of freedom provided by the MIMO
channel, which is min(nT , nR ). It is also clear that, in order to have a nonzero
multiplexing gain, the considered scheme cannot have a constant data rate
but must provide a data rate that increases with the signal-to-noise ratio.
This can be achieved, as an example, by increasing the constellation size

with the signal-to-noise ratio.
It has been demonstrated in [112] that it is not possible to achieve both
full diversity and full multiplexing gains. For each r the optimal diversity gain
Do (r) is the maximum diversity gain that can be achieved by any scheme.
It is shown in [112] that, if the fading coherence time is greater or equal to
nT + nR − 1, then
Do (r) = (nT − r)(nR − r) , 0 ≤ r ≤ min(nT , nR ) . (9.97)
Hence, when the diversity gain is nT nR , the multiplexing gain is zero whereas
when r = min(nT , nR ) the diversity gain is zero. For practical schemes, the
diversity/multiplexing trade-off function lies below the curve (9.97) and can
be used to compare different schemes and to interpret their behavior, as
shown in the examples that follow. As an example, for the Alamouti scheme,
the diversity/multiplexing tradeoff function is [112]
D(r) = max (2nR (1 − r), 0)
and reaches the upper bound (9.97) for r = 0 only. On the other hand, it
can be shown that BLAST schemes favor the multiplexing gain [112].
9.3.9 Concatenated codes for MIMO channels

Channel coding and space-time coding can be combined to achieve further
performance improvements. For a fast fading channel, i.e., when the channel
coherence time is relatively short and time interleaving is employed, the
adoption of an outer channel encoder can provide time diversity in addition
to antenna diversity. On the other hand, for a quasi-static fading channel,
an outer channel encoder can only provide a coding gain.
Different concatenated schemes are available in the literature. The inter-
ested readers can have a look at [158, 159, 160, 161, 162, 163, 164, 165, 166,
167, 168].
9.3.10 Unitary and differential ST codes

Up to now, we assumed perfect knowledge of the channel at the receiver.
This condition can be reached, when the channel coherence time is long
enough, through the use of pilot symbols periodically inserted to help the
receiver to obtain an accurate enough channel estimate. When the channel
changes frequently, absence of knowledge of the channel at both transmitter
and receiver must be assumed and one may resort to noncoherent detection.
In this case, proper ST codes need to be employed. In the following, we will

discuss unitary and differential ST codes.
Unitary ST codes. Before going into details, let us derive the metric for
the case of noncoherent detection assuming the block fading channel model
with coherence time of L symbol intervals.
By collecting L received vectors into an nR × L matrix R, we may write
√
R= γHC + N (9.98)
H being the nR × nT channel matrix, whereas the nT × L matrix C and

the nR × L matrix N collect the transmitted symbols and the noise samples
during L symbol intervals, respectively. Considering the samples ri [ℓ], ℓ =
1, 2, . . . , L, received by antenna i, they can be collected in a row vector
ri = [ri [1], ri [2], . . . , ri [L]] (9.99)
that can be expressed as

√
ri = γhi C + ni (9.100)
where hi is the i-th row of the channel matrix H and ni is a row vector col-
lecting the noise samples at the output of antenna i. Rows ri are independent
of each other. Hence, we may write
nR
Y
f (R|C) = f (ri |C) .
i=1
Given C, the random variables in ri are jointly Gaussian with mean zero and
covariance matrix
√ T T √
Λ = E rTi r∗i = E γC hi + nTi ( γhi C + ni )∗
= IL + γCT C∗
and thus
nR
Y exp −r∗i Λ−1 rTi
f (R|C) =
i=1
π L detΛ
( n )
1 X R
= Ln exp − r∗i Λ−1 rTi

π R (detΛ)nR i=1
−1 T ∗

exp −trace Λ R R
= .
π LnR (detΛ)nR
For the block fading channel with coherence time of L symbols, it is shown
in [169] that, asymptotically, capacity is achieved when
C = VΦ
where Φ is an isotropically distributed 13 nT × L matrix whose rows are or-
thonormal (hence Φ∗ ΦT = InT ) and V is an independent nT × nT real
non-negative diagonal matrix. When L ≫ nT , or when L > nT and the
signal-to-noise ratio is very high, capacity can be achieved by selecting
√
C = LΦ . (9.101)
For this reason, unitary ST codes proposed in [169] have a codebook com-
posed by codewords (9.101), with Φ belonging to a set of 2ηL elements, where
η is the spectral efficiency in bits per channel use. Being H unknown at the
receiver, the maximum likelihood decoder will operate according to the fol-
lowing decision rule
Φ̂ = argmax f (R|Φ)
Φ

exp −trace Λ−1 RT R∗
= argmax
Φ (detΛ)nR
where Λ = IL + γLΦT Φ∗ . By using the fact that Φ∗ ΦT = InT , the following
properties
det(I + AB) = det(I + BA)
trace(AB) = trace(BA)
and the matrix inversion lemma
(A − BD−1 C)−1 = A−1 + A−1 B(D − CA−1 B)−1 CA−1
we have
detΛ = det(IL + γLΦT Φ∗ ) = det(InT + γLΦ∗ ΦT ) = (γL + 1)nT
and
h −1 T ∗ i
trace Λ−1 RT R∗ = trace IL + γLΦT Φ∗ R R

γL T ∗ T ∗
= trace IL − Φ Φ R R
1 + γL

T ∗ γL ∗ T ∗ T
= trace R R − trace RΦ Φ R
1 + γL
13
An nT × L isotropically distributed random matrix is a matrix whose probability
density function remains unchanged when it is right-multiplied by any deterministic nT ×
nT unitary matrix. This matrix is the nT × L counterpart of a complex scalar having unit
magnitude and uniformly distributed phase.
from which we obtain

Φ̂ = argmax trace R∗ ΦT Φ∗ RT
Φ

= argmax trace RΦH ΦRH . (9.102)
Φ
An upper bound on the pairwise error probability can be found in [169].

It results that
  nR
nT
Y
1  1 
Pr(Φ → Φ̂) ≤ (γL) 2 (1−µ2 )
(9.103)
2 i=1 1 + i
4(1+γL)
where µi ≤ 1, i = 1, 2, . . . , nT , are the singular values of the nT × nT corre-

T
lation matrix Φ∗ Φ̂ . Eqn. (9.103) can be used to design the set of matrices
{Φ} to be employed. An algorithm for the construction of this set is provided
in [169]. From (9.103), it also appears that the maximum diversity can be
nT nR as in the case of when H is known at the receiver. This maximum
diversity can be obtained when all singular values µi are strictly lower than
one.
Differential ST codes. A different approach is that pursued in [170],14

based on differential ST coding and differential ST detection that represent
a generalization for MIMO systems of differential encoding and differential
detection in single antenna systems.
Let S be a group of L×L unitary matrices. Hence, being unitary, a matrix
S ∈ S is such that SH S = SSH = IL . In addition, being S a group, IL , the
multiplicative identity, belongs to S, the multiplication of two matrices in S
is a matrix in S, and the inverse of each element of S also belongs to S. Let
C0 be an nT × L matrix such that C0 CH 0 = InT and having the property
that the matrix C0 S has all elements belonging to a given alphabet A, for
all S ∈ S. The set
C = {C0 S|S ∈ S}
represents the set of transmitted codewords. A matrix C ∈ C is clearly such
that CCH = C0 SSH CH 0 = C0 C0 = InT . In the case of a block fading
H
channel with coherence time 2L, matrix C0 is transmitted in the first L

symbol intervals whereas matrix C0 S, with S ∈ S, is transmitted in the
successive L symbol intervals. The resulting ST code has spectral efficiency
1
η = 2L log2 |S| bits per channel use. When the channel changes continuously
14
A similar scheme has been proposed in [171] whereas an alternative scheme for the
case of two transmit antennas has been proposed in [172].
but can be considered approximately constant over 2L symbol intervals, the

following nT × L matrices are transmitted during NL symbol intervals:
(
C0 ℓ=0
Cℓ =
Cℓ−1 Sℓ ℓ = 1, 2, . . . , N − 1 .
In this case, the spectral efficiency is η = L1 NN−1 log2 |S| bits per channel use.
As far as decoding is concerned, in the case of the block fading channel
with coherence time 2L, optimal decoding will be accomplished based on the
observation of 2L symbol intervals. In the case of a channel that changes
continuously, by collecting blocks of L received vectors into nR × L matrices
{Rℓ }, we may write
√
Rℓ = γHℓ Cℓ + Nℓ (9.104)
Hℓ being the nR ×nT channel matrix corresponding to the ℓth block, whereas
the nR × L matrix Nℓ collects the noise samples during the L symbol inter-
vals of the ℓth block. In this case, optimal decoding must be accomplished
based on the observation of the whole sequence {Rℓ }. However, in order
to reduce the receiver complexity, as in the case of differential decoding for
single-antenna systems, decoding of Sℓ is accomplished by looking at pairs
of overlapping blocks of L symbol intervals at a time, i.e., Rℓ and Rℓ−1 . Let
us define the nR × 2L matrix
′
Rℓ = [Rℓ−1 , Rℓ ] .
′ ′
Assuming Hℓ = Hℓ−1 and defining Cℓ = [Cℓ−1 , Cℓ ] and Nℓ = [Nℓ−1 , Nℓ ], we
may write
′ √ ′ ′
Rℓ = γHℓ C ℓ + Nℓ .
′ ′
Since matrices Cℓ are such that Cℓ CH H
ℓ = InT , we also have Cℓ Cℓ = 2InT .
Hence, when accomplishing detection based on a pair of blocks of L symbol
intervals we may adopt the detection strategy (9.102) that now becomes
n h ′ ′ ′ ′
io
Ŝℓ = argmax trace Rℓ CℓH C ℓ RℓH . (9.105)
Sℓ
Under the additional assumption that L = nT —this is certainly possible since

we are not considering a block fading channel but a channel that changes
9.4 – ST coding for frequency-selective (FS) fading channels 265
continuously—it also holds Cℓ CH ℓ = Cℓ Cℓ = InT . Hence, (9.105) becomes

H
h ′ ′ ′ ′
i
Ŝℓ = argmax trace Rℓ CℓH C ℓ RℓH
Sℓ

= argmax trace Rℓ−1RH H
ℓ−1 + Rℓ−1 Cℓ−1 Cℓ Rℓ
H
Sℓ

+Rℓ CH H
ℓ Cℓ−1 Rℓ−1 + Rℓ Rℓ
H

= argmax ℜ trace Rℓ−1 Sℓ RH ℓ = argmax ℜ trace Sℓ RH
ℓ Rℓ−1 .
Sℓ Sℓ
(9.106)
The performance analysis can be carried out in a way similar to the case
of unitary ST codes since both are based on unitary matrices and the same
metric. Details can be found in [170] along with the design criteria and the
optimal codes for two transmit antennas.
Iterative schemes. The scemes previously described can be concatenated

with an outer channel code to improve the performance. As an example, the
concatenation of turbo codes and unitary ST codes with iterative decoding
at the receiver has been considered in [173] whereas the concatenation of
differential ST codes and an outer code through an interleaver has been
considered in [174, 175]. In this latter case, being a differential ST code a
recursive code, when iteratively decoded this serial concatenation provides
an interleaver gain.
9.4 ST coding for frequency-selective (FS) fad-

ing channels
Up to now, we concentrated on flat fading channels. However, in wideband
wireless systems, the symbol period becomes smaller than the channel delay
spread and, hence, the transmitted signal sees a FS channel. An overview of
the main results on ST codes for these channels will be provided here with
reference to the case of quasi-static channels.
9.4.1 System model for FS MIMO fading channels

Assuming a channel with L taps and delays multiple of the symbol time, at
discrete time ℓ the received samples at the output of the nR receive antennas,
collected into an nR × 1 vector rℓ , can be expressed as
L−1
√ X (l)
rℓ = γ H cℓ−l + nℓ , ℓ = 1, . . . , N (9.107)
l=0
where cℓ is the nT × 1 vector collecting the modulation symbols transmitted

in parallel by the nT transmit antennas, nℓ is a nR ×1 complex Gaussian noise
vector having independent real and imaginary components and representing
the thermal noise samples at the nR receive antennas, H(l) is a nR ×nT matrix
of the channel gains for the l-th path, γ is a proper real coefficient, and N
is the codeword length. We will assume a Rayleigh fading channel, i.e., each
entry of H(l) is modeled as a zero mean complex Gaussian random variable.
Different channel taps are usually assumed to be independent. The average
channel gains for different paths are determined from the power delay profile
of the wireless channel. As an example, for the exponential power delay
profile, the channel tap powers decay exponentially.
Collecting the noise samples into an nR × N matrix R = [r1 , . . . , rN ] and
assuming that cℓ = 0nT for ℓ ≤ 0, we may write
√
R = γHC + N (9.108)
where N = [n1 , . . . , nN ] collects the noise samples (as in the case of a flat-
fading channel),
H = H 1 H 2 · · · H nT (9.109)
with  
(0) (1) (L−1)
h1,i h1,i · · · h1,i
 (0) (1) (L−1) 
 h2,i h2,i · · · h2,i 
Hi = 
 .. .. .. 
 (9.110)
 . . ··· . 
(0) (1) (L−1)
hnR ,i h2,i · · · hnR ,i
is the nR × nT L equivalent channel matrix whereas the nT L × N equivalent
matrix of the transmitted symbols C takes the form
 
C1
 C2 
 
C =  ..  (9.111)
 . 
Cn T
with
 
ci,1 ci,2 ci,3 ··· ··· ci,N −1 ci,N
 0 ci,1 ci,2 ··· ··· ci,N −2 ci,N −1 
 
 .. .. 
 . 0 ci,1 ··· ··· . ci,N −2 
Ci = 
 .. .. .. .. .
 (9.112)
 . . 0 ··· ··· . . 
 .. .. .. .. 
 . . . · · · · · · ci,N −L . 
0 0 0 · · · · · · ci,N −L−1 ci,N −L
Matrix Ci is related to the symbols transmitted by antenna i.
9.4 – ST coding for frequency-selective (FS) fading channels 267
9.4.2 Design criterion

The channel model (9.108) states that our FS MIMO fading channel is equiv-
alent to a frequency-flat fading channel having LnT transmit antennas. Look-
ing at (9.112), we can say that for each antenna, we now have other L − 1
virtual antennas transmitting a delayed version of the same symbols (as in
the delay-diversity scheme described in Section 9.3.4). Assuming that matrix
H̃ has independent coefficients, by a proper design of the ST code a diversity
order of LnT nR can be achieved. This is not surprising since the multipath
propagation provides another form of diversity. The following criterion can
thus be stated.
Design criterion: The maximum diversity of LnT nR is achieved by ensur-

ˆ − C̃)(C̃
ing that the matrix A = (C̃ ˆ − C̃)H is full-rank for all the pairs
ˆ and C̃. Otherwise, if the minimum rank of A
of distinct codewords C̃
among all codeword pairs is νmin ≤ LnT , a diversity order νmin nR is
achieved. In order to obtain the maximum possible coding advantage,
the minimum determinant of matrices A having minimum rank should
be maximized.
9.4.3 ST codes for single carrier systems

A code achieving full diversity can be designed by extending the same idea of
the delay-diversity scheme described in Section 9.3.4. In fact, if we look at the
rows of matrix C̃i in (9.112), they already are a delayed version of the first
row. Hence, it is sufficient to transmit, from antenna i, a delayed version
of the symbols transmitted by the first antenna, with a delay of L(i − 1)
symbols. The resulting equivalent codeword matrix C̃ will be
 
c1 c2 c3 · · · cL · · · cN −1 cN
 0 c1 c2 · · · cL−1 · · · cN −2 cN −1 
 . . .. .. 
 . . 
C̃i =  . . c1 · · · cL−2 · · · . . 
 . . . 
 .. .. .. · · · ... · · · cN −L ... 
0 0 0 · · · c1 · · · cN −L−1 cN −L
and matrix A will have full rank.

Although, in principle, space-time block codes could be designed for FS
fading channels, the main advantage of these codes, i.e., the simple linear
processing, will be lost due to the presence of ISI. ST trellis codes, layered
architectures, and concatenated schemes can be also extended to FS channels.
As an example, ST trellis codes employing BPSK and QPSK modulations
are described in [176]. The optimal decoder will operate, in this case, on
the equivalent trellis that takes into account both the code and the channel
trellis. As far as concatenated schemes are concerned, it must be taken into
account that the channel introduces memory. Hence, it can be employed in
place of an inner encoder and concatenated with an outer encoder through
an interleaver.
As in the case of single-input single-output channels, optimal detection
has an exponential complexity in the channel memory L. Therefore, taking
into account that the complexity is exponential in the number of transmit an-
tennas also, optimal detection is practically unfeasible. Reduced-complexity
detection schemes are thus required, such as linear or decision-feedback equal-
ization schemes [177], reduced-state sequence detection [178, 179, 180, 181],
or other schemes based on factor graphs (e.g., see [182, 183]).
9.4.4 ST codes for MIMO OFDM

ST codes for single carrier systems require the use, at the receiver, of so-
phisticated detection techniques. An alternative approach can be the use of
OFDM. In this case, according to Fig. 9.13, independent IDFTs are applied
to the symbols transmitted by each antenna. After cyclic prefix is appended
to each sequence, they are transmitted on the FS MIMO channel. Each of
the nR receive antenna, will receive the superposition of all transmitted sig-
nals. This composite signal undergoes DFT and cyclic prefix removal. The
resulting signals are then jointly demodulated and decoded. It is possible
to observe that MIMO-OFDM allows to perfectly remove ISI (under the as-
sumption of quasi-static fading and perfect carrier synchronization) although
the interference from different transmit antennas must, obviously, be taken
into account [184].
9.5 Massive MIMO systems

9.5 – Massive MIMO systems 269
IDFT DFT
MIMO
IDFT DFT
encoder demod/
decod.
IDFT DFT
Figure 9.13: MIMO-OFDM system.

Appendix A
Signal spaces
In this appendix, the problem of the representation of deterministic and

random signals based on their expansion in a series of orthonormal functions
is summarized. The mathematical tools so developed allow us to establish
a one-to-one correspondence between a signal space and a proper real or
complex vector space. Therefore, their use is the key to turning problems
concerning the analysis of analog signals into equivalent vector problems,
leading to a simpler analysis and interpretation. In the following, we provide
a set of relevant definitions and basic results without proof. The reader can
refer to [7, 185] for further details.
A.1 Preliminary definitions

Let us consider the set of complex signals with finite energy and support (a, b).
We will denote this set as L2 (a, b). In this set, we define the inner product
between two signals x(t) and y(t) as
ˆ b
(x, y) = x(t) y ∗ (t) dt
a
where y (t) is the complex conjugate of y(t). When the two signals are such
∗
that (x, y) = 0, they are called orthogonal. The energy of x(t) is thus the
inner product of x(t) with itself:
ˆ b
Ex = (x, x) = |x(t)|2 dt .
a
We also define the norm of a signal x(t) as the square root of its energy:
s
p ˆ b
1/2
kxk = Ex = (x, x) = |x(t)|2 dt .
a
271
272 Signal spaces
We finally define the distance between two signals as the norm of their dif-
ference
s
p
ˆ b
1/2
kx − yk = Ex−y = (x − y, x − y) = |x(t) − y(t)|2 dt .
a
If kx−yk = 0, the signals x(t) and y(t) coincide almost everywhere. Generally
speaking, the distance kx−yk can be represented as a measure of the diversity
degree between the signals x(t) and y(t).
M
Let us consider the set of M signals si (t) i=1 . They are said to be
linearly independent when the condition
M
X
ci si (t) = 0 ∀t ∈ (a, b)
i=1
can be satisfied if and only if (iff) all coefficients ci are zero. If, instead, there
M P
exists a sequence ci i=1 with some ci 6= 0 such that M i=1 ci si (t) = 0, then
the M signals are called linearly dependent, i.e., at least one signal can be
expressed as a linear combination of the others.
A.2 Signal spaces and orthonormal bases

M
Let us define “space S generated by the M signals si (t) i=1 ” as the set
of signals that can be expressed through their linear combination. When
si (t) ∈ L2 (a, b), for i = 1, 2, . . . , M, this space S is a subspace of L2 (a, b).
M
Clearly, signals si (t) i=1 form a basis for the space they generate. This
basis is not unique and we can look for other bases that can represent the
subspace in a more effective way.
N
Given N functions ϕi (t) i=1 , they form an orthonormal basis of a signal
space S if they satisfy the following conditions:
1. ϕi (t) ∈ S

1 i=j
2. ϕi , ϕj =
0 i 6= j .
3. every element of space S can be expressed as a linear combination of
N
functions ϕi (t) i=1 .
A subspace generated by M signals admits an infinite number of orthonor-

nal bases all made by the same number N of elements. It is N ≤ M (it is
A.2 – Signal spaces and orthonormal bases 273
M
N = M iff signals si (t) i=1 are linearly independent) and N is called di-
M
mension of subspace S. Given the subspace generated by signals si (t) i=1 ,
an orthonormal basis can be found by means of the so-called Gram-Schmidt
orthogonalization procedure [185, 7].
N
Once we have an orthonormal basis ϕi (t) i=1 of the subspace S, we can
express every signal s(t) ∈ S as
N
X
s(t) = si ϕi (t) . (A.1)
i=1
Coefficients si in (A.1) can be obained in a very simple way. In fact, by

computing the inner product between s(t) and ϕj (t) we obtain
!
N
X N
X
s, ϕj = si ϕ i , ϕ j = si ϕ i , ϕ j = sj .
i=1 i=1
Thus, coefficients si in (A.1) can be obtained as

si = s, ϕi i = 1, 2, . . . , N .
We can thus define the column vector s
s = (s1 , s2 , . . . , sN )T (A.2)
which is called image of s(t) ∈ S. It is easy to prove that representation (A.2)

of s(t) is unique. It allows to represent a generic signal s(t) of S as a vector
in CN : the elements of this vector are the coefficients of the representation of
s(t) on the chosen orthonormal basis and are colled components or coordinates
of s(t). We thus have a geometric or dicrete representation of signals in S.
In case of real signals, we will have similar considerations with the exception
that the image will now be a vector in RN .
Let us now consider two generic signals x(t) and y(t), both in S. Their
inner product will be
!
N
X N
X
x, y = xi ϕi , y j ϕj
i=1 i=1
N
XX N XN
= xi yj∗ ϕi , ϕj = xi yi∗ = xT y∗ .
i=1 j=1 | {z } i=1
δij
274 Signal spaces
Thus, the inner product of two signals is equal to the scalar product of the
corresponding images. If we now consider the energy of a generic signal, it
can be expressed as
Es = (s, s) = sT s∗ = ksk2
where ksk is the Euclidean norm of vector s. In addition, the norm of a signal
can be interpreted as the distance from the origin of the vector representing
the signal p
ks(t)k = Es = ksk .
Thus, the distance of two signals can be interpreted as the distance of the
corresponding images:
v
u N
uX
k x − yk = k x − yk = t |xi − yi |2 .
i=1
Since the relationship between the signals in S and their images is linear,
it results that the image of x(t) + y(t) is x + y and the image of λx(t) is λx.
In general, the image of the linear combination of many signals is the linear
combination with the same coefficients of the corresponding images.
A.3 Projection of a signal over a subspace and

complete bases
We said that a subspace S ⊂ L2 (a, b) generated by M signals has a finite
dimension N ≤ M. Every signal of S can thus be represented in an exact
way as a linear combination of the elements of any (orthonormal or not) basis
of S. However, there exist some subspaces of L2 (a, b) which have no finite
dimension. In this case, we will approximate the signals of these subspaces
as a linear combination of the elements of an orthonormal basis of finite
dimension N and we will discuss the convergence of this linear combination
when N grows.
Let us consider a generic signal x(t) ∈ L2 (a, b) not belonging to subspace
S: we will look for signal x̂(t) ∈ S approximating x(t) such that the distance
kx − x̂k is minimized. Clearly, when x(t) ∈ S, x̂(t) = x(t) and thus no
approximation is involved. It is easy to prove that, in general, signal x̂(t)
can be expressed as [7]
N
X
x̂(t) = (x, ϕi ) ϕi (t)
i=1
A.3 – Projection of a signal over a subspace and complete bases 275
N
where ϕi (t) i=1 is any orthonormal basis of subspace S. This signal x̂(t) ∈
S also satisfies the important property that the error signal x(t) − x̂(t) is
orthogonal to any signal of S, i.e., [7]

x − x̂, z = 0 ∀z(t) ∈ S .
This result is called orthogonality principle. Signal x̂(t) is called othogonal

projection of x(t) over S. The mean square value that we minimized when
computing x̂(t) can be expressed as
N
X
2 2 2 2
kx − x̂k = kxk − kx̂k = kxk − |xi |2 (A.3)
i=1
where xi = (x, ϕi ). Now let the number N of orthogonal functions increase,

so that the dimensionality of S increases. Equality (A.3) shows that, as N
gets larger, the energy of the error signal x(t) − x̂(t) diminishes, that is, the
approximation x̂(t) of x(t) becomes more accurate. This raises the following
question: if N → +∞, that is, if the orthonormal basis consists of an infinity
of functions, does the energy x(t) − x̂(t) tend to zero for any x(t) ∈ L2 (a, b)?
Generally speaking, the answer to this question is negative—in other words,
the fact that the basis consists of an infinite number of functions is not
sufficient to ensure that the equality
2
XN

lim x(t) − (x, ϕi ) ϕi (t) = 0 (A.4)
N →∞
i=1
holds for any x(t) ∈ L2 (a, b). However, if this occurs, the basis is said to be
a complete orthonormal basis. Note that, if the basis is complete, (A.4) can
also be expressed as:
+∞
X
x(t) = (x, ϕi ) ϕi (t) (A.5)
i=1
which needs to be interpreted carefully. This result states only that the series
appearing on the right hand side of (A.5) converges to x(t) in quadratic mean,
as stated by (A.4). However, this convergence does not entail the pointwise
convergence (or the uniform convergence) of such a series to x(t) in any
instant of interval (a, b).
Example A.1 The Fourier basis defined as

1
ϕn (t) = √ e 2πnt/T n = 0, ±1, ±2, . . . .
T
276 Signal spaces

is complete for all signals with support in − T2 , T2 that can be represented
through a Fourier series, that is those signals with a finite numbers of dis-
continuities and a finite number of minima and maxima. The coefficients of
this representation are the Fourier series coefficients.
Notice that the sup-
T T
port of these signals is not required to be − 2 , 2 , but can be any interval
of length T . ♦
Example A.2 The following basis

√ n
ϕn (t) = 2Bsinc 2B t − n = 0, ±1, ±2, . . . .
2B
is complete for the subspace of L2 (−∞, +∞) of signals with limited band-
width B. The coefficients for the representation of a signal of this subspace
over this basis are the signal samples. ♦
All results mentioned above and related to the bases of finite dimension
can be extended to all complete bases. For example, the inner product of two
signals can be computed as the scalar product of the corresponding complex
vectors:
∞ ∞
! ∞
∞ X ∞
X X X X
∗
(x, y) = xi ϕi , y j ϕj = xi yj (ϕi , ϕj ) = xi yj∗ . (A.6)
i=1 j=1 i=1 j=1 i=1
Similarly, it holds
∞
X
2
(x, x) = kxk = Ex = |xi |2 (A.7)
i=1
and v
u∞
uX
kx − yk = t |xi − yi |2 . (A.8)
i=1
Equation (A.6) can be seen as the generalization of the Parseval’s theorem

that holds for the Fourier basis.
A.4 Dicrete representation of a random process

Let us now consider a generic random process n(t) whose realizations belong
to S ⊂ L2 (a, b) with probability 1. Given an orthonormal basis {ϕi (t)}∞ i=1 of
S, let us suppose that we can represent the process realizations over this basis
with probability 1. Hence, the event {the realization cannot be represented}
A.4 – Dicrete representation of a random process 277
has probability zero. Under this assumption, the following representation

holds with probability 1:
∞
X
n(t) = ni ϕi (t)
i=1
where the process components {ni }, computed as
ni = (n, ϕi )
are clearly random variables, since they take a different value for each real-
ization of n(t).
We will consider convergence in quadratic mean:
( )
XN 2

lim E n(t) − (n, ϕi )ϕi (t) = 0 a≤t≤b. (A.9)
N →∞
i=1
It means that there could exist some realizations for which the quantity in
curly brackets in (A.9) is not zero, but they have zero probability and do not
affect the average.
If we know the mean value and the autocovariance of the random process
n(t), we can compute the mean and the covariances of components {ni }. Let
us consider the mean and the autocovariance of the process whose expressions
are
η(t) = E{n(t)}
C(t1 , t2 ) = E{n(t1 )n∗ (t2 )} − E{n(t1 )}E{n∗ (t2 )}
= R(t1 , t2 ) − η(t1 )η ∗ (t2 )
having denoted by R(t1 , t2 ) the autocorrelation function of n(t). We thus

have
n o ˆ b
∗
E{ni } = E (n, ϕi ) = E n(t)ϕi (t) dt
a
ˆ b ˆ b
∗
= E {n(t)} ϕi (t) dt = η(t)ϕ∗i (t) dt
a a
= (η, ϕi ) .
As far as the covariance of components ni and nj is concerned, whose expres-

sion is
cov{ni , nj } = E{ni n∗j } − E{ni }E{n∗j }
278 Signal spaces
we can first consider that

ˆ b ˆ b
E{ni n∗j } = E n(t1 )ϕ∗i (t1 ) dt1
n (t2 )ϕj (t2 )dt2 ∗
a a
ˆ bˆ b n o
= E n(t1 )n∗ (t2 ) ϕ∗i (t1 )ϕj (t2 ) dt1 dt2
a a
ˆ bˆ b
= R(t1 , t2 )ϕ∗i (t1 )ϕj (t2 ) dt1 dt2
a a
and ˆ bˆ b
E{ni }E{n∗j } = η(t1 )ϕ∗i (t1 )η ∗ (t2 )ϕj (t2 ) dt1 dt2 .
a a
Thus
ˆ bˆ bh i
cov{ni , nj } = R(t1 , t2 ) − η(t1 )η ∗ (t2 ) ϕ∗i (t1 )ϕj (t2 ) dt1 dt2
a a | {z }
C(t1 ,t2 )
ˆ bˆ b
= C(t1 , t2 )ϕ∗i (t1 )ϕj (t2 ) dt1 dt2 .
a a
In particular, the variance of ni can be computed as

ˆ bˆ b
2
σni = cov{ni , ni } = C(t1 , t2 )ϕ∗i (t1 )ϕi (t2 ) dt1 dt2
a a
When a process is white, and thus has a covariance
C(t1 , t2 ) = qδ(t1 − t2 )
all its component on every orthonormal basis are uncorrelated. In fact

ˆ bˆ b
cov{ni , nj } = qδ(t1 − t2 )ϕ∗i (t1 )ϕj (t2 ) dt1 dt2
a a
ˆ b
= qϕ∗i (t1 )ϕj (t1 ) dt1 = qδij .
a
We can observe that the white noise cannot be represented over a basis since
its realizations do not have finite energy. However, if we project it on an
orthonormal basis we obtain uncorrelated components that, if the process is
also Gaussian, are also independent.
More in general, if a process is not white, we can ask if there exists a
complete orthonormal basis such that all components of n(t) result to be un-
correlated. It can be proved that the functions {ϕi (t)} and the corresponding
A.4 – Dicrete representation of a random process 279
constants {λi } that produce this condition are given by the Karhunen-Loève
theorem [7]. This theorem states that the signals {ϕi (t)} and the associ-
ated constants {λi } are the solutions of the homogeneous Fredholm integral
equation:
ˆ b
C(t1 , t2 )ϕi (t2 ) dt2 = λi ϕi (t1 ) a ≤ t1 ≤ b .
a
This is an integral form of an eigenequation, with a kernel C(t1 , t2 ), where

{ϕi (t)} are the normalized eigenfunctions and {λi } are the corresponding
eigenvalues. From the theory of integral equations it can be shown that, if
the kernel C(t1 , t2 ) is Hermitian in its arguments, that is:
C(t1 , t2 ) = C ∗ (t2 , t1 )
then the following properties hold:
P1. The eigenvalues are real ;
P2. The eigenfunctions associated with distinct eigenvalues are orthogonal ;
P3. If the kernel is square integrable, that is, if:

ˆ bˆ b
|C(t1 , t2 )|2 dt1 dt2 < ∞
a a
then each eigenvalue λi 6= 0 has a finite number of associated orthogonal

eigenfunctions;
P4. If the kernel is positive definite, its eigenfunctions form a complete or-
thonormal set;
P5. If the kernel is nonnegative definite, it can be expanded as:

∞
X
C(t1 , t2 ) = λi ϕi (t1 )ϕ∗i (t2 ) (A.10)
i=1
a result known as Mercer’s theorem.
As far as the last point is concerned, it is worth noting that a random process
with a nondegenerate kernel will have an infinite number of eigenvalues and
will theoretically require the infinite expansion of (A.10). However, in many
cases of practical interest, the spectrum of eigenvalues will remain significant
for a finite number of eigenvalues, before decaying away to zero. Therefore,
280 Signal spaces
only a finite number need be considered as significant for a given machine

accuracy.
Let us now consider a generic real Gaussian process n(t). Its projec-
tion on a space with dimension N will provide components n1 , . . . , nN which
are jointly Gaussian random variables. In fact, they are obtained from the
original process through a linear transformation. In vector notation, we can
define
n = (n1 , n2 , . . . , nN )T
which is a random vector representing the image of this projection. We will
also define
η = E{n} = (η1 , η2 , . . . , ηN )T
as the mean vector and
C = E{nnT } − η η T
 
σn2 1 cov {n1 , n2 } . . . cov {n1 , nN }
 cov {n2 , n1 } σn2 1 . . . cov {n2 , nN } 
 
=  .. .. .. .. 
 . . . . 
2
cov {nN , n1 } cov {nN , n2 } ... σnN
the covariance matrix. The probability density function of vector n results

to be

1 1 T −1

f (n) = p exp − (n − η) C (n − η ) .
(2π)N detC 2
If we used a Karhunen-Loève basis, components of vector n result to be

independent, hence uncorrelated, and the covariance matrix is diagonal.
A.5 Extraction of the signal components

We saw that the ith component of signal x(t) over the orthonormal basis
{ϕi (t)} can be found as
ˆ b
xi = (x, ϕi ) = x(t)ϕ∗i (t) dt .
a
Form a practical point of view, we can extract xi by using the correlator
represented in Fig. A.1(a). As an alternative, we can see xi as the result of
a filtering. With this aim, we can extend the definition of ϕi (t) as
ϕi (t) = 0 when t ≤ a and t ≥ b .

A.5 – Extraction of the signal components 281
We thus have
ˆ b ˆ +∞
xi = x(t)ϕ∗i (t) dt = x(t)ϕ∗i (t) dt .
a −∞
By defining
hi (t) = ϕ∗i (b − t)
and
ˆ +∞
yi (t) = x(t) ⊗ hi (t) = x(τ )hi (t − τ ) dτ
−∞
ˆ +∞
= x(τ )ϕ∗i (b − t + τ ) dτ
−∞
we thus have
ˆ +∞ ˆ b
yi (b) = x(τ )hi (b − τ ) dτ = x(τ )ϕ∗i (τ ) dτ = xi .
−∞ a
Component xi can thus be obtained by sampling at time b the output of a

filter matched to ϕi (t), thus with impulse response ϕ∗i (b − t), as shown in
Fig. A.1(b).
t=b
x(t) Rb xi x(t) yi (t) xi
a
dt hi (t)
ϕ∗i (t)
(a) (b)
Figure A.1: Correlator (a) and matched filter (b).

Appendix B
Detection and estimation theory
B.1
283
Appendix C
Elements of information theory
This appendix briefly summarizes a few concepts from information theory

that will be assumed as prior knowledge in the book. For more details, the
reader can refer to [50, 122, 51].
C.1 Definitions for discrete random variables

We first define the concept of entropy, which is a measure of the uncertainty
of a random variable. Let X be a discrete random variable with alphabet
X and probability mass function (pmf) P (x). We may regard this random
variable as an output of a discrete message source. The entropy H(X) of X
is defined by
X
1
H(X) = − P (x) log2 P (x) = E log2 . (C.1)
x∈X
P (X)
The entropy is a measure of the amount of information required on the av-

erage to describe the random variable. According to (C.1), the entropy of X
1
can be also interpreted as the expected value of the random variable log2 P (X) ,
where X is drawn according to the probability mass function P (x).
The joint entropy H(X, Y ) of a pair of discrete random variables (X, Y ),
with alphabets X and Y, respectively, and with a joint distribution P (x, y),
is defined as
XX
1
H(X, Y ) = − P (x, y) log2 P (x, y) = E log2 . (C.2)
x∈X y∈Y
P (X, Y )
We also define the conditional entropy of a random variable given an-

other as the expected value of the entropies of the conditional distributions,
averaged over the conditioning random variable, i.e.,
285
286 Elements of information theory
X
H(Y |X) = − P (x) log2 H(Y |X = x)
x∈X
X X
=− P (x) P (y|x) log2 P (y|x)
x∈X y∈Y
XX
=− P (x, y) log2 P (y|x)
x∈X y∈Y
= − E {log2 P (Y |X)} . (C.3)
It is possible to demonstrate that (chain rule) [51]
H(X, Y ) = H(X) + H(Y |X) = H(Y ) + H(X|Y ) . (C.4)
In addition, it is
H(X, Y ) ≤ H(X) + H(Y ) . (C.5)
The relative entropy or Kullback-Leibler distance, or divergence, is a mea-

sure of the distance between two distributions P (X) and Q(X). In statistics,
it arises as an expected logarithm of the likelihood ratio, i.e.,
X
P (x) P (X)
D(P ||Q) = P (x) log2 = E log2 (C.6)
x∈X
Q(x) Q(X)
The relative entropy D(P ||Q) is a measure of the inefficiency of assuming that
the distribution is Q when the true distribution is P . The relative entropy
is always nonnegative and is zero if and only if P = Q [51]. However, it is
not a true distance between distributions since it is not symmetric and does
not satisfy the triangle inequality. Nonetheless, it is often useful to think of
relative entropy as a “distance” between distributions.
We now introduce mutual information, which is a measure of the amount
of information that one random variable contains about another random
variable. In other words, it is the reduction in the uncertainty of one random
C.1 – Definitions for discrete random variables 287
variable due to the knowledge of the other. It is defined as

XX P (x, y)
I(X; Y ) = P (x, y) log2
x∈X y∈Y
P (x)P (y)
=D(P (x, y)||P (x)P (y))

P (X, Y )
=E log2
P (X)P (Y )

P (Y |X)
=E log2
P (Y )

P (X|Y )
=E log2
P (X)
=H(X) − H(X|Y )
=H(Y ) − H(Y |X) . (C.7)
Mutual information is nonnegative, being zero only when X and Y are inde-
pendent.
Mutual information also satisfies a chain rule:
n
X
I(X1 , X2 , . . . , Xn ; Y ) = I(Xi ; Y |Xi−1 , Xi−2 , . . . , X1 ) .
i=1
We now prove another property which is often very useful. First of all,
we prove that
I(X; Y, Z) = I(X; Y |Z) + I(X; Z) . (C.8)
In fact

P (Y, Z|X) P (Y |X, Z)P (Z|X)
I(X; Y, Z) = E log2 = E log2
P (Y, Z) P (Y |Z)P (Z)

P (Y |X, Z) P (Z|X)
= E log2 + E log2
P (Y |Z) P (Z)
= I(X; Y |Z) + I(X; Z) .
Similarly,
I(X; Y, Z) = I(X; Z|X) + I(X; Y ) . (C.9)
Now, consider a random variable Y which is a stochastic transformation
of two independent random variables X and Z. In this case, it is
I(X; Y ) ≤ I(X; Y |Z) . (C.10)

The proof is quite simple. In fact, from (C.8), we have that
I(X; Y, Z) = I(X; Y |Z)
since I(X; Z) = 0 due to the independence of X and Z. On the other hand,

from (C.9), we have
I(X; Y, Z) ≥ I(X; Y )
since I(X; Z|X) ≥ 0. Hence
I(X; Y |Z) ≥ I(X; Y ) .
C.2 Capacity for the discrete memoryless chan-

nel
Let us now consider a discrete memoryless channel (DMC) with X as input
and Y as output. A DMC is characterized by the following two properties:
• The alphabets of its input and output are discrete.
• It does not have memory, that is, it acts on each of its input data
independently of all other data.
The capacity of this DMC is defined as the maximum of the mutual infor-
mation I(X; Y ) over the set of input pmfs P (X):
C = max I(X; Y ) (bits/channel use) . (C.11)

P (X)
The pmf P (X) that provides the maximum value of the mutual information
is called capacity-achieving distribution.
C.3 Extension to the transmission of a sequence

To this point, we have assumed a one-shot transmission, where a single input
symbol X produces a single output Y . To model real digital communication
systems that send sequences, and to look ahead toward coded systems, we
consider now the transmission of an N-tuple X = (X1 , X2 , . . . , XN ), which
produces a channel output Y = (Y1 , Y2 , . . . , YN ). We assume that inputs
{Xi } belong to the alphabet X , whereas outputs {Yi } belong to the alpha-
bet Y. We make the further strong assumption that the channel acts on
C.3 – Extension to the transmission of a sequence 289
each input in an independent way, i.e., that the channel is memoryless. Al-
though this is not always true (see Chapter 1 for details), we will remove this
assumption later. The memoryless assumption means that
N
Y
P (y|x) = P (yi |xi ) .
i=1
The mutual information is now

P (Y|X)
I(X; Y) = E log2
P (Y)
XX
P (y|x)
= P (x, y) log2 .
x y
P (y)
It is easy to prove that

N
X
I(X; Y) ≤ I(Yi |Xi )
i=1
with equality if and only if variables Yi are independent. This latter condition
holds if and only if inputs are independent, given the memoryless channel
model. In addition, given the definition (C.11), it is
N
X
I(X; Y) ≤ I(Yi |Xi ) ≤ NC
i=1
where C is the capacity in bits/channel use, i.e., the capacity corresponding

to a one-shot transmission. So in order to have I(X; Y) = NC, it must be
N
Y
P (x) = P (xi )
i=1
where P (xi ) achieves capacity in the sense of (C.11).

Another important relation for the mutual information is associated with
a cascade of channels. Let us assume that the output Y of the channel goes
at the input of another channel or a signal processor that produces the output
Z. It can be demonstrated that
I(X; Z) ≤ I(X; Y) . (C.12)
This is the so called data processing inequality that, in essence, states that
the average mutual information cannot be increased by further processing,
X Y
Figure C.1: Additive white Gaussian channel.
either deterministic or stochastic. This is, in some way, expected. In fact,

communication systems are made of processors such as quantizers, samplers,
encoders, and decoders. The inequality does not imply that they are neces-
sarily harmful, since they often simply manipulate data into another form to
preserve the information but, on the other hand, the additional processing
cannot increase the information transfer. In particular, when Z is a sufficient
statistic, it is I(X; Z) = I(X; Y) [51].
C.4 Extension to continuous random variables

All these definitions can be easily generalized to case X and/or Y are con-
tinuous random variables with probability density functions (pdfs) f (x) and
f (y), respectively. In this case, integrals will take place of summations and
pdfs will take place of pmfs.
Let us consider a channel whose output Y can be expressed as
Y =X +Z
where Z is a Gaussian random variable with mean zero and variance σ 2 , and
is independent of X. This channel is shown in Fig. C.1. To make the problem
well posed, we introduce a power constraint on the input of the channel, i.e.,
we will assume that E{X 2 } ≤ PX . For this channel, the capacity-achieving
distribution is
1 − x
2
f (x) = √ e 2PX
2πPX
and the channel capacity results to be

1 PX
C = log2 1 + 2 (bits/channel use) (C.13)
2 σ
C.5 – The vector Gaussian channel 291
C.5 The vector Gaussian channel

We now consider the vector Gaussian channel, i.e., a channel with input
X = (X1 , X2 , . . . , XN ) and output Y = (Y1 , Y2 , . . . , YN ) such that
Y i = Xi + Z i .
This model can be obtained from successive transmissions in time or through

simultaneous use of several frequencies, or channels, and so on. We assume
that each variable Zi is a zero-mean Gaussian variable with variance σi2 ,
i = 1, 2, . . . , N, and that they are all independent. A total energy constraint
on the input vector is imposed in the form
N
X
E{Xi2 } ≤ PX .
i=1
The mutual information I(X; Y) is maximized when the components of Y

are independent and Gaussian, implying that the input components in X
are independent Gaussian variables as well. The power has to be distributed
over the components according to the following distributions:
(
E{Xi2 } + σi2 = A if A ≥ σi2
E{Xi2 } = 0 if A < σi2
where A is such that the power constraint is satisfied. In other words, for
those components of the input vector that are allocated any power, the sum
of this power together with the noise variance must be constant according to
the so called water-pouring (or water-filling) technique, exemplified in Fig.
C.2.
The capacity resulting from this optimal allocation is
N
X
1 E{Xi2 }
C= log2 1+ (bits/vector channel use) . (C.14)
i=1
2 σi2
C.6 The bandlimited AWGN channel

Let us now consider a baseband bandlimited AWGN channel, having band-
width B and ideal frequency response H(f ) = 1 over this bandwidth. The
received signal can be thus expressed as
Y (t) = X(t) + Z(t) (C.15)

1111
0000
0000
1111 111
000
000
111
0000
1111 000
111
0000
1111 000
111
0000
1111 000
111
0000
1111
0000
1111 000
111
000
111
0000
1111 000
111
0000
1111
0000
1111 000
111
000
111
0000
1111 000
111
0000
1111 000
111
0000
1111 000
111
0000
1111
0000
1111 000
111
000
111
0000
1111 A 2 000
111
0000
1111 σN 000
111
0000
1111
0000
1111 σ22 000
111
000
111
0000
1111 σ32 000
111
0000
1111 000
111
0000
1111 2
σN 000
111
0000
1111
0000
1111 σ12 −1 000
111
000
111
0000
1111 000
111
0000
1111 000
111
Figure C.2: Illustration of the water-filling technique.
where X(t) is the channel input and Z(t) a bandlimited Gaussian noise pro-
cess with power spectral density N0 /2. We will impose a power constraint
on the input process, i.e., E{X 2 (t)} ≤ PX . A vector model for this channel
may be obtained by uniformly sampling Y (t) at a rate equal to 2B. We can
thus work on the equivalent vector channel
Y = X+Z (C.16)
where Y = (Y0 , Y1 , . . . ), X = (X0 , X1 , . . . ), and Z = (Z0 , Z1 , . . . ) are the

samples of Y (t), X(t), and Z(t), respectively. Variables Zi are uncorrelated
and, being jointly Gaussian, are also independent. In addition, they have
mean zero and variance σz2 = N0 B.
We can now use the results of the previous section. Since the noise is
the same over all components/samples, the power allocation will be uniform.
Let us consider the capacity CT related to an interval of duration T . The
channel capacity C in bits/second will be
CT
C = lim
T →∞ T
According to the sampling rate of 2B samples per second, in this interval,

a number of samples N = 2BT will be obtained and thus, using (C.14)

N PX PX
CT = log2 1 + = BT log2 1 + (bits/interval) .
2 N0 B N0 B
C.7 – Extension to correlated processes and channels with memory 293
The channel capacity in bits/second is thus

PX
C = B log2 1 + (bits/second) .
N0 B
The channel capacity provides a fundamental limit on the maximum data
rate that can be transmitted over a channel with asymptotically small proba-
bility of error. Suppose we want to transmit binary information at a rate (en-
tropy) Rb bits/second and we model the bits as independent. If Rb does not
exceed channel capacity, there exists a code that allows essentially error-free
transmission [186]. This result applies with some variation to any channel.
C.7 Extension to correlated processes and chan-

nels with memory
Let us now consider a discrete-time stochastic process which, in practice, is
a sequence {Xi } of random variables. The entropy rate of this discrete-time
stochastic process is defined by
1
h(X) = lim H(X1 , X2 , . . . , Xn )
N →∞ n
when this limit exists. Similarly, if we have two discrete-time processes {Xi }
and {Yi }, the information rate (IR) is defined as
1
i(X; Y) = lim I(X1 , X2 , . . . , Xn ; Y1, Y2 , . . . , Yn ) (bits/channel use)
N →∞ N
when this limit exists.

Let us now consider a channel with input {Xi } and output {Yi }. If the
channel is memoryless and the input symbols Xi are independent, it is clearly
i(X; Y) = I(Xi , Yi )
and the computation of the information rate is quite trivial, at least numer-
ically. When the channel has finite memory, a simulation-based technique,
described in Chapter 5, is available.
The channel capacity in bits/channel use is defined as the maximum of
the information rate over the set of all possible input distributions. However,
if we are constrained to use independent and uniformly distributed symbols
belonging to a specific constellation, it could make sense to simply compute
what is called constrained capacity, i.e., the IR constrained to that input
constellation.1
1
Constrained capacity is sometimes called i.u.d. information rate or in many other
ways.
C.8 Nonergodic channels

Let us consider a fast fading channel. Here fast means that over the duration
of the transmission, we can expect the channel to pass through all possible
states. This allows us to define an ergodic channel capacity as the average or
expected channel capacity of all possible channel states.
When the channel is very slowly varying or when it is modeled as con-
stant over the length of one or more data frames, a duration that may last
for many thousands of symbols, but changing randomly at the end of each
block, we need to consider the concept of outage capacity [125]. It is very
likely that there will be time intervals (or blocks) during which it is im-
possible to achieve a low error probability regardless of the signaling rate.
Under such circumstances the channel will be considered to be in outage.
As a result, we need to consider channel capacity when there is a nonzero
probability Pout that the channel is in outage, and therefore, unusable in the
sense that the desired data rate cannot be achieved with arbitrarily low error
rate. In fact, under such conditions, the channel capacity becomes a random
variable and may take on arbitrarily small values with nonzero probability,
so that arbitrarily low probability of error cannot be achieved regardless of
the codes that are chosen [125]. The outage probability, Pout , thus specifies
the probability of not achieving a given channel capacity. The maximum rate
that can be supported by the channel with a given outage probability is the
outage capacity.
Appendix D
Block and convolutional codes
This appendix briefly describes some essential aspects of traditional coding

schemes, in particular block and convolutional codes, required to understand
modern coding schemes and signal space codes considered in Chapters 4, 7,
and 9. Many significant aspects will be omitted here since not important for
our purposes. For a deeper understanding, the reader can refer to [1, 2, 3, 4].
295
Appendix E
Bilateral Z-transform and some of

its properties
This appendix briefly describes the Z-transform and some of its properties
that will be used in the derivation of the whitening filter described in Section
2.
The bilateral or two-sided Z-transform of a discrete-time (real or complex)
signal xn is the power series X(z) defined as
∞
X
X(z) = xn z −n .
n=−∞
The independent variable z is complex. The region of convergence (ROC) is

the set of points in the complex plane for which the Z-transform summation
converges. The ROC is typically a circular band, i.e., a region bounded by
two concentric circles:
ROC = {z|r1 < |z| < r2 } .
When the ROC includes the unit circle, the Z-transfor computed on it
gives the Fourier transform of the discrete time sequence, i.e.,
∞
X
2πf T
X(e )= xn e−2πnf T .
n=−∞
The inverse Z-transform can be computed as
1
˛
xn = X(z)z n−1 dz
2πj
C
297
298 Bilateral Z-transform and some of its properties
where C is is a counterclockwise closed path encircling the origin and entirely

in the ROC. A special case of this contour integral occurs when C is the unit
circle. This contour can be used when the ROC includes the unit circle. With
this contour, the inverse Z-transform simplifies to the inverse discrete-time
Fourier transform
ˆ 1/2T
xn = X(e2πf T )e2πnf T df T
−1/2T
since on the unit circle z = e2πf T .

Some of the properties of the Z-transform are reported in Table E.1. For
a proof, see for example [187].
Table E.1: Properties of the Z-transform.

Name Time domain Z-domain ROC
Linearity axn + byn aX(z) + bY (z) Contains ROCx ∩ ROCy
ROCx , except z = 0 if k > 0
Time shift xn−k z −k X(z)
and z = ∞ if k < 0
1
Time reversal x−n X(z )−1
r2
< |z| < r11
Complex conjugation x∗n X ∗ (z ∗ ) ROCx
Convolution xn ⊗ yn X(z)Y (z) Contains ROCx ∩ ROCy
Example E.1. Let us consider the Z-transform of the causal sequence
xn = an u[n]
where u[n] is the unit step function defined as

(
1 n≥0
u[n] = .
0 n<0
By definition
∞
X ∞
X
n −n 1 z
X(z) = a z = (az −1 )n = −1
=
n=0 n=0
1 − az z−a
provided that |az −1 | < 1. The region of convergence is thus |z| > |a|. It
contains the unit circle provided that |a| < 1 and thus provided that the
sequence is stable. This Z-transform has a zero in the origin of the complex
plane and a pole in z = a. They are shown, along with the ROC, in Fig.
E.1. ♦
299
ℑ[z]
1111111111
0000000000
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
01 a 01 1 ℜ[z]
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
Figure E.1: ROC, pole and zero for the Z-transform of xn = an u[n] in the
case |a| < 1.
Example E.2. Let us consider the Z-transform of the anticausal sequence
xn = −bn u[−n − 1] .
Its Z-transform results to be

∞
X ∞
X ∞
X
n −n −n n
X(z) = − b z =− b z =1− (b−1 z)
n=−1 n=1 n=0
1 z
=1− −1
=
1−b z z−b
provided that |b−1 z| < 1. The region of convergence is thus |z| < |b|. It
contains the unit circle provided that |b| > 1 and thus provided that the
sequence is stable. This Z-transform has a zero in the origin of the complex
plane and a pole in z = b. They are shown, along with the ROC, in Fig. E.2.
When b = a of the previous example, we obtai exactly the same Z-transform.
The only difference is the ROC. ♦
Example E.3. Let us consider the following Z-transform

z z
X(z) = +
z−a z−b
and let assume that |a| < |b|. We have three possible ROCs. When the ROC
is |z| > |b|, this X(z) corresponds to the causal sequence
xn = an u[n] + bn u[n] .
300 Bilateral Z-transform and some of its properties
ℑ[z]
111111111
000000000
000000000
111111111
000000000
111111111
000000000
111111111
000000000
111111111
000000000
111111111
000000000
111111111
000000000
111111111
1 b
000000000 ℜ[z]
111111111
11
00
00
11
000000000
111111111
000000000
111111111
000000000
111111111
000000000
111111111
000000000
111111111
000000000
111111111
000000000
111111111
000000000
111111111
Figure E.2: ROC, pole and zero for the Z-transform of xn = an u[n] in the
case |a| < 1.
When the ROC is |a| < |z| < |b|, X(z) corresponds to the sequence
xn = an u[n] − bn u[−n − 1] .
Finally, when the ROC is |z| < |a|, this X(z) corresponds to the anticausal
sequence
xn = −an u[−n − 1] − bn u[−n − 1] .
♦
From these examples we can conclude that the stability of a system can
also be determined by knowing the ROC alone. If the ROC contains the unit
circle then the system is stable. On the other hand, from the knowledge of
the ROC, we can understand if the sequence is causal or not.
Bibliography
[1] A. J. Viterbi and J. K. Omura, Principles of Digital Communication

and Coding. McGraw-Hill, 1965.
[2] S. Benedetto and E. Biglieri, Principles of Digital Transmission: With

Wireless Applications. Norwell, MA, USA: Kluwer Academic Publish-
ers, 1999.
[3] J. Proakis and M. Salehi, Digital Communications, 5th ed., McGraw-

Hill, Ed. McGraw-Hill, 2008.
[4] G. Vitetta, D. P. Taylor, G. Colavolpe, F. Pancaldi, and P. Martin,

Wireless Communications: Algorithmic Techniques. John Wiley &
Sons, 2013.
[5] A. J. Viterbi, “Error bounds for convolutional codes and an asymp-

totically optimum decoding algorithm,” IEEE Trans. Inform. Theory,
vol. 13, pp. 259–260, Apr. 1967.
[6] G. D. Forney, Jr., “The Viterbi algorithm,” Proc. IEEE, vol. 61, pp.
268–278, Mar. 1973.
[7] H. L. Van Trees, Detection, Estimation, and Modulation Theory - Part

I. John Wiley & Sons, 1968.
[8] G. D. Forney, Jr., “Maximum-likelihood sequence estimation of digital

sequences in the presence of intersymbol interference,” IEEE Trans. In-
form. Theory, vol. 18, pp. 284–287, May 1972.
[9] G. Ungerboeck, “Adaptive maximum likelihood receiver for carrier-

modulated data-transmission systems,” IEEE Trans. Commun., vol.
com-22, pp. 624–636, May 1974.
[10] J. E. Mazo, “Faster-than-Nyquist signaling,” Bell System Tech. J.,

vol. 54, pp. 1450–1462, Oct. 1975.
301
302 BIBLIOGRAPHY
[11] A. Liveris and C. N. Georghiades, “Exploiting faster-than-Nyquist sig-

naling,” IEEE Trans. Commun., vol. 47, pp. 1502–1511, Sep. 2003.
[12] F. Rusek and J. B. Anderson, “The two dimensional Mazo limit,” in

Proc. IEEE International Symposium on Information Theory, Ade-
laide, Australia, Nov. 2005, pp. 970–974.
[13] W. C. Jakes, Microwave Mobile Communications. New York: John

Wiley & Sons, 1974.
[14] G. Colavolpe and R. Raheli, “Noncoherent sequence detection,” IEEE

Trans. Commun., vol. 47, pp. 1376–1385, Sep. 1999.
[15] G. Ferrari, G. Colavolpe, and R. Raheli, Detection Algorithms for Wire-

less Communications. John Wiley & Sons, 2004.
[16] D. Divsalar and M. Simon, “Multiple-symbol differential detection of

MPSK,” IEEE Trans. Commun., vol. 38, pp. 300–1308, Mar. 1990.
[17] D. Divsalar, M. K. Simon, and M. Shahshahani, “The performance of

trellis-coded MDPSK with multiple symbol detection,” IEEE Trans.
Commun., vol. 38, pp. 1391–1403, Sep. 1990.
[18] D. Divsalar and M. K. Simon, “Maximum-likelihood differential de-

tection of uncoded and trellis coded amplitude phase modulation over
AWGN and fading channels - metrics and performance,” IEEE Trans.
Commun., vol. 42, pp. 76–89, Jan. 1994.
[19] J. Lodge and M. Moher, “Maximum likelihood estimation of CPM

signals transmitted over Rayleigh flat fading channels,” IEEE Trans.
Commun., vol. 38, pp. 787–794, Jun. 1990.
[20] D. Makrakis, P. T. Mathiopoulos, and D. Bouras, “Optimal decoding

of coded PSK and QAM signals in correlated fast fading channels and
AWGN: A combined envelope, multiple differential and coherent de-
tection approach,” IEEE Trans. Commun., vol. 42, pp. 63–75, January
1994.
[21] G. M. Vitetta and D. P. Taylor, “Maximum likelihood decoding of

uncoded and coded PSK signal sequences transmitted over Rayleigh
flat-fading channels,” IEEE Trans. Commun., vol. 43, no. 11, pp. 2750–
2758, Nov. 1995.
BIBLIOGRAPHY 303
[22] X. Yu and S. Pasupathy, “Innovations-based MLSE for Rayleigh fading

channels,” IEEE Trans. Commun., vol. 43, pp. 1534–1544, February-
April 1995.
[23] ——, “Error performance of innovations-based MLSE for Rayleigh fad-

ing channel,” IEEE Trans. Veh. Tech., vol. 45, no. 4, pp. 631–642,
1996.
[24] A. N. D’Andrea, U. Mengali, and R. Reggiannini, “The modified

Cramer-Rao bound and its application to synchronization problems,”
IEEE Trans. Commun., vol. 42, pp. 1391–1399, Feb. 1994.
[25] D. C. Rife and R. R. Boorstyn, “Single tone parameter estimation from

discrete-time observations,” IEEE Trans. Inform. Theory, vol. 20, pp.
591–598, September 1974.
[26] U. Mengali and A. N. D’Andrea, Synchronization Techniques for Dig-

ital Receivers (Applications of Communications Theory). Plenum
Press, 1997.
[27] R. Raheli, A. Polydoros, and C. Tzou, “Per-survivor processing: A

general approach to MLSE in uncertain environments,” IEEE Trans.
Commun., vol. 43, pp. 354–364, February-April 1995.
[28] A. J. Viterbi and A. M. Viterbi, “Nonlinear estimation of PSK-

modulated carrier phase with application to burst digital transmission,”
IEEE Trans. Inform. Theory, vol. 29, pp. 543–551, July 1983.
[29] G. Colavolpe, A. Barbieri, and G. Caire, “Algorithms for iterative de-

coding in the presence of strong phase noise,” IEEE J. Select. Areas
Commun., vol. 23, no. 9, pp. 1748–1757, Sep. 2005.
[30] H. Meyr, M. Oerder, and A. Polydoros, “On sampling rate, analog

prefiltering, and sufficient statistics for digital receivers,” IEEE Trans.
Commun., vol. 42, pp. 3208–3214, Dec. 1994.
[31] J. B. Anderson, T. Aulin, and C.-E. W. Sundberg, Digital Phase Mod-

ulation. New York: Plenum Press, 1986.
[32] A. Barbieri, D. Fertonani, and G. Colavolpe, “Spectrally-efficient con-

tinuous phase modulations,” IEEE Trans. Wireless Commun., vol. 8,
pp. 1564–1572, Mar. 2009.
304 BIBLIOGRAPHY
[33] A. Piemontese, A. Graell i Amat, and G. Colavolpe, “Frequency pack-

ing and multiuser detection for CPMs: how to improve the spectral
efficiency of DVB-RCS2 systems,” IEEE Wireless Commun. Letters,
vol. 2, no. 1, pp. 74–77, Feb. 2013.
[34] B. E. Rimoldi, “A decomposition approach to CPM,” IEEE Trans. In-

form. Theory, vol. 34, pp. 260–270, Mar. 1988.
[35] P. A. Laurent, “Exact and approximate construction of digital phase

modulations by superposition of amplitude modulated pulses (AMP),”
IEEE Trans. Commun., vol. 34, pp. 150–160, Feb. 1986.
[36] U. Mengali and M. Morelli, “Decomposition of M-ary CPM signals into

PAM waveforms,” IEEE Trans. Inform. Theory, vol. 41, pp. 1265–1275,
Sep. 1995.
[37] G. Ungerboeck and I. Csajka, “On improving data link performance by

increasing the channel alphabet and introducing sequence coding,” in
Proc. Int. Conf. Info. Theory,, Ronneby, Sweden, Jun. 1976.
[38] G. Ungerboeck, “Channel coding with multilevel/phase signals,” IEEE

Trans. Inform. Theory, vol. 28, pp. 55–67, Jan. 1982.
[39] A. R. Calderbank and N. J. A. Sloane, “New trellis codes based on

lattices and cosets,” IEEE Trans. Inf. Theory, vol. 33, pp. 177–195,
Mar. 1987.
[40] G. D. Forney, Jr., “Coset codes - Part I: Introduction and geometrical

classification,” IEEE Trans. Inf. Theory, vol. 34, no. 5, pp. 1123–1151,
Sep. 1988.
[41] E. Biglieri, D. Divsalar, P. J. McLane, and M. K. Simon, Introduc-

tion to Trellis-Coded Modulation with Applications. New York, NY:
Macmillan Publishing Company, 1991.
[42] R. W. Chang and J. C. Hancock, “On receiver structures for channels

having memory,” IEEE Trans. Inform. Theory, vol. 12, pp. 463–468,
Oct. 1966.
[43] L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decoding

of linear codes for minimizing symbol error rate,” IEEE Trans. In-
form. Theory, vol. 20, pp. 284–287, Mar. 1974.
BIBLIOGRAPHY 305
[44] Y. Li, B. Vucetic, and Y. Sato, “Optimum soft-output detection for

channels with intersymbol interference,” IEEE Trans. Inform. Theory,
vol. 41, pp. 704–713, May 1995.
[45] G. Colavolpe and A. Barbieri, “On MAP symbol detection for ISI chan-
nels using the Ungerboeck observation model,” IEEE Commun. Letters,
vol. 9, no. 8, pp. 720–722, Aug. 2005.
[46] S. Benedetto, D. Divsalar, G. Montorsi, and F. Pollara, “Soft-output

decoding algorithms for continuous decoding of parallel concatenated
convolutional codes,” in Proc. IEEE Intern. Conf. Commun., Dallas,
Texas, U.S.A., Jun. 1996, pp. 112–117.
[47] P. Roberston, E. Villebrun, and P. Hoeher, “Optimal and sub-optimal

maximum a posteriori algorithms suitable for turbo decoding,” Eu-
ropean Trans. Telecommun., vol. 8, no. 2, pp. 119–125, March/April
1997.
[48] G. Battail, “Pondération des symboles décodé par l’algorithme de

Viterbi,” Annals of Telecommun., vol. 42, pp. 31–38, Jan. 1987, (in
French).
[49] J. Hagenauer and P. Hoeher, “A Viterbi algorithm with soft-decision

outputs and its applications,” in Proc. IEEE Global Telecommun.
Conf., Dallas, TX, U.S.A., Nov. 1989, pp. 1680–1686.
[50] C. Shannon, “A mathematical theory of communication,” Bell System

Tech. J., pp. 379–423, Jul. 1948.
[51] T. M. Cover and J. A. Thomas, Elements of Information Theory,

2nd ed. New York: John Wiley & Sons, 2006.
[52] D. Arnold and H.-A. Loeliger, “On the information rate of binary-input
channels with memory,” in Proc. IEEE Intern. Conf. Commun., vol. 9,
June 2001, pp. 2692–2695.
[53] V. Sharma and S. K. Singh, “Entropy and channel capacity in the regen-
erative setup with application to Markov channels,” in Proc. IEEE In-
ternational Symposium on Information Theory, Washington, DC, Jun.
2001, p. 283.
[54] H. D. Pfister, J. B. Soriaga, and P. H. Siegel, “On the achievable

information rates of finite-state ISI channels,” in Proc. IEEE Global
Telecommun. Conf., San Antonio, TX, 2001, pp. 2992–2996.
306 BIBLIOGRAPHY
[55] D. M. Arnold, H.-A. Loeliger, P. O. Vontobel, A. Kavčić, and W. Zeng,

“Simulation-based computation of information rates for channels with
memory,” IEEE Trans. Inform. Theory, vol. 52, no. 8, pp. 3498–3508,
Aug. 2006.
[56] N. Merhav, G. Kaplan, A. Lapidoth, and S. Shamai, “On information

rates for mismatched decoders,” IEEE Trans. Inform. Theory, vol. 40,
no. 6, pp. 1953–1967, Nov. 1994.
[57] A. Kavčić, X. Ma, and M. Mitzenmacher, “Binary intersymbol interfer-

ence channels: Gallager codes, density evolution, and code performance
bounds,” IEEE Trans. Inform. Theory, vol. 49, no. 7, pp. 1636–1652,
Jul. 2003.
[58] H. D. Pfister, J. B. Soriaga, and P. H. Siegel, “Determining and ap-

proaching achievable rates of binary intersymbol interference channels
using multistage decoding,” IEEE Trans. Inform. Theory, vol. 53, pp.
1416–1429, Apr. 2007.
[59] M. V. Eyuboğlu and S. U. Qureshi, “Reduced-state sequence estimation

with set partitioning and decision feedback,” IEEE Trans. Commun.,
vol. 38, pp. 13–20, Jan. 1988.
[60] A. Duel-Hallen and C. Heegard, “Delayed decision feedback estima-

tion,” IEEE Trans. Commun., vol. 37, pp. 428–436, May 1989.
[61] P. R. Chevillat and E. Eleftheriou, “Decoding of trellis-encoded signals

in the presence of intersymbol interference and noise,” IEEE Trans.
Commun., vol. 36, pp. 669–676, Jul. 1989.
[62] G. Colavolpe, G. Ferrari, and R. Raheli, “Reduced-state BCJR-type

algorithms,” IEEE J. Select. Areas Commun., vol. 19, no. 5, pp. 848–
859, May 2001.
[63] D. Fertonani, A. Barbieri, and G. Colavolpe, “Reduced-complexity

BCJR algorithm for turbo equalization,” IEEE Trans. Commun.,
vol. 55, no. 12, pp. 2279–2287, Dec. 2007.
[64] A. Prlja and J. B. Anderson, “Reduced-complexity receivers for

strongly narrowband intersymbol interference introduced by faster-
than-Nyquist signaling,” IEEE Trans. Commun., vol. 60, no. 9, pp.
2591–2601, 2012.
BIBLIOGRAPHY 307
[65] R. W. Lucky, “Automatic equalization for digital communications,” Bell

System Tech. J., vol. 44, pp. 547–588, Apr. 1965.
[66] ——, “Techniques for adaptive equalization for digital communica-

tions,” Bell System Tech. J., vol. 44, pp. 547–588, April 1965.
[67] M. E. Austin, “Decision-feedback equalization for digital communica-

tion over dispersive channels,” MIT Lincoln Lab., Tech. Rep., 1967.
[68] D. A. George, R. R. Bowen, and J. R. Storey, “An adaptive decision

feedback equalizer,” IEEE Trans. Commun. Technol., vol. 19, no. 3,
pp. 281–293, Jun. 1971.
[69] J. Salz, “Optimum mean-square decision feedback equalization,” Bell

Syst. Tech. J., vol. 52, pp. 1341–1373, Oct. 1973.
[70] C. A. Belfiore and Parks, “Decision feedback equalization,” Proc. IEEE,

vol. 67, Aug. 1979.
[71] S. U. H. Qureshi, “Adaptive equalization,” Proc. IEEE, vol. 73, no. 9,

pp. 1349–1387, Sep. 1985.
[72] Y. Sato, “A method of self-recovering equalization for multilevel

amplitude-modulation systems,” IEEE Trans. Commun., vol. 23, pp.
679–682, Jun. 1975.
[73] D. Godard, “Self-recovering equalization and carrier tracking in two-

dimensional data communication systems,” IEEE Trans. Commun.,
vol. 28, pp. 1867–1875, Nov. 1980.
[74] A. Benveniste and M. Goursat, “Blind equalizers,” IEEE Trans. Com-

mun., vol. 32, pp. 871–883, Aug. 1984.
[75] G. Picchi and G. Prati, “Blind equalization and carrier recovery using
a ‘stop-and-go’ decision directed algorithm,” IEEE Trans. Commun.,
vol. 35, pp. 877–887, Sep. 1987.
[76] D. D. Falconer and F. Magee, “Adaptive channel memory truncation

for maximum likelihood sequence estimation,” Bell System Tech. J.,
vol. 52, no. 9, pp. 1541–1562, Nov. 1973.
[77] S. A. Fredricsson, “Joint optimization of transmitter and receiver fil-

ter in digital PAM systems with a Viterbi detector,” IEEE Trans. In-
form. Theory, vol. IT-22, no. 2, pp. 200–210, Mar. 1976.
308 BIBLIOGRAPHY
[78] C. T. Beare, “The choice of the desired impulse response in combined

linear-Viterbi algorithm equalizers,” IEEE Trans. Commun., vol. 26,
no. 8, pp. 1301–1307, Aug. 1978.
[79] N. Sundstrom, O. Edfors, P. Ödling, H. Eriksson, T. Koski, and P. O.

Börjesson, “Combined linear-Viterbi equalizers - a comparative study
and a minimax design,” in Proc. Vehicular Tech. Conf., Stockholm,
Sweden, Jun. 1994, pp. 1263–1267.
[80] N. Al-Dhahir and J. M. Cioffi, “Efficiently computed reduced-parameter

input-aided MMSE equalizers for ML detection: A unified approach,”
IEEE Trans. Inform. Theory, vol. 42, pp. 903–915, Apr. 1996.
[81] S. A. Aldosari, S. A. Alshebeili, and A. M. Al-Sanie, “A new MSE

approach for combined linear-Viterbi equalizers,” in Proc. Vehicular
Tech. Conf., Tokyo, Japan, May 2000, pp. 1263–1267.
[82] U. L. Dang, W. H. Gerstacker, and D. T. M. Slock, “Maximum SINR

prefiltering for reduced state trellis based equalization,” in Proc. IEEE
Intern. Conf. Commun., Kyoto, Japan, Jun. 2011.
[83] F. Rusek and A. Prlja, “Optimal channel shortening for MIMO and ISI
channels,” IEEE Trans. Wireless Commun., vol. 11, no. 2, pp. 810–818,
Feb. 2012.
[84] F. Rusek and D. Fertonani, “Bounds on the information rate of inter-

symbol interference channels based on mismatched receivers,” IEEE
Trans. Inform. Theory, vol. 58, no. 3, pp. 1470–1482, Mar. 2012.
[85] A. Modenini, F. Rusek, and G. Colavolpe, “Adaptive rate-maximizing

channel-shortening for ISI channels,” IEEE Commun. Letters, vol. 19,
no. 12, pp. 2090–2093, Dec. 2015.
[86] S. Hu and F. Rusek, “On the design of reduced state demodulators with
interference cancellation for iterative receivers,” in Proc. 24th IEEE In-
tern. Symp. on Personal, Indoor, and Mobile Radio Comm. (PIMRC),
Hong Kong, China, Aug. 2015, pp. 981–985.
[87] ——, “On the design of channel shortening demodulators for

iterative receivers in MIMO and ISI channels,” available at
http://arxiv.org/abs/1506.07331.
[88] G. D. Forney, Jr., Concatenated Codes. Cambridge, Massachusetts:

MIT Press, 1966.
BIBLIOGRAPHY 309
[89] C. Berrou, A. Glavieux, and P. Thitmajshima, “Near Shannon limit

error-correcting coding and decoding: turbo-codes,” in Proc. IEEE In-
tern. Conf. Commun., Geneva, Switzerland, May 1993, pp. 1064–1070.
[90] C. Berrou and A. Glavieux, “Near optimum error correcting coding

and decoding: turbo-codes,” IEEE Trans. Commun., vol. 44, no. 10,
pp. 1261–1271, October 1996.
[91] S. Benedetto and G. Montorsi, “Unveiling turbo codes: some results on

parallel concatenated coding schemes,” IEEE Trans. Inform. Theory,
vol. 42, no. 2, pp. 408–428, March 1996.
[92] S. Benedetto, R. Garello, and G. Montorsi, “A search for good convo-

lutional codes to be used in the construction of turbo codes,” IEEE
Trans. Commun., vol. 46, no. 9, pp. 1101–1105, Sep. 1998.
[93] S. Benedetto, D. Divsalar, G. Montorsi, and F. Pollara, “Serial concate-

nation of interleaved codes: performance analysis, design, and iterative
decoding,” IEEE Trans. Inform. Theory, vol. 44, no. 3, pp. 909–926,
May 1998.
[94] G. Colavolpe, G. Ferrari, and R. Raheli, “Extrinsic information in it-

erative decoding: a unified view,” IEEE Trans. Commun., vol. 49, pp.
2088–2094, Dec. 2001.
[95] J. Hagenauer, “The turbo principle: Tutorial introduction & state of

the art,” in Proc. Int. Symp. Turbo Codes & Related Topics, Brest,
France, Sep. 1997, pp. 1–11.
[96] C. Douillard, M. Jezequel, C. Berrou, A. Picart, P. Didier, and

A. Glavieux, “Iterative correction of intersymbol interference: turbo-
equalization,” European Trans. Telecommun., vol. 6, no. 5, pp. 507–511,
September/October 1995.
[97] M. Tüchler, R. Koetter, and A. C. Singer, “Turbo equalization: Prin-

ciples and new results,” IEEE Trans. Commun., vol. 55, pp. 754–767,
May 2002.
[98] R. Köetter, A. C. Singer, and M. Tüchler, “Turbo equalization,” IEEE

Sig. Proc. Mag, vol. 21, no. 1, pp. 67–80, Jan. 2004.
[99] G. Colavolpe, G. Ferrari, and R. Raheli, “Noncoherent iterative (turbo)

decoding,” IEEE Trans. Commun., vol. 48, no. 9, pp. 1488–1498, Sep.
2000.
310 BIBLIOGRAPHY
[100] P. Hoeher and J. Lodge, ““Turbo DPSK”: Iterative differential PSK

demodulation and channel decoding,” IEEE Trans. Commun., vol. 47,
no. 6, pp. 837–843, Jun. 1999.
[101] S. ten Brink, “Convergence behavior of iteratively decoded parallel con-
catenated codes,” IEEE Trans. Commun., vol. 49, no. 10, pp. 1727–
1737, Oct. 2001.
[102] S. ten Brink, G. Kramer, and A. Ashikhmin, “Design of low-density
parity-check codes for modulation and detection,” IEEE Trans. Com-
mun., vol. 52, pp. 670–678, Apr. 2004.
[103] P. Borjesson and C.-E. Sundberg, “Simple approximations of the error
function Q(x) for communications applications,” IEEE Trans. Com-
mun., vol. 27, no. 3, pp. 639–643, Mar. 1979.
[104] M. Chiani, D. Dardari, and M. K. Simon, “New exponential bounds
and approximations for the computation of error probability in fading
channels,” IEEE Trans. Wireless Commun., vol. 2, no. 4, pp. 840–845,
July 2003.
[105] D. Divsalar and M. K. Simon, “The design of trellis codes for fading
channels,” IEEE Trans. Commun., vol. 36, no. 9, pp. 1004–1021, Sep.
1988.
[106] E. Malkamaki and H. Leib, “Evaluating the performance of convolu-
tional codes over block fading channels,” IEEE Trans. Inform. Theory,
vol. 45, no. 5, pp. 1643–1646, July 1999.
[107] E. Zehavi, “8-PSK trellis codes for a Rayleigh channel,” IEEE Trans.
Commun., vol. 40, no. 5, pp. 873–884, May 1992.
[108] G. Caire, G. Taricco, and E. Biglieri, “Bit-interleaved coded modula-
tion,” IEEE Trans. Inf. Theory, vol. 44, no. 3, pp. 927–946, May 1999.
[109] X. Li and J. A. Ritcey, “Bit-interleaved coded modulation with iterative
decoding,” IEEE Commun. Lett., vol. 1, no. 6, pp. 169–171, Nov. 1997.
[110] A. Viterbi and J. Omura, Principles of Digital Communication and
Coding. New York: McGraw-Hill, 1979.
[111] V. Tarokh, N. Seshadri, and A. R. Calderbank, “Space-time codes for
high data rate wireless communication: Performance criterion and code
construction,” IEEE Trans. Inf. Theory, vol. 44, no. 2, pp. 744–765,
Mar. 1998.
BIBLIOGRAPHY 311
[112] L. Zheng and D. N. C. Tse, “Diversity and multiplexing: a fundamen-

tal tradeoff in multiple-antenna channels,” IEEE Trans. Inf. Theory,
vol. 49, no. 5, pp. 1073–1096, May 2003.
[113] H. Lu and P. V. Kumar, “Rate-diversity tradeoff of space-time codes

with fixed alphabet and optimal constructions for PSK modulation,”
IEEE Trans. Inf. Theory, vol. 49, no. 10, pp. 2747–2751, Oct. 2003.
[114] J. Winters, “On the capacity of radio communication systems with

diversity in a Rayleigh fading environment,” IEEE J. Sel. Areas Com-
mun., vol. 5, no. 5, pp. 871–878, June 1987.
[115] G. J. Foschini, “Layered space-time architecture for wireless commu-

nication in a fading environment when using multi-element antennas,”
Bell Labs. Tech. J., vol. 1, pp. 41–59, Autumn 1996.
[116] G. J. Foschini and M. J. Gans, “On limits of wireless communications in

a fading environment when using multiple antennas,” Wireless Personal
Commun., vol. 6, pp. 311–335, Feb. 1998.
[117] I. E. Telatar, “Capacity of multi-antenna Gaussian channels,” Eur.

Trans. Telecommun. (ETT), vol. 10, pp. 585–595, Nov. 1999.
[118] T. L. Marzetta and B. M. Hochwald, “Capacity of a mobile multiple-

antenna communication link in Rayleigh flat fading,” IEEE Trans. Inf.
Theory, vol. 45, no. 1, pp. 139–157, Jan. 1999.
[119] A. Paulraj, R. U. Nabar, and D. Gore, Introduction to Space-Time

Wireless Communications. Cambridge, UK: Cambridge University
Press, 2003.
[120] T. M. Duman and A. Ghrayeb, Coding for MIMO communication sys-

tems. Chichester, West Sussex, England: John Wiley & Sons, 2007.
[121] G. H. Golub, Matrix Computations, 2nd ed. Baltimore: The Johns

Hopkins University Press, 1989.
[122] R. G. Gallager, Information Theory and Reliable Communication.

New York: John Wiley & Sons, 1968.
[123] S. A. Jafar and A. Goldsmith, “Transmitter optimization and opti-

mality of beamforming for multiple antenna systems,” IEEE Trans.
Wireless Commun., vol. 3, no. 4, pp. 1165–1175, July 2004.
312 BIBLIOGRAPHY
[124] A. L. Moustakas, S. H. Simon, and A. M. Sengupta, “MIMO capacity

through correlated channels in the presence of correlated interferers and
noise: a (not so) large N analysis,” IEEE Trans. Inf. Theory, vol. 49,
no. 10, pp. 2545–2561, Oct. 2003.
[125] E. Biglieri, J. Proakis, and S. Shamai, “Fading channels: Information-
theoretic and communications aspects,” IEEE Trans. Inf. Theory,
vol. 44, no. 6, pp. 2619–2692, Oct. 1998.
[126] L. Zheng and D. Tse, “Communicating on the Grassmann manifold:
A geometric approach to the non-coherent multiple antenna channel,”
IEEE Trans. Inf. Theory, vol. 48, no. 2, pp. 359–383, Feb. 2002.
[127] S. A. Jafar and A. Goldsmith, “Multiple-antenna capacity in correlated
Rayleigh fading with channel covariance information,” IEEE Trans.
Wireless Commun., vol. 4, no. 3, pp. 990–997, May 2005.
[128] J. Ventura-Traverset, G. Caire, E. Biglieri, and G. Taricco, “Impact of
diversity reception on fading channels with coded modulation - Part I:
Coherent detection,” IEEE Trans. Commun., vol. 45, no. 5, pp. 563–
572, May 1997.
[129] B. Vucetic and J. Yuan, Space-Time Coding. Chichester, West Sussex,
England: John Wiley & Sons, 2003.
[130] A. R. Hammons, Jr. and H. E. Gamal, “On the theory of space-time
codes for PSK modulation,” IEEE Trans. Inf. Theory, vol. 46, no. 2,
pp. 524–542, Mar. 2000.
[131] Z. Liu, X. Ma, and G. B. Giannakis, “Space-time coding and Kalman
filtering for time-selective fading channels,” IEEE Trans. Commun.,
vol. 50, no. 2, pp. 183–186, Feb. 2002.
[132] A. Wittneben, “A new bandwidth efficient transmit antenna modu-
lation diversity scheme for linear digital modulation,” in Proc. IEEE
1993 Int. Conf. Commun. (ICC’93), Geneva, Switzerland, May 23-26
1993, pp. 1630–1634.
[133] S. M. Alamouti, “A simple transmit diversity technique for wireless
communications,” IEEE J. Sel. Areas Commun., vol. 16, no. 8, pp.
1451–1458, Oct. 1998.
[134] A. Papoulis and S. U. Pillai, Probability, Random Variables and
Stochastic Processes, 4th ed. New York: McGraw-Hill Publishing
Co., 2002.
BIBLIOGRAPHY 313
[135] J. G. Proakis, Digital Communications, 3rd ed. New York: McGraw-

Hill, 1995.
[136] S. Sandhu, R. Heath, and A. Paulraj, “Space-time block codes versus

space-time trellis codes,” in Proc. IEEE 2001 Int. Conf. Commun. (ICC
2001), Helsinki, Finland, June 2002, pp. 1132–1136.
[137] M. J. Borran, M. Memarzadeh, and B. Aazhang, “Design of coded

modulation schemes for orthogonal transmit diversity,” in Proc. 2001
IEEE Int. Symp. Inf. Theory (ISIT 2001), Washington, D.C. (USA),
June 2000, p. 339.
[138] S. Siwamogsatham and M. P. Fitz, “Robust space-time coding for cor-

related Rayleigh fading channels,” IEEE Trans. Sig. Proc., vol. 50,
no. 10, pp. 2408–2416, Oct. 2002.
[139] V. Tarokh, H. Jafarkhani, and A. R. Calderbank, “Space-time block

codes from orthogonal designs,” IEEE Trans. Inf. Theory, vol. 45, no. 5,
pp. 1456–1467, July 1999.
[140] ——, “Space-time block coding for wireless communications: Perfor-

mance results,” IEEE J. Sel. Areas Commun., vol. 17, no. 3, pp. 451–
460, Mar. 1999.
[141] C. Xu and K. S. Kwak, “On decoding algorithm and performance of

space-time block codes,” IEEE Trans. Wireless Commun., vol. 4, no. 3,
pp. 825–829, May 2005.
[142] H. Jafarkhani, “A quasi-orthogonal space-time block code,” IEEE

Trans. Commun., vol. 49, no. 1, pp. 1–4, Jan. 2001.
[143] B. Hassibi and B. M. Hochwald, “High-rate codes that are linear in

space and time,” IEEE Trans. Inf. Theory, vol. 48, no. 7, pp. 1804–
1824, July 2002.
[144] Y. Liu, M. P. Fitz, and O. Y. Takeshita, “A rank criterion for QAM

space-time codes,” IEEE Trans. Inf. Theory, vol. 48, no. 12, pp. 3062–
3079, Dec. 2002.
[145] R. S. Blum, “Some analytical tools for the design of space-time convo-
lutional codes,” IEEE Trans. Commun., vol. 50, no. 10, pp. 1593–1599,
Oct. 2002.
314 BIBLIOGRAPHY
[146] Q. Yan and R. S. Blum, “Improved space-time convolutional codes for

quasi-static slow fading channels,” IEEE Trans. Wireless Commun.,
vol. 1, no. 4, pp. 563–571, Oct. 2002.
[147] A. Stefanov and T. M. Duman, “Performance bounds for space-time

trellis codes,” IEEE Trans. Inform. Theory, vol. 49, no. 9, pp. 2134–
2140, Sept. 2003.
[148] G. J. Foschini, “Layered space-time architecture for wireless commu-

nication in a fading environment when using multiple antennas,” Bell
Labs Tech. J., vol. 1, no. 2, pp. 41–59, Autumn 1996.
[149] V. Tarokh, A. Naguib, N. Seshadri, and A. Calderbank, “Combined

array processing and space-time coding,” IEEE Inf. Theory, vol. 45,
no. 8, pp. 1121–1128, May 1999.
[150] H. El Gamal and A. R. Hammons Jr., “A new approach to layered

space-time coding and signal processing,” IEEE Trans. Inf. Theory,
vol. 47, no. 6, pp. 2321–2334, Sept. 2001.
[151] G. Caire and G. Colavolpe, “On low-complexity space-time coding for

quasi-static channels,” IEEE Trans. Inf. Theory, vol. 49, no. 6, pp.
1400–1416, June 2003.
[152] D.-S. Shiu and J. Kahn, “Layered space-time codes for wireless com-
munications using multiple transmit antennas,” in IEEE Intern. Conf.
on Commun., ICC ’99, Vacouver, Canada, June 1999.
[153] R. Horn and C. Johnson, Matrix Analysis. Cambridge: Cambridge

University Press, 1985.
[154] H. L. V. Trees, Detection, Estimation and Modulation Theory, Part I.

New York: John Wiley & Sons, 1968.
[155] S. M. Kay, Fundamentals of Statistical Signal Processing: Estimation

Theory. Englewood Cliffs, NJ: Prentice Hall International, Inc., 1993.
[156] R. Knopp and P. Humblet, “On coding for block fading channels,” IEEE
Trans. Inform. Theory, vol. 46, no. 1, pp. 189–205, January 2000.
[157] R. Wesel, X. Liu, and W. Shi, “Trellis codes for periodic erasures,”
IEEE Trans. Commun., vol. 48, no. 6, pp. 938–947, June 2000.
BIBLIOGRAPHY 315
[158] G. Bauch, “Concatenation of space-time block codes and “turbo”-

TCM,” in Proc. 1999 IEEE Int. Conf. Commun. (ICC 1999), vol. 6,
Vancouver, BC, Canada, June 1999, pp. 1202–1206.
[159] S. L. Ariyavisitakul, “Turbo space-time processing to improve wireless

channel capacity,” IEEE Trans. Commun., vol. 48, no. 8, pp. 1347–
1359, Aug. 2000.
[160] X. Lin and R. Blum, “Improved space-time codes using serial concate-
nation,” IEEE Commun. Letters, vol. 4, no. 7, pp. 221–223, Jul. 2000.
[161] H. J. Su and E. Geraniotis, “Space-time turbo codes with full antenna

diversity,” IEEE Trans. Commun., vol. 49, no. 1, pp. 47–57, Jan. 2001.
[162] D. Cui and A. M. Haimovich, “Performance of parallel concatenated

space-time codes,” IEEE Commun. Letters, vol. 5, no. 6, pp. 236–238,
June 2001.
[163] A. Stefanov and T. M. Duman, “Turbo-coded modulation for systems

with transmit and receive antenna diversity over block fading chan-
nels: System model, decoding approaches, and practical considera-
tions,” IEEE J. Sel. Areas Commun., vol. 19, no. 5, pp. 958–968, May
2001.
[164] Y. Gong and K. B. Letaief, “Concatenated space-time block coding with

trellis coded modulation in fading channels,” IEEE Trans. Wireless
Commun., vol. 1, no. 4, pp. 580–590, October 2002.
[165] T. H. Liew and L. Hanzo, “Space-time codes and concatenated channel

codes for wireless communications,” IEEE Proc., vol. 90, no. 2, pp.
187–219, Feb. 2002.
[166] V. Gulati and K. N. Narayanan, “Concatenated codes for fading chan-

nels based on recursive space-time trellis codes,” IEEE Trans. Wireless
Commun., vol. 2, no. 1, pp. 118–128, January 2003.
[167] Y. Li, B. Vucetic, Q. Zhang, and Y. Huang, “Assembled space-time

turbo trellis codes,” IEEE Trans. Veh. Tech., vol. 54, no. 5, pp. 1768–
1772, September 2005.
[168] G. Li, I. J. Fair, and W. A. Krzymien, “Low-density parity-check codes

for space-time wireless transmission,” IEEE Trans. Wireless Commun.,
vol. 5, no. 2, pp. 312–322, February 2006.
316 BIBLIOGRAPHY
[169] B. M. Hochwald and T. L. Marzetta, “Unitary space-time modulation

for multiple antenna communications in Rayleigh flat fading,” IEEE
Trans. Inf. Theory, vol. 46, no. 2, pp. 543–564, Mar. 2000.
[170] B. L. Hughes, “Differential space-time modulation,” IEEE Trans. Inf.
Theory, vol. 46, no. 7, pp. 2567–2578, Nov. 2000.
[171] B. M. Hochwald and W. Sweldens, “Differential unitary space-time
modulation,” IEEE Trans. Commun., vol. 48, no. 12, pp. 2041–2052,
Dec. 2000.
[172] V. Tarokh and H. Jafarkhani, “A differential detection scheme for trans-
mit diversity,” IEEE J. Sel. Areas Commun., vol. 18, no. 7, pp. 1169–
1174, July 2000.
[173] I. Bahceci and T. M. Duman, “Combined turbo coding and unitary
space-time modulation,” IEEETrans. Commun., vol. 50, no. 8, pp.
1244–1249, Aug. 2002.
[174] L. H.-J. Lampe and R. Schober, “Bit-interleaved coded differential
space-time modulation,” IEEE Trans. Commun., vol. 50, no. 9, pp.
1429–1439, Sep. 2002.
[175] C. Schlegel and A. Grant, “Differential space-time turbo codes,” IEEE
Trans. Inform. Theory, vol. 49, no. 9, pp. 2298–2306, Sep. 2003.
[176] H. El Gamal, A. R. J. Hammons, Y. Liu, M. P. Fitz, and O. Y.
Takeshita, “On the design of space-time and space-frequency codes
for MIMO frequency-selective fading channels,” IEEE Trans. Inform.
Theory, vol. 49, no. 9, pp. 2277–2292, Sep. 2003.
[177] J. G. Proakis and S. M., Digital Communications, 5th ed. New York:
McGraw-Hill, 2008.
[178] T. Eyceoz and A. Duel-Hallen, “Simplified block adaptive diversity
equaliser for cellular mobile radio,” IEEE Comm. Lett., vol. 1, pp.
15–19, Jan. 1987.
[179] M. V. Eyuboglu and S. U. H. Qureshi, “Reduced-state sequence es-
timation with set partitioning and decision feedback,” IEEE Trans.
Commun., vol. 36, pp. 13–20, Jan. 1988.
[180] ——, “Reduced-state sequence estimation for coded modulation on in-
tersymbol interference channels,” IEEE J. Sel. Areas Comm., vol. 7,
pp. 989–995, Aug. 1989.
BIBLIOGRAPHY 317
[181] P. R. Chevillat and E. Eleftheriou, “Decoding of trellis-encoded signals

in the presence of intersymbol interference and noise,” IEEE Trans.
Commun., vol. 37, no. 7, pp. 669–676, July 1989.
[182] G. Colavolpe and G. Germi, “On the application of factor graphs and
the sum-product algorithm to ISI channels,” IEEE Trans. Commun.,
vol. 53, no. 5, pp. 818–825, May 2005.
[183] G. Colavolpe, D. Fertonani, and A. Piemontese, “SISO detection over

linear channels with linear complexity in the number of interferers,”
IEEE J. Sel. Topics Signal Proc., vol. 5, pp. 1475–1485, December
2011.
[184] H. Bölcskei, D. Gesbert, and A. J. Paulraj, “On the capacity of OFDM-

based spatial multiplexing systems,” IEEE Trans. Commun., vol. 50,
no. 2, pp. 225–234, Feb. 2002.
[185] J. M. Wozencraft and I. M. Jacobs, Principles of Communication En-

gineering. Waveland Press, 1990, (reprint of 1965 original from John
Wiley and Sons).
[186] C. E. Shannon, “A mathematical theory of communication,” Bell Syst.

Tech. J., vol. 27, pp. 379–423, July 1948.
[187] A. V. Oppenheim and R. W. Schafer, Discrete-Time Signal Processing,

2nd ed. New Jersey: Prentice Hall, 1999.

Digital Communications

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Digital Communications

Uploaded by

Copyright:

Available Formats

Giulio Colavolpe

Lecture Notes on Advanced

1 Transmission systems with memory 3

3 Detection in the presence of unknown parameters 75

3.3 Parameter modeled as deterministic and unknown . . . . . . . 85

4 Codes in the signal space 115

5 MAP symbol detection strategy 141

6 Reduced-complexity and adaptive receivers 161

7 Turbo codes and iterative decoding 191

7.3 EXIT chart analysis . . . . . . . . . . . . . . . . . . . . . . . 200

8 Factor graphs and the sum-product algorithm 207

9 Codes for fading channels 209

A Signal spaces 271

B Detection and estimation theory 283

C Elements of information theory 285

D Block and convolutional codes 295

E Bilateral Z-transform and some of its properties 297

Transmission systems with

will denote by m ∈ {m(i) }M i=1 the transmitted message which corresponds to

m̂ = argmax P (m(i) |r) (1.1)

Example 1.1. Let us suppose that a memoryless transmission system, de-

Example 1.2. Let us suppose that the transmission channel is bandlimited.

Preexistent communication system

Figure 1.1: Communication system considered in example 1.1.

Figure 1.2: Block diagram of a generic communication system .

1.2 General model for a modulated signal

A = {a(i) }M (1) (2)

We adopt the following notation: the superscript is used to enumerate the

where we denoted with K the total number of symbols that have to be

Figure 1.3: Set of transmitted signals in Example 1.3.

Figure 1.4: Transmitted signal in Example 1.3 corresponding to the sequence

Example 1.3. A binary transmission system with information symbols

x(t) = s(t − kT ; ak , σk ) kT < t < (k + 1)T

Σ = {σ (ℓ) }Sℓ=1 = {σ (1) , σ (2) , . . . , σ (S) }

having S elements. The number of possible waveforms in that signaling inter-

a(1) s1 (t) a(1) σ (2)

a(2) a(2) σ (1)

Figure 1.5: Tables deﬁning the model of a signal with memory.

An alternative description of the modulated signal is based on a state dia-

Example 1.4. Let us consider a modulator with memory having 2 states,

1.3 Coded linear modulations

a(i) /sn (t)

Figure 1.6: Portion of a state diagram.

1 p(t) −p(t) 1 σ− σ+ 1/ − p(t)

Figure 1.8: Waveform corresponding to a given information sequence for the

Figure 1.9: Encoder.

Figure 1.10: Example of shaping pulse.

As far as the shaping pulse p(t) is concerned, it can have a duration

+ · · · + ck−L (ak−L , µk−L )p[t − (k − L)T ] (1.10)

ck−2 p[t − (k − 2)T ]

ck−1 p[t − (k − 1)T ]

σk = (ak−1 , ak−2 , . . . , ak−L ; µk−L )

1.3.1 Uncoded transmissions

Based on (1.10), the system state is now deﬁned as

σk = (ak−1 , ak−2 , . . . , ak−L )

σk+1 = (ak , ak−1 , . . . , ak−L+1)

Figure 1.12: Example of short pulse.

1.3.2 Pulse of duration at most T (short pulse)

x(t) = s(t − kT ; ak , σk ) = ck (ak , µk )p(t − kT ) .

The number of states is thus S = Sc .

1.3.3 Uncoded transmissions with short pulse

Remark 1.1. Conceptually, a linearly modulated signal can be obtained

Figure 1.13: Block diagrams representing a linear modulator.