Professional Documents
Culture Documents
October 2017
Contents
Foreword v
2 Sequence detection 21
2.1 MAP sequence detection strategy . . . . . . . . . . . . . . . . 21
2.2 Detection through the Viterbi algorithm . . . . . . . . . . . . 23
2.2.1 Implementation aspects for the Viterbi algorithm . . . 31
2.3 MAP sequence detection for linear modulations . . . . . . . . 33
2.3.1 Uncoded transmission . . . . . . . . . . . . . . . . . . 40
2.3.2 Absence of ISI . . . . . . . . . . . . . . . . . . . . . . . 40
2.3.3 Uncoded transmission and absence of ISI . . . . . . . 41
2.3.4 Considerations on the absence of ISI . . . . . . . . . . 42
2.4 Whitened matched filter front end . . . . . . . . . . . . . . . . 43
2.5 Performance of MAP sequence detectors . . . . . . . . . . . . 51
2.5.1 Upper bound on the error probability . . . . . . . . . . 57
2.5.2 Additive white Gaussian noise . . . . . . . . . . . . . . 58
2.5.3 Linear modulations . . . . . . . . . . . . . . . . . . . . 60
2.5.4 Lower bound on the error probability . . . . . . . . . . 67
2.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
i
ii CONTENTS
This book collects the notes of my lectures for the course of Digital Com-
munications and is the results of twenty years of teaching and research in
this field. Its structure is strictly related to the structure of our 2nd-level de-
gree (laurea magistrale) in Communication Engineering at the University of
Parma, taught in English since 2012. In fact, it is assumed that the students
have some familiarity with the main concepts of Detection and Estimation
and Information Theory and with binary block and convolutional codes (al-
though a summary of the main ideas is reported in the appendices).
The book reflects my personal view and formulation of classical topics as
well as advanced topics that are not present in textbooks but it has clearly
influenced by the reading of references such as the books by Wozencraft
and Jacobs (Principles of Communication Engineering, John Wiley & Sons,
1965), Van Trees (Detection, Estimation and Modulation Theory, Part 1,
John Wiley & Sons, 1968), Lindsey and Simon (Telecommunication Systems
Engineering, Prentice-Hall, 1973), Mengali and D’Andrea (Synchronization
Techniques for Digital Receivers, Plenum Press, 1997), Proakis (Digital Com-
munications, 2nd edition, McGraw-Hill, 1989), Benedetto and Biglieri (Prin-
ciples of Digital Transmission with Wireless Applications, Kluwer, 1999),
Simon, Hinedi, and Lindsey (Digital Communications Techniques, Pren-
tice Hall, 1995), Mengali and Morelli (Trasmissione Numerica, McGraw-Hill,
2001) and clearly many journal papers.
I would like to thank all my students, whose questions allowed to improve
these notes, and all my collaborators (Alan Barbieri, Dario Fertonani, Tom-
maso Foggi, Aldo Cero, Amina Piemontese, Nicolò Mazzali, Andrea Mod-
enini, and Alessandro Ugolini) for their detailed comments and suggestions
for improvements.
Parma, September 2016.
v
Chapter 1
1.1 Introduction
Detection theory (see Appendix B for a brief overview) investigates the prob-
lem of one-shot transmissions. In this problem, at the transmission side a
signal s(t) belonging to the set of allowed signals {s(i) (t)}M
i=1 of duration D is
transmitted. These signals are in a one-to-one correspondence with messages
i=1 whose a-priori probabilities P (m ) are known at the receiver. We
{m(i) }M (i)
where r is the vector representation of the received signal r(t) (or a sufficient
statistic, see Appendix B). In the literature, it is known as maximum a-
posteriori probability (MAP) detection strategy.
In case of a transmission over a channel that introduces additive white
Gaussian noise (AWGN), the MAP detection strategy becomes (see Appendix
B)1
1
When a bandpass transmission is considered, denoting by {s̃(i) (t)}M
i=1 the complex
envelopes of the transmitted signals and by r̃(t) the complex envelope of the received
3
4 Transmission systems with memory
D D 2
1 N0
ˆ ˆ
(i)
m̂ = argmax r(t)s (t)dt − s(i) (t) dt + ln P (m(i) ) . (1.3)
m(i) 0 2 0 2
where N0 is the one-sided power spectral density of the additive white Gaus-
sian noise. In this case, the receiver has to observe signal r(t) in the interval
[0, D] to take a decision. In other words, it is required to observe the received
signal on an interval of length equal to the duration of signals s(i) (t) only.
This kind of systems can be employed even when a large amount of data
has to be transmitted by simply repeating the transmission of the different
M-ary messages. Since each M-ary symbol carries log2 M information bits
and assuming that we transmit a new symbol every T seconds,2 it is possible
to transmit N bits by simply performing N/ log2 M M-ary transmissions,
thus requiring a total time TN given by
N
TN = T .
log2 M
In these communication systems, every transmission act can be considered
as independent of previous and following transmissions provided that both
T ≥ D and the transmitted symbols can be assumed as independent of
each other. In fact, when these two conditions are satisfied, the decision
on a given message is not influenced by the decisions taken on previous or
subsequent messages. For this reason, these systems are called memoryless.
When one of these two conditions is not satisfied, the communication system
is called with memory and the decision on one message cannot be taken
without considering previous or subsequent messages, as explained in the
following examples.
2
T is the so called signaling interval or symbol time.
1.1 – Introduction 5
decoder at the receiver, according to the scheme shown in Fig. 1.1. The
channel encoder employs codewords belonging to a given codebook. Since
not all sequences of symbols are allowed (i.e., not all sequences of symbols
belong to the codebook), the encoder operates by introducing correlation in
the sequence of symbols being transmitted. This memory has effect on the
received signal r(t) that, in a given signaling interval, will depend not only
on the corresponding symbol but also on previous and future symbols, and
on the decoding strategy that has to take this correlation into consideration.
♦
r(t)
ENC TRANSM CHAN REC DEC
Although these two examples are representative of the two main reasons
that motivate the investigation of signals with memory, there are many other
possible sources of memory such as, to cite a few of them, the presence of
colored Gaussian noise or the dependence of the received signal from un-
known stochastic parameters (as the unknown phase of transmit and receive
oscillators or the channel fading). This latter scenario will be considered in
Chapter 3.
In order to investigate transmission systems with memory, it is required to
introduce a proper model for these signals that could describe the memory
effects. The corresponding receivers will be more complex than those for
memoryless systems and, in some cases, we will need to resort to suboptimal
detectors with a lower complexity and a worse performance.
6 Transmission systems with memory
w(t)
ak CHANN x(t) r(t) âk
SOURCE TRANSM REC
FILTER
Channel
During the kth signaling interval, symbol ak is associated with signal s(t −
kT ; ak ), which has support in the interval [kT, (k + 1)T ]. Signal x(t) can be
expressed as the superposition of all slices related to the different signaling
3
Messages {m(i) }M (i) M
i=1 of the previous section will be called {a }i=1 from now on.
1.2 – General model for a modulated signal 7
intervals, i.e.,
K−1
X
x(t) = s(t − kT ; ak ) (1.5)
k=0
s(t; 0) s(t; 1)
T
0 0
t T t
x(t)
1 1 0 0 1 0 1
0
T 2T 5T 7T
With respect to the model in (1.5), the signal during the signaling interval
[kT, (k + 1)T ] is now
and depends not only on the information symbol ak but also on the system
state σk . The system states will belong to the finite alphabet
σ σ
a σ (1) σ (2) σ (3) σ (4) a σ (1) σ (2) σ (3) σ (4)
t(a(i) , σ (ℓ) )
4
In case of passband transmissions, the right-hand side of (1.8) represents the complex
envelope of the modulated signal.
σ (ℓ) σ (m)
σ σ
a + − a 1/p(t)
σ σ σ+ σ−
0/0 0/0
+ −
0 0 0 0 σ σ
σ+ σ−
Figure 1.7: Tables and state diagram for the Example 1.4.
p(t) s(t)
0 0
T t
ak = 0 0 1 1 0 1 0 1 1
σk = + + + − + + − − +
where code symbols {cn } depend on the information symbols {an } through
some coding law (we will see some examples in Chapters 4, 7, 9, and Appendix
D, but the interested reader can have a look at many textbooks such as
[1, 2, 3, 4]) and p(t) is the so called shaping pulse having finite energy. This
modulation is called linear because if we have
K−1
X
(i)
x (t) = c(i)
n p(t − nT ) i = 1, 2
n=0
then
K−1
X (1)
x(t) = αcn + βc(2)
n p(t − nT ) = αx(1) (t) + βx(2) (t)
n=0
i.e., the dependence of the modulated signal on the sequence {cn } is lin-
ear although, in general, the dependence of the code symbols {cn } on the
information symbols {an } is not.
1.3 – Coded linear modulations 11
In the most general case, the relationship between coded and information
symbols is still with memory and can be represented through a finite-state
model
cn = u(an , µn )
(1.9)
µn+1 = t(an , µn )
where µn denotes the encoder state (which belongs to an alphabet having car-
dinality Sc ) whereas u(an , µn ) and t(an , µn ) represent output and transition
functions, respectively. This coding law associates sequences of information
symbols {an } with sequences of code symbols {cn } through a one-to-one
correspondence (see Fig. 1.9). Again, the coding law can be also described
through tables or a state diagram. Code symbols cn will belong to an al-
phabet C which, in general, can have a larger cardinality with respect to the
alphabet A of information symbols an .
an cn
ENC
p(t)
L=3
T 2T 3T 4T t
L=3
ck−3 p[t − (k − 3)T ]
ck p[t − kT ]
kT (k + 1)T
Figure 1.11: Pulses p(t − nT ) in (1.8) having non-zero values in the interval
kT < t < (k + 1)T (L = 3).
and we assumed that ci = 0 for i < 0 and i > K, otherwise all these
expressions are correctly defined only for L ≤ k ≤ K. We explicitly showed
the dependence of the code symbols on the information symbols and the
code states. From (1.10), we can conclude that the signal in the interval
kT < t < (k + 1)T depends, in addition to the current information symbol
ak , on a system state σk that can be defined as
These definitions are equivalent since the coding law associates, in a one-
to-one correspondence, symbols {ak−L , . . . , ak−1} with states {µk−L , . . . , µk },
once the initial state µk−L is fixed. The first definition of state σk allows
to also compute the number of possible system states as a function of the
number of encoder states Sc , the duration of pulse p(t), which is related to
parameter L, and the cardinality M of the information symbols. In fact, it
is
S = M L Sc . (1.12)
Notice that the state definition (1.11) and the coding law (1.9) allow to easily
compute the state transition function (1.7). We now consider a few special
cases.
1.3 – Coded linear modulations 13
and the number of states is M L . From state definition, we can easily find
the state transition function t(ak , σk ). In fact, the next state is
and can be simply obtained by discarding the oldest symbol in the definition
of σk , moving to the right all other symbols and adding symbol ak on the left.
For this reason, we will say that the system states will form a shift register
sequence.
p(t)
0
T t
Clearly the system state coincides with that of the encoder, i.e.,
σk = µk .
The first block diagram in Fig. 1.13 represents this interpretation. From
a practical point of view, a linear modulator is a device that receives the
discrete-time signal {ck } and associates a pulse with each symbol. It is thus
made by a pulse generator, as shown in the second block diagram of Fig. 1.13.
For simplicity, we will also use the graphical representation shown in the third
block diagram of Fig. 1.13. ♦
P P
k ck δ(t − kT ) k ck p(t − kT )
p(t)
block diagram 1
P
{ck } PULSE k ck p(t − kT )
GENER
block diagram 2
P
{ck } k ck p(t − kT )
p(t)
block diagram 3
where code symbols are ck = {0, ±1}. Functions u(ak , µk ) and t(ak , µk )
in (1.9) are described by the tables in Fig. 1.14. The state σk defined in
the Example 1.4 and µk differ for their names only, and thus also u(ak , µk )
and t(ak , µk ). In fact, the code state µk ∈ {±1}. Alternatively, we can
analytically express u(ak , µk ) and t(ak , µk ) as
µk+1 = t(ak , µk ) = µk if ak = 0
−µk if ak = 1
ck = u(ak , µk ) = ak µk .
This code belongs to the family of line codes. It is used to shape the signal
power spectral density and is called alternate mark inversion (AMI) code. ♦
µ µ
a +1 −1 a +1 −1
0 0 0 0 +1 −1
1 +1 −1 1 −1 +1
u(ak , µk ) t(ak , µk )
Figure 1.14: Output and code functions for the example 1.5.
Example 1.6. Let us consider an M-ary phase shift keying (M-PSK) signal
with a rectangular shaping pulse. The complex envelopes of the transmitted
signals are
r
Es 2π
s̃i (t) = 2 e M (i−1) , 0 < t < T, i = 1, . . . , M .
T
Defining
n 2π oM
ak ∈ e M (i−1)
i=1
we can express the complex envelope as
r K−1
Es X
x̃(t) = 2 ak p(t − kT ) (1.13)
T k=0
p(t)
0 T t
ak ck
p(t) x̃(t)
Modulator
z −1 q
2 ETs
ck in (1.13):
r K−1
Es X
x̃(t) = 2 ck p(t − kT ) .
T k=0
The block diagram of this modulator is shown in Fig. 1.16, whereas the state
diagram of the encoder is shown in Fig. 1.17 for a quaternary PSK (QPSK)
modulation, i.e., when M = 4. ♦
µ(1)
a(1) /c(1)
a(0) /c(0)
a(0) /c(2)
µ(2) µ(0)
a(3) /c(3)
a(1) /c(3)
µ(3)
a(0) /c(3)
1.4 Exercises
Exercise 1.1. Try to figure out a possible application of the differential
encoding rule described in the Example 1.6.
whose symbols {ck } have known mean value and autocorrelation sequence:
E{ck } = η
E{ck+m c∗k } = R(m)
(it is thus a wide-sense stationary discrete-time process).
1.4 – Exercises 19
Exercise 1.4. Independent and equally likely symbols ak ∈ {±1} are en-
coded according to the following rule
ck = ak + ak−1 .
• Compute the power spectral density of the transmitted signal and draw
its graph.
• Find a shaping pulse p′ (t) that allows to express the signal in the form
∞
X
x(t) = ak p′ (t − kT ) .
k=−∞
• Draw the graph of |P ′(f )|, where P ′(f ) = F [p′ (t)], i.e., P ′ (f ) is the
Fourier transform of p′ (t), and provide an interpretation of the rela-
tionship between the two expressions of signal x(t).
Chapter 2
Sequence detection
where
K−1
X
y(t; a, σ0 ) = s(t − kT ; ak , σ k ) (2.1)
k=0
is the modulated signal according to the general model for signals with mem-
ory (1.6) and w(t) represents the thermal noise, modeled as Gaussian and
white. We used
a = (a0 , a1 , . . . , aK−1)
to denote the sequence that has been really transmitted. We will use σ for
the corresponding sequence of states. In y(t; a, σ 0 ) we explicitly expressed the
dependence on the transmitted sequence and the initial state only since, once
the initial state σ 0 has been chosen, all other states in σ are automatically
determined by the sequence a. One of the possible sequences that can be
transmitted and the corresponding sequence of states will be denoted as a
and σ, respectively. We will assume that the receiver perfectly knows the
signal model, i.e., the waveform y(t; a, σ0 ) associated with the pair (a, σ0 ),
for all possible pairs, and the state transition function. The transmission
system we are referring to is shown in Fig. 2.1. Given the initial state σ 0 , the
received signal r(t) depends on the entire sequence of information symbols a.
21
22 Sequence detection
w(t)
{ak } CHANNEL y(t) r(t) âk
ENC MOD REC
FILTER
strategy results to be more complex than that minimizing the sequence er-
ror probability. In addition, in typical applications both criteria practically
have the same performance since, as we will see, the dominant errors of the
MAP sequence detection strategy are those where only a few symbols are er-
roneously detected (they are more frequent since associated with sequences
that can easily be equivocated). These dominant errors are often the only
errors that occur for signal-to-noise ratio values of practical interest. In other
words, although the MAP sequence detection strategy is based on the min-
imization of the sequence error probability, for reasons that are not directly
related to it, favors the choice of sequences with only a few symbol errors.
In this chapter, we will investigate the MAP sequence detection strategy
whereas the MAP symbol detection strategy will be discussed in Chapter 5.
Based on the results of the previous chapter, we can now formalize the
MAP sequence detection strategy. In case of a baseband transmission, MAP
sequence detection strategy is thus2
ˆ KT
1 KT 2 N0
ˆ
â = argmax r(t)y(t; a, σ0 ) dt− y (t; a, σ0 ) dt+ ln P (a) . (2.2)
a 0 2 0 2
In case of a passband transmissions, assuming that r̃(t) represents the com-
plex envelope of the received signal and ỹ(t; a, σ0 ) the complex envelope of
a possible signal that can be transmitted, MAP sequence detection strategy
becomes
ˆ KT
1 KT
ˆ
â = argmax ℜ ∗
r̃(t)ỹ (t; a, σ0 ) dt − |ỹ(t; a, σ0 )|2 dt+N0 ln P (a) .
a 0 2 0
(2.3)
having exploited the fact that s(t − kT ; ak , σk ) has support in the interval
[kT, (k + 1)T ], and defined
ˆ (k+1)T
zk (ak , σk ) = r(t)s(t − kT ; ak , σk ) dt .
kT
The second integral, which represents the energy of the possible waveforms,
becomes
ˆ KT ˆ KT K−1
X K−1
X
2
y (t; a, σ0 ) dt = s(t − kT ; ak , σk ) s(t − nT ; an , σn ) dt
0 0 k=0 n=0
K−1
X (k+1)T
ˆ
= s2 (t − kT ; ak , σk ) dt
k=0 | kT {z }
E(ak ,σk )
K−1
X
= E(ak , σk )
k=0
t = (k + 1)T
r(t) zk (a(ℓ) , σ (m) )
(ℓ) (m)
s(T − t; a , σ )
(2.2) becomes
K−1 K−1
N0 N0 Y N0 X
ln P (a) = ln P (ak ) = ln P (ak ) .
2 2 k=0
2 k=0
X
K−1
1 N0
â = argmax zk (ak , σk ) − E(ak , σk ) + ln P (ak )
a
k=0
2 2
K−1
X
= argmax λk (ak , σk ) (2.4)
a
k=0
having defined
1 N0
λk (ak , σk ) = zk (ak , σk ) − E(ak , σk ) + ln P (ak ) .
2 2
Remark 2.1. Terms zk (a(ℓ) , σ (m) ) can be interpreted as the output, at time
(k + 1)T of a bank of correlators or matched filters according to the block
diagrams in Fig. 2.2. The number of filters (or correlators) is SM, S being
the number of states. We thus have a number of filters or correlators which is
independent of the transmission length. In fact, the same SM filters can be
reused in any symbol interval. Values {zk (a(ℓ) , σ (m) )} represent a sufficient
statistic for detection. ♦
Remark 2.2. Terms E(a(ℓ) , σ (m) ) can be interpreted as the energy of all
MS possible waveforms s(t; a(ℓ) , σ (m) ) of duration T . They can be precom-
puted and stored in the receiver. ♦
26 Sequence detection
Remark 2.3. We have significantly reduced the front end complexity. How-
ever, we still have to find the sequence a which maximizes (2.4), i.e., which
maximizes the sum of K terms λk (ak , σk ). For each k, we have MS possible
terms, one for each pair (a(ℓ) , σ (m) ). ♦
We can conclude that when the system is memoryless, the receiver can take
independent decisions on each symbol based on λk (ak ) that, in turns, depends
only on the slice of the received signal in the interval [kT < t < (k + 1)T ].
In this case, MAP sequence and symbol detection strategies coincide. ♦
Remark 2.5. When the information symbols are equally likely, the term
that depends on the a-priori symbol probabilities becomes irrelevant and can
be discarded. In this case, the knowledge about the noise intensity becomes
irrelevant too and the MAP strategy coincides with the maximum likelihood
(ML) strategy. ♦
Although we simplified the front end complexity of the receiver, at a first
sight it seems that we still have to compute (2.4) for any possible sequence
a and look for the sequence providing the largest value. In order to find
some ways to simplify the receiver, we introduce a special graph or diagram
describing the time evolution of a finite-state machine. This diagram can
be obtained by adding the temporal dimension to the state diagram, i.e.,
by representing the system states for each discrete-time instant k, and the
transitions between subsequent states.
−1
−1, −1
+1 −1
+1
+1, −1 −1, +1
−1
+1 −1
+1,+1
+1
Figure 2.3: State diagram for the system of the Example 2.1.
σk = (ak−1 , ak−2 )
(+1, +1)
(+1, −1)
(−1, +1)
(−1, −1)
k−1 k k+1 k+2
time
Figure 2.4: Trellis diagram for the system of the Example 2.1. Starting
from the state (−1, +1), the sequence ak−1 = +1, ak = −1, ak+1 = −1 will
determine the state evolution shown with a bold line.
σk = (ak−1 , ak−2 )
(+1, +1)
(+1, −1)
(−1, +1)
(−1, −1)
k
0 1 2 3 4 5 6
ak = +1 −1 +1 +1 −1 +1
σ2 σ4 σ6
σ3 σ5
σ̄
Figure 2.6: The optimal path that includes the state σ̄ at time n and another
path including the same state.
states {σk }6k=2 . In this example, the initial state σ2 is simply specified by the
first two information symbols. Fig. 2.5 shown an example of path associated
with the information sequence {+1, −1, +1, +1, −1, +1}. ♦
The number of possible paths in the trellis is M K , as many as the infor-
mation sequences. However, not all paths have to be considered in the search
of that with the largest metric. In fact, if we proceed in a smart way, we can
consider a significantly lower number of paths. It is convenient to define a
partial path (sequence) metric as the sum of the metrics of all branches of
that path till the kth discrete-time instant:
k
X
Λk (a0 , a1 , . . . , ak ) = λi (ai , σi ) .
i=0
is the largest one. This implies that also the first term Λn (a0 , . . . , an ) is the
largest among all path metrics ending into the state σ̄ at time n. In fact,
30 Sequence detection
k k+1
Survivor
Survivors
candidates
for the new state
if we were able to find another path, for example that denoted through a
dashed line in Fig. 2.6, with larger partial metric, it would be sufficient to
substitute it at the first part of the bold path to obtain an overall path with
a larger metric, thus contradicting the hypothesis that the bold path is the
optimal one.
In general, we do not know the state included in the optimal path at time
n—we only know that if it includes a given state σ̄ at time n, the partial path
would have the largest partial metric Λn (a0 , . . . , an ). Since this is true for
all states at time n, for each state it is sufficient to consider, among all M n
possible paths at time n, one path for each state only, that with the largest
metric among all paths ending in that state, as candidate to become the
optimal path. This path is called survivor of that state. Since it is sufficient
to consider, at time k, S survivors only, one for each state, we can denote
the relevant path metric to as Λk (σk ), thus highlighting the dependence on
the state only.
Let us now assume that we know the survivors and the relevant metrics
for all states σk at time k. In order to find the survivors and the relevant
metrics at the next time instant (i.e., for all states σk+1 ), it is sufficient to
consider all possible ways of extending the S survivors from time k to time
k + 1. The number of these possible candidates is equal to the number of
branches in a trellis section, i.e., SM. The survivor of each state σk+1 can
be obtained by choosing, among the candidates that terminate in that state,
that with the largest metric, as shown in Fig. 2.7. Hence, the updating rule
for the survivor metrics will be
C(a(1) , σ (1) )
t = (k + 1)T
λk (a(1) , σ (1) )
(1) (1)
s(T − t; a , σ )
C(a(2) , σ (1) )
t = (k + 1)T
λk (a(2) , σ (1) ) {âk }
r(t) (2) (1)
s(T − t; a , σ )
VA
C(a(M ) , σ (S) )
t = (k + 1)T
λk (a(M ) , σ (S) )
(M ) (S)
s(T − t; a , σ )
N0
C(a(ℓ) , σ (m) ) = − 21 E a(ℓ) , σ (m) + 2
ln P a(ℓ)
and the relevant metrics at the previous discrete-time instant and the branch
metrics of that trellis section. At every step, we need to always perform the
same operations. Thus the complexity will be linear in the transmission
length K. This algorithm is known in the literature as the Viterbi algorithm
(VA). A. J. Viterbi, who first proposed this algorithm, did not realize that
the algorithm was optimal [5]. The optimality was demonstrated by G. D.
Forney Jr. a few years later [6].
We can thus conclude that the structure of a MAP sequence detector is
based on the following elements:
• a bank of filters (or correlators) matched to all SM possible waveforms
s(t; a(ℓ) , σ (m) ) composing the modulated signal; this bank of filters pro-
vides the SM sequences {zk (a(ℓ) , σ (m) )} representing a sufficient statis-
tic;
k−5 k
and w̃(t) is the complex envelope of the thermal noise. The shaping pulse
p(t) will be assumed as having support in
T0 = [0, (K + L)T ] .
We can observe that, when we considered the model (1.6), we assumed that
the signal had support in [0, KT ]. With this model (2.5), we have border
effects related to the duration of the shaping pulse that need to be considered.
Taking into account the different observation interval and using the ex-
pression (2.3) of the MAP detection strategy related to complex envelopes,
34 Sequence detection
t = kT
r̃(t) xk
p∗ (−t)
x(t)
where we defined3
ˆ
xk = r̃(t)p∗ (t − kT ) dt
T0
ˆ +∞
= r̃(t)p∗ (t − kT ) dt
−∞
= r̃(t) ⊗ p∗ (−t)|t=kT (2.8)
and exploited the fact that p(t−kT ) is zero outside T0 for any k = 0, 1, . . . , K−
1. Eqn. (2.8) shows that xk can be interpreted as the sample, at time t = kT ,
of the output of a filter matched to pulse p(t), as shown in Fig. 2.10. This in-
terpretation assumes the use of an anticausal filter. However, from a practical
point of view, we can use a causal matched filter, i.e., a filter with impulse
response p∗ [(L + 1)T − t], thus introducing a delay of L + 1 discrete-time
instants on the sequence {xk }.
Let us now consider the second integral in (2.6). With similar considera-
3
Symbol ⊗ denotes “convolution”.
2.3 – MAP sequence detection for linear modulations 35
tions we have
ˆ ˆ K−1
X K−1
X
2
|ỹ(t; a, σ0 )| dt = ck p(t − kT ) c∗m p∗ (t − mT ) dt
T0 T0 k=0 m=0
K−1
X K−1
X ˆ
= ck c∗m p(t − kT )p∗ (t − mT ) dt
k=0 m=0 T0
K−1
X K−1
X ˆ +∞
= ck c∗m p(t − kT )p∗ (t − mT ) dt
k=0 m=0 −∞
K−1
X K−1
X
= ck c∗m p(t) ⊗ p∗ (−t)|t=(m−k)T
k=0 m=0
| {z }
gm−k
where
ˆ +∞
∗
g(t) = p(t) ⊗ p (−t) = p(τ )p∗ (τ − t)dτ
−∞
gk = g(kT ) .
We can observe that the signal at the matched filter (MF) output is
K−1
X
∗
x(t) = r̃(t) ⊗ p (−t) = ck p(t − kT ) ⊗ p∗ (−t) + w̃(t) ⊗ p∗ (−t)
| {z }
k=0
n(t)
K−1
X
= ck g(t − kT ) + n(t) .
k=0
Thus, this pulse g(t) is not only the autocorrelation of the finite energy
pulse p(t), but can be also interpreted as the impulse response of the overall
transmission system up to the MF output. Similarly, the discrete-time pulse
gk can be interpreted as the impulse response of the overall discrete-time
system up to the sampler. Fig. 2.11 summarizes these interpretations. We
may observe that the discrete-time signal {xk } can be directly expressed as
the discrete convolution of sequences {ck } and {gk }, that is
K−1
X
xk = ci gk−i + nk (2.10)
i=0
36 Sequence detection
t = kT
{ck } r̃(t) {xk }
p(t) p∗ (−t)
x(t)
w̃(t)
{ck } {xk }
gk
{nk }
p(τ ) p(τ )
p(τ − LT )
0 (L + 1)T τ 0 LT τ
(L + 1)T
where we used again the agreement that code symbols ck are zero when k < 0
or k ≥ K.
2.3 – MAP sequence detection for linear modulations 37
Let us consider again the energy term (2.9). It is clearly a real quantity.4
This happens since pulse g(t) has Hermitian symmetry, being an autocorre-
lation function:
ˆ +∞
g(−t) = p(τ )p∗ (τ + t) dτ
−∞
ˆ +∞ ∗
∗
= p (τ )p(τ + t) dτ
−∞
ˆ +∞ ∗
∗ ′ ′ ′
= p (τ − t)p(τ ) dτ
−∞
= g ∗ (t) .
Similarly, the Hermitian symmetry also holds for the corresponding discrete-
time pulse5
g−k = gk∗ .
it is
Amk = cm c∗k gk−m = (ck c∗m gm−k )∗ = A∗km .
The energy of all possible signals (2.9) can thus be obtained by summing the
elements on the main diagonal plus two times the real part of all elements of
the lower triangular part of matrix
A00 A01 ... A0,K−1
A10 A11 ... A1,K−1
A= .. .. .. .. .
. . . .
AK−1,0 AK−1,1 . . . AK−1,K−1
4
It is also non negative, since g(t) and {gk } are an autocorrelation function and an
autocorrelation sequence, respectively.
5
The symmetry of the discrete-time pulse {gℓ } is related to the assumption of using the
anticausal matched filter p∗ (−t). In a practical implementation, we will introduce a delay
to make the filter causal. Pulse {gℓ } will be thus symmetric with respect to this delay.
38 Sequence detection
We thus obtain
K−1
(K−1 k−1 )
X K−1
X K−1
X XX
ck c∗m gm−k = |ck |2 g0 + 2ℜ ck c∗m gm−k
k=0 m=0 k=0 k=1 m=0
K−1
(K−1 k )
X XX
= |ck |2 g0 + 2ℜ ck c∗k−ℓ g−ℓ
k=0 k=1 ℓ=1
K−1
(K−1 L )
X XX
2
= |ck | g0 + 2ℜ c∗k ck−ℓ gℓ (2.11)
k=0 k=0 ℓ=1
t = kT
r̃(t) λk (a(i) , σ (j) )
∗
p (−t) ℜ{·}
x(t) xk
1
Vk (a(i) , σ (j) ) = − |ck (a(i) , σ (j) )|2 g0 + N0 ln P (a(i) ) +
2 ( )
X L
−ℜ c∗k (a(i) , σ (j) )ck−ℓ (a(i) , σ (j) )gℓ .
ℓ=1
Remark 2.6. In the derivation of (2.12), we assumed that p(t) had a lim-
ited duration. Under this hypothesis, g(t) is also limited and so also gk .
However, the problem can be solved through the Viterbi algorithm under
looser conditions. In fact, it is sufficient that only gk has limited duration.
In other words, we do not need that g(t) = 0 for |t| > (L + 1)T (it can also
have infinite support) but it is sufficient that gk = 0 for |k| > L. Under this
hypothesis, the MAP sequence strategy can be implemented as a search of
the optimal path on a trellis diagram, thus on a diagram with a finite number
of states.
When p(t) has an infinite support, g(t) also has an infinite support. In
this case, both the modulated signal and the useful signal at the MF output
cannot be represented by using a finite-state machine. However, if gk = 0
for |k| > L, the discrete-time signal at the sampler output, that results to
be a sufficient statistic for detection, has a finite memory. We will see later
a few examples of pulses p(t) having infinite duration but such that the
corresponding sequence gk is such that gk = 0 for |k| > L. In these cases,
the observation interval that appears in (2.6) must be infinite (T0 → ∞). ♦
40 Sequence detection
This expression shows that the branch metric λk depends on the information
symbols {ak, ak−1, , . . . , ak−L }, that define the pair (ak , σk ). In this case, the
system memory is related to non-zero elements of the discrete-time pulse gk ,
i.e., to the presence of intersymbol interference (ISI) into the discrete-time
signal xk .
xk = ck + nk
and clearly depends on one code symbol only. The state of the system will
coincide with the encoder state
σk = µk .
1
λk (ak , σk ) = ℜ {xk c∗k } − |ck |2 + N0 ln P (ak ) .
2
The receiver trellis diagram thus coincides with that of the encoder.
2.3 – MAP sequence detection for linear modulations 41
1 1 1
λk (ak , σk ) = ℜ {xk c∗k } − |ck |2 + N0 ln P (ak ) + |xk |2 − |xk |2
2 2 2
1 1
= − |xk − ck |2 + N0 ln P (ak ) + |xk |2
2 2
∼ − |xk − ck |2 + 2N0 ln P (ak ) .
and corresponds to a search for the sequence {ak } corresponding to the code
sequence {ck } that, unless the term depending on the a-priori probability of
the information sequence, has the minimum square Euclidean distance from
the received sequence {xk }. This interpretation is related to the observation
that, for a linear modulation without ISI, the dimension of the signal space
is equal to the number of samples of the discrete-time signal at the sampler
output (see Exercise 2.1). ♦
Remark 2.8. We assumed that the code is such that a single code symbol
ck corresponds to a single information symbol ak (in other words the possible
redundancy is introduced by expanding the constellation cardinality). If
instead we have, for example, a convolutional code of rate 1/2, and thus a
(1) (2)
pair of code symbols (ck , ck ) corresponds to a single information symbol,
(1) (2)
denoting by (xk , xk ) the corresponding received samples, the branch metric
can be expressed as
(1) (1) (2) (2)
λk (ak , σk ) = −| xk − ck |2 − | xk − ck |2 + 2N0 ln P (ak ) .
♦
xk = ak + nk .
42 Sequence detection
Thus, the state is not defined and the trellis diagram degenerates into a single
state. Branch metrics (2.13) become
1
λk (ak ) = ℜ {xk a∗k } − |ak |2 + N0 ln P (ak ) . (2.15)
2
MAP sequence detection strategy becomes
K−1
X
â = argmax λk (ak )
a
k=0
where G(f ) is the Fourier transform of g(t). A class of functions G(f ) that
satisfy those conditions is that of the so-called raised cosine (RC) functions
T |f | < 1−α
2T
T
G(f ) = 2
1 + cos πTα
|f | − 1−α
2T
1−α
2T
< |f | < 1+α
2T
(2.17)
0 altrimenti.
where parameter 0 ≤ α ≤ 1 is the excess bandwidth or roll-off factor. In
general, functions satisfying condition (2.16) are said to have a vestigial sym-
metry around f = 1/2T .
If we require that the Nyquist condition is met and taking into account
that the optimal front end filter has to be a filter matched to the shaping
pulse p(t) of the linearly modulated signal, we can choose p(t) such that the
following integral equation is satisfied
p(t) ⊗ p∗ (−t) = g(t)
2.4 – Whitened matched filter front end 43
with g(t) satisfying the Nyquist condition. This equation can be easily solved
by working into the frequency domain, i.e.,
from which p
| P (f ) | = G(f ) .
Hence, it is sufficient to choose p(t) such that its amplitude spectrum is the
square root of a function having vestigial symmetry. The phase spectrum of
p(t) can be arbitrary since it will be perfectly compensated by the matched
filter which has opposite phase response. Hence, P (f ) can have a root raised
cosine (RRC) spectrum.
Rn (τ ) = 2N0 g(τ )
Sn (f ) = 2N0 |P (f )|2 . (2.18)
Ry (τ ) = Rx (τ ) ⊗ p(t) ⊗ p∗ (−τ ) .
Notice that, since w̃(t) is the complex envelope of a white noise process w(t)
with PSD N0 /2, its PSD is 2N0 .
If we consider the discrete-time noise process nk = n(kT ), its autocorre-
lation sequence is
Although we are using the same symbol, this function represents the PSD
of {nk } and not that of the continuous-time n(t) noise in (2.18). As known,
these two functions are related through the equation
1 X (c) m
+∞
Sn(d) (f ) = Sn f −
T m=−∞ T
where superscripts (d) and (c) denote the fact that we are referring to a
discrete- or a continuous-time process, respectively.
Let us now consider the bilateral Z-transform of sequence {gℓ }Lℓ=−L
L
X L
X 2L
X
−ℓ −L −ℓ+L −L
G(z) = gℓ z =z gℓ z =z gL−m z m (2.20)
ℓ=−L ℓ=−L m=0
The PSD is thus obtained by computing the function 2N0 G(z) on the unit
circle.
The Hermitian symmetry of pulse {gℓ } implies a particular symmetry of
function G(z). In fact, the Z-transform of g−ℓ
∗
is
L
X L
X −ℓ
L
X
∗ −ℓ 1
g−ℓ z = =gℓ∗z ℓ gℓ∗
ℓ=−L ℓ=−L ℓ=−L
z
L −ℓ ! ∗
X 1 1
∗
= gℓ ∗
=G
ℓ=−L
z z∗
2.4 – Whitened matched filter front end 45
1
ρ∗
ρ
ρ∗
1
ρ
∗
and since gℓ = g−ℓ , we have
∗ 1
G(z) = G .
z∗
Thus, if ρ is a zero of G(z), we have
∗ 1
G(ρ) = G =0
ρ∗
and, hence, also 1/ρ∗ is a zero. This pair of zeros is located in the complex
plane as shown in Fig. 2.14. In particular, if ρ is inside the unit circle, 1/ρ∗
is necessarily outside. Instead, if ρ is on the unit circle, it will be double.
Let us denote by z1 , . . . , zL , all zeros of G(z) inside the unit circle (|zi | ≤
1). The remaining L zeros are 1/z1∗ , . . . , 1/zL∗ , clearly located outside the unit
circle (|1/zi | ≥ 1). Function G(z) can be factored in the following way
YL YL
1
G(z) = z −L g−L (z − zi ) (z − ∗ )
i=1 i=1
zi
YL L L
L 1 Y Y
= (−1) g−L ∗
(1 − zi z ) (1 − zi∗ z)
−1
zi
| {z i=1 } i=1 i=1
α
L
Y L
Y
= α (1 − zi z ) (1 − zi∗ z) .
−1
(2.22)
i=1 i=1
that means
YL YL
1 1
g−L ∗
= gL .
z
i=1 i
z
i=1 i
Remembering that gL = g−L ∗
, we infer that both sides of this equation, and
thus constant α, are real. In addition, it is
L
Y
G(1) = α |1 − zi |2 > 0
i=1
since G(1) is proportional (through the positive constant 2N0 ) to the noise
PSD at frequency f = 0 that must be positive.
Now, defining
L L
√ Y −1
√ −L Y
F (z) = α (1 − zi z ) = αz (z − zi )
i=1 i=1
we have
L
!∗ L
1 √ Y √ Y
∗
F = α (1 − zi z ∗ ) = α (1 − zi∗ z) . (2.23)
z∗ i=1 i=1
Let us now consider the two discrete-time filters with transfer function 1/F (z)
and 1/F ∗ (1/z ∗ ), respectively. If these filters have at their input the noise
sequence {nk }, the PSD of the noise at the output is, in both cases,
Sn (f ) Sn (f )
2 = ∗ 2πf T 2 = 2N0 .
F e2πf T F e
| {z } | {z }
Sw′ (f ) Sw′′ (f )
2.4 – Whitened matched filter front end 47
1
F (z)
{wk′ }
{nk }
1
F ∗ (1/z ∗ )
{wk′′}
Hence, the noise at the output of any of these two filters is white. In Fig. 2.15,
we report the two filters and the related white noise processes {wk′ } and {wk′′ }.
i. Poles of 1/F ∗ (1/z ∗ ) are the zeros of F ∗ (1/z ∗ ) and they are all outside
the unit circle. Thus, the region of convergence of 1/F ∗ (1/z ∗ ) contains
at least the unit circle. Hence, there exists a left-sided sequence, i.e.,
anticausal, that has 1/F ∗ (1/z ∗ ) as Z-transform (see Appendix E). It
will be the impulse response of this first filter. Since it is anticausal,
it can be implemented in an approximate way only, by truncating the
impulse response and introducing a proper delay to make it causal.
ii. Poles of 1/F (z) are the zeros of F (z) and they are all inside the unit
circle. The region of convergence is, at least, the region outside the
unit circle. Hence, there exists a right-sided sequence, i.e., causal, that
has this function as Z-transform.
Both filters can be used to whiten the noise, as shown in Fig. 2.15. We will
choose the first anticausal WF, that can be implemented in an approximate
way.
Let us now consider the discrete-time equivalent model of the overall sys-
tem up to the sampler and put the WF at its output, as shown in Fig. 2.16(a).
By exploiting the factorization G(z) = F (z)F ∗ (1/z ∗ ), we can observe that the
system up to the output of the WF is equivalent to that shown in Fig. 2.16(b),
where {wk } is a white noise sequence whose samples have variance 2N0 . Since
L L
√ Y −1
X
F (z) = α (1 − zi z ) = fℓ z −ℓ
i=0 ℓ=0
the sequence {fℓ }Lℓ=0 can be interpreted as the impulse response of the discrete-
time equivalent model with white noise of the overall system represented in
48 Sequence detection
{nk }
(a)
{wk }
(b)
Figure 2.16: (a) Whitening filter at the output of the discrete-time equivalent
model of the overall system. (b) Discrete-time equivalent model with white
noise of the overall system.
This impulse response {f−ℓ }ℓ=−L will be anticausal. The solution described
∗ 0
by (2.25) has to be preferred since its equivalent impulse response has min-
imum phase, i.e., all its zeros are inside the unit circle. On the contrary,
the second solution has maximum phase, i.e., all its zeros are outside the
unit circle. They have the same amplitude response and differ for the phase
2.4 – Whitened matched filter front end 49
{wk }
{ak } {ck } 1
{yk′ }
∗
COD F z∗
sufficient
statistics
t = kT
r̃(t) {xk } {yk }
1
p∗ (−t) F ∗ (1/z ∗ )
that coincides with that required for sequence detection based on signal {xk }.
The MAP sequence detection strategy can be obtained by simply observ-
ing that (2.25) corresponds to a vector channel with additive Gaussian noise
having independent and identically distributed samples. This vector channel
can be expressed as
y = s(a) + w (2.26)
where the kth element of s(a) is
L
X
sk (a) = fℓ ck−ℓ . (2.27)
ℓ=0
Remembering that the code symbols are {ck }K−1 k=0 , the elements of y relevant
K−1+L
for detection are {yk }k=0 . Using (1.1), the detection strategy becomes
This second approach that makes use of a WMF has been proposed in
1972 by G. D. Forney [8]. Two years later, in 1974, G. Ungerboeck published
in [9] the approach we discussed in Section 2.3. These two solutions are
both optimal. In addition, they work on the same trellis diagram, thus
they have the same complexity. The Ungerboeck approach makes use of a
simpler front end but employs the branch metrics (2.13) that are slightly more
complex. The Forney approach employs a more complex front end filter but
uses simpler branch metrics (2.29). As a conclusion, these two solutions are
perfectly equivalent, at least unless we do not consider some aspects related
to complexity reduction that will be discussed in Chapter 6.7 In the case of
absence of ISI, both approaches perfectly coincide. This can be verified by
considering that, in the absence of ISI, gℓ = 0 for ℓ 6= 0 in (2.13) and fℓ = 0
for ℓ 6= 0 in (2.29).
The MAP sequence detection strategy is often applied to the case of
equally likely information symbols. In this case, it coincides with the maxi-
mum likelihood strategy. For this reason, it is often referred to as maximum
likelihood sequence detection (MLSD) and the metric is often called likelihood
function.
Correct path
Occasional departure
Correct path
k k+1 k+H k+H +1
Detected path
again on the correct path. They are called error events and represent the
basis for the performance analysis of these receivers.
We say that at discrete-time k an error event of length H begins, if correct
and detected paths differ for the states σk+1 , . . . , σk+H , as shown in Fig. 2.20.
The duration H of the error event thus represents the number of wrong states
in the detected path.
where µk is the encoder state at time k and L the channel dispersion length.
Let us suppose that at time k an error event of length H begins. The state
definition assures that symbols ak−L , . . . , ak−1 are correctly detected. The
first wrong state is σk+1 , defined as
(+1, −1)
−1 +1 −1
(−1, +1)
−1
−1
(−1, −1)
Unless otherwise specified, we will consider this latter notation. Notice that
if in the case of the second notation all sequences a and â are possible, in
the third case, given the error sequence e, not all information sequences a
are possible.
Based on the previous examples (2.4 e 2.5), in the case of linear modula-
tion an error event beginning at time k and of duration H is characterized
by the error sequence
. . . , 0, ek , . . . , ek+H−L , 0, . . .
The first summation is extended to all possible shapes, the second to all
beginnings of the error events producing an error at time k. Let us define
the indicator function
(j)
1 if εi produces an error at time i + m,
(j)
qm [εi ] = i.e., m instants after its beginning
0 otherwise.
(j)
As an example, for a linear modulation, it is q0 [εk ] = 1, since an error event
(j)
that begins at time k is characterized by ek 6= 0; in addition, it is q−1 [εk ] = 0
56 Sequence detection
(j)
since at time k − 1 the error event has not started yet; q1 [εk ] can be either
1 or 0 depending on the fact that either ek+1 6= 0 or ek+1 = 0. Thus, it is
(j) (j)
P (âk 6= ak | εi ) = qk−i [εi ]
(j)
The probability of the error event εi rigorously depends on the begin-
ning instant i, in addition to the shape j. However, if the transmission is
sufficiently long, we may reasonably imagine that it becomes independent of
i, except for an initial transient. Under this steady-state condition, we have
X X (j)
P (âk 6= ak ) = P (ε(j)) qk−i [εi ] .
j i
X k
X
(j) (j) (j)
w(ε ) = qk−i [εi ] = qk−i [εi ]
i i=−∞
∞
X (j)
= qm [εk−m ]
m=0
where the sum has to be extended over all error events beginning at any
given time k. The error probability is thus equal to the average number of
wrong symbols over all error events beginning at a given instant.
With alternative notation, the error probability (2.30) can be expressed
in the form
X X
P (âk 6= ak ) = w(e)P (a, e)
e a∈A(e)
X X
= w(e) P (e | a)P (a) (2.31)
e a∈A(e)
where
2.5 – Performance of MAP sequence detectors 57
In the particular case that P [Λ(a + e) > Λ(a) | a] does not depend on a but
only on e, we will say that the uniform error property (UEP) holds. In this
case, we obtain
X
Ps ≤ w(e)P [Λ(a + e) > Λ(a) | a] P [a ∈ A(e)] (2.32)
e
where {a ∈ A(e)} represents the event that, given the sequence e, a generic
information sequence a is compatible with it.
In a similar way, we may obtain an upper bound on the bit error prob-
ability. By denoting with b(e, a) the number of bit errors corresponding to
the error event (e, a), similarly to (2.31) we have
1 X X
Pb = b(e, a)P (e | a)P (a) .
log2 M e
a∈A(e)
the ratio between the average number of symbol errors and the number of
transmitted symbols. By substituting w(e) with b(e, a) we obtain the ratio
between the average number of bit errors and the number of transmitted
symbols. Thus, it has to be divided by the number of bits per symbol.
If the UEP holds, we have
1 X X
Pb ≤ P [Λ(a + e) > Λ(a) | a] b(e, a)P (a).
log2 M e
a∈A(e)
The derived upper bounds on Ps and Pb are based on the computation of the
PEPs, that are usually easy to compute since they are related to a binary
signaling system. In the next section, we will discuss the computation of the
PEP in the specific case of an AWGN channel.
For high values of the signal-to-noise ratio (SNR), the first term will take
into account the dominant errors whereas the second term can be neglected,
thus obtaining
" #
∼ X dmin
Ps ≤ w(e)P [a ∈ A(e)] Q √ (2.36)
e∈E
2N0
min
∼
where symbol ≤ means that the upper bound is approximate and the term
within square brackets is the multiplicity of the minimum distance error
events; by denoting this term with Ks , for high SNR values we obtain
∼ dmin
P s ≤ Ks Q √ .
2N0
Similarly, with reference to (2.33) we have
X
1 dmin
Pb ≤ b(e)Q √ P [a ∈ A(e)] + other terms (2.37)
log2 M e∈E 2N0
min
where “other terms” are related to error events with a larger distance, thus
infinitesimals of higher order with respect to the signal-to-noise ratio (SNR).
Asymptotically, we have
X
∼ dmin 1
Pb ≤ Q √ b(e)P [a ∈ A(e)] (2.38)
2N0 log2 M e∈E
min
| {z }
Kb
Ki
di
dmin d1 d2
those having
√ minimum distance, and on the value that the Q function takes
on dmin / 2N0 . In general, in order to compute the upper bound we could
define a multiplicity distribution or distance spectrum, i.e., a diagram rep-
resenting both the values of all distances related to the error events and
their corresponding multiplicities, as shown in Fig. 2.22 where we assumed
d1 ≤ d2 ≤ . . . . In particular, the upper bound (2.35) can be expressed as
X
dmin di
P s ≤ Ks Q √ + Ki Q √
2N0 d∈{di }
2N0
where {di } represent the set of all possible distances between pairs of signals
(excluding dmin ) and Ki represents the multiplicity of the error events having
distance di , i.e.,
X
Ki = w(e)P [a ∈ A(e)]
e∈Ei
having denoted with Ei the set of all possible error sequences having distance
di . At high SNR values the dominant term is that related to the minimum
distance whereas for lower SNR values we could need to take into account
some terms with distance larger than dmin . Similar considerations hold for
the bit error probability.
It is thus clear that the evaluation of the distance d(a, e) (d(e) in case of
UEP) is crucial. In the case of the AWGN channel, it is
ˆ +∞
2
d (e, a) = k s(t, a) − s(t, a + e) k = 2
| s(t, a) − s(t, a + e) |2 dt.
−∞
8
√
The coefficient 2 and the case of a pulse p(t) having energy Ep 6= 1 can be taken
into account by properly normalizing the constellation of symbols ck .
2.5 – Performance of MAP sequence detectors 61
where clearly ˆ +∞
gj−i = p(t − iT )p∗ (t − jT ) dt .
−∞
and we can thus conclude that the UEP holds. If we now assume that at
time k an error event of duration H begins, we can write
k+H−L
X k+H−L
X
d2 (e, a) = (ci − ĉi )(cj − ĉj )∗ gj−i .
i=k j=k
After the initial transient, the result does not depend on k. Hence, assuming
k = 0 we have
H−L
X H−L
X
2
d (e, a) = (ci − ĉi )(cj − ĉj )∗ gj−i .
i=0 j=0
Remark 2.9. Bit and symbol error probabilities are usually expressed as
a function either of the ratio ES /N0 or the ratio Eb /N0 , ES and Eb being
the mean energy per information symbol and per bit, respectively. Thus, we
have to compute ES (clearly Eb = ES / log2 M) in the scenario at hand. The
mean energy per information symbol results to be
ES = Ps T
where Ps is the mean signal power and can be computed as half the power
of the complex envelope:
Ps = Ps̃ /2 .
Since the complex envelope of the signal is
√ X
s̃(t) = 2 ck p(t − kT )
k
Wc (f )
Ws̃ (f ) = 2 |P (f )|2
T
where Wc (f ) is the Fourier transform of the autocorrelation sequence Rc (m) =
E{ck+m c∗k } of code symbols. We thus have
ˆ ∞
Ps̃ T
ES = = Wc (f ) |P (f )|2 df .
2 −∞
since we assumed that the pulse has unit energy. This expression for ES can
be directly employed in the expression of symbol and bit error probabilities.
♦
Let us now consider the discrete-time equivalent model with white noise of
the system, reported in Fig. 2.23. The sufficient statistic yk can be expressed
as
L
X
yk = fℓ ck−ℓ +wk .
ℓ=0
| {z }
sk (a)
2.5 – Performance of MAP sequence detectors 63
The square distance can be thus expressed through the energy of the error
sequence of code symbols filtered by the impulse response of the equivalent
discrete-time model with white noise. In particular, in the case of an uncoded
transmission, we have
2
X X L
2
d (e) = fℓ ek−ℓ
k ℓ=0
ck
fk yk
wk
Figure 2.23: Discrete-time equivalent model with white noise of the system.
64 Sequence detection
in which:
• we neglected the (non-negative) terms of the summation corresponding
to the values of the index k = n + 1, . . . , n + H − 1;
• we exploited the fact that en−1 = 0, since the error event begins at time
n, and en+H = 0, since it ends at time n + H;
• in the last lower bound we exploited the fact that, for sure, en 6= 0 and
en+H−1 6= 0, and defined
d2min = ε2 . (2.43)
Let us now evaluate the term P (a ∈ A(e)) appearing in the upper bound
expression when UEP holds. Since the information symbols are independent,
we have
n+H−L
Y
P (a ∈ A(e)) = P (ak + ek ∈ A)
k=n
M −1 |ek |
P (ak + ek ∈ A) = P (ak 6= −(M − 1)) = =1− .
M 2M
M −2 | ek |
P (ak + ek ∈ A) = =1− .
M 2M
| ek |
P (ak + ek ∈ A) = 1 − .
2M
Y
n+H−L
| ek |
P (a ∈ A(e)) = 1− .
k=n
2M
2.5 – Performance of MAP sequence detectors 67
The error events having minimum distance belong to the set of those with
H = 1, as already said. When e ∈ Emin , we thus have w(e) = 1 and
X
Ks = P (ak + ek ∈ A) .
e∈Emin
In the second summation, we can consider only one term, thus obtaining
a further lower bound. In order to have the tighter lower bound, we will
pick up the largest possible term, again that corresponding to the minimum
distance, i.e.,
X dmin (a)
P (â | a) ≥ Q √
â6=a
2N0
68 Sequence detection
where dmin (a) denotes the minimum distance among all those of paths dif-
ferent from a. We thus have
X
dmin (a)
Ps ≥ P (a)Q √
a
2N0
that can be further lower bounded by limiting the summation to those se-
quences a that already have a sequence at minimum distance. Hence
X
dmin dmin
Ps ≥ P (a)Q √ =Q √ P (a ∈ Amin )
a∈A
2N 0 2N 0
min
where Amin is the subset of all sequences a that have at least one sequence at
minimum distance with error event starting at time k. The term P (a ∈ Amin )
is the probability that, by picking a sequence at random, it has at least one
sequence at minimum distance.
We can conclude by observing that the error probability satisfies
′ dmin ∼ dmin
Ks Q √ ≤ P s ≤ Ks Q √
2N0 2N0
in which
X
Ks = w(e)P (a ∈ A(e))
e∈Emin
Ks′ = P (a ∈ Amin )
∼
and symbol ≤ means that the upper bound holds only asymptotically for
high values of the SNR, and is thus approximate since we neglected the error
events having a distance larger than the minimum one. This approximation
is however asymptotically tight (i.e., exact). When Ks′ and Ks are close
enough, we can evaluate, in an accurate way, the probability Ps . Similar
results can be derived for the bit error probability.
since every symbol has at least one symbol at minimum distance. We thus
obtain
dmin ∼ 1 dmin
Q √ ≤ Ps ≤ 2 1 − Q √ .
2N0 M 2N0
In particular, the approximate upper bound exactly coincides with the asymp-
totic symbol error probability for a PAM transmission in the absence of ISI.
♦
70 Sequence detection
2.6 Exercises
Exercise 2.1. Let us consider an M-ary transmission system employing a
linear modulation. The complex envelope of the transmitted signal is
K−1
X
s̃(t) = ck p(t − kT ) .
k=0
In general, symbols ck and pulse p(t) are complex and the noise is additive,
white, and Gaussian. Let us also assume that the condition for the absence
of ISI holds, i.e.,
1 for k = 0
gk =
0 otherwise
where gk = g(kT ) and g(t) = p(t) ∗ p∗ (−t).
• Demonstrate that an orthonormal basis of the signal space is
{p(t − kT )}K−1
k=0 .
Exercise 2.4. Let us suppose that the discrete-time equivalent pulse at the
MF output {gℓ }Lℓ=−L , representative of a transmission system employing a
linear modulation, is real.
• Find the general condition that p(t) has to satisfy to have g(t) real.
Exercise 2.5. Let us assume that the shaping pulse p(t) of a linear modu-
lation has duration T and unit energy. This pulse is distorted by the channel
so that the shaping pulse at the receiver results to be
where α is a constant.
• Define the state required to represent the memory of the signal at the
WF output.
Exercise 2.6. Repeat the previous exercise when pulse p(t) has a RRC
Fourier transform.
Exercise 2.7. With reference to Exercise 1.4, describe the MAP sequence
detection strategy.
Exercise 2.8. Let us consider a coded linear modulation that can be ex-
pressed in the form
( )
√ X
s(t) = 2ℜ ck p(t − kT ) ejω0 t
k
• Prove that the square distance of a generic error event starting at time
n, having duration H, and characterized by the error sequence {ek }
and the information sequence {ak }, can be expressed as
n+H−L
X n+H−L
X
2
d (e, a) = (ĉi − ci )(ĉj − cj )∗ gj−i
i=n j=n
where L is the channel dispersion length, {ck } is the code sequence as-
sociated with the information sequence {ak }, {ĉk } is the code sequence
associated with the information sequence {ak + ek }, and {gk } is the
discrete-time impulse response at the MF output.
• In the case of absence of ISI, show that d2 (e, a) is equal to the square
distance of the code sequences.
2.6 – Exercises 73
Exercise 2.9. Following Exercise 2.8, let us consider a coded linear mod-
ulation. Prove that the square distance of a generic error event starting at
time n, having duration H, and characterized by the error sequence {ek } and
the information sequence {ak }, can be expressed as
2
X X
n+H L
2
d (e, a) = fℓ (ĉk−ℓ − ck−ℓ )
k=n ℓ=0
where L is the channel dispersion length, {ck } is the code sequence associated
with the information sequence {ak }, {ĉk } is the code sequence associated with
the information sequence {ak + ek }, and {fℓ } is the discrete-time impulse
response at the WF output.
• Compute dmin .
Exercise 2.11. With reference to Exercises 1.4 and 2.7, compute the per-
formance of the MAP sequence detector.
75
76 Detection in the presence of unknown parameters
rk = θck + wk , k = 0, 1, . . . , K − 1 (3.1)
ii. The noncoherent channel. This channel introduces a phase shift unknown
to the receiver, constant for the whole transmission, and uniformly
distributed in [0, 2π). It is straightforward to show that the samples
at the MF output still represent a sufficient statistic. They can be
expressed as
rk = ck eθ + wk , k = 0, 1, . . . , K − 1 . (3.2)
rk = ck eθk + wk , k = 0, 1, . . . , K − 1 . (3.3)
rk = θk ck + wk , k = 0, 1, . . . , K − 1 (3.4)
1
For all considered channels, samples at the MF output will be considered as an example
of (exact or approximate) sufficient statistic characterized by discrete-time white noise. In
this chapter, they will be denoted as rk instead of xk . This is due to the fact that,
in thepabsence of ISI, an orthonormal basis for the set of transmitted signals is {p(t −
kT )/ Ep }K−1k=0 (see Exercise 2.1). Thus, these samples also represent the components of
r(t) over this orthonormal basis and, as such, they will be denoted as {rk }. We will use
the notation rkk21 = [rk1 , rk1 +1 , . . . , rk2 ]T . According to it, it is thus r = rK−1
0 .
3.2 – Stochastic parameter with known pdf 77
2π
1
ˆ
J0 (x) = ex cos θ dθ
2π 0
where I0 (x) is the zeroth-order modified Bessel function of the first kind
defined as ˆ 2π
1
I0 (x) = ex cos θ dθ = J0 (x) .
2π 0
Neglecting terms that are irrelevant for detection, strategy (3.5) becomes
" ! ( )#
1 K−1
X 1
K−1
X
â = argmax I0 rk c∗k exp − |ck |2
a
N0 k=0 2N0 k=0
" ! #
1 X
K−1 K−1
X
1 2
= argmax ln I0 rk c∗k − |ck | . (3.8)
a N0 2N0
k=0 k=0
By observing (3.8), one can easily understand that two coded sequences
c1 and c2 satisfying the following condition c1 = ejφ c2 , where φ is any an-
gle multiple of the angle of symmetry of the employed constellation, have
the same metric. These sequences are thus indistinguishable (noncoherently
catastrophic sequences). For such a channel, it is thus required to use an
encoder having a codebook (the set of codewords) without codewords satis-
fying that condition (noncoherently noncatastrophic codes) [14]. Differential
encoding is an example noncoherently noncatastrophic code. Another way
to obtain a noncoherently noncatastrophic code is through serial concatena-
tion of a differential encoder and a rotationally invariant code, that is a code
such that, for every codeword c, all sequences obtained through a rotation of
any multiple of the angle of symmetry of the employed constellation is still
a codeword. ♦
An approximate strategy that allows to obtain receivers that can be im-
plemented with a complexity which is linear in K, is that called truncated-
memory strategy [15]. The optimal strategy (3.5) is based on the pdf f (r|a).
This pdf can be expressed, by using the chain rule, as
K−1
Y
f (r|a) = f (r|c) = f (rk |r0k−1, c) .
k=0
On the other hand, it is clear that, in the presence of a causal system, the pdf
f (rk |r0k−1 , c) will depend on coded symbols transmitted till the discrete-time
instant k and not on future symbols. Thus, we can write
K−1
Y
f (r|a) = f (rk |r0k−1, ck0 ) .
k=0
This strategy can be implemented through the Viterbi algorithm with branch
metric λk (ak , σk ) = ln f (rk |r0k−1, ckk−C ). The trellis state is thus defined as
σk = (ak−1 , ak−2 , . . . , ak−C , µk−C ) (3.10)
80 Detection in the presence of unknown parameters
k−1
Example 3.2 For the noncoherent channel, it is R = C. Pdf f (rk |rk−C , ckk−C )
can be also expressed as
k−1 f (rkk−C |ckk−C ) f (rkk−C |ckk−C )
f (rk |rk−C , ckk−C ) = k−1
= k−1 k−1
.
f (rk−C |ckk−C ) f (rk−C |ck−C )
k−1 k−1
Pdfs f (rkk−C |ckk−C ) and f (rk−C |ck−C ) have both expression given by (3.7). It
is thus
P
I 1 C ∗
1 1 2 r c
0 N0 ℓ=0 k−ℓ k−ℓ
k−1
f (rk |rk−C k−1
, ck−C )= exp − |rk | + |ck |2 P .
2πN0 2N0 1 C ∗
I r c
0 N0 ℓ=1 k−ℓ k−ℓ
−1
10
−2
10
known ph.
C=6; S=16
BER
X C
k−1 k−1 ∗
ln f (rk |rk−C , ck−C ) = const. + rk−ℓ ck−ℓ
ℓ=0
X C 1
− rk−ℓ c∗k−ℓ − |ck |2 . (3.12)
2
ℓ=1
−1
10
−2
10
BER
−3 known ph.
10 C=5; S=4
C=5; S=1 &
C=4; S=4
C=4; S=1 &
−4 C=3; S=4
10 C=2; S=4
C=3; S=1
C=2; D&S
C=1; D&S
−5
10 8 9 10 11 12 13 14 15 16
Eb/N0 [dB]
Figure 3.2: Performance of the strategy based on the branch metric (3.12)
for a 16-QAM modulation with quadrant differential encoding.
C
X
r̂k = qℓ rk−ℓ
ℓ=1
3.2 – Stochastic parameter with known pdf 83
100
S =8
10−1
S =16
−2
10
BER
S =64
Figure 3.3: Performance of the strategy based on the branch metric (3.12)
for a 8-state trellis coded modulations (TCMs) with 16-QAM.
∗
E{(r̂k − rk )rk−m } = 0 , m = 1, 2, . . . , C ⇒
∗ ∗
E{r̂k rk−m } = E{rk rk−m } , m = 1, 2, . . . , C ⇒
C
X
∗ ∗
qℓ E{rk−ℓ rk−m } = E{rk rk−m } , m = 1, 2, . . . , C .
ℓ=1
C
X
qℓ ck−ℓ c∗k−m Rθ (m − ℓ) + qm |ck−m |2 Rθ (0) + 2N0 = ck c∗k−m Rθ (m) .
ℓ=1
ℓ6=m
84 Detection in the presence of unknown parameters
or, equivalently, as
C
X
2N0
pℓ Rθ (m − ℓ) + pm Rθ (0) + = Rθ (m) (3.13)
ℓ=1
|ck−m |2
ℓ6=m
c
having defined pℓ = qℓ k−ℓck
. Notice that the linear system (3.13) having C
equations that allow to compute coefficients pℓ , depends on coded symbols
through the terms |ck−m |2 . In the case of a PSK constellation (for which
|ck−m |2 = cost.), this dependence disappears and thus coefficients pℓ are
independent of the coded sequence. As a function of coefficients pℓ , the
optimal predictor is
C
X rk−ℓ
r̂k = ck pℓ .
ck−ℓ
ℓ=1
The mean square values of the prediction error results to be
" C
#
X
σe2 = E{|r̂k − rk |2 } = E{[rk − r̂k ]rk∗ } = 2N0 + |ck |2 Rθ (0) − pℓ Rθ (−ℓ)
ℓ=1
As said, for a PSK modulation coefficients pℓ and the mean square prediction
error σe2 are independent of the coded symbols. Otherwise, they depend on
|ck−1|2 , |ck−2|2 , . . . , |ck−C |2 . A detection strategy based on linear prediction
has been first proposed in [19, 20, 21, 22]. The reader can also refer to [23, 4]
for a thorough analysis. ♦
3.3 – Parameter modeled as deterministic and unknown 85
e−θ
X
K−1
2
â = argmax f (r|a, θ) = argmin rk − ck eθ
a a
k=0
X
K−1
−θ
2
= argmin rk e − ck
a
k=0
the receiver Rθ implementing this strategy is shown in Fig. 3.4. The receiver
changes when the value of θ changes. Thus, the UMP test does not exist. ♦
Example 3.6 Let us consider again the noncoherent channel. In this case,
the joint likelihood function can be expressed in the form (3.6) reported here
for convenience:
K ( K−1
)
1 1 X
f (r|a, θ) = exp − |rk |2
2πN0 2N0
k=0
( )
1 X
K−1
1 K−1
X
· exp − |ck |2 + rk c∗k cos [θ − φ(r, c)] .
2N0 k=0
N0 k=0
that is, we can maximize the joint likelihood function by neglecting the cosine
term, choosing then θ properly such that the cosine is maximized. Notice
3.4 – Estimation techniques 87
(a) (b)
that strategies (3.8) and (3.16) are very similar. The latter can be obtained
from the former by adopting the approximation ln I0 (x) ≃ x. ♦
3.3.2 Synchronization
The approach commonly used in the receiver design is that based on synchro-
nization. According to it, an estimate θ̂ of the unknown parameter is found
from the received signal according to some technique (as better specified in
the following). This estimate is then used in place of the true value of the
parameter, neglecting a possible residual error. In other words, the adopted
detection strategy will be
This approach is described in Fig. 3.5(a). Fig. 3.5(b), instead, refers to the
particular case of a noncoherent channel.
where
1 1
CRB(θ) = − n o= h i2 . (3.19)
∂ 2 ln f (r|θ,v)
Er ∂θ 2 Er ∂ ln f (r|θ,v)
∂θ
computation but more loose. The MCRB for parameter θ can be expressed
as [24]
var(θ̂) = E{[θ̂ − θ]2 } ≥ MCRB(θ)
where
1 1
MCRB(θ) = − n o= h i2 . (3.21)
∂2 ln f (r|θ,v,u)
Er,u ∂θ 2 Er,u ∂ ln f (r|θ,v,u)
∂θ
and finally
N0
MCRB(θ) = . (3.23)
P ∂sk (θ,v,u) 2
Eu k ∂θ
3.4.2 DA estimator
In packet transmissions, a field of known data is usually available for syn-
chronization purposes. This field can be placed at the beginning (preamble)
or in the middle of a packet (midamble). In other cases, we have more fields
of known data distributed along the packet. These known symbols are often
called pilot symbols. They can thus be employed for synchronization. As-
suming that we have available L consecutive known symbols, in particular
symbols a0L−1 , the ML-DA estimator based on them and on the corresponding
received samples is
θk = 2πF kT + φ .
it is
L−1
X
F̂DA = argmax f (r0L−1 |a0L−1 , F ) = argmax rk a∗k e−2πF kT
F F
k=0
PL−1
The quantity ∗ −2πF kT
k=0 rk ak e is the Fourier transform of the discrete-
time sequence rk ak . Its maximum value can be found through a two-step
∗
Example 3.8 The CRB in the case of the previous example (DA frequency
estimation in the presence of an unknown phase) is not available in closed
form. This is due to the pdf (3.25) that, plugged into (3.19), makes the com-
putation of the expectation impossible in closed form. However, it is possible,
once the symbol constellation and the value of L have been selected, to nu-
merically compute it through a Montecarlo simulation. The computation
of the MCRB is, however, much simpler. Let us assume that the received
samples employed for the estimation are those for k0 ≤ k ≤ k0 + L − 1. By
using (3.23) that takes the form
N0
MCRB(F ) =
P ∂sk (F,a,θ) 2
Eθ k ∂F
where
sk (F, a, θ) = ak e(2πF kT +θ)
we easily obtain
N0
MCRB(F ) = Pk0 +L−1 . (3.26)
4π 2 T 2 k=k0 k 2 |ak |2
This result depends on the value of k0 . Since we are interested in the tight-
est possible lower bound, we can choose the value of k0 giving the largest
3.4 – Estimation techniques 93
possible value of (3.26) and thus the lowest possible value of the denomina-
tor. Assuming that the symbols in the preamble are always the same when
changing k0 and that L is odd, the lowest possible value is obtained when
k0 = − L−1
2
. It is thus
N0
MCRB(F ) = P L−1 .
4π 2 T 2 2
k=− L−1
k 2 |ak |2
2
L−1
This expression for the MCRB(F ) allows also to optimize symbols a−2L−1
2
in the preamble. In fact, they have to be selected is such a way the term
P L−1
2
k=− L−1
k 2 |ak |2 is maximized. For M-PSK signals, being |ak | = 1 and
2
P
considering that nk=1 k 2 = n(n+1)(2n+1)
6
, we obtain
3N0
MCRB(F ) = .
π 2 T 2 (L2 − 1)L
♦
Example 3.9 Let us now consider the phase estimation. The channel
model is given by (3.2) where now symbols are obviously uncoded. The pdf
f (r0L−1 |a0L−1 , θ) can be expressed as (see (3.6))
L ( L−1
)
1 1 X
f (r0L−1|a0L−1 , θ) = exp − |rk |2 + |ak |2
2πN0 2N0 k=0
( )
1 XL−1
· exp rk a∗k cos [θ − φ(r, a)]
N0 k=0
ℑ ℑ
|r| |w|
L θw
|r|
θr |w|
θ θr − θ θw − θ
ℜ L ℜ
(a) (b)
ℑ[w]
θ̂DA ≃ θ + .
L
Under this high-SNR condition, we have
E{θ̂DA } = θ
h i2 N0
var{θ̂DA } = E θ̂DA − θ = .
L
The estimator is thus unbiased and, as shown in the following, it reaches the
CRB (computed in the next example) at least under the high-SNR condition.
For low values of the SNR, the estimator becomes biased and we can no more
compare it with the CRB which holds for unbiased estimators. ♦
Example 3.10 Let us compute the CRB for the case of the previous exam-
ple, that is for the DA phase estimate when L received samples are observed.
In this case, v = a, whereas u is an empty set. The CRB and the MCRB
thus coincide and we can employ (3.23) that now takes the form
N0
CRB(θ) = MCRB(θ) = P
∂sk (a,θ) 2
k ∂θ
3.4 – Estimation techniques 95
where
sk (a, θ) = ak eθ .
It thus results
N0
CRB(θ) = MCRB(θ) = P 2
k |ak |
that, in the case of M-PSK signals, becomes
N0
CRB(θ) = MCRB(θ) = .
L
♦
Λ(θ) = ln f (r0L−1|a0L−1 , θ) =
L L−1
" L−1
#
1 1 X 2 2 1 −θ
X
∗
= ln − |rk | + |ak | + ℜ e rk ak
2πN0 2N0 k=0 N0 k=0
Considering the kth term of this derivative, we can recursively update the
phase estimate (according to the gradient algorithm) as
h i
θ̂k+1 = θ̂k + αℑ e−θ̂k rk a∗k (3.27)
r(t) rk
p∗(−t)
t = kT
e−θ̂k
Error ak
LUT comput.
The S-curve is shown in Fig. 3.8. This figure shows a stable equilibrium point
for φ = 0 and two unstable points for φ = ±π. When the PLL is in one of
these two unstable equilibrium points, it can get “trapped” in it for a long time
3.4 – Estimation techniques 97
S(φ)
π
φ
−π
θ̂k ek
αz −1
1−z −1
until noise or a channel phase variation will perturb this condition—in this
case the PLL can move toward a stable equilibrium point. This phenomenon
is called hang-up and can significantly increase the acquisition time. The
existence of this stable equilibrium point for φ = 0 means that the estimator
is unbiased.
The PLL equivalent model is shown in Fig. 3.9. In this figure we denoted
by νk the component of the error signal having mean zero. We can thus write
ek = S(θk − θ̂k ) + νk .
In order to compute the mean square estimation error, we have to first com-
pute the power spectral density of the random process νk . Let us consider
an M-ary PSK transmission with |ak | = 1. It is
νk
θk φk
Aφ
θ̂k ek
αz −1
1−z −1
where Φ(z) and V (z) are the Z-transforms of φk and νk , respectively, and
αz −1
F (z) = .
1 − z −1
Thus, it is
α
H(z) = − . (3.28)
z − (1 − Aα)
The mean square estimation error can thus be computed 2 taking into account
that the power spectral density of φk is N0 H(ej2πf T ) . Hence
1
ˆ
2T
var[θ̂k ] = E{[θk − θ̂k ] } = 2
E{φ2k } = N0 T H(e2πf T )2 df
1
− 2T
1
ˆ
2
= N0 H(e2πf T )2 d(f T ) = 2N0 Beq T |H(1)|2
− 21
3.4 – Estimation techniques 99
Considering that |H(1)|2 = A12 , we can easily compute the integral appearing
in (3.29) using the Parseval equality as
1 1
X
H(e2πf T )2 d(f T ) = 1 H(e2πf T )2 d(f T ) = 1
ˆ ˆ
2 2
|hn |2
0 2 − 12 2 n
3.4.3 DD estimator
After an initial training period, sufficient to obtain an accurate enough es-
timate that can allow the detector to produce reliable decisions, in order to
track the parameter variations we can employ the same CL DA estimator
where known symbols are substituted by the decisions provided by the de-
tector. The resulting estimator is called DD. The ML-DD estimator can be
mathematically described as
âk−D
r(t) rk
∗
p (−t) VA
t = kT
−θ̂k z −d
e
Error ĉˆk−d
LUT comput.
ek = ℑ{rk−dĉˆ∗k−d e−θ̂k }
−1
αz
1−z −1
(2)
e−θ̂k
LUT
Error
comput. (2)
LUT c̆k
αz −1
1−z −1
Error
comput. (1)
c̆k
αz −1
1−z −1
h i
θ̂k+1 (σk+1 ) = θ̂k (σk ) + αℑ e−θ̂k (σk ) rk c̆∗k (σk ) (3.32)
where c̆k (σk ) is the decision taken on the survivor of state σk , having assumed
that the survivor of the state σk+1 at time k + 1 has been obtained by ex-
tending the survivor of the state σk at time k. The estimate θ̂k+1 (σk+1 ) will
be employed to compensate for the phase rotation introduced by the channel
into the received samples used to compute the metrics of branches at the
output of state σk+1 . The receiver block diagram is shown in Fig. 3.12 when
the VA operates over a trellis with only two states. ♦
Example 3.13 Let us consider again the CL-DD phase estimator, this time
for an uncoded transmission without ISI. A symbol-by-symbol detector can
be employed and we thus have no problems related to the decision delay. The
102 Detection in the presence of unknown parameters
shown in Fig. 3.14. This figure shows that the S-curve is periodic with a
3.4 – Estimation techniques 103
S(φ)
φ
−π −π/2 0 π/2 π
X X X 1 L
f (r0L−1 |θ) = ··· f (r0L−1 |a0L−1 , θ) .
a0 ∈A a1 ∈A a ∈A
M
L−1
Example 3.14 Let us consider the OL NDA phase estimator for M-PSK
modulation. It is based on the pdf
104 Detection in the presence of unknown parameters
X X X 1 L
f (r0L−1 |θ) = ··· f (r0L−1 |a0L−1 , θ)
a0 ∈A a1 ∈A aL−1 ∈A
M
X X X 1 L L−1
Y
= ··· f (rℓ |aℓ , θ)
a ∈A a ∈A a ∈A
M ℓ=0
0 1 L−1
L−1
Y X 1
= f (rℓ |aℓ , θ)
ℓ=0 a ∈A
M
ℓ
L−1
Y
= f (rℓ |θ) (3.35)
ℓ=0
where
X 1 M
X −1
1 m
f (rℓ |θ) = f (rℓ |aℓ , θ) = f (rℓ |aℓ = e2π M , θ)
aℓ
M m=0
M
M −1
1 X 1 1 m
2π M
θ 2
= exp − rℓ − e e
2πN0 m=0 M 2N0
n o
exp − 2N1 0 |rℓ |2 + 1 M X−1
1 1 −2π m −θ
= exp ℜ rℓ e M e .
2πN0 m=0
M N 0
having defined
M
X −1
1 1 −2π m −θ
T (rℓ , θ) = ln exp ℜ rℓ e M e
m=0
M N0
M
X −1
1 1 −2π m −θ ∗ 2π m
θ
= ln exp rℓ e M e + rℓ e M e .
m=0
M 2N0
3.4 – Estimation techniques 105
By using the Taylor series for the exponential and Newton’s binomial expan-
sion, we obtain
M
X −1
1 1 −2π m −θ ∗ 2π m
θ
T (rℓ , θ) = ln exp rℓ e M e + rℓ e M e
m=0
M 2N 0
(M −1 ∞ p )
X 1 X 1 1 −2π m −θ m p
∗ 2π θ
= ln rℓ e Me + rℓ e M e
m=0
M p=0
p! 2N 0
( M −1 ∞ p X p
X 1 X 1 1 p m
= ln rℓq e−2π M q e−θq
M p=0 p! 2N0 q
m=0 q=0
)
m
· (rℓ∗ )p−q e2π M (p−q) eθ(p−q)
(∞ )
X 1 1 p Xp
p
= ln rℓq (rℓ∗ )p−q eθ(p−2q) A(p − 2q)
p! 2N0 q
p=0 q=0
having defined
M −1
1 X 2π m(p−2q)
A(p − 2q) = e M
M m=0
1 1 − e2π(p−2q)
= .
M 1 − e 2π
M
(p−2q)
will keep only the term corresponding to the lower value of p, i.e., that
corresponding to the pair (p, q) = (M, 0).
• p−2q = −M. This value can be obtained by choosing (p, q) = (M, M),
(p, q) = (M + 2, M + 1), (p, q) = (M + 4, M + 2), and so on. In this
case too, we will keep only the term corresponding to the lower value
of p, i.e., that corresponding to the pair (p, q) = (M, M).
For the same reason, we will not consider values of p − 2q corresponding to
other integer multiples of M that would give higher values of p. The resulting
approximated expression of T (rℓ , θ) is thus
( M M )
1 1 1 1
T (rℓ , θ) ≃ ln 1 + (rℓ∗ )M eM θ + rℓM e−M θ
M! 2N0 M! 2N0
( " M #)
1 1
= ln 1 + 2ℜ rℓM e−M θ
M! 2N0
This estimator is called Mth power estimator. By raising the received samples
to the power M, we are removing the modulation. It is however clear that,
π π
with this estimator, we can only estimate phase values in the range [− M , M ).
Differential encoding is a viable solution for this problem too.
By using computer simulations, it can be demonstrated that this esti-
mator reaches the CRB for medium/high SNR values (although it has been
3.4 – Estimation techniques 107
derived under the assumption of low SNR). Its performance for low SNR
values can be improved by resorting to a generalization proposed by A. J.
Viterbi and A. M. Viterbi [28]. This generalization can be described in this
way. The Mth power estimator computes, for each received sample,
having defined
X X X
fˆ(r0L−1 |θ) = P̂ (c0 ) P̂ (c1 ) · · · P̂ (cL−1 )f (r0L−1 |c0L−1 , θ)
c0 ∈C c1 ∈C cL−1 ∈C
4
The validity of this assumption is usually guaranteed by the presence of an interleaver
placed between the detector and the decoder. We will see that this interleaver is required
to allow iterative detection and decoding.
108 Detection in the presence of unknown parameters
that represents the estimate of the pdf of the received samples given the
parameter value only, computed from the soft decisions.
We may notice that, before the first iteration, when the soft decisions
are not available, or when they are not yet reliable, the SDD estimator is an
NDA estimator. When the iterations go on and soft decisions become more
reliable, the SDD estimator becomes a DA estimator.
Example 3.15 For the case of phase estimate, through manipulations sim-
ilar to those giving (3.35), we obtain
L−1
Y
fˆ(r0L−1 |θ) = fˆ(rℓ |θ)
ℓ=0
where X
fˆ(rℓ |θ) = P̂ (cℓ )f (rℓ |cℓ , θ) (3.36)
cℓ ∈C
with
1 1
θ 2
f (rℓ |cℓ , θ) = exp − rℓ − cℓ e .
2πN0 2N0
The SDD phase estimate becomes
L−1
X
θ̂SDD = argmax fˆ(r0L−1 |θ) = argmax ln fˆ(r0L−1 |θ) = argmax ln fˆ(rℓ |θ)
θ θ θ
ℓ=0
L−1
( )
X X 1 2
= argmax ln P̂ (cℓ ) exp − rℓ − cℓ eθ .
θ
ℓ=0 cℓ ∈C
2N0
having defined
X
γℓ = E{cℓ } = P̂ (cℓ )cℓ
cℓ ∈C
2
X
βℓ = E{|cℓ | } = P̂ (cℓ ) |cℓ |2
cℓ ∈C
−θ̂ℓ ∗ 1 −θ̂ℓ ∗
cℓ ∈C P̂ (cℓ )ℑ rℓ e cℓ exp N0 ℜ rℓ e cℓ
= θ̂ℓ + α P n h io
1 −θ̂ℓ c∗
cℓ ∈C P̂ (c ℓ ) exp N0
ℜ r ℓ e ℓ
and, in addition,
L−1
Y
f (θ 0L−1 ) = f (θ0 ) f (θℓ |θℓ−1 )
ℓ=1
1
with f (θ0 ) = 2π
. We can thus resort to MAP estimation:
(M AP )
θ̂ SDD = argmax fˆ(r0L−1 |θ 0L−1 )f (θ0L−1 )
θ 0L−1
n h i o
= argmax ln fˆ(r0L−1 |θ 0L−1 ) + ln f (θ 0L−1 )
θ L−1
0
"L−1 L−1
#
X X
= argmax ln fˆ(rℓ |θℓ ) + ln f (θℓ |θℓ−1 ) + ln f (θ0 ) .
θ L−1
0 ℓ=0 ℓ=1
Let us define
"L−1 L−1
#
X X
Γ(θ 0L−1 ) = γ(rℓ , θℓ ) + ln f (θℓ |θℓ−1 )
ℓ=0 ℓ=1
where7
γ(rℓ , θℓ ) = ln fˆ(rℓ |θℓ ) .
The MAP estimator can be obtained through the equation
∇Γ(θ 0L−1 )θ=θ̂ = 0 .
It is
∂Γ ∂γ(rℓ , θℓ ) 1 1 1 1
= − 2 θ̂ℓ + 2 θ̂ℓ−1 − 2 θ̂ℓ + 2 θ̂ℓ+1 = 0 (3.38)
∂θℓ θ=θ̂ ∂θℓ θℓ =θ̂ℓ σ∆ σ∆ σ∆ σ∆
6
Notice that, since the channel phase is defined modulo 2π, pdf f (θk+1 |θk ) can be
approximated as Gaussian only when σ∆ ≪ 2π.
7
As far as γ(rℓ , θℓ ) is concerned, we can use either the exact expression or the approx-
imation (3.37).
3.5 – A general technique to obtain a sufficient statistic 111
2
(f ) (f )σ∆ (f )
θ̂ℓ = θ̂ℓ−1 + δ(rℓ , θ̂ℓ−1 )
2
2
(b) (b) σ (b)
θ̂ℓ = θ̂ℓ+1 + ∆ δ(rℓ , θ̂ℓ+1 )
2
obtaining, at the end, the following approximate MAP estimation
(f ) (b)
θ̂ + θ̂ℓ
θ̂ℓ = ℓ .
2
♦
signal
spectrum
f
1−δ 1 1+δ
2Tc 2Tc 2Tc
B
Gaussian process with constant power spectral density for |f | > Bv and
independent of the original noise affecting r(t). Signal rBL (t), having limited
bandwidth Bv , can be now sampled at the Nyquist frequency fc = 1/Tc =
2Bv and the resulting samples {rBL (kTc )} represent a discrete-time sufficient
statistic.8 Notice that the discrete-time additive noise affecting {rBL (kTc )}
is white.
This solution assumes the use of an ideal front end filter HBL (f ). Actually,
we can obtain another sufficient statistic which is statistically equivalent to
the previous one, by simply employing a filter whose frequency response H(f )
is flat over the signal bandwidth and whose square magnitude |H(f )|2 has
edges with vestigial symmetry around frequency 1/2Tc , as shown in Fig. 3.15.
In this way, in fact, the effect of the filter on the useful signal is clearly the
same as that of HBL (f ). As far as the noise is concerned, since
2
1
|H(f )| + H(f − ) = const.
2
Tc
the noise after sampling with frequency fc = 1/Tc has certainly a power
spectral density which is constant as in the previous case. A possible filter
H(f ) satisfying the above mentioned conditions is a filter with a root raised
cosine frequency response with roll-off δ, bandwidth 1+δ
2Tc
and such that 1−δ
2Tc
>
B.
8
In general, we will have more than one sample per symbol interval.
3.6 – Exercises 113
3.6 Exercises
Exercise 3.1. Let us consider a coded BPSK signal over an AWGN channel
that also introduces an unknown phase modeled as a discrete random variable
that takes the values {0, π} with the same probability. Compute the metric
of the optimal MAP sequence detector. Is the use of differential encoding
necessary?
Exercise 3.2. Let us consider a coded PSK signal over an AWGN channel
that also introduces a positive attenuation, unknown at the receiver and
modeled as a random variable with exponential distribution. Describe in
detail the structure of the optimal MAP sequence detector. In particular,
state if an UMP test exists.
rk = fk ck + wk
where symbols ck ∈ {±1, ±j}, possibly coded, thus belong to a QPSK con-
stellation, the fading process fk has autocorrelation Rf (m) = α|m| , and wk
are the complex noise samples, with variance σ 2 per component. Compute
the branch metrics of a receiver based on linear prediction with C = 2, by
specifying the expression of prediction coefficients and the prediction error.
xk = ak ejθ + wk .
xk = ak ej[2πνkT +θ] + wk
Exercise 3.5. Let us consider a BPSK transmission and the M-th power
estimator. Design a closed-loop implementation for this estimator. In par-
ticular:
115
116 Codes in the signal space
g(t)
0
T 2T t
q(t)
1
2
T 2T t
r ˆ t
2Es
y(t) = cos 2πf0 t + 2πh x(τ ) dτ + θ .
T −∞
Hence, it results
r " ˆ t X #
2Es
s(t; a) = cos 2πf0 t + 2πh ai g(τ − iT ) dτ + θ . (4.2)
T −∞ i
Let us now consider the slice of signal in the generic interval [kT, (k+1)T ].
Remembering the general model (1.6) for modulated signals, we have
r ( " #)
2Es X
s̃(t; a) = s̃(t − kT ; ak , σk ) = exp 2πh ai q(t − iT ) + θ .
T i
(4.3)
The limits of the summation in (4.3) have not been specified because they
can extend from −∞ to ∞, if we imagine a signal of infinite duration, or from
0 to K − 1, if we consider the transmission of a finite number of symbols. In
any case, in (4.3) the summation can be stopped to the k-th term, because
all following terms do not contribute to that slice. In fact, in the interval
[kT, (k +1)T ] we have that q[t−(k +1)T ] = 0, q[t−(k +2)T ] = 0, etc., as also
clear from Fig. 4.3. If we remember that the frequency pulse has support in
the interval [0, LT ], and hence that the phase-smoothing response is constant
for t > LT , the signal phase can be expressed by means of the sum of three
contributions (besides the initial phase θ). By referring, for example, to a
transmission of finite duration, the phase of the complex envelope can me
expressed as
P
{ak } i ai g(t − iT ) FM MOD s(t; a)
g(t)
h
L=2
ak q[t − kT ]
kT (k + 1)T
k−L
X 1
φ(t; ak , σk ) = 2πh ai
i=0
2
k−1
X
+2πh ai q(t − iT )
i=k−L+1
+2πhak q(t − kT ) + θ kT ≤ t < (k + 1)T .
The first term depends on “old” symbols, whose pulse q(t) has already reached
the final value 1/2. This term is called phase state
k−L
X
ϕk = πh ai mod 2π (4.4)
i=0
symbols defines the so-called correlative state and, together with the phase
state, contributes to the definition of the modulator state at time instant kT ,
i.e.,
σk = (ak−1 , ak−2, . . . , ak−L+1 ; ϕk )
| {z } |{z}
correlative state phase state
Given the present symbol ak and the state σk , we can thus compute the
signal phase φ(t; ak , σk ), and then the signal itself as
r
2Es
s̃(t − kT ; ak , σk ) = exp {φ(t; ak , σk )} kT ≤ t < (k + 1)T .
T
At time instant t = (k + 1)T , that is at the end of the present signaling
interval, the successive modulator state becomes
where the new correlative state is obtained by a simple shift, and the new
phase state is given by
It takes into account the contribution of that q(t) which has reached the final
value 1/2 at the end of the interval [kT, (k + 1)T ].
To evaluate the number of states of the modulator, let us first observe
that the number of correlative states is M L−1 . The number of phase states
can, in principle, be very large, or even infinite if we start the sum in (4.4)
from −∞. However, if we consider that the phase state must necessarily
belong to [0, 2π) and we remember that symbols belong to the alphabet
{±1, ±3, . . . , ±(M − 1)}, it is possible to verify that the number of phase
states is finite if the modulation index is rational
n
h=
p
where n e p are relatively prime integers. If n is even, the possible values of
the phase state are
n 2n (p − 1)n
ϕk ∈ 0, π , π , . . . , π n even
p p p
and the number of different phase states is p. If n is odd, the possible values
of the phase state are
n 2n (2p − 1)n
ϕk ∈ 0, π , π , . . . , π n odd
p p p
120 Codes in the signal space
and the number of different phase states is 2p. However, the phase state can
only take p of them at even instants, and the remaining p at odd instants.
Hence, the overall number of states of a CPM modulator results in any case
to be pM L−1 .
Remark 4.1. A CPM signal can then be represented by using the model
for signals with memory described in Chapter 1. In other words, a CPM
signal can be expressed as the cascade of an encoder (actually a system with
memory, described by the state transition table) and a memoryless modu-
lator (described by the waveform table), concentrating in the encoder the
memory source [34]. Hence, it is not necessary to investigate MAP sequence
detection for CPMs, since it can be implemented as described in Chapter 2.
The optimal receiver employs a bank of M L−1 matched filters and
operates on a trellis with pM L−1 states, as discussed in Excercise 4.1.♦
ak = 2āk − (M − 1) (4.5)
ϕk = −πh(M − 1)k + 2πhϕ̄k (4.6)
we have that āk ∈ {0, 1, . . . , M − 1}, ϕ̄k ∈ {0, 1, . . . , p − 1}, and moreover the
integer ϕ̄k can be recursively updated using the expression
With this new notation, it is possible to express the phase φ(t; ak , σk ) for
kT ≤ t < (k + 1)T as
+ 2πh[2āk − (M − 1)]q(t − kT ) + θ .
simplifying in this way the notation because the new phase state ϕ̄k takes on
values belonging to the same alphabet, independently of the instant k (even
or odd). ♦
CPMs can be classified in two categories:
• full response, when L = 1;
4.1 – Continuous phase modulations 121
k−1
X
φ(t; ak , σk ) = πh ai + 2πhak q(t − kT ) + θ
i=−∞
k−1
X t − kT
= πh ai + 2πhak + θ kT < t < (k + 1)T .
i=−∞
2T
We can then conclude that this is a frequency shift keying (FSK) modulation.
Since the phase is continuous, these full response CPMs with linear phase-
smoothing response are also called continuous phase FSK (CPFSK). In the
special case of a binary modulation with h = 1/2, since information symbols
belong to the alphabet {±1}, the possible frequency values are f0 ± 1/4T .
The difference between these frequency values is 1/2T , that is the minimum
to ensure the orthogonality of signals on a signaling interval (for coherent
demodulation). For this reason, a binary CPFSK with h = 1/2 is also called
g(t) q(t)
1 1
2T 2
0 T t 0 T t
Figure 4.4: Frequency pulse and phase-smoothing response for CPFSK mod-
ulations.
122 Codes in the signal space
2π
3π
2
π
2
0 T 2T 3T 4T 5T t
− π2
−π
− 3π
2
−2π
minimum shift keying (MSK). In this case, the possible phase states are
π 3π
ϕk ∈ 0, , π, .
2 2
Notice that in full response CPMs the correlative state disappears and the
modulator state coincides with the phase state.
A way to represent the phase evolution in a CPM is through the so called
phase trees. Let us assume that the initial phase θ = 0, that the phase state
is ϕ0 = 0, and consider, for the sake of simplicity, the MSK modulation. In
the interval 0 < t < T , the phase has expression
t
φ(t; a0 , ϕ0 ) = πa0 a0 = ±1 .
2T
In the next interval, T < t < 2T , it has expression
t−T
φ(t; a1 , ϕ1 ) = ϕ1 + πa1 a1 = ±1
2T
where ϕ1 = a0 π/2. If we proceed in this way, we can build the phase tree
of an MSK, as reported in Fig. 4.5. Notice that at time instants kT , with
4.1 – Continuous phase modulations 123
k even, the possible phase state values are 0 and π, while when k is odd
the possible phase states values are π/2 and 3π/2. Moreover, we can always
write
ϕk = (ϕk−1 + πhak−L ) mod 2π .
We can observe that, because the phase-smoothing response is linear, in a
signaling interval the slope of the phase is the same as the frequency of the
signal, which is constant during that interval.
Let us now consider an example of a phase tree for a partial response
CPM. Let us now assume M = 2, h = 1/2, L = 2, and a linear phase-
smoothing response, as in Fig. 4.6. Phase states {ϕk } belong again to the
alphabet {0, π2 , π, 3π
2
}, while modulator states are defined as
σk = (ak−1 , ϕk ) .
In this case, the phase during the interval kT < t < (k + 1)T has expression
k−2
π X t − (k − 1)T t − kT
φ(t; ak , σk ) = ai + πak−1 + πak +θ .
2 i=−∞ 4T 4T
π t − (k − 1)T π t − kT
φ(t; ak , σk ) = ϕk + ak−1 + ak +θ
2 2T 2 2T
where
π
ϕk = ϕk−1 +ak−2 mod 2π .
2
We can then build the tree diagram in successive steps.
q(t)
1
2
2T t
π
ϕk − 2
ak−1 = ak = 1
ak−1 = 1 , ak = −1
ϕk
π
ϕk − 4
π
ϕk − 2
ak−1 = −1 , ak = 1
ak−1 = ak = −1
ϕk − π
(k − 1)T kT (k + 1)T
2π
3π
2
π
+1
π -1
2
+1
0 +1
-1 T 2T 3T 4T 5T t
-1
− π2
−π
− 3π
2
−2π
Remark 4.3. A CPM signal can be decomposed as the sum of a finite num-
ber of linearly modulated signals. This decomposition, originally proposed
by Laurent [35] for binary CPMs, has been then extended to M-ary CPMs
by Mengali and Morelli [36]. According to this decomposition, it is possible
to express the complex envelope (4.1) of a CPM signal as
F
X −1 X
s̃(t; a) = αm,k pm (t − kT ) (4.8)
m=0 k
1
1 − cos( 2πt
g(t) = 2LT LT
)
1
LT
L LT t
2
T
that compose the CPM signal, {pm (t)} are the corresponding shaping pulses
and {αm,k } are the corresponding symbols (also known as pseudo-symbols).
The expressions of pulses {pm (t)} can be found in closed form from the
expression of the phase response q(t) of the CPM and from the value of the
modulation index. Similarly, symbols {αm,k } can be expressed as a function
of the information symbols {ak } and the modulation index. For instance,
symbol α0,k can be expressed as
( k
)
X
α0,k = exp πh ai = α0,k−1 exp {πhak } .
i=0
Section 2.3, that the branch metric of the MAP sequence detection receiver
can be expressed as
"M −2 #
X
∗
λ(ak , α0,k−1 ) = ℜ xm,k αm,k + N0 ln P (ak ) (4.10)
m=0
where we defined
xm,k = r(t) ⊗ pm (−t)|t=kT
i.e., xm,k is the output of a filter matched to the m-th component, and we
have expressed the branch metric as a function of ak and α0,k−1 only, having
taken advantage of the previously mentioned property of principal pseudo-
symbols. A simplified receiver is then made of a bank of M − 1 matched
filters and a Viterbi detector that operates on a trellis with p states, with
branch metrics (4.10), thus significantly simplifying the complexity. ♦
depend on the k input bits at the same instant but also on previous infor-
mation bits. For a given receiver complexity, as an example a given number
of encoder states for a convolutional encoder, we can look for good codes,
i.e., for codes having the largest possible minimum Hamming distance, that
characterizes the dominant errors and thus the asymptotic performance.
Let us now assume that the source generates the information bits at a
given rate. If we encode the information by using an encoder with rate k/n,
due to the redundancy insertion we need to increase the signaling frequency
of our transmission system of a factor n/k. This means that the bandwidth of
the transmitted signal will be expanded of a factor n/k, i.e., of the inverse of
the code rate. In other words, the redundancy is introduced in the frequency
domain. As a reward, the code will provide an energy gain in the sense that,
for a given error probability, we can transmit a lower power. Error-correcting
coding is thus a valuable tool for transmission systems in which the power
is a limited resource, since they allow to spare power at the expense of a
bandwidth and a complexity (at both transmitter and receiver) increase.
Let us now consider a transmission system for which the bandwidth is
a limited resource and cannot be increased. The bandwidth increase can
be avoided by enlarging the signal set, that is, by employing a higher-order
constellation, to compensate for the redundancy introduced by the code.
This possibility is discussed in the following example.
for a same error probability. We could conclude that the coded system could
guarantee a gain only if a very powerful code, with a gain much larger than
the intrinsic penalty of 3.5 dB is employed. Since codes with a gain of at
least 3.5 dB are very complex, it seems that the price to be paid is a very
high decoding complexity. ♦
This point of view implicitly assumes that modulation and coding are sep-
arately designed. We will see that a change of perspective is required. In fact,
coding and modulation must be designed jointly while the receiver, instead of
performing demodulation and decoding in two separate steps, must combine
the two operations into one. In this way, the parameter governing the system
performance will be no more the minimum Hamming distance but, at least
on AWGN channels in the absence of ISI, the minimum Euclidean distance
between the transmitted sequences. The idea behind coded modulations is
thus that coding must have the goal of the maximization of the minimum
Euclidean distance dmin between any possible pair of sequences.
Trellis-coded modulation (TCM) is indeed a technique based on the com-
bination of coding and modulation to increase the efficiency in bandlimited
environments. It was originally described in the seminal work of G. Unger-
boeck and I. Csajka [37] and clearly formalized by G. Ungerboeck in 1982 [38]
with reference to the AWGN channel. Before going into the details of this
technique, we have to introduce the concept of set partitioning. It is a way
to partition the original constellation that we employ on the channel, in sub-
sets whose elements have increasing Euclidean distance. The following two
examples show how this partitioning can be implemented.
A
011
100 010
√
d0 = 2 ES sin π8
101 001
110 000
B0 111 B1
√
d1 = 2ES
C0 C2 C1 C3 √
d2 = 2 ES
D0 D4 D2 D6 D1 D5 D3 D7
B0 B1
C0 C2 C1 C3
1011 1001 0111 0101
0011 0001 1111 1101
1010 1000 0110 0100
0010 0000 1110 1100
D0 D1 D4 D5 D2 D3 D6 D7
1011 1001 0111 0101
0011 0001 1111 1101
1000 1010 0100 0110
0000 0010 1100 1110
We will now discuss how the set partitioning can be used in the design
of a coded modulation. In the most general case, the block diagram of the
encoder/modulator is shown in Fig. 4.17. Among the k information bits at
the encoder input, k1 are coded, by using an encoder of rate k1 /n, obtaining
n code bits, whereas the remaining k2 = k − k1 bits are left uncoded. Hence,
the overall encoder has a rate k/(k2 +n). The group of n bits at the output of
the binary encoder is used to select one of 2n possible subsets in a proper way,
as discussed in the following, whereas the group of k2 uncoded bits is used
to select one of the 2k2 points of a subset. In fact, the information related
to the subset needs a higher protection, whereas the information related to
the point within the subset is intrinsically more protected, since the points
within a subset are at the largest possible distance, according to the set
partitioning principle. It is also clear that it is not necessary to carry out
the set partitioning as far as we have subsets of one point only. In fact, it is
sufficient to stop at a partitioning level such that we have 2n subsets of 2k2
points each.
TCMs are based on this idea. To be more precise, in TCMs the code rate
is k/(k + 1) (or equivalently, the employed binary code has rate k1 /(k1 + 1))
and the binary encoder in the block diagram of Fig. 4.17 can be a con-
volutional encoder (and thus linear) or a non-linear encoder defined as a
4.2 – Trellis coded modulations 133
k = k1 + k2 k2
.. Select
. 2
k1 + 2 point from
1 subset
k1 + 1 Constellation
n
k1 point
Rate-k1/n ..
.. . Select
.
binary
2 2 subset
encoder 1
1
m = n + k2 Mapper
(a) all signal points should occur with the same frequency;
(c) transitions originating from or merging into the same state are assigned
to subsets that belong to a same subset of lower order.
1
It is possible to show that block codes can also be described by using a trellis, although
it is time-varying.
134 Codes in the signal space
(0, 0) C0 0
C2 2 4
6
(1, 0) C2 C 2
0
0 6
4
C1 1
(0, 1) C3 7 3 5
C3 3 7
(1, 1) C1 1
5
(a) (b)
Figure 4.18: (a) Trellis of the binary encoder. (b) Trellis of the overall
encoder.
Rule (a) guarantees that the trellis code has a regular structure. Rule (b)
represents a formalization of the concept previously expressed that the n
bits at the output of the binary encoder must select the subset whereas the
uncoded bits select the point within the subset. Finally, rule (c) is important
to improve the overall performance of the code by a proper assignment of
the subsets to the coded bits. In order to understand this, let us consider
the following design.
Example 4.4 Let us assume that we want to design a 4-state TCM encoder
to be employed with the 8-PSK constellation. Since we need 3 bits to select
the points of a constellation of cardinality 8, it results k2 + n = 3. Since, as
said, in TCM the code rate is k/(k + 1), we will have k = 2 and thus the
information bits at the input of the scheme in Fig. 4.17 are 2. Let us denote
(1) (2)
these bits as am and am . With reference to Fig. 4.17, let us suppose that
we choose k1 = k2 = 1 (a different choice is considered in the Exercise 4.2).
Thus, it is n = 2. The binary encoder must select one of 2n = 4 subsets, i.e.,
one of the subsets C0 , C1 , C2 , and C3 of second level, whereas the uncoded
bit will select the point within these subsets.
(1) (1)
The state µm of the binary encoder is defined as µm = (am−1 , am−2 ).
the corresponding trellis is shown in Fig. 4.18(a). The trellis of the overall
encoder, shown in Fig. 4.18(b), is built by taking into account the further
(2)
input bit am that will originate parallel transitions. In fact, given the present
(1)
state, the next state will be determined by am only, independently of the
(2)
value of am . Rule (b) states that parallel transitions are associated with
points of the same subset of second level, and thus, at the end, the binary
encoder selects the subset of second level.
4.2 – Trellis coded modulations 135
0 0 0
4 4 4
2
6 2
Rule (c) is important to maximize the Euclidean distance for error events
different from those related to parallel transitions. Let us consider Fig. 4.18(a).
In this figure, it is shown a possible association of subsets of second level to
the trellis branches that satisfies the three Ungerboeck’s rules. The corre-
sponding association of the constellation points to the trellis branches of the
overall encoder is shown in Fig. 4.18(b). Let us now find the minimum dis-
tance between pairs of code sequences that we can obtain with this coded
scheme. We will compare it with that related to the uncoded QPSK. The
comparison is fair because both schemes carry 2 bits per signaling interval
and thus have, for a same signaling frequency, the same bandwidth.
The uncoded QPSK scheme √ employs points of subsets B0 or B1 , equiva-
lently, whose distance is d1 = 2ES . Let us now consider the coded scheme
with the 8-PSK constellation. The distance between symbols both belong-
ing to subsets of second √level, thus between symbols associated with parallel
transitions, is d2 = 2 ES . However, we must also consider the distance
between code sequences related to longer error events. Let us consider, for
example, the pair of paths in the trellis related to symbols (D0 , D0 , D0 ) and
(D2 , D1 , D2 ), that correspond, in terms of subsets, to sequences (C0 , C0 , C0 )
and (C2 , C1 , C2 ), as reported in Fig. 4.19. The square distance between the
signals corresponding to those paths is given by (2.39), reported here for
convenience: X
d2 (e, a) = |cm − ĉm |2 .
m
We thus have
and results to be larger than that related to parallel transitions which is 4ES .
It is easy to understand that this result is due to the fulfillment of rule (c).
It is also easy to verify that no other error event can √ have lower distance.
We can thus conclude that the minimum distance is 2 ES . By √ comparing
this distance with that related
√ to an uncoded QPSK, which is 2ES , we can
notice that their ratio is 2. Hence, the coded modulation allows a gain of
3 dB with respect to an uncoded QPSK and this is obtained with a simple
4-state code.
An intuitive interpretation of this result is the following. The code op-
erates at the subset level by introducing a correlation in the code sequence
in such a way points of different subsets cannot be confused unless further
error in adjacent instants occur. The uncoded bits are instead used to select
points within a subset that are intrinsically at a maximum distance and thus
more protected. In this case, we can say that the encoder does a very good
job since the minimum distance is related to parallel transitions and thus
those are the most frequent errors.
The minimum distance is able to predict the asymptotic coding gain of
these schemes. Fig. 4.20 reports the bit error ratio (BER) of the designed 4-
state code obtained through a computer simulation. The performance of the
uncoded system is also shown for comparison along with the performance of
a more complex 8-state TCM still employing the 8-PSK constellation. It can
be noticed that this latter coding scheme provides a further gain, although
very limited (3.6 dB of asymptotic gain with respect to the uncoded QPSK
system). The described 4-state TCM scheme is optimal in the sense that
there not exist other 4-state codes having a larger minimum distance. The 8-
state code has no parallel transitions since, otherwise, the minimum distance
would remain that associated with them and thus no gain would be obtained
with respect to the 4-state code.
Detection and decoding of these TCM schemes must be performed jointly
through a search on a trellis diagram by using the Viterbi algorithm. By
denoting with µ(i) a generic encoder state, the branch metrics to be used to
decode the designed 4-state TCM code using the 8-PSK constellation are
where C(a(i) , µ(i) ) denotes the second level subset associated with the tran-
sition corresponding to the pair (a(i) , µ(i) ). The minimum operation imme-
diately identifies the symbol in C(a(i) , µ(i) ) and thus the most significant bit
(MSB). This symbol will be then selected in case the winner survivor will
contain that branch. Hence, the decision on this bit is related to an uncoded
system. ♦
4.2 – Trellis coded modulations 137
0
10
uncoded
-1 4-state TCM-8PSK
10 8-state TCM-8PSK
-2
10
-3
10
BER
-4
10
10-5
10-6
10-7
0 2 4 6 8 10 12
Eb/N0 [dB]
Figure 4.20: BER performance for an uncoded QPSK and two TCM schemes.
4.3 Exercises
Exercise 4.1 In the interval [kT, (k + 1)T ] the complex envelope of an
M-ary CPM is
r
2Es [θk (t)+ϕk ]
s̃(t) = e
T
where
k
X
θk (t) = 2πh ai q(t − iT )
i=k−L+1
whereas
k−L
X
ϕk = πh ai mod 2π
i=0
is the phase state, h = n/p is the modulation index, and L is the correlation
length. The number of correlative and phase states is M L and p, respectively.
Let us assume that the information symbols are equally likely.
K−1
X
â = argmax zk (ak , σk )
a
k=0
• Denoting by θ(m,l) (t) and ϕ(l) the phase branches and phase states as-
sociated with the trellis branch (a(m) , σ (l) ), m = 1, . . . , M and l =
1, 2, . . . , pM L−1 , show that
ˆ (k+1)T
(m) (l)
zk (a ,σ )= cos ϕ(l) rc (t) cos θ(m,l) (t) + rs (t) sin θ(m,l) (t)
kT
+ sin ϕ(l) rs (t) cos θ(m,l) (t) − rc (t) sin θ(m,l) (t) dt .
Exercise 4.2 We would like to design a 4-state TCM encoder with rate
(1) (2)
2/3 to be employed with the 8-PSK constellation. Denoting by ak and ak
the bits at the encoder input,
(1) (1)
A. design a code based on a trellis whose state is defined as µk = (ak−1 , ak−2 );
B. show that the trellis has parallel transitions and compute the minimum
Euclidean distance of this code;
C. verify that the encoder can be implemented as shown in Fig. 4.21, i.e.,
by employing a convolutional encoder and a mapper that associates the
(3) (2) (1)
triplets of coded bits (ck , ck , ck ) with points of the 8-PSK constel-
lation, as shown in the figure (mapping by set partitioning);
(1) (2)
D. design a code based on a trellis whose state is defined as µk = (ak−1 , ak−1 );
E. show that the encoder trellis has no parallel transitions and compute
the minimum Euclidean distance of this code.
Exercise 4.3 We would like to design a 4-state TCM encoder with rate 3/4
(1) (2)
to be employed with the 16-QAM constellation. Denoting by ak , ak , and
(3)
ak the bits at the encoder input, design the code and compute its minimum
Euclidean distance.
Chapter 5
141
142 MAP symbol detection strategy
ak ck
ENC fk yk
wk
Figure 5.1: Discrete-time equivalent model with white noise of the channel.
of the channel, reported in Fig. 5.1.1 In fact, since samples yk , that can be
expressed as (see Chapter 2)
L
X
yk = fl ck−l + wk , k = 0, 1, . . . , K − 1 (5.1)
l=0
Thus, we have to compute the probabilities P (ak |y) or, equivalently, the
probability density functions (pdfs) f (y|ak ).
The brute-force computation of P (ak |y) through the marginalization
X X X X
P (ak |y) = ··· ··· P (a|y)
a0 ∈A ak−1 ∈A ak+1 ∈A aK−1 ∈A
the system state. The pair (ak , σk ) will allow us to univocally identify symbols
(ck , ck−1, . . . , ck−L ). We can thus compute the probability density function
2
1 1 XL
γk (ak , σk ) = f (yk |ak , σk ) = exp − yk − fl ck−l
2πN0 2N0
l=0
Code symbols (ck , ck−1, . . . , ck−L ) can be also univocally identified by the
pair represented by the state σk+1 and the information symbol ak−J at some
previous instant k − J, where J depends on the employed encoder, if any. As
an example, in the absence of coding we have ak−J = ak−L , being
Hence, the pair (ak , σk ) is in a one-to-one correspondence with the pair rep-
resented by σk+1 and a past symbol. We will denote this pair by (a− +
k−J , σk+1 ).
Similarly, given the pair (ak−J , σk+1 ), the corresponding pair will be denoted
as (a+ −
k , σk ).
We are now able to compute the probability density function f (y0K−1|ak )
as
we can write
X
+
f (y|ak ) = αk (σk )γk (ak , σk )βk+1 (σk+1 ). (5.5)
σk
144 MAP symbol detection strategy
Remark 5.1. Quantities αk (σk ), βk+1 (σk+1 ), and γk (ak , σk ) can be arbi-
trarily normalized. In fact, the multiplication of them by some arbitrary
constants, independent of the information symbols, does not modify the fi-
nal decisions. ♦
Quantities αk (σk ) and βk+1 (σk+1 ) can be computed, for each system state,
through a forward and a backward recursion, respectively. As far as the
computation of αk (σk ) is concerned, we have
P {σk− |a+ −
k } = P {σk }
f (y0k−1|yk , a+ − k−1 −
k , σk ) = f (y0 |σk ) .
The initial state for each recursion can be computed starting from the initial
and final states of the encoder, if known at the receiver. As an example, let
5.2 – BCJR algorithm 145
where ex2 −x1 is certainly lower than 1. When x2 > x1 , we can reverse the
role of x1 and x2 . Thus, we can write
ln (ex1 + ex2 ) = max(x1 , x2 ) + ln 1 + e−|x2 −x1 | .
In the following, we will define
x1 ⊎ x2 = max(x1 , x2 ) + ln 1 + e−|x2 −x1 | . (5.9)
On the other hand, when ln (ex1 + ex2 + ex3 ) has to be computed, we can
operate recursively by first computing
x = ln (ex1 + ex2 ) = x1 ⊎ x2 .
Since
ex = ex1 + ex2
it is
3
]
ln (ex1 + ex2 + ex3 ) = ln (ex + ex3 ) = x ⊎ x3 = x1 ⊎ x2 ⊎ x3 = xi .
i=1
Let us come back to the forward recursion, reported here for convenience
X
αk+1 (σk+1 ) = αk (σk− )γk (a+ − +
k , σk )P (ak ) .
ak−J
By defining
α̊k (σk ) = ln αk (σk )
γ̊k (ak , σk ) = ln γk (ak , σk )
we can write
X
α̊k+1 (σk+1 ) = ln αk (σk− )γk (a+ − +
k , σk )P (ak )
ak−J
X − + − +
= ln eα̊k (σk )+γ̊k (ak ,σk )+ln P (ak )
ak−J
]
= α̊k (σk− ) + γ̊k (a+ − +
k , σk ) + ln P (ak ) . (5.10)
ak−J
Remark 5.3. Both recursions and the final completion have exactly the
same complexity. In fact, in the two recursions we have to compute S quan-
tities through a sum (or through the new operator ⊎) involving M terms.
In the completion, we instead have to compute M quantities through a sum
involving S terms. If we approximate
x1 ⊎ x2 ≃ max(x1 , x2 ) (5.11)
the forward recursion in the logarithmic domain has exactly the same com-
plexity as the Viterbi algorithm. The BCJR algorithm has a complexity
which is thus roughly 3 times that of the Viterbi algorithm. ♦
Example 5.1. We said that MAP sequence and symbol detection algo-
rithms have a quite similar performance. This can be observed in Fig. 5.2
where the BER performance of both Viterbi and BCJR algorithms is shown
with reference to a rate-1/2 convolutional code with 16 states. ♦
The algorithm can be also derived for the Ungerboeck model. However,
the probabilistic derivation cannot be used and it is necessary to resort to the
framework based on factor graphs and the sum-product algorithm described
in Chapter 8 [45].
0
10
uncoded
uncod. (theory)
−1 Coded (BCJR)
10 Coded (Viterbi)
−2
10
BER
−3
10
−4
10
−5
10
−6
10
0 1 2 3 4 5 6 7 8 9 10
Eb/N0 [dB]
algorithm providing soft decisions and based on the MAP sequence detection
criterion. It is called soft-output Viterbi algorithm (SOVA) [48, 49].
With reference to Fig. 5.3, that represents the trellis diagram of a decoder
for a 4-state binary code, the idea behind SOVA can be explained as follows.
(m)
Let Λk be the partial metric of a generic path m at time k. Since Λk derives
from a logarithmic likelihood function, it will be (assuming that Λ is a metric
that has to be minimized)
(m)
P {path m is correct} ∝ e−Λk ; m = 1, 2 .
(1) (2)
When Λk < Λk , the Viterbi algorithm will select path 1. Thus, the prob-
ability that the Viterbi algorithm chose the wrong path at state σk is
(2)
e−Λk 1
Pσk = (1) (2)
= (2) (1)
.
−Λk −Λk
e +e 1 + e[Λk −Λk ]
With probability Pσk , that depends on the difference between the metrics of
the two paths, the Viterbi algorithm will make an error on those positions in
which the two paths differ. Based on this principle, i.e., on the observation
of the difference between the metrics of different paths ending into a same
state, the probabilities of the single bits for each state and each time instant
are properly updated.
−1
m=1
σk
m=2
Figure 5.3: Trellis of the Viterbi decoder for a rate-1/2 convolutional code
with 4 states.
150 MAP symbol detection strategy
are the entropy rate and the conditional entropy rate of the channel output.
The information rate will clearly change when changing the distribution of
the input symbols. However, we are not looking here for the input distribu-
tion providing the maximum of the information rate since we are interested in
the case we are constrained to use a specific input distribution (in particular,
that corresponding to independent and uniformly distributed input symbols
belonging to a given constellation). One of the key results of the information
theory is that an error-free communication is, in principle, possible when the
rate R of the employed code does not exceed the information rate i(x; y)
[50, 51]. Notice that, when the employed code is based on the use of a binary
code with rate Rc whose coded bits are mapped onto an M-ary constellation,
the rate R of the overall code (in bits/channel use) is given by the product of
the rate of the binary code and the number of bits per modulation symbol,
i.e., R = Rc log2 M.
In most cases of interest, it is unfortunately not possible to analytically
compute the information rate i(x; y). On the other hand, the complexity of
the direct numerical computation of
1 f (y0N −1|x0N −1 )
iN (x; y) = E log2
N f (y0N −1 )
is exponential in N, and the sequence i1 , i2 , i3 , . . . converges rather slowly
even for very simple cases. However, there exists a simulation-based recursive
algorithm, described in [52, 53, 54, 55], that can provide an accurate numeri-
cal estimate of the information rate and that only requires the availability of
5.4 – Computation of the information rate 151
the optimal MAP symbol detection algorithm for that channel. In [55], the
sequence x0N −1 is allowed to be Markovian and the general case of a channel
with finite memory is considered. Without loss of generality, we will consider
here the case of a channel with ISI described by the Forney model (5.1), i.e,
L
X
yk = fℓ xk−ℓ + wk (5.13)
ℓ=0
where 2
1 1 XL
f (yk |x0N −1 ) = exp − yk − fℓ xk−ℓ
2πN0 2N0
ℓ=0
2
These are the same assumptions we used for the derivations of the BJCR algorithm.
152 MAP symbol detection strategy
and thus
h(y|x) = − E log2 f (yk |x0N −1 )
ˆ
= − f (yk |x0N −1 ) log2 f (yk |x0N −1 ) dyk
ˆ
= − f (z) log2 f (z) dz
where
1 1 2
f (z) = exp − |z| .
2πN0 2N0
Hence
h(y|x) = 1 + log2 (2πN0 ) = log2 (4πN0 ) .
However, when a closed-form expression is not available, we can again
resort to a numerical simulation. In fact, if we define
As discussed at the end of Section 5.2, a simple way for avoiding problems
of numerical stability consists of properly scaling the metrics after each step
of the recursions, and further improvements are obtained by implementing
the algorithm in the logarithmic domain [47]. In this case, the additional
constraint of preserving, at each time epoch k, the ratio between the terms
µk (σk ) and αk (σk ) must be accounted for. An alternative way is the modifi-
cation of the two recursions as follows:
X
αk+1 (σk+1 ) =λk+1 αk (σk− )γk (x+ − +
k , σk )P (xk )
ak−L
X
µk+1(σk+1 ) =δk+1 µk (σk )f (yk |σk , xk0 )P (σk+1|σk , xk0 )
σk
where {λk } and {δk } are positive scale factors. If these scale factors are
chosen such that
X
αk+1 (σk+1 ) = 1
σk+1
X
µk+1(σk+1 ) = 1
σk+1
then
N
1 X
log2 λk = h(y)
N k=0
N
1 X
log2 δk = h(y|x) .
N
k=0
if the maximum and the minimum outcomes differ for less than 0.05 bits per
channel use) output their average as a final estimation, otherwise, increase
the value of NG and repeat the procedure. Although the minimum value of N
providing a given target accuracy strongly depends on the system parameters,
it is unusual that the required value of N is larger than 107 symbols.
where the subscript P means that the average of x is with respect to the
actual statistics P (x), and
P (x)q(y|x)
QP (x|y) =
qP (y)
is the stochastic inverse of channel q(y|x). For what we said about the way
i′ (x; y) is computed, it is
Xˆ q(y|x)
′
i (x; y) = P (x)f (y|x) log2 dy
x
qP (y)
P (bk = 0|y)
ℓk = ln . (5.16)
P (bk = 1|y)
Otherwise, an approximation of it will be obtained. The pragmatic capacity
is defined as I(bk ; ℓk ), i.e., as the mutual information of the channel having bk
as input and ℓk as output. The pragmatic capacity can be used to compute
an achievable lower bound on the information rate i(x; y). In fact, it is
1
I(bk ; ℓk ) ≤ i(x; y) .
log2 M
The proof is quite simple. For sure, it is I(bk ; ℓk ) ≤ I(bk ; y) for the data
processing inequality (C.12). The equality holds when the employed SISO
detector is the optimal MAP symbol detector since in this case the ℓk is a
sufficient statistic for the detection of bit bk . Thus, for a sufficiently large
value of N
1 1
i(x; y) = I(x; y) = I(b; y)
N N
N log2 M −1
1 X
= I(bk ; y|bk−1, bk−2 , . . . , b0 )
N k=0
≥ log2 M I(bk ; y)
≥ log2 M I(bk ; ℓk )
having exploited the chain rule for the mutual information and (C.10), since
bits {bk } are independent.
As far as the computation of the pragmatic capacity is concerned, from
(5.16) we have that4
eℓk 1
P (bk = 0|y) = ℓ
=
1+e k 1 + e−ℓk
1
P (bk = 1|y) = .
1 + eℓk
4
We are assuming now that the optimal MAP symbol detector is available and so ℓk is
the true LLR.
158 MAP symbol detection strategy
By defining
f (x) = log2 (1 + e−x )
it is
and thus
1
f (ℓk (1 − 2bk )) = − log2 P (bk |y) = log2 .
P (bk |y)
Hence,
5.7 Exercises
Exercise 5.1 Let us consider the implementation of the BCJR algorithm
to perform detection in the case of differentially encoded PSK signals trans-
mitted over a channel without ISI and with unknown initial code symbol.
Demonstrate that the MAP symbol detection strategy becomes a symbol-
by-symbol detection strategy with decision rule given by
X
1 ∗ ∗
ân = argmax exp 2
ℜ[cn−1 (xn an + xn−1 )]
an
c
σ
n−1
by demonstrating that
x1 ⊎ x2 = max(x1 , x2 ) .
Demonstrate that, in this case, the forward recursion becomes exactly the
Viterbi algorithm.
Chapter 6
S = Sc M L
161
162 Reduced-complexity and adaptive receivers
Whereas in the full-complexity trellis these symbols are associated with the
considered branch, in the reduced trellis, a branch is associated with symbols
{ck , ck−1 , . . . , ck−Lr } only. We can express these metrics as
2
Lr L
X X
λk (ak , σk ) = yk − fℓ ck−ℓ − fℓ ck−ℓ . (6.3)
ℓ=0 ℓ=Lr +1
Symbols appearing in the first summation are thus associated with the branch
of the reduced trellis we are considering. We have the problem to find out
the symbols of the second summation.
Before considering the possible solutions, let us assume that the symbols
appearing in the second summation are known. In this case, the second
summation can be perfectly evaluated and this will correspond to the ideal
cancellation of some ISI terms affecting the channel. In fact, by denoting
with {ck } the transmitted sequence, we have
L L Lr
X X X
fℓ ck−ℓ + wk − fℓ ck−ℓ = fℓ ck−ℓ + wk .
ℓ=Lr +1
|ℓ=0 {z } ℓ=0
yk
The algorithm performance will thus correspond to that of the channel with
Lr
truncated pulse {fℓ }ℓ=0 .
6.1 – Reduced-state sequence detection 163
âk−D
yk yk′
r(t) WMF VA
- ĉˆk−Lr −1
PL
ℓ=Lr +1 fℓ ĉˆk−ℓ
ĉˆk−Lr −1
−1 −1 −1
z z z
fL fLr +2 fLr +1
k−Lr −1
Clearly, symbols {ci }i=k−L are unknown to the receiver. However, we
can proceed in an approximate way by employing the decision-feedback tech-
nique. It is based on the use, in the second summation, of the decisions
as if they were the true symbols. The resulting scheme is shown in Fig. 6.1,
where we denoted by {ĉˆk } the sequence of detected code symbols used for ISI
cancellation. Notice that these symbols are obtained with a delay lower than
the decision delay D of the VA. For this reason, we called them preliminary
decisions (see Chapter 3).
A problem with this reduced-complexity receiver is that preliminary de-
cisions provided by the VA are less reliable, thus cancellation of ISI symbols
not included in the trellis definition is performed with decisions of poor qual-
ity—the higher the complexity reduction, the lower the delay of preliminary
decisions. A more effective solution consists in evaluating the second sum-
mation in (6.3) by considering the evolution of each survivor, i.e., by using
the following branch metrics
2
Lr L
X X
λk (ak , ωk ) = yk − fℓ ck−ℓ − fℓ c̆k−ℓ (ωk ) . (6.4)
ℓ=0 ℓ=Lr +1
164 Reduced-complexity and adaptive receivers
1 1
2
2 1
2
3
3
3
4
4 4
Figure 6.2: In PSP, the branch metric computation depends, as far as the
residual ISI is concerned, from the history of the survivors that those branches
extend.
Here, c̆k−ℓ (ωk ) denotes the code symbol ck−ℓ associated with the survivor of
state ωk . This time, the second summation takes different values on different
trellis branches, depending on the history of the survivor that those branches
extend, i.e., according to the PSP principle, already described in Chapter 3,
and highlighted in Fig. 6.2. It clearly provides a better performance than
the previous technique. Its intuitive explanation is the following: although
the receiver does not know, at time k, which survivor will result to be the
winner, this latter survivor will be extended for sure with decisions having
the best possible quality. The performance of this technique based on the
PSP principle is shown in Fig. 6.3 in terms of bit error ratio (BER) versus
the signal-to-noise ratio, for the case of a BPSK transmission over a channel
with impulse response characterized by f0 = f1 = f2 = f3 = 0.5 (L = 3).
This class of reduced-complexity receivers, whose state is defined by trun-
cation, is a particular case of a more general class. Before describing it, we
remember that in the full-complexity trellis, the state can be equivalently
defined as
In fact, the knowledge of ck−1 , . . . , ck−L can be extracted from σk whereas the
pair (ak , µk ) can allow to compute ck . With reference to the state definition
(6.5), a reduced-complexity trellis can be obtained by defining a new state
where code symbols {ck−i } are substituted by subsets of the constellation
they belong to. In other words, the state defines only the subset symbols
{ck−i } belong to.
For a formal definition, let us call M ′ the cardinality of the alphabet
6.1 – Reduced-state sequence detection 165
0
10
-1
10
10-2
BER
10-3
10-4
Full complexity
-5 Lr=2
10
Lr=1
-6 Lr=0
10
0 2 4 6 8 10 12 14
Eb/N0 [dB]
This state groups together all states σk define by (6.5) having symbol ck−i
belonging to the same Ik−i (i), for i = 1, . . . , L.
In order to correctly define the state, it is required that, given the state ωk
and the subset Ik (1) symbol ck belongs to, the next state ωk+1 is univocally
determined. In fact, since
B0 B1
C0 C2 C1 C3
Figure 6.4: Original constellation and relevant partitions for the Example
6.1.
partitions must be such that Ω(i) is a further partition of Ω(i + 1). In this
way, Ik−i (i) univocally determines Ik−i (i + 1) of the next state. In other
words the partition depths must satisfy the condition
J1 ≥ J2 ≥ · · · ≥ JL .
where Ik−1 (1) ∈ Ω(1) is one of subsets C whereas Ik−2 (2) ∈ Ω(2) is one of
subsets B. In this case, the number of states is S ′′ = 4 · 2 = 8. Although
the number of states is the same, the receivers corresponding to the defined
trellis can have a different performance.
Notice that this second approach is more general. In fact, if we choose
(J1 , J2 ) = (8, 1) , we obtain exactly the state ωk′ by noticing that Ik−2 (2)
becomes irrelevant, since Ω(2) is composed of one element only, and we can
equivalently define the state as
In addition, this second approach allows to define trellises with a larger va-
riety of states. As an example, we can have a 16-state trellis (by choosing
(J1 , J2 ) = (8, 2) or (J1 , J2 ) = (4, 4)) which cannot be obtained by truncation.
♦
If Lr is the last value of index i such that Ji > 1, we have
ωk = µk ; Ik−1 (1), . . . , Ik−Lr (Lr ) .
By defining (
M ′ for i = 1, . . . , Lr
Ji =
1 for i = K + 1, . . . , L
we obtain exactly the state (6.1) defined by truncating the memory. It is
thus clear that the second class of reduced-complexity algorithms includes
the first one as a special case. The number of states is
( QL
r for an uncoded system
i=1 Ji
S= QLr Ji
Sc i=1 2n−k1for a TCM system.
where c̆k−ℓ (ωk ) denotes the code symbol at time k − ℓ associated with the
survivor of state ωk .
This technique has an interesting performance (i.e., a limited performance
loss) when set partitioning follows the Ungerboeck’s rule already discussed
168 Reduced-complexity and adaptive receivers
in the previous chapter, i.e., when symbols within each subset have a large
relative distance. In addition, the channel impulse response {fℓ }Lℓ=0 must
have minimum phase such that the energy is concentrated for low values of
ℓ whose corresponding symbols are better represented on the reduced trellis.
The described technique based on set partitioning and PSP is called
reduced-state sequence detection (RSSD) and has been proposed by three
independent groups of researchers [59, 60, 61]. In practice it consists in
building a reduced trellis that is then processed with full complexity. It can
be also extended to the BCJR algorithm described in Chapter 5 [62]. Other
techniques are available to perform detection on the original trellis but ex-
ploring only a fraction of it (see, for example, [63, 64]). At the end of this
chapter, we will describe an alternative technique, which still builds a re-
duced trellis that is then processed with full complexity, and that works on
the Ungerboeck metrics.
where {âk−ℓ } represent the previous decisions. The minimization of (6.6) can
be performed in a symbol-by-symbol fashion as
since we now have a one-state trellis. The resulting scheme becomes that of
a decision-feedback equalizer (DFE) shown in Fig. 6.5. The name equalizer
means that ISI on the signal is eliminated, or reduced, before performing
symbol-by-symbol detection.
In the case of a coded modulation, the lowest complexity can be achieved
when the receiver state coincides with that of the encoder, i.e., ωk = µk . In
the case, ISI cancellation can be implemented based on preliminary decisions
6.2 – Adaptive equalization 169
yk yk′
r(t) WMF DET âk
−
FEEDBACK
FILTER
z −1 z −1 z −1 âk
fL f1
Feedback filter
yk âk−D
r(t) WMF
−
c̆k−1 (σk )
VA
FEEDBACK
FILTER
+
yk
zk âk−d
+ DET
pN ′ p1
z −1 z −1
âk−d−N ′ âk−d
or PSP. With reference to this latter case, the receiver block diagram is shown
in Fig. 6.6 where a bank of feedback filters is present, one per each survivor.
The DFE, that we obtained as a special case of the MAP sequence detec-
tion algorithm with complexity reduction pushed to the limit, has historically
been proposed well before MAP sequence detection. In fact, in the presence
of ISI, the most intuitive, and presently also the most used, detection strategy
is the symbol-by-symbol one with channel equalization used in the attempt
to remove the ISI [65, 66, 67, 68, 69, 70, 71]. Thus, we will now investi-
gate equalization techniques. To this aim, let us consider a receiver model
characterized by special constraints on the input and feedback filters. In
particular, we will suppose that these filters are discrete-time finite impulse
response (FIR) filters. In the literature, they are often called tapped delay-
line (TDL) filters (with a finite number of taps). The receiver structure
is shown in Fig. 6.7. In this DFE, in addition to the feedforward filter, a
feedback filter is also present. The analog front end filter can be a matched
filter or an approximation for it in the case of an unknown channel. The
feedforward filter is characterized by N taps and the relevant coefficients will
be called {ci }, whereas the feedback filter has N ′ taps and coefficients {pi }.3
In a structure like this, we need to optimize coefficients {ci } and {pi }
according to some criterion. Excluding the minimization of the symbol error
3
From now on, we will consider an uncoded transmission. As a consequence, there is
no confusion between the code symbols and the equalizer’s taps.
6.2 – Adaptive equalization 171
b = E {x∗k ak−d }
of dimension N. The MSE can be thus expressed through the quadratic form
E(c) = cH Ac − 2ℜ cH b + σa2 .
where we called sk the signal component in yk . Since σn2 > 0, then cH Ac > 0,
∀c 6= 0. In the absence of noise, the matrix is positive definite too. In fact,
we have a positive semi-definite matrix if there exists a vector c 6= 0 such
that
E | sk |2 = E | cT xk |2 = 0 .
This can happen when cT xk is zero with probability one 1, i.e., when xk is
linearly dependent on xk−1 , . . . , xk−N +1 with probability 1 and this is clearly
impossible.
6.2 – Adaptive equalization 173
▽c E = 2 (Ac − b) = 0
c0 = A−1 b . (6.8)
and hence, remembering that yk = xTk c and the definitions of A and b, we have
(Ac − b) = 0 .
174 Reduced-complexity and adaptive receivers
c2 c(k+1)
c(k)
c0
c1
1
c(k+1) = c(k) − α ▽c E (k)
2 c=c
▽c E = 2 (Ac − b)
= 2 E x∗k xTk c − E {x∗k ak−d }
= 2E {x∗k (yk − ak−d )}
= 2E {x∗k ek }
6.2 – Adaptive equalization 175
having defined
ek = yk − ak−d
All quantities appearing in this expression are available at the receiver. In
fact:
• vector xk is contained in the TDL;
yk âk−d
EQ DEC
Tracking
-
Taps
adjust. ek Training
ak−d
and
p1
p2
p= .. .
.
pN ′
we have
zk = xTk c + aTk p = uTk v
where we defined
U = E u∗k uTk
∗
xk
= E · · · xTk ; aTk
a∗k
.
A .. G
= ··· · ···
.
GH .. σa2 I
and
G = E x∗k aTk
b
w = E {u∗k ak−d } = · · · .
0
v0 = U−1 w .
with
ek = zk − ak−d .
It can be split in two equations related to the taps of the forward and back-
ward filters, respectively:
(
c(k+1) = c(k) − αek x∗k
p(k+1) = p(k) − αek a∗k
nk
ak xk yk âk−d
hk EQ DET
Tracking
−
Taps
adjust. ek Training
ak−d
d
nk
ak xk âk
hk DET
Taps
- ek adjust.
Tracking
ck
Training
C(z) ≃ H(z)
having defined
c0 ak
c1 ak−1
c= .. ak = .. .
. .
cN −1 ak−N +1
Assuming that the information symbols have mean zero and are uncorre-
lated, matrix A is diagonal and b contains the samples hk . Hence, we cannot
simply compute the optimal solution as
c0 = A−1 b (6.10)
▽cE = 2(Ac − b)
= 2 E a∗k aTk c − E {a∗k xk }
= 2 E a∗k aTk c − xk
= −2E {a∗k ek } .
nk
âk−D
ak yk
{fℓ }L0 VA
ˆk−d
â
d
f̂ k
Taps
ek adjust.
−
f̂k Ins.
{fˆℓ }L0
Appr.
d
These branch metrics can be evaluated only if the receiver perfectly knows
the channel impulse response {fℓ }Lℓ=0 . In other words, the receiver has to first
identify the discrete-time equivalent model with white noise of the channel.
Let us consider Fig 6.12, where the generic discrete-time impulse response
{hk } is substituted by the discrete-time equivalent model with white noise
{fℓ }Lℓ=0 , and the generic detector is implemented through the Viterbi algo-
rithm. The vector collecting the identified weights will be denoted by f̂ (k)
to emphasize that it represents the estimate of the equivalent channel model
{fk }. In the scheme of Fig 6.12 we used preliminary decisions with delay d
and, as a consequence, samples {yk } are also delayed by the same amount
before the comparison with the output of the identifier filter. In addition, we
182 Reduced-complexity and adaptive receivers
will assume that the number of weights of this filter is exactly L + 1 which
is the number of samples of the equivalent channel model.
The estimated channel impulse response can be updated by using the
stochastic gradient algorithm as
The presence of the delay d implies that we are estimating an “old” version
of the channel. By increasing d up to the decision delay D of the VA we will
have a better estimate of the channel provided that it is slowly-varying. For
a fast channel, we can reduce d but, in this way, we will have decisions with
a low reliability. We can thus resort to the following techniques:
A. we can use a large delay and predict f (k) based on previous identifica-
tions;
where f̂ (k) (σk ) is a per-survivor channel estimate. The branch metrics thus
become
λk (ak , σk ) = | ek (ak , σk ) |2
and, by using the VA, we obtain
for the pairs (ak , σk ) satisfying (6.11) (i.e., along the transitions extending
the survivors).
The history of CS receivers starts in the early 1970s with the work of
Falconer and Magee [76], which originated further research on the topic [77,
78, 79, 80, 81, 82]. The work of Rusek and Prlja [83] generalized the previous
works by proposing a technique to design the optimal CS receiver, from an
information theoretic point of view, for a generic linear channel.
In this section, we briefly review the main results of [83] for the special
case of ISI channels. Let us consider the Forney model
L
X
yk = fℓ xk−ℓ + wk , k = 0, 1, . . . , K − 1
ℓ=0
where, this time, we denoted by {xk } the transmitted symbols (and not the
samples at the matched filter output, as in Chapter 2). We can collect all
transmitted symbols and received samples into vectors x = (x0 , x1 , . . . , xK−1 )T
and y = (y0 , y1, . . . , yK−1)T , respectively, and write
y = Fx + w
F = Toeplitz({fℓ })
where the Toeplitz matrices Fr and Gr , and the mismatched noise density
Nr are subject to optimization.7 By removing the terms irrelevant for the de-
tection process, namely those that do not depend on the transmitted symbols
x, the mismatched (auxiliary) channel law (6.13) can be redefined as
q(y|x) = exp 2ℜ yH Fr x − xH Gr x (6.14)
where, without loss of generality, Nr has been absorbed into the design of Fr
and Gr . We can notice from (6.14) that the need for trellis processing arises
only from the matrix Gr . In other words, when Gr is diagonal, the optimal
receiver for the auxiliary channel becomes a symbol-by-symbol detector. In
order to obtain an optimal receiver with a limited number of states, Gr must
be constrained such that Gr i,j = 0 for |i − j| > Lr , where Lr is the desired
length of the resulting shortened channel response. To achieve an effective
complexity reduction, Lr must be selected to be lower than the actual channel
memory L.
In [83], the matrices Fr , and Gr are designed to maximize the lower
bound on the achievable information rate of the channel based on mismatched
detection assuming Gaussian inputs. Toeplitz matrices Fr and Gr can be
defined from two discrete sequences {fℓr } and {gℓr }, i.e.,8
Fr = Toeplitz({fℓr })
Gr = Toeplitz({gℓr }) .
Sequence {fℓr } is the impulse response of the CS filter whereas {gℓr } is the
equivalent channel response after the CS filter. They can be obtained through
the following steps [83]. Let F (ej2πf T ) and G(ej2πf T ) be the Fourier trans-
forms of {fℓr } and {gℓr }, respectively. It can be demonstrated that, for an
ISI channel with impulse response {fℓ } and a receiver trellis characterized by
7
As demonstrated in [84], (6.13) is a valid channel law although it is not necessarily a
valid pdf.
8
Sequence {gℓr } is such that gℓr = 0 for |ℓ| > Lr .
6.4 – Channel shortening 185
Gr (ej2πf T ), with minf Gr (ej2πf T ) > −1, the optimal CS filter can be obtained
as [83]
F H (ej2πf T )
F r (ej2πf T ) = (Gr (ej2πf T ) + 1) , (6.15)
|F (ej2πf T )|2 + 2N0
where F (ej2πf T ) is the Fourier transform of the actual channel response {fℓ }.
Notably, the filter (6.15) can be seen as the cascade of an MMSE filter (see
Appendix B), that does not depend on the reduced channel memory Lr ,
followed by a filter with transfer function Gr (ej2πf T ) + 1. When the memory
Lr is equal to zero, (6.15) reduces to a classical MMSE filter.
We now have to compute the optimal response Gr (ej2πf T ). This can be
done through the following steps (refer to [83] for deteils):
A. Compute
2N0
B(ej2πf T ) = (6.16)
|F (ej2πf T )|2 + 2N0
Lr
and its inverse Fourier transform {bℓ }ℓ=−L r.
B. Define the vector b = [b1 , . . . , bLr ] and the matrix B as the Toeplitz
matrix of dimension Lr × Lr formed from the vector [b0 , . . . , bLr −1 ] as
b0 b1 . . . bLr −1
b1 b0 . . . bLr −2
B = .. .. . . .. .
. . . .
bLr −1 bLr −2 . . . b0
The authors of [83] also provided a closed-form expression for the lower bound
of the achievable information rate of the channel, achievable with the optimal
detector for the considered reduced trellis, under the assumption of Gaussian
186 Reduced-complexity and adaptive receivers
CS
filter
r(t) kT
WMF {fℓr} VA/
BCJR
0.9
0.8
i′ (x; y) [bits/ch. use]
0.7
0.6
0.5
0.4
Figure 6.14: Achievable information rate for the CS receiver with increas-
ing complexity on the channel with response [0.5, 0.5, −0.5, −0.5]. A BPSK
modulation has been considered.
188 Reduced-complexity and adaptive receivers
6.5 Exercises
Exercise 6.1 Consider the transmission of independent and uniformly dis-
tributed symbols belonging to a 16-QAM constellation on an ISI channel
with dispersion length L = 2.
• Define the trellis for MAP sequence detection.
• When adopting the RSSD technique, define the state of the possible
reduced trellises.
Symbols {ak } are uncorrelated, have mean zero, and have mean square value
E{|ak |2 } = σa2 . Noise samples have mean zero and autocorrelation sequence
E{nk+ℓ n∗k } = σn2 ρℓ . The input vector is defined as
△
xk = (xk , xk−1 , . . . , xk−N +1 )T .
• Show that the channel autocorrelation matrix
A = E{x∗k xTk }
has components
X
Aij = σa2 h∗n hn+i−j + σn2 ρi−j , i, j = 0, 1, . . . , N − 1 .
n
• Compute the equalizer taps when adopting the minimum MSE criterion
assuming a delay d = 0, and the corresponding minimum MSE.
Symbols {ak } are real, uncorrelated, have mean zero and mean square value
E{a2k } = σa2 . Samples {hk } are all zero except h0 = 1 and h1 = b. Thermal
noise can be considered as negligible.
• Compute the MSE Elin at the equalizer output and compare it with
the MSE at the equalizer input.
• Compute the minimum MSE at its output and compare it with Elin .
Chapter 7
191
192 Turbo codes and iterative decoding
again with a second RSC encoder with rate 1/2 to obtain a second sequence of
parity bits. The original information sequence and the parity-check sequences
are then transmitted, as shown in Fig. 7.1. We can observe that the rate of
the overall encoder is 1/3. A higher rate can be obtained through puncturing,
i.e., by transmitting less parity bits. As an example, in the case of the original
turbo code in [89, 90], a rate 1/2 is obtained by transmitting only odd bits
of the first parity-check sequence and only even bits of the second one.
(1)
ak ck
RSC (2)
ck
encoder
RSC (3)
ck
encoder
creases in a very steep way for low-medium signal-to-noise ratio values (the
so-called waterfall region) and then, for higher values of the signal-to-noise
ratio (and typically for bit-error probability values below 10−5 ), where the
performance is governed by the minimum distance, it starts decreasing in a
very slow way (the so-called error floor, although the term floor is improper
since there is no irreducible floor in the performance).
The two encoders in the scheme of Fig. 7.1 are called component encoders
and are typically identical. As said, in the turbo code first appeared in the
literature, two RSC encoders have been employed as component encoders. It
was understood later that the systematic nature is not necessary, although it
simplifies the decoder implementation [91]. On the contrary, it is fundamental
to adopt recursive component encoders in order to obtain an interleaver gain,
i.e., a turbo code whose performance improves when increasing the interleaver
length.
Recursive codes are such that the code bits at time k not only depend on
the information bits at the same instant and at previous ν instants, where ν
is the code constraint length, but on all previous bits since the encoder has
a structure with feedback connections. Starting from a non-recursive non-
systematic convolutional encoder with rate 1/n, it is possible to obtain in a
very simple way a RSC encoder with the same rate and characterized by the
same codewords, and thus with the same minimum Hamming distance dH,min .
Obviously, for a given input sequence, the corresponding codeword will be
different in the two cases. As an example, let us consider a non-recursive
non-systematic convolutional encoder with rate 1/2. The two coded bits at
time k can be expressed as
" ν #
(1)
X (1)
ck = gi ak−i mod 2 (7.1)
" i=0
ν
#
(2)
X (2)
ck = gi ak−i mod 2 (7.2)
i=0
where the sum is modulo 2. The corresponding RSC encoder can be obtained
by placing, at the input of the encoder shift register, having ν memory ele-
ments, not the information bit ak but an auxiliary bit wk . The coded bits
can be expresses as a function of the information and auxiliary bits as
(1)
ck = ak (7.3)
" ν #
(2)
X (2)
ck = gi wk−i mod 2 (7.4)
i=0
194 Turbo codes and iterative decoding
whereas the recursive equation that expresses the sequence of auxiliary bits
a a function of the information bits is
" ν
#
X (1)
wk = ak + gi wk−i mod 2 . (7.5)
i=1
Another RSC encoder with the same minimum Hamming distance can be
(1) (2)
obtained by exchanging the role of coefficients gi and gi , i.e., by employing
(2) (1)
coefficients gi in the update of auxiliary bits and coefficients gi to compute
the sequence of non-systematic coded bits.
(1)
ck
bk
z −1 z −1 z −1 z −1
(2)
ck
Figure 7.2: The 16-state component encoder of the turbo code proposed in
[89, 90].
The 16-state convolutional encoder with rate 1/2, employed in the original
turbo code in [89, 90], is shown in Fig. 7.2, whereas Fig. 7.3 reports the 8-
state encoder with rate 1/2 employed as component encoder of the turbo
code in the UMTS (universal mobile telecommunications service) standard.
The use of turbo codes is also considered in many other standards such as
DVB (digital video broadcasting) and CCSDS (consultative committee for
space data systems) standards. For the choice of good component codes, the
reader can refer to [92].
Another fundamental component in the turbo code structure is the inter-
leaver which has to be non-uniform.2 In practice, a non-uniform interleaver
performs a random permutation. Hence, a pair of adjacent bits in the input
sequence is separated, after the permutation, by a number of bits which is
not always the same but depends on the position of the considered pair. The
minimum distance of the turbo code depends on the interleaver. Thus, the
2
A uniform interleaver operates by writing the input bits in a matrix row by row and
outputs them by reading column by column.
7.1 – Turbo codes 195
(1)
ck
bk
z −1 z −1 z −1
(2)
ck
Figure 7.3: The 8-state component encoder of the turbo code of the UMTS
standard.
some interesting advantages with respect to turbo codes. In fact, for them
the interleaver gain is larger, i.e., the bit-error probability goes as M −3 , where
M is the interleaver length, whereas for turbo codes it goes as M −1 [91]. Se-
rially and parallelly concatenated schemes have been then extended to higher
constellations to obtain coded modulations with high spectral efficiency.
ℓe,in,1
k
âk
− ℓe,out,1
k Π−1
BCJR
+
(1) Π ℓe,out,2
k
yk
(2) −
yk ℓe,in,2
k +
(3)
yk BCJR Π−1
The presence of the interleaver in the scheme of Fig. 7.1 makes the mem-
ory of the turbo code very large, despite the use of simple component en-
coders. As a consequence, the optimal MAP sequence (or symbol) decoder
would be characterized by a huge number of states and thus it would be
unfeasible. For this reason, it has been proposed to resort to a suboptimal
iterative scheme whose complexity is much lower than that of the optimal
decoder but, as empirically verified, with a performance very close to that of
the optimal one. The decoder for the turbo code with rate 1/3 of Fig. 7.1
is shown in Fig. 7.5. As said, its operation principle is similar to that of
the turbo engine. The received samples corresponding to the information se-
quence and the first parity-check sequence are first decoded through a BCJR
7.2 – Iterative decoding 197
algorithm (or any other SISO decoder) corresponding to the first convolu-
tional encoder. This decoder will thus provide a sequence of soft decisions
for the every bit of the information sequence. These soft decisions are then
interleaved and employed by a second BCJR designed for the second com-
ponent encoder, in addition to the received samples corresponding to the
second parity-check sequence. The soft decisions produced by the second de-
coder are then deinterleaved and fed back to the first component decoder for
the next iteration, where the additional information provided by the second
decoder are employed by the first decoder to produce more reliable soft de-
cisions. This iterative process will proceed for 10÷20 iterations after which
the final decisions will be taken.
Both component decoders in the scheme of Fig. 7.5 are, as said, soft-
output decoders. Hence, they additionally provide information about the
reliability of their decisions. The basic principle of the iterative decoding is,
in fact, the following: each component decoder employs the “suggestions” of
the other decoder to provide more and more reliable decisions. Let us see in
detail how the reliability information is produced and employed.
In order to speed up the convergence of the iterative process, each decoder
has to receive at its input an information that it did not produce. For this
reason, in [89, 90] the concept of estrinsic information has been introduced,
to identify the part of the reliability information produced by a decoder
that does not depend on the information that the decoder itself has received
(see also [94]). Let us consider the generic component decoder, and let us
assume that it operates according to the BCJR algorithm.3 Let us denote
by ℓe,in
k the extrinsic information at its input and by ℓe,out
k that at its output,
with reference to the information bit ak . We used the same symbol already
used in (5.8) to denote the log-likelihood ratio since every decoder indeed
provides the log-likelihood ratio related to the information bit ak , except for
the subtraction needed to compute the extrinsic information. The extrinsic
information at the input of the other decoder is then used as an estimate
of the a-priori probability of the information bits. In other words, the other
decoder assumes that
P (ak = 0)
ℓe,in
k ≃ ln (7.6)
P (ak = 1)
3
Similar considerations hold in case we employ the SOVA.
198 Turbo codes and iterative decoding
0
10 uncoded
uncoded (theory)
1 iteration
2 iterations
-1 3 iterations
10 6 iterations
18 iterations
10-2
BER
-3
10
-4
10
-5
10
0 1 2 3 4 5 6 7 8
Eb/N0 [dB]
and thus
exp{ℓe,in
k }
P (ak = 0) ≃ (7.7)
1 + exp{ℓe,in
k }
1
P (ak = 1) = 1 − P (ak = 0) ≃ . (7.8)
1 + exp{ℓe,in
k }
0
10 1 iteration
3 iterations
6 iterations
12 iterations
-1
10
10-2
BER
-3
10
-4
10
-5
10
0 0.5 1 1.5 2 2.5 3 3.5 4
Eb/N0 [dB]
expressed as
P (ak = 0|y) f (y|ak = 0)P (ak = 0)
ℓk = ln = ln
P (ak = 1|y) f (y|ak = 1)P (ak = 1)
f (y|ak = 0)
= ln + ℓe,in
k . (7.9)
f (y|ak = 1)
The first term of the right-hand side of (7.9) is the output extrinsic informa-
tion that can be thus obtained as
ℓe,out
k = ℓk − ℓe,in
k . (7.10)
A1 D1
Π−1
BCJR 1 E1
Π
y (1)
y (2) E2
Π A2
y (3) BCJR 2 D2
Π−1
main implication, as we will see in the following, that the contribution of the
(1)
received samples {yk } has to be removed from the extrinsic information to
avoid that each component decoder processes this contribution twice. We
denoted by
P (a = 0) P (a = 0|y)
Ai = ln , Di = ln , i = 1, 2
P (a = 1) P (a = 1|y)
the input and output LLR related to decoder i and by Ei the extrinsic infor-
mation. Clearly A2 = E1 (A1 = E2 ) after proper interleaving (deinterleav-
ing) since the extrinsic information on the systematic bits E1 (E2 ) is passed
through the interleaver (deinterleaver) to become the a-priori input A2 (A1 )
of the second (first) decoder . We will also define
f (y (1) |a = 0)
Y = ln .
f (y (1) |a = 1)
and w is a Gaussian random variable with mean zero and variance σ 2 . Hence
202 Turbo codes and iterative decoding
(1) 1 1 (1) 2
f (y |a) = √ exp − 2 y − x
2πσ 2 2σ
and
2 (1)
Y = y . (7.11)
σ2
The extrinsic information can be thus computed as
E1 = D1 − Y − A1
E2 = D2 − Y − A2 . (7.12)
2 2
Y = 2
y = 2 (x + w) = µY x + nY
σ σ
with µY = σ22 and σY2 = E{n2Y } = 4
σ2
. Thus, mean and variance of Y are
connected by
σY2
µY = .
2
This relationship will turn out to be useful for modeling the a-priori knowl-
edge later.
The parallel decoder of Fig. 7.8 has a symmetric arrangement. Thus, the
situation for the second decoder with respect to A2 , D2 , and E2 is essentially
the same as for A1 , D1 , and E1 . Long sequence lengths make sure that tail
effects (open/terminated trellises of convolutional codes) can be neglected.
Hence, it is sufficient to focus on the first decoder for the remainder. To
simplify notation the decoder index “1” is omitted in the following. We
will try to predict the behavior of the iterative decoder by looking at the
input/output relations of individual component decoders. Since analytical
treatment of the BCJR-decoder is difficult, we will employ the following
assumptions:
These two assumtions suggest that we can model the a-priori input to the
constituent decoder by applying an independent Gaussian random variable
with variance σA2 and mean zero in conjunction with the known transmitted
systematic bits, i.e., as
A = µA x + nA
Since A is supposed to be an LLR-value based on a Gaussian distribution,
as in the case of Y the mean value must fulfill the condition
σ2
µA = A .
2
We can now measure the information contents of the a-priori knowledge,
by computing the mutual information IA = I(X; A) between the transmitted
systematic bits and the LLR-values A:
X ˆ ∞ f (A = ξ|x)
IA = P (x)f (A = ξ|x) log2 dξ
x=−1,1 −∞ f (A = ξ)
X ˆ ∞ 2f (A = ξ|x)
= P (x)f (A = ξ|x) log2 dξ
x=−1,1 −∞ f (A = ξ|x = 1) + f (A = ξ|x = −1)
where (
2 )
2
1 1 σ
f (A = ξ|x) = p exp − 2 ξ − A x .
2
2πσA 2σA 2
It is thus
1 X
ˆ ∞ ξ
2e− 2 x
IA = f (A = ξ|x) log2 ξ ξ dξ
2 x=−1,1 −∞ e− 2 + e 2
ξ
1 ∞ 2e− 2
ˆ
= f (A = ξ|x = 1) log2 ξ ξ dξ
2 −∞ e− 2 + e 2
ξ
1 ∞ 2e 2
ˆ
+ f (A = ξ|x = −1) log2 ξ ξ dξ
2 −∞ e−2
+ e 2
( )
ˆ ∞ 2 2
1 1 σA −ξ
=1 − p exp − 2
ξ − log 2 1 + e dξ .
2πσA2 −∞ 2σA 2
The last result is a function of σA which is not available in closed form but
can be numerically computed. It will be denoted as J(σA ). It holds that
lim J(σA ) = 0
σA →0
lim J(σA ) = 1 .
σA →∞
204 Turbo codes and iterative decoding
output IE1 of first decoder becomes input IA2 of the second decoder 1
first decoder, 0.8 dB
second decoder, 0.8 dB
second decoder, 0.1 dB
second decoder, 0.1 dB
0.8
0.6
0.4
0.2
0
0 0.2 0.4 0.6 0.8 1
output IE2 of second decoder becomes input IA1 of the first decoder
the iterative decoder. For 0.1 dB the trajectory (lower left corner) gets stuck
after two iterations since both decoder characteristics do intersect. For 0.8
dB the trajectory is able to pass through the bottleneck. After six passes
through the decoder, increasing correlations of extrinsic information start to
show up and let the trajectory deviate from its expected zigzag-path. For
larger interleavers the trajectory stays on the characteristics for some more
passes through the decoder.
EXIT charts are very useful to predict the waterfall region. It can be
also used to design the component codes by choosing those with the lower
waterfall. The main advantage of the EXIT chart to the understanding of
iterative decoding is that only simulations of individual constituent decoders
are needed to obtain the desired transfer characteristics. These can then be
used in any combination in the EXIT chart to describe the behavior of the
corresponding iterative decoder, asymptotic with respect to the interleaver
size. No resource-intensive bit-error rate simulations of the iterative decoding
scheme itself are required.
206 Turbo codes and iterative decoding
7.4 Exercises
Exercise 7.1 For the component encoders in Figs. 7.2 and 7.3, define the
encoder state µk and plot the corresponding trellis diagram. Finally, consider
the trellis branch associated to the pair (ak , µk ) and find the corresponding
pair (a− +
k−J , µk+1 ).
Exercise 7.3 Draw the block diagram of the iterative decoder correspond-
ing to the serially concatenated scheme in Fig. 7.4. Show that, at least for one
of the two component decoders, it is required to modify the BCJR algorithm
in such a way it provides, in addition to the a-posteriori probabilities of the
information symbols, also those of code symbols. Suggest how to modify the
BCJR algorithm in this sense.
Chapter 8
8.1 Introduction
207
208 Factor graphs and the sum-product algorithm
Chapter 9
• one sample per symbol is sufficient for optimal decoding; we will denote
by {r ℓ } the received sample sequence and by {rℓ } the corresponding
deinterleaved sequence;
209
210 Codes for fading channels
(1)
(1)
ak ck CHANNEL
(2)
(2) ck ck Q ck
ak ENC (3) MOD f (r|c, h)
ck
COD TCM
(1)
âk
rk Q−1 rk
DECOD â(2)
k
whereas we will assume that the channel amplitude has Rayleigh distri-
bution. Defining ρℓ = |hℓ |2 , this random variable has the exponential
distribution: (
e−ρ for ρ ≥ 0
f (ρ) = ; (9.3)
0 for ρ < 0
Under these assumptions, let us compute the pairwise error probability. From
(9.4) it follows that a maximum likelihood decoder that perfectly knows the
fading gains will operate on the code trellis with metric to be maximized
equal to
N
X −1
Λ(c) = ln f (r|c, h) = ln f (rℓ |cℓ , hℓ ) . (9.5)
ℓ=0
Let us now consider two codewords c and ĉ stemming from the same state and
merging after a given number of trellis steps. Thus, given a particular channel
realization, the pairwise error probability, which represents the probability
that the decoder chooses the sequence ĉ when c is the transmitted sequence
and ĉ and c are the only two possible decoding outcomes, can be expressed
as1
2
d(s, ŝ) 1 d (s, ŝ)
Pr{Λ(ĉ) > Λ(c)|c, h} = Q √ ≤ exp −
2 2 4
( )
1 γX
= exp − |hℓ |2 |ĉℓ − cℓ |2
2 4
ℓ∈I
1 Y n γ o
2
= exp − ρℓ |ĉℓ − cℓ |
2 ℓ∈I 4
where d(s, ŝ) is the Euclidean distance between the the discrete-time signals
√
s and ŝ corresponding to vectors c and ĉ, respectively, i.e., sℓ = γhℓ cℓ and
√
ŝℓ = γhℓ ĉℓ , Λ(c) and Λ(ĉ) are the path metrics corresponding to c and ĉ,
respectively, and I is the set of all ℓ such that |ĉℓ − cℓ | =
6 0. By using (9.3)
1
The following inequality has been used:
1 −x2 /2
Q(x) ≤ e , x ≥ 0.
2
Tighter upper bounds could be derived as described in [103, 104].
212 Codes for fading channels
and the independence of random variables {ρℓ }, the average of the channel
realization is easily computed obtaining the average pairwise error probability
n o
Pr{Λ(ĉ) > Λ(c)|c} = Eh Pr{Λ(ĉ) > Λ(c)|c, h}
1 Y +∞ n γ o
ˆ
Pr{c → ĉ} ≤ exp − ρℓ |ĉℓ − cℓ |2 e−ρℓ dρℓ
2 ℓ∈I 0 4
1Y ∞ n hγ io
ˆ
= exp −ρℓ |ĉℓ − cℓ |2 + 1 dρℓ
2 ℓ∈I 0 4
1Y 1
= γ
2 ℓ∈I 4 |ĉℓ − cℓ |2 + 1
1 γ −|I| 1
Pr{c → ĉ} ≤ Q 2
2 4 ℓ∈I |ĉℓ − cℓ |
Code diversity criterion: The minimum code diversity |I| must be max-
imized.
Codes designed for the AWGN channel are thus suboptimal when employed
on a fading channel and a redesign is necessary. In particular, parallel tran-
sitions must be avoided since, when present, the code minimum Hamming
distance results to be equal to one (and thus the asymptotic performance
linearly decreases with the signal-to-noise ratio).
Several authors worked on the design of codes for fading channels. We will
now focus on a particular design procedure for constructing trellis codes with
optimal performance on the Rician/Rayleigh fading channel [105]. Although
this procedure applies to both conventional and multiple trellis codes (that
is, codes characterized by multiple output symbols for each trellis transition),
we will focus on the latter since their potential can be fully exploited on this
channel. In fact, when multiple trellis codes are employed, we can again
design a trellis diagram with parallel paths and still have an asymptotic
performance that decreases faster than linearly with the signal-to-noise ratio.
As an example, by using a multiple trellis code with L symbols associated
with each trellis branch, it is possible to have a code diversity ofQ
L and also to
design the code in such a way the minimum value of the term ℓ∈I |ĉℓ − cℓ |2
on error events with minimum Hamming distance is maximized. This is made
possible by a specific set partitioning procedure, obviously different from that
previously described, that will be now illustrated for the case of L = 2 and
an M-PSK constellation.
Let A denote the original M-PSK constellation and A ⊗ A the twofold
Cartesian product of A with itself. Hence, each element of A ⊗ A is a pair of
symbols belonging to the original constellation A. In the following, we will
i
identify the PSK symbol e2π M , i = 0, 1, . . . , M − 1, through integer i. At
the first partition level, set A ⊗ A is partitioned into M sets defined by the
ordered Cartesian product A ⊗ Bi , i = 0, 1, . . . , M − 1, whose p-th element,
p = 0, 1, . . . , M − 1, is the ordered pair (p, [pq + i]M ), where q ≤ M is a
proper odd integer and [·]M denotes a sum modulo M. As an example, for
M = 8 and an integer q = 3, we obtain the following subsets:
0 0 0 1
1 3 1 4
2 6 2 7
3 1 3 2
A ⊗ B0 =
A ⊗ B1 =
4 4 4 5
5 7 5 0
6 2 6 3
7 5 7 6
214 Codes for fading channels
0 2 0 3
1 5 1 6
2 0 2 1
3 3 3 4
A ⊗ B2 =
A ⊗ B3 =
4 6 4 7
5 1 5 2
6 4 6 5
7 7 7 0
0 4 0 5
1 7 1 0
2 2 2 3
3 5 3 6
A ⊗ B4 =
A ⊗ B5 =
4 0 4 1
5 3 5 4
6 6 6 7
7 1 7 2
0 6 0 7
1 1 1 2
2 4 2 5
3 7 3 0
A ⊗ B6 =
A ⊗ B7 =
.
4 2 4 3
5 5 5 6
6 0 6 1
7 3 7 4
As it can be observed, within any of the M partitions, each pair differs
from all other pairs in both elements. Hence, when the pairs of a partition
are employed to label parallel transitions of a multiple trellis code, a code
diversity of L = 2 is obtained.
The parameter q is a key point in this partition method. QIn fact, its choice
takes aim at maximizing the minimum value of the term ℓ∈I |ĉℓ − cℓ |2 on
parallel transitions. Before going into details, let us observe that Q the set
Bi+1 is simply a cyclic shift of set Bi . Thus, since the term ℓ∈I |ĉℓ −
cℓ |2 is simply the product of the squared Euclidean distances between the
corresponding symbols in the pair, the adopted set partitioning guarantees
that the intradistance structure of each partition A ⊗ Bi is the same. Hence,
it is sufficient to study the intradistance structure of the so-called generating
set A ⊗ B0 . In other words, let us consider the pairs (p, [pq + i]M ) and
(l, [lq + i]M ) of the set A ⊗ Bi . The product of the square distances between
9.1 – TCM for fading channels 215
It can be easily proven that M − q ∗ is also a valid solution. Table 9.1 reports
the optimal values of q for different values of M.
6 2 7 5
0 1 1 4
2 7 3 2
C0 ⊗ D10 =
4
C1 ⊗ D11 =
5 5 0
6 3 7 6
216 Codes for fading channels
0 2 1 5
2 0 3 3
C0 ⊗ D20 =
4
C1 ⊗ D21 =
6 5 1
6 4 7 7
0 3 1 6
2 1 3 4
C0 ⊗ D30 =
4
C1 ⊗ D31 =
7 5 2
6 5 7 0
0 4 1 7
2 2 3 5
C0 ⊗ D40 =
4
C1 ⊗ D41 =
0 5 3
6 6 7 1
0 5 1 0
2 3 3 6
C0 ⊗ D50 =
4
C1 ⊗ D51 =
1 5 4
6 7 7 2
0 6 1 1
2 4 3 7
C0 ⊗ D60 =
4
C1 ⊗ D61 =
2 5 5
6 0 7 3
0 7 1 2
2 5 3 0
C0 ⊗ D70 =
4
C1 ⊗ D71 =
3 5 6
6 1 7 4
Obviously, within each of these new sets, each pair is still distinct from
all other pairs in both positions. However,
Q it is in general no more true that
now the minimum value of the term ℓ∈I |ĉℓ − cℓ |2 is maximized within each
set unless in the previous step we used the value q ∗ corresponding to M/2
instead of M. Hence, the choice of q ∗ depends on the target partition level.
The third and subsequent steps are identical in construction to the second
step, i.e., we need to partition each set in the present level into two sets
containing the alternate rows, with the set of the previous levels obtained
by using a value of q ∗ computed as in (9.7) with M successively replaced by
M/4, M/8, and so on.
This procedure can be easily generalized to the case of L multiple of two.
As an example, in the case of L = 4 the sets belonging to the first partition
level will be the M 2 sets A ⊗ Bi ⊗ A ⊗ Bp , i, p = 0, 1, . . . , M − 1.
9.1 – TCM for fading channels 217
A ⊗ B0 A ⊗ B0
A⊗ B4
B 4
B2 A⊗
A⊗
A ⊗ B6
Figure 9.2: Trellis diagram for the considered 2-state rate-4/6 multiple TCM.
When the number of sets required to satisfy the trellis is less than the
number of sets generated on a particular partition level, only those having
largest interdistance must be chosen, as in the examples below. Let us now
discuss, through a couple of practical examples, how to employ these sets in
the code construction. The examples deal with two- and four-state rate-4/6
multiple (L = 2) trellis coded 8-PSK modulations, respectively.
Example Let us now consider a 4-state rate-4/6 multiple TCM using the
8-PSK as output constellation. As before, being L = 2, two 8-PSK symbols
(hence 6 bits) are transmitted every 4 input information bits. The encoder
218 Codes for fading channels
trellis has thus 24 = 16 branches leaving each state. Since there are now four
states, and assuming a completely connected encoder trellis, as that shown in
Fig. 9.3, each transition between states has four parallel paths. The encoder
trellis is shown in Fig. 9.3, where parallel paths are denoted by bold lines.
For the properties of the set partition method just described, if we associate
pairs from sets C0 ⊗ Di0 and C1 ⊗ Di1 with parallel transition, we are sure
that a code diversity of 2 is obtained onQ them. In addition, it can be shown
that the minimum value of the term ℓ∈I |ĉℓ − cℓ |2 on parallel transitions
is 4, provided that sets C0 ⊗ Di0 and C1 ⊗ Di1 are obtained through the
procedure described above but with q = 1, which is the optimal value for
M/2 = 4. Even in this case, not all sets C0 ⊗ Di0 and C1 ⊗ Di1 are required,
since only the following eight sets, for simplicity denoted as Si in the figure,
can be used:
0 0 1 5
2 2 3 7
S1 = 4 4
S 2 =
5 1
6 6 7 3
0 4 1 1
2 6 3 3
S3 = 4 0
S 4 =
5 5
6 2 7 7
0 2 1 7
2 4 3 1
S5 = 4 6
S6 = 5 3
6 0 7 5
0 6 1 3
2 0 3 5
S7 = 4 2
S 8 =
5 7
6 4 7 1
Every other error event consisting of two or more branches has an Ham-
ming distance greater than 2 regardless of which path is chosen as the correct
path. Thus, the dominant term on the asymptotic symbol or bit error prob-
ability expressions corresponds again to parallel paths
S1 S1 S1
S2
S4 S3
S5 S5
S6
S7
S8
S2 S1
S1
S4
S3
S6 S5
S8
S7
Figure 9.3: Trellis diagram for the considered 4-state rate-4/6 multiple TCM.
using the deinterleaved samples rℓ and working on the trellis of the overall
encoder with branch metrics. In case of a multiple TCM with L code symbols
per trellis branch, we will clearly have the sum of L terms of this form, one
per code symbol.
In practical conditions, i.e., when this ideal channel estimator is not avail-
able, due to the presence of the interleaver, the optimal decoder can hardly
be implemented. Assuming that a knowledge on the channel statistics is
available at the receiver, a soft-input soft-output detection algorithm based
on linear prediction (the MAP symbol detection version described in Chap-
ter 8) can be used and the relevant extrinsic information (in the logarithmic
domain)
Pr {cℓ |r}
ln = ln Pr {cℓ |r} − ln Pr {cℓ } = ln f (r|cℓ ) (9.8)
Pr {cℓ }
deinterleaved and employed as input of the decoder, like in decoding schemes
for serially concatenated convolutional codes (see Chapter 7). In this case,
the turbo principle [95] can be advocated and, by using a soft-output decoder,
a few iterations between detector and decoder performed.
220 Codes for fading channels
ri being a row vector collecting the samples received by antenna i, hi the cor-
responding channel gain, and ni the noise samples corresponding to antenna
2
It should be noticed that, for quasi-static channels, the union bound results to be
loose [106]. We will see later how to solve this problem.
9.1 – TCM for fading channels 221
and nR
Kb 1
Pb . (9.14)
2 1 + d2min γ4
that, asymptotically, becomes
Kb h 2 γ i−nR
Pb . dmin . (9.15)
2 4
The rate of decrease is thus proportional to the number of receive antennas.
Space diversity can thus be employed instead of time diversity. We will
see later that transmit antenna diversity can be also exploited through the
adoption of properly designed space-time codes.
222 Codes for fading channels
(1)
(1)
aℓ cℓ
Binary Q cℓ
P/S S/P Mapper Channel
encoder
(k)
aℓ (n)
cℓ m bits
BICM encoder
(1)
âℓ
rℓ Q−1 Binary
Demod. P/S S/P
decoder
(k)
âℓ
m soft-outputs n soft-inputs
Q
S/P P/S
m soft-inputs n soft-outputs
BICM demodulator/decoder
The idea behind BICM is very simple. If we consider two codewords of the
binary code having Hamming distance dH , and thus that differ for dH bits,
after the interleaver they will likely belong to the label of dH different coded
symbols. Hence, with high probability, the corresponding M-ary codewords
still have Hamming distance dH . The use of a binary code optimal in the sense
of the free Hamming distance (and of a proper interleaver), thus ensures that
the code diversity is maximized. These codes are known from early sixties
and thus an ad-hoc code design is not necessary. Obviously, the obtained
coded schemes are not optimal since no attempt is made Q in the direction of
2
trying to maximize the minimum value of the term ℓ∈I |ĉℓ − cℓ | over the
set of codewords with minimum Hamming distance. However, since the code
diversity is by far the most important parameter, these codes are expected
to provide a very good performance and to be practically optimal.
where
(
n o 1
if lab(j) (cp ) = b
(j) 2m−1
Pr cp |lab (cp ) = b =
0 otherwise.
(1) (1)
A 011 A0 011 A1
010 001 010 001
(1) (1)
Figure 9.5: Constellation A and partitions A0 e A1 .
3
The outer decoder will use, as branch metric
n
X (i)
λℓ = ln f (r|cℓ = b, h)
i=1
(i)
i.e., the sum of the extrinsic information of coded bits {cℓ }ni=1 (cf. the case of a convo-
lutional code transmitted over an AWGN channel, discussed in Remark 2.8).
9.2 – Bit-interleaved coded modulation (BICM) 225
Binary-input
channel 1
(1)
cℓ Binary-input (i)
ln f (r|cℓ = b)
Binary channel 2
P/S
encoder
(n)
cℓ
Binary-input
channel m
Figure 9.6: Equivalent parallel channel model for BICM in the case of ideal
interleaving.
same state and merging after a given number of trellis steps. Our aim is
to compute the pairwise error probability Pr{c → ĉ} that, however, may
depend on the pair (c, ĉ) rather than on their difference. This is because
the binary-input channels of the BICM equivalent parallel channel model in
Fig. 9.6 may be nonsymmetric, depending on the mapping and the signal
constellation A. In [108], a symmetrization procedure is thus described
which leaves the performance unmodified.
After symmetrization, the pairwise error probability will depend, besides
the channel and the type of detection, on the Hamming distance dH between
c and ĉ, the employed mapping µ, and the signal constellation A:
The usual bound on the bit error probability of binary codes can be computed
as ∞
1 X
Pb ≤ wI (dH )g(dH , µ, A) (9.21)
k d =1
H
in case of a block code of rate k/n, where wI (dH ) is the input weight of error
events having Hamming distance dH . In the original paper by Zehavi [107],
a Chernoff bound on the pairwise error probability has been derived in closed
form for 8-PSK with Gray mapping and a receiver with perfect knowledge
of the channel. It cannot, however, be extended to other mappings or signal
constellations. A more general and very accurate upper bound is instead
derived in [108] based on the Bhattacharyya bound [110].
9.3 – Space-time coding for frequency-flat fading channels 227
is given by η = N1 log2 |C| bits per channel use. By definition, the average
information bit-energy over noise power spectral density ratio is given by
E b /N0 = γnT /η.
Eqn. (9.23) describes the general model for a time-varying frequency-flat
MIMO channel. When N is much larger than the channel coherence time,
each codeword sees a large number of channel realizations. We can assume
that {Hℓ } is an ergodic random process and the channel is consequently
ergodic. In scenarios characterized by a limited mobility, the channel can
be assumed to be slow or quasi-static, i.e., each codeword sees only one
channel realization. In other words, Hℓ = H, ℓ = 1, 2, . . . , N. In this case,
this fading model is nonergodic. A different model for time-varying fading
channels has been introduced by Marzetta and Hochwald in [118]. They
considered a block-fading channel constant for L consecutive channel uses
and independent from block to block, modeling, as an example, a system
with quasi-static fading and frequency hopping every L channel uses. This
case will be denoted as block fading channel.
Different assumptions can be made on the knowledge of the channel gain
matrix at the transmitter and receiver. For a quasi-static channel, it is
generally assumed that H is perfectly known at the receiver since the channel
gains can be obtained fairly easily by sending a pilot sequence for channel
estimation (see [119, Section 3.9] or [120, Section 10.1]). On the contrary,
the assumption of perfect knowledge of the channel matrix at the transmitter
holds only if a delay-free error-free feedback link from receiver to transmitter
exists, allowing the receiver to send back the estimated channel gains, or if
time-division duplexing is used, where each end can estimate the channel
from the incoming signal in the reverse direction (channel reciprocity). On
the contrary, on a block fading channel, the assumption adopted in [118] is of
absence of knowledge of the channel gains at both transmitter and receiver.
The case of perfect knowledge of the channel gains at both transmitter
and receiver is of scarce interest in this section devoted to ST codes since, in
this case, through simple transmit precoding and receive filtering the MIMO
channel can be decomposed into a set of parallel and independent SISO
channels. Consider for example, the quasi-static channel and the singular
value decomposition of matrix H as [121]
H = UΣVH (9.24)
average capacity has no meaning (is strictly zero), as the channel is noner-
godic [125]. In this case, outage probability, defined as the probability that
the transmission rate exceeds the mutual information of the channel, must
be evaluated. The maximum rate that can be supported by the channel with
a given outage probability is the outage capacity (see Appendix C). As in the
case of ergodic channels, for a given outage probability, the outage capacity
increases linearly with ξ.
Finally, in the case of a block fading channel with coherence time of L
symbols and absence of knowledge of the channel at both transmitter and
receiver [118, 126], when L ≥ γ + nR , at high SNR values the capacity (in
bits per channel use) can be approximated as
ξ
C ≃ξ 1− log2 γ .
L
Hence, for L → ∞ the capacity of the noncoherent MIMO channel ap-
proaches that of the coherent channel. However, when L < γ + nR the
capacity increases as ς 1 − Lς log2 γ, where ς = min(nT , nR , ⌊L/2⌋). As a
consequence, it is not convenient to have a number of transmit antennas
larger that ⌊L/2⌋, although, when fading is correlated, additional transmit
antennas do increase capacity [127].
where k · k denotes the Frobenius norm of a matrix (i.e., the square root of
the sum of the square magnitudes of its elements). Hence,
n given a particular
o
channel realization, the pairwise error probability Pr C → Ĉ|H can be
computed as
n o r
γ
Pr C → Ĉ|H = Q kH(Ĉ − C)k
2
1 n γ o
2
≤ exp − kH(Ĉ − C)k . (9.29)
2 4
h i X
nR
2 H H
kH(Ĉ − C)k = trace H(Ĉ − C)(Ĉ − C) H = hi AhH
i . (9.30)
i=1
nR X
X nT
2
kH(Ĉ − C)k = λj |pi,j |2 (9.32)
i=1 j=1
and
( )
n o 1 γX
nR X ν
Pr C → Ĉ|H ≤ exp − λj |pi,j |2
2 4 i=1 j=1
1Y
nR Y ν n γ o
= exp − λj ρi,j (9.34)
2 i=1 j=1 4
having defined the random variable ρi,j = |pi,j |2 whose probability density
function is (
e−ρ for ρ ≥ 0
f (ρ) = (9.35)
0 for ρ < 0 .
We may thus compute the average pairwise error probability 8
n o n n oo
Pr C → Ĉ = EH Pr C → Ĉ|H
1Y
nR Yν ˆ n γ o
≤ exp − λj ρi,j f (ρi,j ) dρi,j
2 i=1 j=1 4
" ν #−nR
1 Y γ
nR Y ν
1Y 1
= = 1 + λj (9.36)
2 i=1 j=1 1 + γ4 λj 2 j=1 4
stating that, for low values of γ (γ ≪ 1), the pairwise error probability is
governed essentially by trace(A) instead of by det(A). Note that
h i
trace(A) = trace (Ĉ − C)(Ĉ − C) = kĈ − Ck2 .
H
that is the squared Euclidean distance between C and Ĉ. This is somehow
expected since for low values of the signal-to-noise ratio the performance
is governed by the additive noise rather than the fading. Thus, the error
probability curve changes its behavior from a waterfall shape (for small values
of γ) to a linear shape (for high values of γ) where the performance is governed
by the above-mentioned rank-determinant design principles.
For large values of νnR , let’s say νmin nR ≥ 4, this linear behavior is
observed for error probability values so small that a code design based on
the asymptotic behavior is highly suboptimal for error probability values of
interest [128, 129]. For these values, the pairwise error probability can be
obtained by examining the asymptotic behavior for νnR → ∞. Let us come
9
Note that A and Ĉ−C have the same rank. Hence, this criterion could be equivalently
enunciated with reference to matrix Ĉ − C.
9.3 – Space-time coding for frequency-flat fading channels 235
back to (9.33) and consider that, for the law of large numbers10
ν
X
kH(Ĉ − C)k2 → nR λj = nR trace(A) = nR kĈ − Ck2 .
j=1
Hence ( )
n o 1 γnR kĈ − Ck2
Pr C → Ĉ ≤ exp − .
2 4
The following alternative code design principle thus results:
has rank at least 4 for all the pairs of distinct codewords Ĉ and C, the
minimum trace of matrices A, which is the minimum squared Euclidean
distance between Ĉ and C, should be maximized.
10
Other design criteria are discussed in [130, 131].
236 Codes for fading channels
Hence
1Y
N n γ o
= exp − kĉℓ − cℓ k2 kp1,ℓ k2
2 ℓ=1 4
1Y n γ o
= exp − kĉℓ − cℓ k2 kp1,ℓ k2 (9.42)
2 ℓ∈I 4
Hence, the basic code design principles over fast frequency-flat fading chan-
nels are thus the following:
Code diversity criterion: The minimum diversity |I| between all pairs of
distinct codewords must be maximized.
and ( )
n o 1 γ X
Pr C → Ĉ ≤ exp − nR kĉℓ − cℓ k2 .
2 4
ℓ∈I
maximized.
h1 [ℓ]
h2 [ℓ]
cℓ
TX RX ĉℓ
hnR [ℓ]
coefficients, can be easily derived (see also Section 9.1.2 for the case of a
coded transmission). In fact, being the system memoryless (uncoded system
and perfect channel knowledge) we have
√ √ 2 2
ĉℓ = argmin krℓ − γhℓ cℓ k2 = argmin hH ℓ r ℓ − γkhℓ k c ℓ
cℓ cℓ
√
H ∗ γ 2 2
= argmax ℜ[hℓ rℓ cℓ ] − khℓ k |cℓ | . (9.46)
cℓ 2
It is thus clear, from this latter expression, that the optimal decision rule
linearly combines the received samples of different antennas after co-phasing
and weighting them with their respective channel gains. Samples from anten-
nas experiencing better channel gains (and thus higher signal-to-noise ratio
values) are emphisized more than others, and this is intuitive since they are
more reliable. This detection strategy is commonly known as maximal-ratio
combining detection.
It is easy to verify that it is the same optimal strategy corresponding to
the following equivalent single-input single-output channel
√
řℓ = hH
ℓ rℓ = γkhℓ k2 cℓ + ňℓ (9.47)
where, given the channel gains, the noise term ňℓ is still Gaussian with
variance khℓ k2 . Under the hypothesis that the components of hℓ are in-
dependent and identically distributed Gaussian random variables with mean
zero and unit variance (Rayleigh fading environment), the random variable
αℓ = γkhℓ k2 , representing the instantaneous signal-to-noise ratio, is chi-
squared distributed with 2nR degrees of freedom [134]. Its probability density
function is thus given by
( n −1 − α
α R e γ
n for α ≥ 0
f (α) = γ R (nR −1)! . (9.48)
0 for α < 0
The average symbol error probability can thus be easily computed. From
the equivalent channel model (9.47), considering, as an example, a BPSK
modulation whose bit error√probability for a given value of the instantaneous
signal-to-noise ratio is Q( 2α), we obtain the following expression for the
average bit error probability:
ˆ ∞ √
Pb = Q( 2α)f (α) dα .
−∞
A closed form expression for this probability exists and reads [135, p. 781]
r nR nXR −1
r m
1 γ nR − 1 + m 1 γ
Pb = 1− 1+ .
2 1+γ m=0
m 2 1 + γ
(9.49)
240 Codes for fading channels
h1,1
r̃[1] ĉ1
c1 , −c∗2 DEC
c1 , c2 LIN
TX h1,2
r[1], r[2] r̃[2] ĉ2
COMB
DEC
c2 , c∗1
√
A simpler upper bound can be found by considering that Q( 2α) ≤ 21 e−α
and hence
1 ∞ −α 1 1
ˆ
Pb ≤ e f (α) dα = (9.50)
2 −∞ 2 (1 + γ)nR
that, for γ → ∞, gives
1 1
Pb . (9.51)
2 γ nR
which clearly shows that a diversity order of nR is achieved.
The motivation behind the Alamouti scheme is thus the following. In a
cellular system, the base station can be easily equipped with multiple an-
tennas with sufficient separation among them. Hence, the technique just
described can be conveniently adopted in the uplink. On the contrary, since
at the mobile terminal it is difficult to place multiple antennas, receive diver-
sity can hardly be employed. The aim of the scheme proposed by Alamouti
is thus of obtaining transmit diversity when there are two transmit antennas.
ST block codes represent a generalization of the Alamouti scheme to the
case of nT > 2. Although they provide full diversity, there is no coding
advantage provided by ST block codes.11 However, optimal decoding can be
performed efficiently through a simple linear processing of the samples at the
output of the receive antennas.
The Alamouti scheme. Let us consider for now the case of a channel
with nT = 2 transmit antennas and nR = 1 receive antenna shown in Fig. 9.8.
The codewords have length N = 2 and the channel, perfectly known at the
receiver, is assumed to remain the same over two consecutive time intervals
(quasi-static fading over N = 2 symbol intervals). In the two considered
symbol intervals, it will be described by the matrix
11
To achieve an additional coding gain, one should concatenate an outer code with an
inner ST block code [136, 137, 138].
9.3 – Space-time coding for frequency-flat fading channels 241
H = [h1,1 , h1,2 ] .
The codeword matrices are of the form
c1 −c∗2
C=
c2 c∗1
meaning that, during the first interval, symbol c1 is transmitted from the first
antenna and symbol c2 from the second antenna whereas, during the second
interval, symbol −c∗2 is transmitted from the first antenna and symbol c∗1 from
the second antenna. A rate of one symbol per channel use is thus achieved.
The corresponding received samples in the two intervals are12
√
r[1] = γ(h1,1 c1 + h1,2 c2 ) + n[1]
√
r[2] = γ(−h1,1 c∗2 + h1,2 c∗1 ) + n[2] . (9.52)
where
h h1,2
Ȟ = ∗1,1
h1,2 −h∗1,1
and ň = (n[1], n[2]∗ )T is statistically equivalent to the vector (n[1], n[2])T .
An alternative set of sufficient statistics is represented by
mean zero and variance (|h1,1 |2 + |h1,2 |2 ). Decisions on symbols c1 and c2 can
thus be obtained by adopting the following symbol-by-symbol rules:
√
ĉ1 = argmin r̃[1] − γ |h1,1 |2 + |h1,2 |2 c1
c1
√
ĉ2 = argmin r̃[2] − γ |h1,1 |2 + |h1,2 |2 c2 . (9.56)
c2
In other words, after a proper linear combining of the received samples, de-
tection on symbols c1 and c2 can be decoupled. For this reason, the Alamouti
scheme is called an orthogonal design.
This scheme can be generalized to the case of multiple receive antennas.
Denoting by ri [1] and ri [2] two consecutive samples at the output of antenna
i, i = 1, 2, . . . , nR , following the same steps of the case for nR = 1 after linear
combining and normalization we have the following samples
√
r̃i [1] = h∗i,1 ri [1] + hi,2 ri [2]∗ = γ |hi,1 |2 + |hi,2|2 c1 + ñi [1]
√
r̃i [2] = h∗i,2 ri [1] − hi,1 ri [2]∗ = γ |hi,1 |2 + |hi,2 |2 c2 + ñi [2] (9.57)
where ñi [1] and ñi [2] are independent Gaussian noise samples with variance
(|hi,1 |2 + |hi,2 |2 ). Optimal decision on symbols c1 and c2 can thus be obtained
through maximal-ratio combining. After straightforward manipulations, we
obtain
ĉ1 = argmax f (r̃1 [1], r̃2 [1], . . . , r̃nR [1]|c1 , h1,1 , h1,2 , h2,1 , h2,2 , . . . , hnR ,1 , hnR ,2 )
c1
nR
Y
= argmax f (r̃i [1]|c1 , hi,1 , hi,2 )
c1
i=1
nR
" nR
#
√ X X
= argmin γ|c1 |2 (|hi,1 |2 + |hi,2 |2 ) − 2ℜ c∗1 r̃i [1]
c1
i=1 i=1
n 2
X R
√
2 2
= argmin r̃i [1] − γc1 (|hi,1 | + |hi,2 | ) (9.58)
c1
i=1
ĉ2 = argmax f (r̃1 [2], r̃2 [2], . . . , r̃nR [2]|c2 , h1,1 , h1,2 , h2,1 , h2,2 , . . . , hnR ,1 , hnR ,2 )
c2
nR
Y
= argmax f (r̃i [2]|c2 , hi,1 , hi,2 )
c2
i=1
n 2
X R
√
2 2
= argmin r̃i [2] − γc2 (|hi,1 | + |hi,2 | ) . (9.59)
c2
i=1
The performance analysis of this scheme is quite simple. From (9.58) and
(9.59), in fact, it is clear that a decision on symbol cℓ , ℓ = 1, 2 is obtained
from the following equivalent single-input single-output channel
nR
X nR
X
√
r̃[ℓ] = r̃i [ℓ] = γcℓ |hi,1 |2 + |hi,2 |2 + ñ[ℓ] (9.60)
i=1 i=1
P R
having defined ñ[ℓ] = ni=1 ñi [ℓ]. Given the channel P gains, samples {ñ[ℓ]}
are jointly Gaussian, independent, and with variance ni=1 R
(|hi,1 |2 + |hi,2 |2 ).
Comparing (9.60) with (9.47), it is thus clear that the Alamouti scheme with
nT = 2 transmit antennas and nR receive antennas is perfectly equivalent to
a scheme with nT = 1 transmit antenna and 2nR receive antennas and using
maximal-ratio combining, provided that the same value of γ is employed,
i.e., provided that the same power per transmit antenna is spent (meaning
that for an equal overall transmitted power, the performance of the Alamouti
scheme exhibits a 3-dB degradation). It is thus also clear that the Alamouti
scheme achieves full diversity (diversity 2nR ). This can be easily verified by
considering two distinct codewords
c1 −c∗2 ĉ1 −ĉ∗2
C= , Ĉ =
c2 c∗1 ĉ2 ĉ∗1
Orthogonal ST block codes. The Alamouti scheme has been designed for
nT = 2 transmit antennas. Orthogonal ST block codes [139, 140] represent
an extension for the case nT > 2.
In the general case of nT transmit antennas, in order to design a code
having a rate of 1 symbol per channel use and full diversity, we need to
design a set of nT × nT (squared) matrices, with elements from the employed
constellation, whose rows are orthogonal to each other. This latter property,
in fact, will ensure that an optimal receiver can be designed based on a
linear processing plus symbol-by-symbol detection. Unfortunately, it is not
possible to always find such an orthogonal design. For real constellations (as
an example M-PAM), it exists for nT = 2, 4, and 8, only. As an example, for
nT = 4, the corresponding orthogonal design is that using codeword matrices
244 Codes for fading channels
of the form
c1 −c2 −c3 −c4
c2 c1 c4 −c3
C=
c3 −c4 c1
. (9.61)
c2
c4 c3 −c2 c1
It is easy to prove, proceeding as done for the Alamouti code, that it achieves
full diversity. It also has a rate of 1 symbol per channel use since 4 symbols
are transmitted in 4 time slots.
On the other hand, for complex constellations, there exists a unique or-
thogonal design for nT = 2 (that proposed by Alamouti). It is, however,
possible to find many orthogonal designs by removing some of the mentioned
constraints. As an example, for nT = 4, the code having codewords
c1 −c2 −c3 −c4 c∗1 −c∗2 −c∗3 −c∗4
c2 c1 c4 −c3 c∗2 c∗1 c∗4 −c∗3
C= c3 −c4 c1
. (9.62)
c2 c∗3 −c∗4 c∗1 c∗2
c4 c3 −c2 c1 c∗4 c∗3 −c∗2 c∗1
achieves full diversity [as can be easily proved by computing matrix A =
(Ĉ − C)(Ĉ − C)H ] but has a rate of 1/2 symbol per channel use since 4
symbols are transmitted in 8 time slots.
A mathematical framework to describe the general class of linear orthog-
onal designs has been proposed in [141]. The nT ×N matrices {C} describing
an orthogonal space-time block code and used to transmit K symbols (thus
achieving a rate of K/N symbols per channel use) can be expressed in the
form
XK
C= (ck Ak + c∗k Bk ) (9.63)
k=1
where Ak and Bk are proper nT × N matrices. That is, all elements of C are
linear combinations of the symbols {ck }K
k=1 being transmitted and/or their
conjugates. As an example, the Alamouti code can be described by using
this framework with nT = N = K = 2 and
1 0 0 0 0 0 0 −1
A1 = , A2 = , B1 = , B2 = .
0 0 1 0 0 1 0 0
Clearly, matrices {C} must satisfy the property that their rows are or-
thogonal, that is CCH is a diagonal matrix with strictly positive elements.
More precisely, the following condition must hold
K
X
CCH = Dk |ck |2 (9.64)
k=1
9.3 – Space-time coding for frequency-flat fading channels 245
having defined
nR
X
r̃k = ri A H H H
k hi + hi Bk ri (9.72)
i=1
nR
X
ξk2 = hi Dk hH
i . (9.73)
i=1
i=1
nR
X nR
X
√
= γck hi Dk hH
i + ni AH H H
k hi + hi Bk ni
i=1 i=1
√
= γck ξk2 + ñk (9.75)
having defined
nR
X
ñk = ni AH H H
k hi + hi Bk ni
i=1
whose variance, given hi and taking into account that the noise samples at
the output of antenna i are uncorrelated and with unit variance, is ξk2. Hence,
the detection strategy (9.71) can be considered as derived from the equivalent
single-input single-output channel model (9.75) and the performance analysis
carried out accordingly as for the Alamouti scheme, easily verifying that these
schemes achieve full diversity. This can be also verified by considering two
distinct codewords
K
X
C= (ck Ak + c∗k Bk )
k=1
XK
Ĉ = (ĉk Ak + ĉ∗k Bk ) (9.76)
k=1
PK
and verifying that the matrix (Ĉ − C)(Ĉ − C)H = k=1 |ĉk − ck |
2
Dk has full
rank provided that Ĉ 6= C.
9.3 – Space-time coding for frequency-flat fading channels 247
It is easy to prove that this code does not achieve full diversity (it achieves
diversity 2nR when nR receive antennas are employed). Although not all its
rows are orthogonal, we can observe that the first and fourth columns are
orthogonal to the second and third ones. Hence, through a proper linear
processing it is possible to decouple the decisions on symbols c1 and c4 from
those on symbols c2 and c3 . The decisions on symbols c1 and c4 and those
on symbols c2 and c3 must be performed jointly, thus increasing the receiver
complexity with respect to that of orthogonal ST codes.
(9.65) and (9.66) no longer hold and optimal decoding becomes prohibitive.
Suboptimal decoding techniques, such as those mentioned in the following
Section 9.3.7, can be adopted. Regarding the code design or, in other words,
the design of matrices Ak and Bk , in [143] a technique is proposed aimed at
maximizing the mutual information between the input and the output of the
channel.
• Rule 1 : Transitions departing from the same state differ in the second
symbol only.
• Rule 2 : Transitions merging at the same state differ in the first symbol
only.
In fact, by following these rules, the error matrix assumes the form (for all
(Ĉ, C))
··· 0 ··· β ···
Ĉ − C =
··· α ··· 0 ···
with α and β nonzero complex numbers. Thus, every such error matrix has
full rank and the ST code achieves full diversity. The maximization of the
minimum determinant of matrices A = (Ĉ − C)(Ĉ − C)H having minimum
rank is a harder task. The code design is therefore performed through a
computer search [111] or through algebraic techniques [130, 144].
9.3 – Space-time coding for frequency-flat fading channels 249
input symbol: 0 1 2 3
00 01 02 03
10 11 12 13
20 21 22 23
30 31 32 33
input symbol: 0 1 2 3
00 01 02 03 22 23 20 21
10 11 12 13 32 33 30 31
branch labels branch labels
20 21 22 23 02 03 00 01
30 31 32 33 12 13 10 11
(a) (b)
Figure 9.9: Trellis diagrams of two ST trellis codes with (a) S = 4 and (b)
i
S = 8 states. QPSK symbol e2π 4 , i = 0, 1, 2, 3, is specified through integer
i.
error probability terms decay very slowly and, as a consequence, the number
of dominant terms is not limited.
A possible solution is represented by the technique described in [106]
for the application to convolutional codes, and applied in [147] to ST trellis
codes. The idea is very simple. Let us assume that we are interested in
(U )
the computation of an upper bound on the bit error probability Pb (the
same considerations hold for the symbol error probability or the frame error
probability). Up to now, the starting point was the computation of the
pairwise error probability given a channel realization H. It was then averaged
over the channel realizations and employed in the computation of an upper
bound on the bit error probability. We can instead apply the technique
described in Chapter 2 to compute an upper bound of the bit error probability
(U )
given the channel realization Pb (H), upper bound it with one if it exceeds
unity, and then perform the average over the channel realizations:
(U )
Pb = EH {min[1, Pb (H)]} .
In other words we are changing the order of the average and the summation
and, when the channel coefficients are so small that the pairwise error prob-
ability terms become close to one producing a bound given H having a value
larger than 1, we trivially upper bound it with 1, and then average over the
channel statistics. The average cannot be now computed in closed form but
Montecarlo averaging must be adopted.
1 1
encoder mapper
demux spatial
information .. .. .. ..
bits . . . .
1:L interleaver
L nT
encoder mapper
(HBLAST and VBLAST) will be also mentioned along with alternative LST
architectures such as multilayered ST codes [149], threaded ST codes [150],
and wrapped ST codes [151].
In a BLAST architecture, multiple independent coded streams are dis-
tributed throughout the transmission resource array in the so-called layers.
Since the complexity of the optimal decoder is impractical, the aim is to de-
sign the layering architecture and the associated signal processing so that the
receiver can efficiently separate the individual layers and decode each of them
effectively. In other words, low-complexity suboptimal decoding schemes
based on individual decoding of the component codes and on mitigation of
the mutual interference among component codewords can be adopted.
The block diagram of DBLAST encoder is shown in Fig. 9.10 [148]. The
information bit stream is demultiplexed into L parallel substreams. Each
substream is independently encoded and the code bits are mapped onto M-
ary symbols belonging to a constellation A. The resulting L codewords are
collected in the row vectors {c(i) }Li=1 , of length N ′ = nT d symbols. These
row vectors are then broken into nT small subblocks of d symbols each. We
(i)
will denote by cj the row vector representing the j-th subblock of codeword
c(i) . These subblocks are cyclically assigned by the spatial interleaver to all
transmit antennas in such a way the codewords share a balanced presence
over all nT antennas and none of the individual substreams is hostage of
the worst of the nT paths. In the case of nT = 4, the transmitted nT × N
codeword matrices, with N = d(L+nT −1), have thus the following structure:
(1) (2) (3) (4) (5)
c1 c1 c1 c1 c1 ... ... ... ...
(1) (2) (3) (4) (5)
0 c2 c2 c2 c2 c2 ... ... . . .
C= (1) (2) (3) (4) (5) (9.77)
0 0 c3 c3 c3 c3 c3 ... . . .
(1) (2) (3) (4) (5)
0 0 0 c4 c4 c4 c4 c4 ...
where the entries below the first diagonal layer are zeros. Symbols belonging
252 Codes for fading channels
where we assumed that cn,ℓ is the symbol we would like to detect. As one
n−1
can observe, the interference of symbols {ck,ℓ }k=1 has been removed. The
remaining symbols belong to lower layers. Hence, samples
k = 1, 2, . . . , d
vn,k+(n−1)d (9.80)
n = 1, 2, . . . , nT
9.3 – Space-time coding for frequency-flat fading channels 253
can be used to decode layer 1 since no interference from other layers is present.
Once this layer has been decoded, the corresponding information bits at the
decoder output are encoded again and can thus be subtracted when decoding
the second layer. The process will continue layer-by-layer by using samples
nT
X
√ k = 1, 2, . . . , d
v̂n,k+(i+n−2)d = vn,k+(i+n−2)d − γ bn,k ĉk,k+(i+n−2)d
n = 1, 2, . . . , nT
k=n+1
(9.81)
where {ĉk,ℓ } are the decisions on code symbols already taken for the previous
layers, to decode layer i. Hence, the decisions needed in (9.81) are provided
by earlier decoded codewords (layers). Samples v̂n,k+(i+n−2)d in (9.81) can be
expressed as
√
v̂n,k+(i+n−2)d = γbn,n cn,k+(i+n−2)d
nT
√ X
+ γ bn,k (ck,k+(i+n−2)d − ĉk,k+(i+n−2)d) + ňn,k+(i+n−2)d
k=n+1
showing that, when previously decoded codewords are all correct, detection
is interference-free.
As mentioned, the ZF suppression strategy requires nR ≥ nT . This re-
quirement can be relaxed, and also a better performance obtained in the same
conditions, by using minimum mean-square error (MMSE) filtering [150, 151].
The linear front end is defined, in this case, by an nT × nR complex matrix
FH which produces the alternative sufficient statistic V = FH R whose ℓ-th
column is
vℓ = FH rℓ . (9.82)
We know that vector rℓ can be expressed as (see (9.23)).
nT
√ √ X
rℓ = γHcℓ + nℓ = γ hk ck,ℓ + nℓ (9.83)
k=1
having defined
nT
X
√
r̂ℓ = rℓ − γ hk ĉk,ℓ
k=n+1
Vectors and r̂ℓ and v̂ℓ may be expressed, under the assumption of correct
decisions, as
n
√ X
r̂ℓ = γ hk ck,ℓ + nℓ (9.85)
k=1
n
√ X
v̂ℓ = γ gk ck,ℓ + ňℓ . (9.86)
k=1
The (n, ℓ)-th element of nT ×N matrix V̂ = (v̂1 , v̂2 , . . . , v̂N ) can be expressed
as
v̂n,ℓ = fnH r̂ℓ (9.87)
where fn is the n-th column of F. Since vn,ℓ is employed as soft statistic
associated with symbol cn,ℓ , column fn is selected as that minimizing the mean
square error E{|fnH r̂ℓ − cn,ℓ |2 }, that under the assumption of correct decisions
(i.e., under the assumption that (9.85) holds) can be easily computed in
closed form as [154, 155] (see also Appendix B)
n
!−1
√ X
fn = γ I nR + γ hk hH
k hn .
k=1
Notice that, this time, the interference of the upper layers is not removed
through filtering, but the joint effect of interference and noise is minimized
according to the MMSE criterion. Decoding is accomplished as for the ZF
strategy by decoding a layer and cancelling it before decoding the next layer.
Other LST architectures have been conceived aiming at improving the
performance or reducing the overall receiver complexity. Horizontal BLAST
(HBLAST) has a structure very similar to DBLAST, the only difference is
the absence of the spatial interleaver. The corresponding encoder is shown in
Fig. 9.11. As one can observe, in this scheme the number of layers is equal to
the number of transmit antennas nT . In other words, each layer is univocally
9.3 – Space-time coding for frequency-flat fading channels 255
1 1
encoder mapper
information demux .. .. .. ..
bits . . . .
1 : nT
nT nT
encoder mapper
It can be noticed that different layers are decoded with different reliability.
In particular, the last detected layer has the highest reliability since, for it,
the contribution of all other layers has been cancelled. A way to overcome
this problem is to sort the received sequences starting detection from that
with the highest power. This corresponds to sort the columns of H according
to their squared norms. In the case of MMSE filtering, detection proceeds
as mentioned in the case of DBLAST—the only difference being represented
by the different allocation of codewords in matrix C. In this case also, the
received sequences can be properly sorted.
Finally, in vertical BLAST (VBLAST) the different layers are not en-
coded. This simplifies the receiver structure but a less reliable cancellation
can be performed. This scheme can be concatenated with an outer channel
encoder, possibly through an interleaver. In this case, iterative detection and
decoding can be performed based on the turbo principle.
C = F (c)
where the formatter F is defined such that the element cn,ℓ of the codeword
matrix C is related to the element ck of the codeword c by
(
ckn,ℓ if 1 ≤ kn,ℓ ≤ N ′
cn,ℓ = (9.88)
0 otherwise
9.3 – Space-time coding for frequency-flat fading channels 257
zero symbols
d
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69
2 6 10 14 18 22 26 30 34 38 42 46 50 54 58 62 66 70
nT
3 7 11 15 19 23 27 31 35 39 43 47 51 55 59 63 67 71
4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72
N
zero symbols
When the component encoder is a trellis code of rate b/nT , the corresponding
wrapped ST code with d = 0 coincides with a standard ST trellis code. For
d > 0, the corresponding wrapped ST code can be seen as the concatenation
of a trellis code with delay-diversity. Because of the lower and upper triangles
of zero symbols in the codeword matrix in Fig. 9.12, there is an inherent rate
loss of (nT − 1)d/N which, however, is negligible if N ≫ nT d. Moreover, if
the transmission of a long sequence of codewords is envisaged, the codeword
matrices can be concatenated in order to fill the leading and tailing triangles
of zeros, so that no rate loss is incurred.
The wrapped ST architecture has been designed such that when the com-
ponent codewords are produced by a trellis encoder, decoding can be imple-
mented efficiently by ZF or MMSE decision-feedback interference mitigation
coupled with Viterbi decoding, through the use of per-survivor processing.
258 Codes for fading channels
that is, to maximize the minimum number of non-zero rows in the matrix
difference Ĉ − C = F (ĉ) − F (c) for each pair of distinct codewords matrices
Ĉ − C, which is strictly related to the rank diversity of a WSTC. The block-
diversity criterion has been investigated in [106, 156, 157] for the design
of trellis codes for cyclic interleaving and/or periodic puncturing and codes
optimized in this sense are thus available. The relationship between the rank
diversity ν of a WSTC and the block diversity of its component code is the
following [151]:
Rc
ν ≤ δ ≤ 1 + nT 1 − . (9.93)
log2 M
where Rc is the rate of the component code and M is the cardinality of
the employed constellation A. Moreover, there exist values of d for which
9.3 – Space-time coding for frequency-flat fading channels 259
ν = δ [151]. Since it is known from [111] that for any STC with spectral
efficiency η = nT Rc the rank diversity satisfies the inequality
Rc
ν ≤ 1 + nT 1 − (9.94)
log2 M
which is the same upper bound on block diversity given in (9.93), we conclude
that the wrapping construction incurs no loss of optimality in terms of rank
diversity (for an appropriate choice of the delay d). As a matter of fact, while
it is difficult to construct codes with rank diversity equal to the upper bound
(9.94), it is very easy to find trellis codes for which the upper bound (9.93) on
δ is met with equality, for several coding rates and values of nT . Examples of
these codes are tabulated in [156, 157]. Therefore, the wrapping construction
is a powerful tool to construct STCs with optimal rank diversity.
Hence, when the diversity gain is nT nR , the multiplexing gain is zero whereas
when r = min(nT , nR ) the diversity gain is zero. For practical schemes, the
diversity/multiplexing trade-off function lies below the curve (9.97) and can
be used to compare different schemes and to interpret their behavior, as
shown in the examples that follow. As an example, for the Alamouti scheme,
the diversity/multiplexing tradeoff function is [112]
and reaches the upper bound (9.97) for r = 0 only. On the other hand, it
can be shown that BLAST schemes favor the multiplexing gain [112].
Unitary ST codes. Before going into details, let us derive the metric for
the case of noncoherent detection assuming the block fading channel model
with coherence time of L symbol intervals.
By collecting L received vectors into an nR × L matrix R, we may write
√
R= γHC + N (9.98)
Given C, the random variables in ri are jointly Gaussian with mean zero and
covariance matrix
√ T T √
Λ = E rTi r∗i = E γC hi + nTi ( γhi C + ni )∗
= IL + γCT C∗
and thus
nR
Y exp −r∗i Λ−1 rTi
f (R|C) =
i=1
π L detΛ
( n )
1 X R
For the block fading channel with coherence time of L symbols, it is shown
in [169] that, asymptotically, capacity is achieved when
C = VΦ
where Φ is an isotropically distributed 13 nT × L matrix whose rows are or-
thonormal (hence Φ∗ ΦT = InT ) and V is an independent nT × nT real
non-negative diagonal matrix. When L ≫ nT , or when L > nT and the
signal-to-noise ratio is very high, capacity can be achieved by selecting
√
C = LΦ . (9.101)
For this reason, unitary ST codes proposed in [169] have a codebook com-
posed by codewords (9.101), with Φ belonging to a set of 2ηL elements, where
η is the spectral efficiency in bits per channel use. Being H unknown at the
receiver, the maximum likelihood decoder will operate according to the fol-
lowing decision rule
Φ̂ = argmax f (R|Φ)
Φ
exp −trace Λ−1 RT R∗
= argmax
Φ (detΛ)nR
where Λ = IL + γLΦT Φ∗ . By using the fact that Φ∗ ΦT = InT , the following
properties
det(I + AB) = det(I + BA)
trace(AB) = trace(BA)
and the matrix inversion lemma
(A − BD−1 C)−1 = A−1 + A−1 B(D − CA−1 B)−1 CA−1
we have
detΛ = det(IL + γLΦT Φ∗ ) = det(InT + γLΦ∗ ΦT ) = (γL + 1)nT
and
h −1 T ∗ i
trace Λ−1 RT R∗ = trace IL + γLΦT Φ∗ R R
γL T ∗ T ∗
= trace IL − Φ Φ R R
1 + γL
T ∗ γL ∗ T ∗ T
= trace R R − trace RΦ Φ R
1 + γL
13
An nT × L isotropically distributed random matrix is a matrix whose probability
density function remains unchanged when it is right-multiplied by any deterministic nT ×
nT unitary matrix. This matrix is the nT × L counterpart of a complex scalar having unit
magnitude and uniformly distributed phase.
9.3 – Space-time coding for frequency-flat fading channels 263
In this case, the spectral efficiency is η = L1 NN−1 log2 |S| bits per channel use.
As far as decoding is concerned, in the case of the block fading channel
with coherence time 2L, optimal decoding will be accomplished based on the
observation of 2L symbol intervals. In the case of a channel that changes
continuously, by collecting blocks of L received vectors into nR × L matrices
{Rℓ }, we may write
√
Rℓ = γHℓ Cℓ + Nℓ (9.104)
Hℓ being the nR ×nT channel matrix corresponding to the ℓth block, whereas
the nR × L matrix Nℓ collects the noise samples during the L symbol inter-
vals of the ℓth block. In this case, optimal decoding must be accomplished
based on the observation of the whole sequence {Rℓ }. However, in order
to reduce the receiver complexity, as in the case of differential decoding for
single-antenna systems, decoding of Sℓ is accomplished by looking at pairs
of overlapping blocks of L symbol intervals at a time, i.e., Rℓ and Rℓ−1 . Let
us define the nR × 2L matrix
′
Rℓ = [Rℓ−1 , Rℓ ] .
′ ′
Assuming Hℓ = Hℓ−1 and defining Cℓ = [Cℓ−1 , Cℓ ] and Nℓ = [Nℓ−1 , Nℓ ], we
may write
′ √ ′ ′
Rℓ = γHℓ C ℓ + Nℓ .
′ ′
Since matrices Cℓ are such that Cℓ CH H
ℓ = InT , we also have Cℓ Cℓ = 2InT .
Hence, when accomplishing detection based on a pair of blocks of L symbol
intervals we may adopt the detection strategy (9.102) that now becomes
n h ′ ′ ′ ′
io
Ŝℓ = argmax trace Rℓ CℓH C ℓ RℓH . (9.105)
Sℓ
are described in [176]. The optimal decoder will operate, in this case, on
the equivalent trellis that takes into account both the code and the channel
trellis. As far as concatenated schemes are concerned, it must be taken into
account that the channel introduces memory. Hence, it can be employed in
place of an inner encoder and concatenated with an outer encoder through
an interleaver.
As in the case of single-input single-output channels, optimal detection
has an exponential complexity in the channel memory L. Therefore, taking
into account that the complexity is exponential in the number of transmit an-
tennas also, optimal detection is practically unfeasible. Reduced-complexity
detection schemes are thus required, such as linear or decision-feedback equal-
ization schemes [177], reduced-state sequence detection [178, 179, 180, 181],
or other schemes based on factor graphs (e.g., see [182, 183]).
IDFT DFT
MIMO
IDFT DFT
encoder demod/
decod.
IDFT DFT
Signal spaces
where y (t) is the complex conjugate of y(t). When the two signals are such
∗
that (x, y) = 0, they are called orthogonal. The energy of x(t) is thus the
inner product of x(t) with itself:
ˆ b
Ex = (x, x) = |x(t)|2 dt .
a
We also define the norm of a signal x(t) as the square root of its energy:
s
p ˆ b
1/2
kxk = Ex = (x, x) = |x(t)|2 dt .
a
271
272 Signal spaces
We finally define the distance between two signals as the norm of their dif-
ference
s
p
ˆ b
1/2
kx − yk = Ex−y = (x − y, x − y) = |x(t) − y(t)|2 dt .
a
If kx−yk = 0, the signals x(t) and y(t) coincide almost everywhere. Generally
speaking, the distance kx−yk can be represented as a measure of the diversity
degree between the signals x(t) and y(t).
M
Let us consider the set of M signals si (t) i=1 . They are said to be
linearly independent when the condition
M
X
ci si (t) = 0 ∀t ∈ (a, b)
i=1
can be satisfied if and only if (iff) all coefficients ci are zero. If, instead, there
M P
exists a sequence ci i=1 with some ci 6= 0 such that M i=1 ci si (t) = 0, then
the M signals are called linearly dependent, i.e., at least one signal can be
expressed as a linear combination of the others.
1. ϕi (t) ∈ S
1 i=j
2. ϕi , ϕj =
0 i 6= j .
3. every element of space S can be expressed as a linear combination of
N
functions ϕi (t) i=1 .
M
N = M iff signals si (t) i=1 are linearly independent) and N is called di-
M
mension of subspace S. Given the subspace generated by signals si (t) i=1 ,
an orthonormal basis can be found by means of the so-called Gram-Schmidt
orthogonalization procedure [185, 7].
N
Once we have an orthonormal basis ϕi (t) i=1 of the subspace S, we can
express every signal s(t) ∈ S as
N
X
s(t) = si ϕi (t) . (A.1)
i=1
s = (s1 , s2 , . . . , sN )T (A.2)
Thus, the inner product of two signals is equal to the scalar product of the
corresponding images. If we now consider the energy of a generic signal, it
can be expressed as
Es = (s, s) = sT s∗ = ksk2
where ksk is the Euclidean norm of vector s. In addition, the norm of a signal
can be interpreted as the distance from the origin of the vector representing
the signal p
ks(t)k = Es = ksk .
Thus, the distance of two signals can be interpreted as the distance of the
corresponding images:
v
u N
uX
k x − yk = k x − yk = t |xi − yi |2 .
i=1
Since the relationship between the signals in S and their images is linear,
it results that the image of x(t) + y(t) is x + y and the image of λx(t) is λx.
In general, the image of the linear combination of many signals is the linear
combination with the same coefficients of the corresponding images.
N
where ϕi (t) i=1 is any orthonormal basis of subspace S. This signal x̂(t) ∈
S also satisfies the important property that the error signal x(t) − x̂(t) is
orthogonal to any signal of S, i.e., [7]
x − x̂, z = 0 ∀z(t) ∈ S .
holds for any x(t) ∈ L2 (a, b). However, if this occurs, the basis is said to be
a complete orthonormal basis. Note that, if the basis is complete, (A.4) can
also be expressed as:
+∞
X
x(t) = (x, ϕi ) ϕi (t) (A.5)
i=1
which needs to be interpreted carefully. This result states only that the series
appearing on the right hand side of (A.5) converges to x(t) in quadratic mean,
as stated by (A.4). However, this convergence does not entail the pointwise
convergence (or the uniform convergence) of such a series to x(t) in any
instant of interval (a, b).
is complete for all signals with support in − T2 , T2 that can be represented
through a Fourier series, that is those signals with a finite numbers of dis-
continuities and a finite number of minima and maxima. The coefficients of
this representation are the Fourier series coefficients.
Notice that the sup-
T T
port of these signals is not required to be − 2 , 2 , but can be any interval
of length T . ♦
Similarly, it holds
∞
X
2
(x, x) = kxk = Ex = |xi |2 (A.7)
i=1
and v
u∞
uX
kx − yk = t |xi − yi |2 . (A.8)
i=1
ni = (n, ϕi )
are clearly random variables, since they take a different value for each real-
ization of n(t).
We will consider convergence in quadratic mean:
( )
XN 2
lim E n(t) − (n, ϕi )ϕi (t) = 0 a≤t≤b. (A.9)
N →∞
i=1
It means that there could exist some realizations for which the quantity in
curly brackets in (A.9) is not zero, but they have zero probability and do not
affect the average.
If we know the mean value and the autocovariance of the random process
n(t), we can compute the mean and the covariances of components {ni }. Let
us consider the mean and the autocovariance of the process whose expressions
are
η(t) = E{n(t)}
C(t1 , t2 ) = E{n(t1 )n∗ (t2 )} − E{n(t1 )}E{n∗ (t2 )}
= R(t1 , t2 ) − η(t1 )η ∗ (t2 )
and ˆ bˆ b
E{ni }E{n∗j } = η(t1 )ϕ∗i (t1 )η ∗ (t2 )ϕj (t2 ) dt1 dt2 .
a a
Thus
ˆ bˆ bh i
cov{ni , nj } = R(t1 , t2 ) − η(t1 )η ∗ (t2 ) ϕ∗i (t1 )ϕj (t2 ) dt1 dt2
a a | {z }
C(t1 ,t2 )
ˆ bˆ b
= C(t1 , t2 )ϕ∗i (t1 )ϕj (t2 ) dt1 dt2 .
a a
C(t1 , t2 ) = qδ(t1 − t2 )
We can observe that the white noise cannot be represented over a basis since
its realizations do not have finite energy. However, if we project it on an
orthonormal basis we obtain uncorrelated components that, if the process is
also Gaussian, are also independent.
More in general, if a process is not white, we can ask if there exists a
complete orthonormal basis such that all components of n(t) result to be un-
correlated. It can be proved that the functions {ϕi (t)} and the corresponding
A.4 – Dicrete representation of a random process 279
constants {λi } that produce this condition are given by the Karhunen-Loève
theorem [7]. This theorem states that the signals {ϕi (t)} and the associ-
ated constants {λi } are the solutions of the homogeneous Fredholm integral
equation:
ˆ b
C(t1 , t2 )ϕi (t2 ) dt2 = λi ϕi (t1 ) a ≤ t1 ≤ b .
a
C(t1 , t2 ) = C ∗ (t2 , t1 )
P4. If the kernel is positive definite, its eigenfunctions form a complete or-
thonormal set;
As far as the last point is concerned, it is worth noting that a random process
with a nondegenerate kernel will have an infinite number of eigenvalues and
will theoretically require the infinite expansion of (A.10). However, in many
cases of practical interest, the spectrum of eigenvalues will remain significant
for a finite number of eigenvalues, before decaying away to zero. Therefore,
280 Signal spaces
C = E{nnT } − η η T
σn2 1 cov {n1 , n2 } . . . cov {n1 , nN }
cov {n2 , n1 } σn2 1 . . . cov {n2 , nN }
= .. .. .. ..
. . . .
2
cov {nN , n1 } cov {nN , n2 } ... σnN
We thus have
ˆ b ˆ +∞
xi = x(t)ϕ∗i (t) dt = x(t)ϕ∗i (t) dt .
a −∞
By defining
hi (t) = ϕ∗i (b − t)
and
ˆ +∞
yi (t) = x(t) ⊗ hi (t) = x(τ )hi (t − τ ) dτ
−∞
ˆ +∞
= x(τ )ϕ∗i (b − t + τ ) dτ
−∞
we thus have
ˆ +∞ ˆ b
yi (b) = x(τ )hi (b − τ ) dτ = x(τ )ϕ∗i (τ ) dτ = xi .
−∞ a
t=b
x(t) Rb xi x(t) yi (t) xi
a
dt hi (t)
ϕ∗i (t)
(a) (b)
B.1
283
Appendix C
285
286 Elements of information theory
X
H(Y |X) = − P (x) log2 H(Y |X = x)
x∈X
X X
=− P (x) P (y|x) log2 P (y|x)
x∈X y∈Y
XX
=− P (x, y) log2 P (y|x)
x∈X y∈Y
In addition, it is
X
P (x) P (X)
D(P ||Q) = P (x) log2 = E log2 (C.6)
x∈X
Q(x) Q(X)
The relative entropy D(P ||Q) is a measure of the inefficiency of assuming that
the distribution is Q when the true distribution is P . The relative entropy
is always nonnegative and is zero if and only if P = Q [51]. However, it is
not a true distance between distributions since it is not symmetric and does
not satisfy the triangle inequality. Nonetheless, it is often useful to think of
relative entropy as a “distance” between distributions.
We now introduce mutual information, which is a measure of the amount
of information that one random variable contains about another random
variable. In other words, it is the reduction in the uncertainty of one random
C.1 – Definitions for discrete random variables 287
Mutual information is nonnegative, being zero only when X and Y are inde-
pendent.
Mutual information also satisfies a chain rule:
n
X
I(X1 , X2 , . . . , Xn ; Y ) = I(Xi ; Y |Xi−1 , Xi−2 , . . . , X1 ) .
i=1
We now prove another property which is often very useful. First of all,
we prove that
I(X; Y, Z) = I(X; Y |Z) + I(X; Z) . (C.8)
In fact
P (Y, Z|X) P (Y |X, Z)P (Z|X)
I(X; Y, Z) = E log2 = E log2
P (Y, Z) P (Y |Z)P (Z)
P (Y |X, Z) P (Z|X)
= E log2 + E log2
P (Y |Z) P (Z)
= I(X; Y |Z) + I(X; Z) .
Similarly,
I(X; Y, Z) = I(X; Z|X) + I(X; Y ) . (C.9)
Now, consider a random variable Y which is a stochastic transformation
of two independent random variables X and Z. In this case, it is
• It does not have memory, that is, it acts on each of its input data
independently of all other data.
The capacity of this DMC is defined as the maximum of the mutual infor-
mation I(X; Y ) over the set of input pmfs P (X):
The pmf P (X) that provides the maximum value of the mutual information
is called capacity-achieving distribution.
each input in an independent way, i.e., that the channel is memoryless. Al-
though this is not always true (see Chapter 1 for details), we will remove this
assumption later. The memoryless assumption means that
N
Y
P (y|x) = P (yi |xi ) .
i=1
with equality if and only if variables Yi are independent. This latter condition
holds if and only if inputs are independent, given the memoryless channel
model. In addition, given the definition (C.11), it is
N
X
I(X; Y) ≤ I(Yi |Xi ) ≤ NC
i=1
This is the so called data processing inequality that, in essence, states that
the average mutual information cannot be increased by further processing,
290 Elements of information theory
X Y
Y =X +Z
where Z is a Gaussian random variable with mean zero and variance σ 2 , and
is independent of X. This channel is shown in Fig. C.1. To make the problem
well posed, we introduce a power constraint on the input of the channel, i.e.,
we will assume that E{X 2 } ≤ PX . For this channel, the capacity-achieving
distribution is
1 − x
2
f (x) = √ e 2PX
2πPX
Y i = Xi + Z i .
where A is such that the power constraint is satisfied. In other words, for
those components of the input vector that are allocated any power, the sum
of this power together with the noise variance must be constant according to
the so called water-pouring (or water-filling) technique, exemplified in Fig.
C.2.
The capacity resulting from this optimal allocation is
N
X
1 E{Xi2 }
C= log2 1+ (bits/vector channel use) . (C.14)
i=1
2 σi2
1111
0000
0000
1111 111
000
000
111
0000
1111 000
111
0000
1111 000
111
0000
1111 000
111
0000
1111
0000
1111 000
111
000
111
0000
1111 000
111
0000
1111
0000
1111 000
111
000
111
0000
1111 000
111
0000
1111 000
111
0000
1111 000
111
0000
1111
0000
1111 000
111
000
111
0000
1111 A 2 000
111
0000
1111 σN 000
111
0000
1111
0000
1111 σ22 000
111
000
111
0000
1111 σ32 000
111
0000
1111 000
111
0000
1111 2
σN 000
111
0000
1111
0000
1111 σ12 −1 000
111
000
111
0000
1111 000
111
0000
1111 000
111
Figure C.2: Illustration of the water-filling technique.
where X(t) is the channel input and Z(t) a bandlimited Gaussian noise pro-
cess with power spectral density N0 /2. We will impose a power constraint
on the input process, i.e., E{X 2 (t)} ≤ PX . A vector model for this channel
may be obtained by uniformly sampling Y (t) at a rate equal to 2B. We can
thus work on the equivalent vector channel
Y = X+Z (C.16)
CT
C = lim
T →∞ T
when this limit exists. Similarly, if we have two discrete-time processes {Xi }
and {Yi }, the information rate (IR) is defined as
1
i(X; Y) = lim I(X1 , X2 , . . . , Xn ; Y1, Y2 , . . . , Yn ) (bits/channel use)
N →∞ N
295
Appendix E
This appendix briefly describes the Z-transform and some of its properties
that will be used in the derivation of the whitening filter described in Section
2.
The bilateral or two-sided Z-transform of a discrete-time (real or complex)
signal xn is the power series X(z) defined as
∞
X
X(z) = xn z −n .
n=−∞
When the ROC includes the unit circle, the Z-transfor computed on it
gives the Fourier transform of the discrete time sequence, i.e.,
∞
X
2πf T
X(e )= xn e−2πnf T .
n=−∞
1
˛
xn = X(z)z n−1 dz
2πj
C
297
298 Bilateral Z-transform and some of its properties
xn = an u[n]
By definition
∞
X ∞
X
n −n 1 z
X(z) = a z = (az −1 )n = −1
=
n=0 n=0
1 − az z−a
provided that |az −1 | < 1. The region of convergence is thus |z| > |a|. It
contains the unit circle provided that |a| < 1 and thus provided that the
sequence is stable. This Z-transform has a zero in the origin of the complex
plane and a pole in z = a. They are shown, along with the ROC, in Fig.
E.1. ♦
299
ℑ[z]
1111111111
0000000000
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
01 a 01 1 ℜ[z]
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
Figure E.1: ROC, pole and zero for the Z-transform of xn = an u[n] in the
case |a| < 1.
xn = −bn u[−n − 1] .
xn = an u[n] + bn u[n] .
300 Bilateral Z-transform and some of its properties
ℑ[z]
111111111
000000000
000000000
111111111
000000000
111111111
000000000
111111111
000000000
111111111
000000000
111111111
000000000
111111111
000000000
111111111
1 b
000000000 ℜ[z]
111111111
11
00
00
11
000000000
111111111
000000000
111111111
000000000
111111111
000000000
111111111
000000000
111111111
000000000
111111111
000000000
111111111
000000000
111111111
Figure E.2: ROC, pole and zero for the Z-transform of xn = an u[n] in the
case |a| < 1.
When the ROC is |a| < |z| < |b|, X(z) corresponds to the sequence
xn = an u[n] − bn u[−n − 1] .
Finally, when the ROC is |z| < |a|, this X(z) corresponds to the anticausal
sequence
xn = −an u[−n − 1] − bn u[−n − 1] .
♦
From these examples we can conclude that the stability of a system can
also be determined by knowing the ROC alone. If the ROC contains the unit
circle then the system is stable. On the other hand, from the knowledge of
the ROC, we can understand if the sequence is causal or not.
Bibliography
[6] G. D. Forney, Jr., “The Viterbi algorithm,” Proc. IEEE, vol. 61, pp.
268–278, Mar. 1973.
301
302 BIBLIOGRAPHY
[45] G. Colavolpe and A. Barbieri, “On MAP symbol detection for ISI chan-
nels using the Ungerboeck observation model,” IEEE Commun. Letters,
vol. 9, no. 8, pp. 720–722, Aug. 2005.
[52] D. Arnold and H.-A. Loeliger, “On the information rate of binary-input
channels with memory,” in Proc. IEEE Intern. Conf. Commun., vol. 9,
June 2001, pp. 2692–2695.
[53] V. Sharma and S. K. Singh, “Entropy and channel capacity in the regen-
erative setup with application to Markov channels,” in Proc. IEEE In-
ternational Symposium on Information Theory, Washington, DC, Jun.
2001, p. 283.
[75] G. Picchi and G. Prati, “Blind equalization and carrier recovery using
a ‘stop-and-go’ decision directed algorithm,” IEEE Trans. Commun.,
vol. 35, pp. 877–887, Sep. 1987.
[83] F. Rusek and A. Prlja, “Optimal channel shortening for MIMO and ISI
channels,” IEEE Trans. Wireless Commun., vol. 11, no. 2, pp. 810–818,
Feb. 2012.
[86] S. Hu and F. Rusek, “On the design of reduced state demodulators with
interference cancellation for iterative receivers,” in Proc. 24th IEEE In-
tern. Symp. on Personal, Indoor, and Mobile Radio Comm. (PIMRC),
Hong Kong, China, Aug. 2015, pp. 981–985.
[145] R. S. Blum, “Some analytical tools for the design of space-time convo-
lutional codes,” IEEE Trans. Commun., vol. 50, no. 10, pp. 1593–1599,
Oct. 2002.
314 BIBLIOGRAPHY
[152] D.-S. Shiu and J. Kahn, “Layered space-time codes for wireless com-
munications using multiple transmit antennas,” in IEEE Intern. Conf.
on Commun., ICC ’99, Vacouver, Canada, June 1999.
[156] R. Knopp and P. Humblet, “On coding for block fading channels,” IEEE
Trans. Inform. Theory, vol. 46, no. 1, pp. 189–205, January 2000.
[157] R. Wesel, X. Liu, and W. Shi, “Trellis codes for periodic erasures,”
IEEE Trans. Commun., vol. 48, no. 6, pp. 938–947, June 2000.
BIBLIOGRAPHY 315
[160] X. Lin and R. Blum, “Improved space-time codes using serial concate-
nation,” IEEE Commun. Letters, vol. 4, no. 7, pp. 221–223, Jul. 2000.
[182] G. Colavolpe and G. Germi, “On the application of factor graphs and
the sum-product algorithm to ISI channels,” IEEE Trans. Commun.,
vol. 53, no. 5, pp. 818–825, May 2005.