## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

**Soft-Output Decoding Algorithms in Iterative
**

Decoding of Turbo Codes

S. Benedetto,

a

D. Divsalar,

b

G. Montorsi,

a

and F. Pollara

b

In this article, we present two versions of a simpliﬁed maximum a posteriori

decoding algorithm. The algorithms work in a sliding window form, like the Viterbi

algorithm, and can thus be used to decode continuously transmitted sequences

obtained by parallel concatenated codes, without requiring code trellis termination.

A heuristic explanation is also given of how to embed the maximum a posteriori

algorithms into the iterative decoding of parallel concatenated codes (turbo codes).

The performances of the two algorithms are compared on the basis of a powerful

rate 1/3 parallel concatenated code. Basic circuits to implement the simpliﬁed a

posteriori decoding algorithm using lookup tables, and two further approximations

(linear and threshold), with a very small penalty, to eliminate the need for lookup

tables are proposed.

I. Introduction and Motivations

The broad framework of this analysis encompasses digital transmission systems where the received

signal is a sequence of wave forms whose correlation extends well beyond T, the signaling period. There

can be many reasons for this correlation, such as coding, intersymbol interference, or correlated fading. It

is well known [1] that the optimum receiver in such situations cannot perform its decisions on a symbol-

by-symbol basis, so that deciding on a particular information symbol u

k

involves processing a portion of

the received signal T

d

seconds long, with T

d

> T. The decision rule can be either optimum with respect

to a sequence of symbols, u

n

k

= (u

k

, u

k+1

, · · · , u

k+n−1

), or with respect to the individual symbol, u

k

.

The most widely applied algorithm for the ﬁrst kind of decision rule is the Viterbi algorithm. In its

optimum formulation, it would require waiting for decisions until the whole sequence has been received.

In practical implementations, this drawback is overcome by anticipating decisions (single or in batches)

on a regular basis with a ﬁxed delay, D. A choice of D ﬁve to six times the memory of the received data

is widely recognized as a good compromise between performance, complexity, and decision delay.

Optimum symbol decision algorithms must base their decisions on the maximum a posteriori (MAP)

probability. They have been known since the early seventies [2,3], although much less popular than the

Viterbi algorithm and almost never applied in practical systems. There is a very good reason for this

neglect in that they yield performance in terms of symbol error probability only slightly superior to

the Viterbi algorithm, yet they present a much higher complexity. Only recently, the interest in these

a

Politecnico di Torino, Torino, Italy.

b

Communications Systems and Research Section.

63

algorithms has seen a revival in connection with the problem of decoding concatenated coding schemes.

Concatenated coding schemes (a class in which we include product codes, multilevel codes, generalized

concatenated codes, and serial and parallel concatenated codes) were ﬁrst proposed by Forney [4] as a

means of achieving large coding gains by combining two or more relatively simple “constituent” codes.

The resulting concatenated coding scheme is a powerful code endowed with a structure that permits an

easy decoding, like “stage decoding” [5] or “iterated stage decoding” [6].

To work properly, all these decoding algorithms cannot limit themselves to passing the symbols decoded

by the inner decoder to the outer decoder. They need to exchange some kind of soft information. Actually,

as proved by Forney [4], the optimum output of the inner decoder should be in the form of the sequence

of the probability distributions over the inner code alphabet conditioned on the received signal, the a

posteriori probability (APP) distribution. There have been several attempts to achieve, or at least to

approach, this goal. Some of them are based on modiﬁcations of the Viterbi algorithm so as to obtain, at

the decoder output, in addition to the “hard”-decoded symbols, some reliability information. This has led

to the concept of “augmented-output,” or the list-decoding Viterbi algorithm [7], and to the soft-output

Viterbi algorithm (SOVA) [8]. These solutions are clearly suboptimal, as they are unable to supply the

required APP. A diﬀerent approach consisted in revisiting the original symbol MAP decoding algorithms

[2,3] with the aim of simplifying them to a form suitable for implementation [9–12].

In this article, we are interested in soft-decoding algorithms as the main building block of iterative stage

decoding of parallel concatenated codes. This has become a “hot” topic for research after the successful

proposal of the so-called turbo codes [6]. They are (see Fig. 1) parallel concatenated convolutional codes

(PCCC) whose encoder is formed by two (or more) constituent systematic encoders joined through an

interleaver. The input information bits feed the ﬁrst encoder and, after having been interleaved by the

interleaver, enter the second encoder. The codeword of the parallel concatenated code consists of the

input bits to the ﬁrst encoder followed by the parity check bits of both encoders. Generalizations to more

than one interleaver are possible and fruitful [13].

INTERLEAVER

LENGTH = N

y

2

y

1

x

x

RATE 1/3 PCCC

x

REDUNDANCY

BIT

REDUNDANCY

BIT

RATE 1/2 SYSTEMATIC

CONVOLUTIONAL

ENCODERS

Fig. 1. Parallel concatenated convolutional code.

The suboptimal iterative decoder is modular and consists of a number of equal component blocks

formed by concatenating soft decoders of the constituent codes (CC) separated by the interleavers used

at the encoder side. By increasing the number of decoding modules and, thus, the number of decoding

iterations, bit-error probabilities as low as 10

−5

at E

b

/N

0

= 0.0 dB for rate 1/4 PCCC have been shown

by simulation [13]. A version of turbo codes employing two eight-state convolutional codes as constituent

codes, an interleaver of 32 × 32 bits, and an iterative decoder performing two and one-half iterations

with a complexity of the order of ﬁve times the maximum-likelihood (ML) Viterbi decoding of each

constituent code is presently available on a chip yielding a measured bit-error probability of 0.9 × 10

−6

at E

b

/N

0

= 3 dB [14].

64

In recent articles [15,17], upper bounds to the ML bit-error probability of PCCCs have been proposed.

As a by-product, it has been shown by simulation that iterative decoding can approach quite closely the

ML performance. The iterative decoding algorithm was a simpliﬁcation of the algorithm proposed in [3],

whose regular steps and limited complexity seem quite suitable to very large-scale integration (VLSI)

implementation. Simpliﬁed versions of the algorithm [3] have been proposed and analyzed in [12] in the

context of a block decoding strategy that requires trellis termination after each block of bits. Similar

simpliﬁcation also was used in [16] for hardware implememtation of the MAP algorithm.

In this article, we will describe two versions of a simpliﬁed MAP decoding algorithm that can be used

as building blocks of the iterative decoder to decode PCCCs. A distinctive feature of the algorithms is

that they work in a “sliding window” form, like the Viterbi algorithm, and thus can be used to decode

“continuously transmitted” PCCCs, without requiring trellis termination and a block-equivalent structure

of the code. The simplest among the two algorithms will be compared with the optimum block-decoding

algorithm proposed in [3]. The comparison will be given in terms of bit-error probability when the

algorithms are embedded into iterative decoding schemes for PCCCs. We will choose, for comparison,

a very powerful PCCC scheme suitable for deep-space applications [18–20] and, thus, working at a very

low signal-to-noise ratio.

II. System Context and Notations

As previously outlined, our ﬁnal aim is to ﬁnd suitable soft-output decoding algorithms for iterated

staged decoding of parallel concatenated codes employed in a continuous transmission. The core of such

algorithms is a procedure to derive the sequence of probability distributions over the information symbols’

alphabet based on the received signal and constrained on the code structure. Thus, we will start by this

procedure and only later will we extend the description to the more general setting.

Readers acquainted with the literature on soft-output decoding algorithms know that one burden in

understanding and comparing the diﬀerent algorithms is the spread and, sometimes, mess of notations

involved. For this reason, we will carefully deﬁne the system and notations and then stick consistently to

them for the description of all algorithms.

For the ﬁrst part of the article, we will refer to the system of Fig. 2. The information sequence u,

composed of symbols drawn from an alphabet U = {u

1

, · · · , u

I

} and emitted by the source, enter an

encoder that generates code sequences c. Both source and code sequences are deﬁned over a time index

set K (a ﬁnite or inﬁnite set of integers). Denoting the code alphabet C = {c

1

, · · · , c

M

}, the code C can

be written as a subset of the Cartesian product of C by itself K times, i.e.,

C ⊆ C

K

The code symbols c

k

(the index k will always refer to time throughout the article) enter the modulator,

which performs a one-to-one mapping of them with its signals, or channel input symbols x

k

, belonging

to the set X = {x

1

, · · · , x

M

}.

1

The channel symbols x

k

are transmitted over a stationary memoryless channel with output symbols y

k

.

The channel is characterized by the transitions probability distribution (discrete or continuous, according

to the channel model) P(y|x). The channel output sequence is fed to the symbol-by-symbol soft-output

demodulator, which produces a sequence of probability distributions γ

k

(c) over C conditioned on the

received signal, according to the memoryless transformation

1

For simplicity of notation, we have assumed that the cardinality of the modulator equals that of the code alphabet. In

general, each coded symbol can be mapped in more than one channel symbol, as in the case of multilevel codes or trellis

codes with parallel transitions. The extension is straightforward.

65

SOFT

DECODER

SOFT

DEMODULATOR

y

Y

P(x

k

|y

k

) P(x

k

|y)

MEMORYLESS

CHANNEL

MODULATOR ENCODER SOURCE

U

u c

C

x

X

y

Y

Fig. 2. The transmission system.

γ

k

(c)

= P(x

k

= x(c), y

k

) = P(y

k

|x

k

= x(c))P

k

(c)

= γ

k

(x) (1)

where we have assumed to know the sequence of the a priori probability distributions of the channel input

symbols (P

k

(x) : k ∈ K) and made use of the one-to-one mapping C → X.

The sequence of probability distributions γ

k

(c) obtained by the modulator on a symbol-by-symbol

basis is then supplied to the soft-output symbol decoder, which processes the distributions in order to

obtain the probability distributions P

k

(u|y). They are deﬁned as

P

k

(u|y)

= P(u

k

= u|y) (2)

The probability distributions P

k

(u|y) are referred to in the literature as symbol-by-symbol a posteriori

probabilities (APP) and represent the optimum symbol-by-symbol soft output.

From here on, we will limit ourselves to the case of time-invariant convolutional codes with N states,

use the following notations with reference to Fig. 3, and assume that the (integer) time instant we are

interested in is the kth:

(1) S

i

is the generic state at time k, belonging to the set S = {S

1

, · · · , S

N

}

(2) S

−

i

(u

**) is one of the precursors of S
**

i

, and precisely the one deﬁned by the information

symbol u

**emitted during the transition S
**

−

i

(u

) → S

i

.

2

(3) S

+

i

(u) is one of the successors of S

i

, and precisely the one deﬁned by the information

symbol u emitted during the transition S

i

→ S

+

i

(u).

(4) To each transition in the trellis, a signal x is associated, which depends on the state

from which the transition originates and on the information symbol u determining that

transition. When necessary, we will make this dependence explicit by writing x(u

, S

i

)

when the transition ends in S

i

and x(S

i

, u) when the transition originates from S

i

.

III. The BCJR Algorithm

In this section, we will restate in our new notations, without derivation, the algorithm described

in [3], which is the optimum algorithm to produce the sequence of APP. We will call this algorithm the

2

The state S

i

and the symbol u

**uniquely specify the precursor S
**

−

i

(u

**) in the case of the class of recursive convolutional
**

encoders, like the ones we are interested in (when the largest degree of feedback polynomial represents the memory

of a convolutional encoder). The extension to the case of feed-forward encoders and other nonconventional recursive

convolutional encoders is straightforward.

66

S

i

S

N

• • •

•

•

•

S

i

+

(u)

u

u

k k + 1 k – 1

S

N

S

N

S

1

S

1

S

1

S

i

–

(u )

• • •

x (u ,S

i

)

c (u ,S

i

)

x (S

i

,u

)

c (S

i

,u

)

Fig. 3. The meaning of notations.

BCJR algorithm from the authors’ initials.

3

We consider ﬁrst the original version of the algorithm, which

applies to the case of a ﬁnite index set K = {1, · · · , n} and requires the knowledge of the whole received

sequence y = (y

1

, · · · , y

n

) to work. In the following, the notations u, c, x, and y will refer to sequences

n-symbols long, and the integer time variable k will assume the values 1, · · · , n. As for the previous

assumption, the encoder admits a trellis representation with N states, so that the code sequences c (and

the corresponding transmitted signal sequences x) can be represented as paths in the trellis and uniquely

associated with a state sequence s = (s

0

, · · · , s

n

) whose ﬁrst and last states, s

0

and s

n

, are assumed to

be known by the decoder.

4

Deﬁning the a posteriori transition probabilities from state S

i

at time k as

σ

k

(S

i

, u)

= P(u

k

= u, s

k−1

= S

i

|y) (3)

the APP P(u|y) we want to compute can be obtained as

P

k

(u|y) =

Si

σ

k

(S

i

, u) (4)

Thus, the problem of evaluating the APP is equivalent to that of obtaining the a posteriori transition

probabilities deﬁned in Eq. (3). In [3], it was proven that the APP can be computed as

σ

k

(S

i

, u) = h

σ

α

k−1

(S

i

)γ

k

(x(S

i

, u))β

k

(S

+

i

(u)) (5)

where

3

The algorithm is usually referred to in the recent literature as the “Bahl algorithm”; we prefer to credit all the authors:

L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv.

4

Lower-case s

k

denotes the states of a sequence at time k, whereas upper-case S

i

represents one particular state belonging

to the set S.

67

• h

σ

is such that

Si,u

σ

k

(S

i

, u) = 1

• γ

k

(x(S

i

, u)) are the joint probabilities already deﬁned in Eq. (1), i.e.,

γ

k

(x)

= P(y

k

, x

k

= x) = P(y

k

|x

k

= x) · P(x

k

= x) (6)

The γ’s can be calculated from the knowledge of the a priori probabilities of the channel

input symbols x and of the transition probabilities of the channel P(y

k

|x

k

= x). For

each time k, there are M diﬀerent values of γ to be computed, which are then associated

to the trellis transitions to form a sort of branch metrics. This information is supplied

by the symbol-by-symbol soft-output demodulator.

• α

k

(S

i

) are the probabilities of the states of the trellis at time k conditioned on the past

received signals, namely,

α

k

(S

i

)

= P(s

k

= S

i

|y

k

1

) (7)

where y

k

1

denotes the sequence y

1

, y

2

, · · · , y

k

. They can be obtained by the forward

recursion

5

α

k

(S

i

) = h

α

u

α

k−1

(S

−

i

(u))γ

k

(x(u, S

i

)) (8)

with h

α

a constant determined through the constraint

Si

α

k

(S

i

) = 1

and where the recursion is initialized as

α

0

(S

i

) =

_

1 if S

i

= s

0

0 otherwise

(9)

• β

k

(S

i

) are the probabilities of the trellis states at time k conditioned on the future

received signals P(s

k

= S

i

|y

n

k+1

). They can be obtained by the backward recursion

β

k

(S

i

) = h

β

u

β

k+1

(S

+

i

(u))γ

k+1

(x(S

i

, u)) (10)

5

For feed-forward encoders and nonconventional recursive convolutional encoders like G(D) = [1, (1 + D + D

2

)/(1 + D)]

in Eq. (8), the summation should be over all possible precursors S

−

i

(u) that lead to the state S

i

, and x(u, S

i

) should be

replaced by x(S

−

i

(u), u). Then such modiﬁcations are also required for Eqs. (18) and (26). In Eqs. (22), (29), and (32),

the maximum should be over all S

−

i

(u) that lead to S

i

. The c(u, S

i

) should be replaced by c(S

−

i

(u), u).

68

with h

β

a constant obtainable through the constraint

Si

β

k

(S

i

) = 1

and where the recursion is initialized as

β

n

(S

i

) =

_

1 if S

i

= s

n

0 otherwise

(11)

We can now formulate the BCJR algorithm by the following steps:

(1) Initialize α

0

and β

n

according to Eqs. (9) and (11).

(2) As soon as each term y

k

of the sequence y is received, the demodulator supplies to the

decoder the “branch metrics” γ

k

of Eq. (6), and the decoder computes the probabilities

α

k

according to Eq. (8). The obtained values of α

k

(S

i

) as well as the γ

k

are stored for

all k, S

i

, and x.

(3) When the entire sequence y has been received, the decoder recursively computes the

probabilities β

k

according to the recursion of Eq. (10) and uses them together with the

stored α’s and γ’s to compute the a posteriori transition probabilities σ

k

(S

i

, u) according

to Eq. (5) and, ﬁnally, the APP P

k

(u|y) from Eq. (4).

A few comments on the computational complexity of the ﬁnite-sequence BCJR algorithm can be found

in [3].

IV. The Sliding Window BCJR (SW-BCJR)

As previous description made clear, the BCJR algorithm requires that the whole sequence has been

received before starting the decoding process. In this aspect, it is similar to the Viterbi algorithm in its

optimum version. To apply it in a PCCC, we need to subdivide the information sequence into blocks,

6

decode them by terminating the trellises of both CCs,

7

and then decode the received sequence block by

block. Beyond the rigidity, this solution also reduces the overall code rate.

A more ﬂexible decoding strategy is oﬀered by a modiﬁcation of the BCJR algorithm in which the

decoder operates on a ﬁxed memory span, and decisions are forced with a given delay D. We call this

new, and suboptimal, algorithm the sliding window BCJR (SW-BCJR) algorithm. We will describe two

versions of the sliding window BCJR algorithm that diﬀer in the way they overcome the problem of

initializing the backward recursion without having to wait for the entire sequence. We will describe the

two algorithms using the previous step description suitably modiﬁed. Of the previous assumptions, we

retain only that of the knowledge of the initial state s

0

, and thus assume the transmission of semi-inﬁnite

code sequences, where the time span K ranges from 1 to ∞.

6

The presence of the interleaver naturally points toward a block length equal to the interleaver length.

7

The termination of trellises in a PCCC has been considered a hard problem by several authors. As shown in [13], it is,

indeed, quite an easy task.

69

A. The First Version of the Sliding Window BCJR Algorithm (SW1-BCJR)

Here are the steps:

(1) Initialize α

0

according to Eq. (9).

(2) Forward recursion at time k: Upon receiving y

k

, the demodulator supplies to the de-

coder the M distinct branch metrics, and the decoder computes the probabilities α

k

(S

i

)

according to Eqs. (6) and (8). The obtained values of α

k

(S

i

) are stored for all S

i

, as well

as the γ

k

(x).

(3) Initialization of the backward recursion (k > D):

β

k

(S

j

) = α

k

(S

j

), ∀S

j

(12)

(4) Backward recursion: It is performed according to Eq. (10) from time k −1 back to time

k −D.

(5) The a posteriori transition probabilities at time k −D are computed according to

σ

k−D

(S

i

, u) = h

σ

· α

k−D−1

(S

i

)γ

k−D

(x(S

i

, u))β

k−D

(S

+

i

(u)) (13)

(6) The APP at time k −D is computed as

P

k−D

(u|y) =

Si

σ

k−D

(S

i

, u) (14)

For a convolutional code with parameters (k

0

, n

0

), number of states N, and cardinality of the code

alphabet M = 2

n0

, the SW1-BCJR algorithm requires storage of N ×D values of α’s and M ×D values

of the probabilities γ

k

(x) generated by the soft demodulator. Moreover, to update the α’s and β’s for each

time instant, the algorithm needs to perform M × 2

k0

multiplications and N additions of 2

k0

numbers.

To output the set of APP at each time instant, we need a D-times long backward recursion. Thus, the

computational complexity requires overall

• (D + 1)M ×2

k0

multiplications

• (D + 1)M additions of 2

k0

numbers each

As a comparison,

8

the Viterbi algorithm would require, in the same situation, M × 2

k0

additions and

M ×2

k0

-way comparisons, plus the trace-back operations, to get the decoded bits.

B. The Second, Simpliﬁed Version of the Sliding Window BCJR Algorithm (SW2-BCJR)

A simpliﬁcation of the sliding window BCJR that signiﬁcantly reduces the memory requirements

consists of the following steps:

8

Though, indeed, not fair, as the Viterbi algorithm does not provide the information we need.

70

(1) Initialize α

0

according to Eq. (9).

(2) Forward recursion (k > D): If k > D, the probabilities α

k−D−1

(S

i

) are computed

according to Eq. (8).

(3) Initialization of the backward recursion (k > D):

β

k

(S

j

) =

1

N

, ∀S

j

(15)

(4) Backward recursion (k > D): It is performed according to Eq. (10) from time k −1 back

to time k −D.

(5) The a posteriori transition probabilities at time k −D are computed according to

σ

k−D

(S

i

, u) = h

σ

· α

k−D−1

(S

i

)γ

k−D

(x(S

i

, u))β

k−D

(S

+

i

(u)) (16)

(6) The APP at time k −D is computed as

P

k−D

(u|y) =

Si

σ

k−D

(S

i

, u) (17)

This version of the sliding window BCJR algorithm does not require storage of the N × D values of

α’s as they are updated with a delay of D steps. As a consequence, only N values of α’s and M × D

values of the probabilities γ

k

(x) generated by the soft demodulator must be stored. The computational

complexity is the same as the previous version of the algorithm. However, since the initialization of the

β recursion is less accurate, a larger value of D should be set in order to obtain the same accuracy on

the output values P

k−D

(u|y). This observation will receive quantitative evidence in the section devoted

to simulation results.

V. Additive Algorithms

A. The Log-BCJR

The BCJR algorithm and its sliding window versions have been stated in multiplicative form. Owing

to the monotonicity of the logarithm function, they can be converted into an additive form passing to

the logarithms. Let us deﬁne the following logarithmic quantities:

Γ

k

(x)

=log[γ(x)]

A

k

(S

i

)

=log[α

k

(S

i

)]

71

B

k

(S

i

)

=log[β

k

(S

i

)]

Σ

k

(S

i

, u)

=log[σ

k

(S

i

, u)]

These deﬁnitions lead to the following A and B recursions, derived from Eqs. (8), (10), and (5):

A

k

(S

i

) =log

_

u

exp

_

A

k−1

(S

−

i

(u)) + Γ

k

(x(u, S

i

))

_

_

+H

A

(18)

B

k

(S

i

) =log

_

u

exp

_

Γ

k+1

(x(S

i

, u)) +B

k+1

_

S

+

i

(u)

__

_

+H

B

(19)

Σ

k

(S

i

, u) =A

k−1

(S

i

) + Γ

k

(x(S

i

, u)) +B

k

(S

+

i

(u)) +H

Σ

(20)

with the following initializations:

A

0

(S

i

) =

_

0 if S

i

= s

0

−∞ otherwise

B

1

(S

i

) =

_

0 if S

i

= s

n

−∞ otherwise

B. Simpliﬁed Versions of the Log-BCJR

The problem in the recursions deﬁned for the log-BCJR consists of the evaluation of the logarithm of

a sum of exponentials:

log

_

i

exp{A

i

}

_

An accurate estimate of this expression can be obtained by extracting the term with the highest expo-

nential,

A

M

= max

i

A

i

so that

log

_

i

exp{A

i

}

_

= A

M

+ log

_

_

1 +

Ai=A

M

exp{A

i

−A

M

}

_

_

(21)

and by computing the second term of the right-hand side (RHS) of Eq. (21) using lookup tables. Further

simpliﬁcations and the required circuits for implementation are discussed in the Appendix.

72

However, when A

M

A

i

, the second term can be neglected. This approximation leads to the additive

logarithmic-BCJR (AL-BCJR) algorithm:

A

k

(S

i

) =max

u

_

A

k−1

(S

−

i

(u)) + Γ

k

(x(u, S

i

))

¸

+H

A

(22)

B

k

(S

i

) =max

u

_

B

k+1

(S

+

i

(u)) + Γ

k+1

(x(S

i

, u))

¸

+H

B

(23)

Σ

k

(S

i

, u) =A

k−1

(S

i

) + Γ

k

(x(S

i

, u)) +B

k

(S

+

i

(u)) +H

Σ

(24)

with the same initialization of the log-BCJR.

Both versions of the SW-BCJR algorithm described in the previous section can be used, with obvious

modiﬁcations, to transform the block log-BCJR and the AL-BCJR into their sliding window versions,

leading to the SW-log-BCJR and the SWAL1-BCJR and SWAL2-BCJR algorithms.

VI. Explicit Algorithms for Some Particular Cases

In this section, we will make explicit the quantities considered in the previous algorithms’ descriptions

by making assumptions on the code type, modulation format, and channel.

A. Rate 1/n Binary Systematic Convolutional Encoder

In this section, we particularize the previous equations in the case of a rate 1/n binary systematic

encoder associated to n binary-pulse amplitude modulation (PAM) signals or binary phase shift keying

(PSK) signals.

The channel symbols x and the output symbols from the encoder can be represented as vectors of n

binary components:

˜ c

= [c

1

, · · · , c

n

]c

i

∈ {0, 1}

˜ x

= [x

1

, · · · , x

n

]x

i

∈ {A, −A}

˜ x

k

= [x

k1

, · · · , x

kn

]

˜ y

k

= [y

k1

, · · · , y

kn

]

where the notations have been modiﬁed to show the vector nature of the symbols. The joint probabilities

γ

k

(˜ x), over a memoryless channel, can be split as

γ

k

(˜ x) =

n

m=1

P(y

km

|x

km

= x

m

)P(x

km

= x

m

) (25)

Since in this case the encoded symbols are n-tuple of binary symbols, it is useful to redeﬁne the input

probabilities, γ, in terms of the likelihood ratios:

73

λ

km

=

P(y

km

|x

km

= A)

P(y

km

|x

km

= −A)

λ

A

km

=

P(x

km

= A)

P(x

km

= −A)

so that, from Eq. (25),

γ

k

(˜ x) =

n

m=1

(λ

km

)

cm

1 +λ

km

(λ

A

km

)

cm

1 +λ

A

km

= h

γ

n

m=1

_

λ

km

· λ

A

km

¸

cm

where h

γ

takes into account all terms independent of ˜ x.

The BCJR can be restated as follows:

α

k

(S

i

) = h

γ

h

α

u

α

k−1

(S

−

i

(u))

n

m=1

_

λ

km

· λ

A

km

¸

cm(u,Si)

(26)

β

k

(S

i

) = h

γ

h

β

u

β

k+1

(S

+

i

(u))

n

m=1

_

λ

(k+1)m

· λ

A

(k+1)m

_

cm(Si,u)

(27)

σ

k

(S

i

, u) = h

γ

h

σ

α

k−1

(S

i

)

n

m=1

_

λ

km

· λ

A

km

¸

cm(u,Si)

β

k

(S

+

i

(u)) (28)

whereas its simpliﬁcation, the AL-BCJR algorithm, becomes

A

k

(S

i

) =max

u

_

A

k−1

(S

−

i

(u)) +

n

m=1

c

m

(u, S

i

)

_

Λ

km

+ Λ

A

km

_

_

+H

A

(29)

B

k

(S

i

) =max

u

_

B

k+1

(S

+

i

(u)) +

n

m=1

c

m

(S

i

, u)

_

Λ

km

+ Λ

A

km

_

_

+H

B

(30)

Σ

k

(S

i

, u) =A

k−1

(S

i

) +

n

m=1

c

m

(S

i

, u)

_

Λ

km

+ Λ

A

km

_

+B

k

(S

+

i

(u)) (31)

where Λ stands for the logarithm of the corresponding quantity λ.

B. The Additive White Gaussian Noise Channel

When the channel is the additive white Gaussian noise (AWGN) channel, we obtain the explicit

expression of the log–likelihood ratios Λ

ki

as

74

Λ

ki

=log

_

P(y

ki

|x

ki

= A)

P(y

ki

|x

ki

= −A)

_

=log

_

¸

¸

_

1

√

2πσ

2

exp{−

1

2σ

2

(y

ki

−A)

2

}

1

√

2πσ

2

exp{−

1

2σ

2

(y

ki

+A)

2

}

_

¸

¸

_

=

2A

σ

2

y

ki

Hence, the AL-BCJR algorithm assumes the following form:

A

k

(S

i

) =max

u

_

A

k−1

(S

−

i

(u)) +

n

m=1

c

m

(u, S

i

)

_

2A

σ

2

y

km

+ Λ

A

km

_

_

+H

A

(32)

B

k

(S

i

) =max

u

_

B

k+1

(S

+

i

(u)) +

n

m=1

c

m

(S

i

, u)

_

2A

σ

2

y

km

+ Λ

A

km

_

_

+H

B

(33)

Σ

k

(S

i

, u) =A

k−1

(S

i

) +

n

m=1

c

m

(S

i

, u)

_

2A

σ

2

y

km

+ Λ

A

km

_

+B

k

(S

+

i

(u)) (34)

In the examples presented in Section VIII, we will consider turbo codes with rate 1/2 component

convolutional codes transmitted as binary PAM or binary PSK over an AWGN channel.

VII. Iterative Decoding of Parallel Concatenated Convolutional Codes

In this section, we will show how the MAP algorithms previously described can be embedded into

the iterative decoding procedure of parallel concatenated codes. We will derive the iterative decoding

algorithm through suitable approximations performed on maximum-likelihood decoding. The description

will be based on the fairly general parallel concatenated code shown in Fig. 4, which employs three

encoders and three interleavers (denoted by π in the ﬁgure).

Let u

k

be the binary random variable taking values in {0, 1}, representing the sequence of information

bits u = (u

1

, · · · , u

n

). The optimum decision algorithm on the kth bit u

k

is based on the conditional

log–likelihood ratio L

k

:

L

k

=log

P(u

k

= 1|y)

P(u

k

= 0|y)

=log

u:u

k

=1

P(y|u)

j=k

P(u

j

)

u:u

k

=0

P(y|u)

j=k

P(u

j

)

+ log

P(u

k

= 1)

P(u

k

= 0)

=log

u:u

k

=1

P(y|x(u))

j=k

P(u

j

)

u:u

k

=0

P(y|x(u))

j=k

P(u

j

)

+ log

P(u

k

= 1)

P(u

k

= 0)

(35)

where, in Eq. (35), P(u

j

) are the a priori probabilities.

75

•

u

x

0

x

1

x

2

x

3

D D

ENCODER 1

D D

ENCODER 2

D D

ENCODER 3

u

3

u

2

π

3

π

1

π

2

u

1

• • •

• •

• • •

• • •

Fig. 4. Parallel concatenation of three convolutional codes.

If the rate k

o

/n

o

constituent code is not equivalent to a punctured rate 1/n

o

code or if turbo trellis-

coded modulation is used, we can ﬁrst use the symbol MAP algorithm as described in the previous

sections to compute the log–likelihood ratio of a symbol u = u

1

, u

2

, · · · , u

ko

, given the observation y as

λ(u) = log

P(u|y)

P(0|y)

where 0 corresponds to the all-zero symbol. Then we obtain the log–likelihood ratios of the jth bit within

the symbol by

L(u

j

) = log

u:uj=1

e

λ(u)

u:uj=0

e

λ(u)

In this way, the turbo decoder operates on bits, and bit, rather than symbol, interleaving is used.

To explain the basic decoding concept, we restrict ourselves to three codes, but extension to several

codes is straightforward. In order to simplify the notation, consider the combination of the permuter

(interleaver) and the constituent encoder connected to it as a block code with input u and outputs x

i

,

i = 0, 1, 2, 3(x

0

= u) and the corresponding received sequences as y

i

, i = 0, 1, 2, 3. The optimum bit

decision metric on each bit is (for data with uniform a priori probabilities)

L

k

= log

u:u

k

=1

P(y

0

|u)P(y

1

|u)P(y

2

|u)P(y

3

|u)

u:u

k

=0

P(y

0

|u)P(y

1

|u)P(y

2

|u)P(y

3

|u)

(36)

but, in practice, we cannot compute Eq. (36) for large n because the permutations π

2

, π

3

imply that y

2

and y

3

are no longer simple convolutional encodings of u. Suppose that we evaluate P(y

i

|u), i = 0, 2, 3

in Eq. (36) using Bayes’ rule and using the following approximation:

76

P(u|y

i

) ≈

n

k=1

˜

P

i

(u

k

) (37)

Note that P(u|y

i

) is not separable in general. However, for i = 0, P(u|y

0

) is separable; hence, Eq. (37)

holds with equality. So we need an algorithm that approximates a nonseparable distribution P(u|y

i

)

= P

with a separable distribution

n

k=1

˜

P

i

(u

k

)

= Q. The best approximation can be obtained using the

Kullback cross-entropy minimizer, which minimizes the cross-entropy H(Q, P) = E{log(Q/P)} between

the input P and the output Q.

The MAP algorithm approximates a nonseparable distribution with a separable one; however it is

not clear how good it is compared with the Kullback cross-entropy minimizer. Here we use the MAP

algorithm for such an approximation. In the iterative decoding, as the reliability of the {u

k

} improves,

intuitively one expects that the cross-entropy between the input and the output of the MAP algorithm

will decrease, so that the approximation will improve. If such an approximation, i.e., Eq. (37), can be

obtained, we can use it in Eq. (36) for i = 2 and i = 3 (by Bayes’ rule) to complete the algorithm.

Deﬁne

˜

L

ik

by

˜

P

i

(u

k

) =

e

u

k

˜

L

ik

1 +e

˜

L

ik

(38)

where u

k

∈ {0, 1}. To obtain {

˜

P

i

} or, equivalently, {

˜

L

ik

}, we use Eqs. (37) and (38) for i = 0, 2, 3 (by

Bayes’ rule) to express Eq. (36) as

L

k

= f(y

1

,

˜

L

0

,

˜

L

2

,

˜

L

3

, k) +

˜

L

0k

+

˜

L

2k

+

˜

L

3k

(39)

where

˜

L

0k

= 2Ay

0k

/σ

2

(for binary modulation) and

f(y

1

,

˜

L

0

,

˜

L

2

,

˜

L

3

, k) = log

u:u

k

=1

P(y

1

|u)

j=k

e

uj(

˜

L0j+

˜

L2j+

˜

L3j)

u:u

k

=0

P(y

1

|u)

j=k

e

uj(

˜

L0j+

˜

L2j+

˜

L3j)

(40)

We can use Eqs. (37) and (38) again, but this time for i = 0, 1, 3, to express Eq. (36) as

L

k

= f(y

2

,

˜

L

0

,

˜

L

1

,

˜

L

3

, k) +

˜

L

0k

+

˜

L

1k

+

˜

L

3k

(41)

and similarly,

L

k

= f(y

3

,

˜

L

0

,

˜

L

1

,

˜

L

2

, k) +

˜

L

0k

+

˜

L

1k

+

˜

L

2k

(42)

A solution to Eqs. (39), (41), and (42) is

˜

L

1k

=f(y

1

,

˜

L

0

,

˜

L

2

,

˜

L

3

, k)

˜

L

2k

=f(y

2

,

˜

L

0

,

˜

L

1

,

˜

L

3

, k)

˜

L

3k

=f(y

3

,

˜

L

0

,

˜

L

1

,

˜

L

2

, k)

_

¸

¸

¸

¸

¸

¸

_

¸

¸

¸

¸

¸

¸

_

(43)

77

for k = 1, 2, · · · , n, provided that a solution to Eq. (43) does indeed exist. The ﬁnal decision is then based

on

L

k

=

˜

L

0k

+

˜

L

1k

+

˜

L

2k

+

˜

L

3k

(44)

which is passed through a hard limiter with zero threshold. We attempted to solve the nonlinear equations

in Eq. (43) for

˜

L

1

,

˜

L

2

, and

˜

L

3

by using the iterative procedure

˜

L

(m+1)

1k

= α

(m)

1

f(y

1

,

˜

L

0

,

˜

L

(m)

2

,

˜

L

(m)

3

, k) (45)

for k = 1, 2, · · · , n, iterating on m. Similar recursions hold for

˜

L

(m)

2k

and

˜

L

(m)

3k

.

We start the recursion with the initial condition

˜

L

(0)

1

=

˜

L

(0)

2

=

˜

L

(0)

3

=

˜

L

0

. For the computation of

f(·), we can use any MAP algorithm as described in the previous sections, with permuters (direct and

inverse) where needed; call this the basic decoder D

i

, i = 1, 2, 3. The

˜

L

(m)

ik

, i = 1, 2, 3 represent the

extrinsic information. The signal ﬂow graph for extrinsic information is shown in Fig. 5 [13], which is a

fully connected graph without self-loops. Parallel, serial, or hybrid implementations can be realized based

on the signal ﬂow graph of Fig. 5 (in this ﬁgure y

0

is considered as part of y

1

). Based on our equations,

each node’s output is equal to internally generated reliability L minus the sum of all inputs to that node.

The BCJR MAP algorithm always starts and ends at the all-zero state since we always terminate the

trellis as described in [13]. We assumed π

1

= I identity; however, any π

1

can be used.

D

3

D

2

D

1

L

1

~

L

2

~

L

3

~

L

2

~

L

1

~

L

3

~

Fig. 5. Signal flow graph for

extrinsic information.

The overall decoder is composed of block decoders D

i

connected in parallel, as in Fig. 6 (when the

switches are in position P), which can be implemented as a pipeline or by feedback. A serial imple-

mentation is also shown in Fig. 6 (when the switches are in position S). Based on [13, Fig. 5], a serial

implementation was proposed in [21]. For those applications where the systematic bits are not transmit-

ted or for parallel concatenated trellis codes with high-level modulation, we should set

˜

L

0

= 0. Even

in the presence of systematic bits, if desired, one can set

˜

L

0

= 0 and consider y

0

as part of y

1

. If the

systematic bits are distributed among encoders, we use the same distribution for y

0

among the received

observations for MAP decoders.

At this point, further approximation for iterative decoding is possible if one term corresponding to

a sequence u dominates other terms in the summation in the numerator and denominator of Eq. (40).

Then the summations in Eq. (40) can be replaced by “maximum” operations with the same indices, i.e.,

replacing

u:u

k

=i

with

max

u:u

k

=i

for i = 0, 1. A similar approximation can be used for

˜

L

2k

and

˜

L

3k

in

Eq. (43). This suboptimal decoder then corresponds to an iterative decoder that uses AL-BCJR rather

than BCJR decoders. As discussed, such approximations have been used by replacing

with max in the

log-BCJR algorithm to obtain AL-BCJR. Clearly, all versions of SW-BCJR can replace BCJR (MAP)

decoders in Fig. 6.

For turbo codes with only two constituent codes, Eq. (45) reduces to

78

DELAY

2

• •

•

L

1

~

(m)

L

2

~

(m)

π

2 π

2

–1

DELAY 2

+

– L

2

y

2

log-BCJR 1

or

SWL-BCJR 1

π

1 π

1

–1

DELAY 1

+

– L

1

y

1

π

3 π

3

–1

+

DELAY 3

– L

3

L

3

~

y

3

Σ

•

•

•

•

•

DECODED

BITS

L

•

•

•

L

0

~

y

0

2A/σ

2

+

•

+

+

(m)

D

1

D

2

D

3

Σ

Σ

Σ

log-BCJR 2

or

SWL-BCJR 2

log-BCJR 3

or

SWL-BCJR 3

DELAY

3

•

•

•

DELAY

1

•

•

P

S

P

S

••

•

S

P

•

S

P

Fig. 6. Iterative decoder structure for three parallel concatenated codes.

˜

L

(m+1)

1k

= α

(m)

1

f(y

1

,

˜

L

0

,

˜

L

(m)

2

, k)

˜

L

(m+1)

2k

= α

(m)

2

f(y

2

,

˜

L

0

,

˜

L

(m)

1

, k)

for k = 1, 2, · · · , n, and m = 1, 2, · · ·, where, for each iteration, α

(m)

1

and α

(m)

2

can be optimized (simulated

annealing) or set to 1 for simplicity. The decoding conﬁguration for two codes is shown in Fig. 7. In this

special case, since the paths in Fig. 7 are disjointed, the decoder structure can be reduced to a serial mode

structure if desired. If we optimize α

(m)

1

and α

(m)

2

, our method for two codes is similar to the decoding

method proposed in [6], which requires estimates of the variances of

˜

L

1k

and

˜

L

2k

for each iteration in

the presence of errors. It is interesting to note that the concept of extrinsic information introduced in

[6] was also presented as “partial factor” in [22]. However, the eﬀectiveness of turbo codes lies in the

use of recursive convolutional codes and random permutations. This results in time-shift-varying codes

resembling random codes.

In the results presented in the next section, we will use a parallel concatenated code with only two

constituent codes.

79

+ π

2

+

–

L

2

y

2

+

+

– L

1

L

1

~

y

1

•

DECODED BITS

L

0

~

y

0

2A/σ

2

Fig. 7. Iterative decoder structure for two parallel concatenated codes.

(m)

L

2

~

(m)

D

1

D

2

π

2

–1

DELAY 1

DELAY 2

log-BCJR 1

OR

SWL-BCJR 1

log-BCJR 2

OR

SWL-BCJR 2

Σ

Σ

VIII. Simulation Results

In this section, we will present some simulation results obtained applying the iterative decoding algo-

rithm described in Section VII, which, in turn, uses the optimum BCJR and the suboptimal, but simpler,

SWAL2-BCJR as embedded MAP algorithms. All simulations refer to a rate 1/3 PCCC with two equal,

recursive convolutional constituent codes with 16 states and generator matrix

G(D) =

_

1,

1 +D +D

3

+D

4

1 +D

3

+D

4

_

and an interleaver of length 16,384 designed according to the procedure described in [13], using an

S-random permutation with S = 40. Each simulation run examined at least 25,000,000 bits.

In Fig. 8, we plot the bit-error probabilities as a function of the number of iterations of the decoding

procedure using the optimum block BCJR algorithm for various values of the signal-to-noise ratio. It can

be seen that the decoding algorithm converges down to BER = 10

−5

at signal-to-noise ratios of 0.2 dB

with nine iterations. The same curves are plotted in Fig. 9 for the case of the suboptimum SWAL2-BCJR

algorithm. In this case, 0.75 dB of signal-to-noise ratio is required for convergence to the same BER and

with the same number of iterations.

In Fig. 10, the bit-error probability versus the signal-to-noise ratio is plotted for a ﬁxed number

(5) of iterations of the decoding algorithm and for both optimum BCJR and SWAL2-BCJR MAP de-

coding algorithms. It can be seen that the penalty incurred by the suboptimum algorithm amounts

to about 0.5 dB. This ﬁgure is in agreement with a similar result obtained in [12], where all MAP

80

10

–1

10

–2

10

–3

10

–4

10

–5

2 4 6 8 10 12 14 16 18 20

NUMBER OF ITERATIONS

P

b

(

e

)

0.05

0.00

0.10

0.15

0.20

0.25

0.35

0.45

0.50

–0.05

Fig. 8. Convergence of turbo coding: bit-error probability

versus number of iterations for various E

b

/N

0

using the

SW2-BCJR algorithm.

10

–1

10

–2

10

–3

10

–4

10

–5

P

b

(

e

)

2 4 6 8 10 12 14 16 18 20

NUMBER OF ITERATIONS

Fig. 9. Convergence of turbo coding: bit-error probability

versus number of iterations for various E

b

/N

0

using the

SWAL2-BCJR algorithm.

0.60

0.65

0.70

0.75

0.85

1.00

algorithms were of the block type. The penalty is completely attributable to the approximation of the

sum of exponentials described in Section V.B. To verify this, we have used a SW2-BCJR and compared

its results with the optimum block BCJR, obtaining the same results.

Finally, in Figs. 11 and 12, we plot the number of iterations needed to obtain a given bit-error prob-

ability versus the bit signal-to-noise ratio, for the two algorithms. These curves provide information on

the delay incurred to obtain a given reliability as a function of the bit signal-to-noise ratio.

81

P

b

(

e

)

1

10

–1

10

–2

10

–3

10

–4

10

–5

10

–6

10

–7

10

–8

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

E

b

/N

0

Fig. 10. Bit-error probability as a function of the bit signal-to-

noise ratio using the SW2-BCJR and SWAL2-BCJR algorithms

with five iterations.

SWAL2-BCJR

SW2-BCJR

1.0 0.9 0.8 0.7 0.6

2

4

6

8

10

12

14

16

18

20

E

b

/N

0

D

E

L

A

Y

,

n

u

m

b

e

r

o

f

i

t

e

r

a

t

i

o

n

s

Fig. 11. Number of iterations to achieve several bit-error

probabilities as a function of the bit signal-to-noise ratio using

the SWAL2-BCJR algorithm.

P

b

(e) = 10

–2

P

b

(e) = 10

–4

P

b

(e) = 10

–3

IX. Conclusions

We have described two versions of a simpliﬁed maximum a posteriori decoding algorithm working in

a sliding window form, like the Viterbi algorithm. The algorithms can be used as a building block to

decode continuously transmitted sequences obtained by parallel concatenated codes, without requiring

code trellis termination. A heuristic explanation of how to embed the maximum a posteriori algorithms

into the iterative decoding of parallel concatenated codes was also presented. Finally, the performances

of the two algorithms were compared on the basis of a powerful rate 1/3 parallel concatenated code.

82

0.3 0.4 0.5 0.2 0.1

1

2

3

4

5

6

7

8

9

10

E

b

/N

0

D

E

L

A

Y

,

n

u

m

b

e

r

o

f

i

t

e

r

a

t

i

o

n

s

P

b

(e) = 10

–2

P

b

(e) = 10

–4

P

b

(e) = 10

–3

Fig. 12. Number ot iterations to achieve several bit-error

probabilities as a function of the bit signal-to-noise ratio using

the SW2-BCJR algorithm.

Acknowledgment

The research in this article was partially carried out at the Politecnico di Torino,

Italy, under NATO Research Grant CRG 951208.

References

[1] S. Benedetto, E. Biglieri, and V. Castellani, Digital Transmission Theory, New

York: Prentice-Hall, 1987.

[2] K. Abend and B. D. Fritchman, “Statistical Detection for Communication Chan-

nels With Intersymbol Interference,” Proceedings of IEEE, vol. 58, no. 5, pp. 779–

785, May 1970.

[3] L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal Decoding of Linear

Codes for Minimizing Symbol Error Rate,” IEEE Transactions on Information

Theory, pp. 284–287, March 1974.

[4] G. D. Forney, Jr., Concatenated Codes, Cambridge, Massachusetts: Massachu-

setts Institute of Technology, 1966.

[5] V. V. Ginzburg, “Multidimensional Signals for a Continuous Channel,” Probl.

Peredachi Inform., vol. 20, no. 1, pp. 28–46, January 1984.

[6] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon Limit Error-

Correcting Coding and Decoding: Turbo-Codes,” Proceedings of ICC’93, Geneva,

Switzerland, pp. 1064–1070, May 1993.

83

[7] N. Seshadri and C.-E. W. Sundberg, “Generalized Viterbi Algorithms for Error

Detection With Convolutional Codes,” Proceedings of GLOBECOM’89, vol. 3,

Dallas, Texas, pp. 43.3.1–43.3.5, November 1989.

[8] J. Hagenauer and P. Hoeher, “A Viterbi Algorithm With Soft-Decision Outputs

and Its Applications,” Proceedings of GLOBECOM’89, Dallas, Texas, pp. 47.1.1–

47.1.7, November 1989.

[9] Y. Li, B. Vucetic, and Y. Sato, “Optimum Soft-Output Detection for Channels

With Intersymbol Interference,” Trans. on Information Theory, vol. 41, no. 3,

pp. 704–713, May 1995.

[10] S. S. Pietrobon and A. S. Barbulescu, “A Simpliﬁcation of the Modiﬁed Bahl Al-

gorithm for Systematic Convolutional Codes,” Proceedings of ISITA’94, Sydney,

Australia, pp. 1073–1077, November 1994.

[11] U. Hansson and T. Aulin, “Theoretical Performance Evaluation of Diﬀerent Soft-

Output Algorithms,” Proceedings of ISITA’94, Sydney, Australia, pp. 875–880,

November 1994.

[12] P. Robertson, E. Villebrun, and P. Hoeher, “A Comparison of Optimal and Sub-

Optimal MAP Decoding Algorithms Operating in the Log Domain,” Proceedings

of ICC’95, Seattle, Washington, pp. 1009–1013, June 1995.

[13] D. Divsalar and F. Pollara, “Turbo Codes for PCS Applications,” Proceedings of

ICC’95, Seattle, Washington, pp. 54–59, June 1995.

[14] CAS 5093 Turbo-Code Codec, 3.7 ed., data sheet, Chateaubourg, France: Co-

matlas, August 1994.

[15] S. Benedetto and G. Montorsi, “Performance of Turbo Codes,” Electronics Let-

ters, vol. 31, no. 3, pp. 163–165, February 1995.

[16] S. S. Pietrobon, “Implementation and Performance of a Serial MAP Decoder for

Use in an Iterative Turbo Decoder,” Proceedings of ISIT’95, Whistler, British

Columbia, Canada, pp. 471, September 1995.

Also http://audrey.levels.unisa.edu.au/itr-users/steven/turbo/ISIT95ovh2.ps.gz

[17] D. Divsalar, S. Dolinar, R. J. McEliece, and F. Pollara, “Transfer Function

Bounds on the Performance of Turbo Codes,” The Telecommunications and Data

Acquisition Progress Report 42-122, April–June 1995, Jet Propulsion Laboratory,

Pasadena, California, pp. 44–55, August 15, 1995.

http://tda.jpl.nasa.gov/tda/progress report/42-122/122A.pdf

[18] S. Benedetto and G. Montorsi, “ Design of Parallel Concatenated Convolutional

Codes,” to be published in IEEE Transactions on Communications, 1996.

[19] D. Divsalar and F. Pollara, “Multiple Turbo Codes,” Proceedings of IEEE

MILCOM95, San Diego, California, November 5–8, 1995.

[20] D. Divsalar and F. Pollara, “On the Design of Turbo Codes,” The Telecommuni-

cations and Data Acquisition Progress Report 42-123, July–September 1995, Jet

Propulsion Laboratory, Pasadena, California, pp. 99–121, November 15, 1995.

http://tda.jpl.nasa.gov/tda/progress report/42-123/123D.pdf

[21] S. A. Barbulescu, “Iterative Decoding of Turbo Codes and Other Concatenated

Codes,” Ph.D. Dissertation, University of South Australia, August 1995.

[22] J. Lodge, R. Young, P. Hoeher, and J. Hagenauer, “Separable MAP ‘Filters’

for the Decoding of Product and Concatenated Codes,” Proceedings of ICC’93,

Geneva, Switzerland, pp. 1740–1745, May 1993.

84

Appendix

Circuits to Implement the MAP Algorithm for Decoding

Rate 1/n Component Codes of a Turbo Code

In this appendix, we show the basic circuits required to implement a serial additive MAP algorithm

for both block log-BCJR and SW-log-BCJR. Extension to a parallel implementation is straightforward.

Figure A-1 shows the implementation

9

of Eq. (18) for the forward recursion using a lookup table for

evaluation of log(1 + e

−x

), and subtraction of max

j

{A

k

(S

j

)} from A

k

(S

i

) is used for normalization to

prevent buﬀer overﬂow.

10

The circuit for maximization can be implemented simply by using a comparator

and selector with feedback operation. Figure A-2 shows the implementation of Eq. (19) for the backward

recursion, which is similar to Fig. A-1. A circuit for computation of log(P

k

(u|y)) from Eq. (4) using

Eq. (20) for ﬁnal computation of bit reliability is shown in Fig. A-3. In this ﬁgure, switch 1 is in position 1

and switch 2 is open at the start of operation. The circuit accepts Σ

k

(S

i

, u) for i = 1, then switch 1 moves

to position 2 for feedback operation. The circuit performs the operations for i = 1, 2, · · · , N. When the

circuit accepts Σ

k

(S

i

, u) for i = N, switch 1 goes to position 1 and switch 2 is closed. This operation is

done for u = 1 and u = 0. The diﬀerence between log(P

k

(1|y)) and log(P

k

(0|y)) represents the reliability

value required for turbo decoding, i.e., the value of L

k

in Eq. (35).

SELECT

1 OF 2

A

k–1

(S

i

(0))

–

COMPARE

SELECT

1 OF 2

LOOKUP

TABLE

x

A

k–1

(S

i

(1))

–

A

k

(S

i

)

E E

–

+

log (1 + e

–x

)

NORMALIZE

A

k

(S

i

) – max {A

k

(S

j

)}

+

+

BRANCH METRIC

Γ

k

(x (0,S

i

))

BRANCH METRIC

Γ

k

(x (1,S

i

))

NORMALIZED A

k

(S

i

)

Fig. A-1. Basic structure for forward computation in the log-BCJR MAP algorithm.

+

+

+

+

9

For feed-forward and nonconventional recursive convolutional codes, the notations in Fig. A-1 should be changed according

to Footnotes 2 and 5.

10

Simpler normalization can be achieved by monitoring the two most signiﬁcant bits. When both of them are one, then we

reset all the most signiﬁcant bits to zero. This method increases the bit representation by an additional 2 bits.

85

SELECT

1 OF 2

B

k+1

(S

i

(0))

COMPARE

SELECT

1 OF 2

LOOKUP

TABLE

x

B

k+1

(S

i

(1))

B

k

(S

i

)

E E

–

+

log (1 + e

–x

)

NORMALIZE

B

k

(S

i

) – max {B

k

(S

j

)}

+

+

BRANCH METRIC BRANCH METRIC

Γ

k+1

(x (S

i

,1))

NORMALIZED B

k

(S

i

)

Fig. A-2. Basic structure for backward computation in the log-BCJR MAP algorithm.

+ +

Γ

k+1

(x (S

i

,1))

+

+

+

+

We propose two simpliﬁcations to be used for computation of log(1 + e

−x

) without using a lookup

table.

Approximation 1: We used the approximation log(1 +e

−x

) ≈ −ax +b, 0 < x < b/a where b = log(2),

and we selected a = 0.3 for the simulation. We observed about a 0.1-dB degradation compared with the

full MAP algorithm for the code described in Section VIII. The parameter a should be optimized, and it

may not necessarily be the same for the computation of Eq. (18), Eq. (19), and log(P

k

(u|y)) from Eq. (4)

using Eq. (20). We call this “linear” approximation.

Approximation 2: We take

log(1 +e

−x

) ≈

_

0 if x > η

c if x < η

We selected c = log(2) and the threshold η = 1.0 for our simulation. We observed about a 0.2-dB

degradation compared with the full MAP algorithm for the code described in Section VIII. This threshold

should be optimized for a given SNR, and it may not necessarily be the same for the computation

of Eq. (18), Eq. (19), and log(P

k

(u|y)) from Eq. (4) using Eq. (20). If we use this approximation,

the log-BCJR algorithm can be built based on addition, comparison, and selection operations without

requiring a lookup table, which is similar to a Viterbi algorithm implementation. We call this “threshold”

approximation. At most, 8- to 10-bit representation suﬃces for all operations (see also [12] and [16]).

86

x

{Σ

k

(S

i

,u )}

log P

k

(u | y)

log (1+e

–x

)

•

•

•

•

•

•

•

•

INITIAL

VALUE

2

1

SWITCH 1

SWITCH 2

SELECT

1 OF 2

COMPARE

LOOKUP

TABLE

SELECT

1 OF 2

–

+

+

E E

+

Fig. A-3. Basic structure for bit reliability computation in the

log-BCJR MAP algorithm.

87

algorithms has seen a revival in connection with the problem of decoding concatenated coding schemes. Concatenated coding schemes (a class in which we include product codes, multilevel codes, generalized concatenated codes, and serial and parallel concatenated codes) were ﬁrst proposed by Forney [4] as a means of achieving large coding gains by combining two or more relatively simple “constituent” codes. The resulting concatenated coding scheme is a powerful code endowed with a structure that permits an easy decoding, like “stage decoding” [5] or “iterated stage decoding” [6]. To work properly, all these decoding algorithms cannot limit themselves to passing the symbols decoded by the inner decoder to the outer decoder. They need to exchange some kind of soft information. Actually, as proved by Forney [4], the optimum output of the inner decoder should be in the form of the sequence of the probability distributions over the inner code alphabet conditioned on the received signal, the a posteriori probability (APP) distribution. There have been several attempts to achieve, or at least to approach, this goal. Some of them are based on modiﬁcations of the Viterbi algorithm so as to obtain, at the decoder output, in addition to the “hard”-decoded symbols, some reliability information. This has led to the concept of “augmented-output,” or the list-decoding Viterbi algorithm [7], and to the soft-output Viterbi algorithm (SOVA) [8]. These solutions are clearly suboptimal, as they are unable to supply the required APP. A diﬀerent approach consisted in revisiting the original symbol MAP decoding algorithms [2,3] with the aim of simplifying them to a form suitable for implementation [9–12]. In this article, we are interested in soft-decoding algorithms as the main building block of iterative stage decoding of parallel concatenated codes. This has become a “hot” topic for research after the successful proposal of the so-called turbo codes [6]. They are (see Fig. 1) parallel concatenated convolutional codes (PCCC) whose encoder is formed by two (or more) constituent systematic encoders joined through an interleaver. The input information bits feed the ﬁrst encoder and, after having been interleaved by the interleaver, enter the second encoder. The codeword of the parallel concatenated code consists of the input bits to the ﬁrst encoder followed by the parity check bits of both encoders. Generalizations to more than one interleaver are possible and fruitful [13].

RATE 1/3 PCCC

x x

REDUNDANCY BIT RATE 1/2 SYSTEMATIC CONVOLUTIONAL ENCODERS

y1

INTERLEAVER LENGTH = N

x

REDUNDANCY BIT

y2

Fig. 1. Parallel concatenated convolutional code.

The suboptimal iterative decoder is modular and consists of a number of equal component blocks formed by concatenating soft decoders of the constituent codes (CC) separated by the interleavers used at the encoder side. By increasing the number of decoding modules and, thus, the number of decoding iterations, bit-error probabilities as low as 10−5 at Eb /N0 = 0.0 dB for rate 1/4 PCCC have been shown by simulation [13]. A version of turbo codes employing two eight-state convolutional codes as constituent codes, an interleaver of 32 × 32 bits, and an iterative decoder performing two and one-half iterations with a complexity of the order of ﬁve times the maximum-likelihood (ML) Viterbi decoding of each constituent code is presently available on a chip yielding a measured bit-error probability of 0.9 × 10−6 at Eb /N0 = 3 dB [14].

64

As a by-product. Thus. The comparison will be given in terms of bit-error probability when the algorithms are embedded into iterative decoding schemes for PCCCs. The core of such algorithms is a procedure to derive the sequence of probability distributions over the information symbols’ alphabet based on the received signal and constrained on the code structure. working at a very low signal-to-noise ratio. The extension is straightforward. we will refer to the system of Fig. we have assumed that the cardinality of the modulator equals that of the code alphabet. A distinctive feature of the algorithms is that they work in a “sliding window” form. Both source and code sequences are deﬁned over a time index set K (a ﬁnite or inﬁnite set of integers). Simpliﬁed versions of the algorithm [3] have been proposed and analyzed in [12] in the context of a block decoding strategy that requires trellis termination after each block of bits. In this article. For this reason. cM }. · · · . enter an encoder that generates code sequences c. 2. Denoting the code alphabet C = {c1 . The iterative decoding algorithm was a simpliﬁcation of the algorithm proposed in [3]. it has been shown by simulation that iterative decoding can approach quite closely the ML performance. Similar simpliﬁcation also was used in [16] for hardware implememtation of the MAP algorithm. or channel input symbols xk . · · · . whose regular steps and limited complexity seem quite suitable to very large-scale integration (VLSI) implementation. The channel is characterized by the transitions probability distribution (discrete or continuous. The channel output sequence is fed to the symbol-by-symbol soft-output demodulator.e. uI } and emitted by the source. for comparison. xM }. The information sequence u. according to the channel model) P (y|x). upper bounds to the ML bit-error probability of PCCCs have been proposed. we will describe two versions of a simpliﬁed MAP decoding algorithm that can be used as building blocks of the iterative decoder to decode PCCCs. · · · . thus. according to the memoryless transformation 1 For simplicity of notation. and thus can be used to decode “continuously transmitted” PCCCs. which produces a sequence of probability distributions γk (c) over C conditioned on the received signal.1 The channel symbols xk are transmitted over a stationary memoryless channel with output symbols yk . In general.In recent articles [15. composed of symbols drawn from an alphabet U = {u1 . i. a very powerful PCCC scheme suitable for deep-space applications [18–20] and.. C ⊆ CK The code symbols ck (the index k will always refer to time throughout the article) enter the modulator. like the Viterbi algorithm. the code C can be written as a subset of the Cartesian product of C by itself K times. which performs a one-to-one mapping of them with its signals. belonging to the set X = {x1 . we will carefully deﬁne the system and notations and then stick consistently to them for the description of all algorithms. without requiring trellis termination and a block-equivalent structure of the code. System Context and Notations As previously outlined.17]. The simplest among the two algorithms will be compared with the optimum block-decoding algorithm proposed in [3]. mess of notations involved. For the ﬁrst part of the article. our ﬁnal aim is to ﬁnd suitable soft-output decoding algorithms for iterated staged decoding of parallel concatenated codes employed in a continuous transmission. we will start by this procedure and only later will we extend the description to the more general setting. Readers acquainted with the literature on soft-output decoding algorithms know that one burden in understanding and comparing the diﬀerent algorithms is the spread and. each coded symbol can be mapped in more than one channel symbol. sometimes. 65 . II. We will choose. as in the case of multilevel codes or trellis codes with parallel transitions.

yk ) = P (yk |xk = x(c))Pk (c) = γk (x) (1) where we have assumed to know the sequence of the a priori probability distributions of the channel input symbols (Pk (x) : k ∈ K) and made use of the one-to-one mapping C → X. use the following notations with reference to Fig. The transmission system. and precisely the one deﬁned by the information + symbol u emitted during the transition Si → Si (u). From here on. 3. Si ) when the transition ends in Si and x(Si . 2. γk (c) = P (xk = x(c). and precisely the one deﬁned by the information − symbol u emitted during the transition Si (u ) → Si .SOURCE u U ENCODER c x MODULATOR C X MEMORYLESS CHANNEL y Y y Y SOFT DEMODULATOR P(xk |yk) SOFT DECODER P(xk |y) Fig.2 + (3) Si (u) is one of the successors of Si . They are deﬁned as Pk (u|y) = P (uk = u|y) (2) The probability distributions Pk (u|y) are referred to in the literature as symbol-by-symbol a posteriori probabilities (APP) and represent the optimum symbol-by-symbol soft output. we will limit ourselves to the case of time-invariant convolutional codes with N states. we will make this dependence explicit by writing x(u . · · · . 66 . SN } − (2) Si (u ) is one of the precursors of Si . The BCJR Algorithm In this section. (4) To each transition in the trellis. a signal x is associated. We will call this algorithm the 2 The − state Si and the symbol u uniquely specify the precursor Si (u ) in the case of the class of recursive convolutional encoders. The extension to the case of feed-forward encoders and other nonconventional recursive convolutional encoders is straightforward. belonging to the set S = {S1 . without derivation. which is the optimum algorithm to produce the sequence of APP. which processes the distributions in order to obtain the probability distributions Pk (u|y). III. we will restate in our new notations. The sequence of probability distributions γk (c) obtained by the modulator on a symbol-by-symbol basis is then supplied to the soft-output symbol decoder. When necessary. and assume that the (integer) time instant we are interested in is the kth: (1) Si is the generic state at time k. the algorithm described in [3]. u) when the transition originates from Si . which depends on the state from which the transition originates and on the information symbol u determining that transition. like the ones we are interested in (when the largest degree of feedback polynomial represents the memory of a convolutional encoder).

yn ) to work.S1 • S1 • S1 • – Si (u )• u u x (u . it was proven that the APP can be computed as + σk (Si . u))βk (Si (u)) (5) where 3 The algorithm is usually referred to in the recent literature as the “Bahl algorithm”. s0 and sn . Raviv. sn ) whose ﬁrst and last states. · · · .u ) c (Si . u) (4) Thus.4 Deﬁning the a posteriori transition probabilities from state Si at time k as σk (Si . J. · · · . Cocke. c. the encoder admits a trellis representation with N states. u) = P (uk = u. n} and requires the knowledge of the whole received sequence y = (y1 . whereas upper-case Si represents one particular state belonging to the set S. u) = hσ αk−1 (Si )γk (x(Si . · · · . are assumed to be known by the decoder. so that the code sequences c (and the corresponding transmitted signal sequences x) can be represented as paths in the trellis and uniquely associated with a state sequence s = (s0 .3 We consider ﬁrst the original version of the algorithm. • SN k+1 • BCJR algorithm from the authors’ initials. and the integer time variable k will assume the values 1. and y will refer to sequences n-symbols long. Bahl. the notations u.Si ) c (u . n. and J. which applies to the case of a ﬁnite index set K = {1. sk−1 = Si |y) (3) the APP P (u|y) we want to compute can be obtained as Pk (u|y) = Si σk (Si . sk denotes the states of a sequence at time k. In the following. 67 4 Lower-case . The meaning of notations. In [3]. the problem of evaluating the APP is equivalent to that of obtaining the a posteriori transition probabilities deﬁned in Eq. (3). · · · . F.u ) SN k–1 • SN k Fig. As for the previous assumption.Si ) • Si (u) + •Si x (Si . 3. Jelinek. R. x. we prefer to credit all the authors: L.

The c(u. yk . which are then associated to the trellis transitions to form a sort of branch metrics. (22). u). (1). Si ) should be replaced by c(Si (u). u)) are the joint probabilities already deﬁned in Eq. u)) (10) 5 For feed-forward encoders and nonconventional recursive convolutional encoders like G(D) = [1. In Eqs. − − the maximum should be over all Si (u) that lead to Si . (8). there are M diﬀerent values of γ to be computed. 68 . · · · . Then such modiﬁcations are also required for Eqs. xk = x) = P (yk |xk = x) · P (xk = x) (6) The γ’s can be calculated from the knowledge of the a priori probabilities of the channel input symbols x and of the transition probabilities of the channel P (yk |xk = x). This information is supplied by the symbol-by-symbol soft-output demodulator. (1 + D + D2 )/(1 + D)] − in Eq.e. Si )) u αk (Si ) = hα (8) with hα a constant determined through the constraint αk (Si ) = 1 Si and where the recursion is initialized as α0 (Si ) = 1 if Si = s0 0 otherwise (9) • βk (Si ) are the probabilities of the trellis states at time k conditioned on the future n received signals P (sk = Si |yk+1 ).• hσ is such that σk (Si . namely.u • γk (x(Si . (18) and (26). (29). the summation should be over all possible precursors Si (u) that lead to the state Si . They can be obtained by the backward recursion βk (Si ) = hβ u + βk+1 (Si (u))γk+1 (x(Si . Si ) should be − replaced by x(Si (u). i. u). u) = 1 Si . k αk (Si ) = P (sk = Si |y1 ) (7) k where y1 denotes the sequence y1 . γk (x) = P (yk . • αk (Si ) are the probabilities of the states of the trellis at time k conditioned on the past received signals. For each time k. They can be obtained by the forward recursion5 − αk−1 (Si (u))γk (x(u. and (32). and x(u.. y2 .

Of the previous assumptions.6 decode them by terminating the trellises of both CCs. As shown in [13]. (4). where the time span K ranges from 1 to ∞. it is. and thus assume the transmission of semi-inﬁnite code sequences. (3) When the entire sequence y has been received. quite an easy task. (9) and (11). the APP Pk (u|y) from Eq. The obtained values of αk (Si ) as well as the γk are stored for all k. (10) and uses them together with the stored α’s and γ’s to compute the a posteriori transition probabilities σk (Si . and the decoder computes the probabilities αk according to Eq. We will describe two versions of the sliding window BCJR algorithm that diﬀer in the way they overcome the problem of initializing the backward recursion without having to wait for the entire sequence. the decoder recursively computes the probabilities βk according to the recursion of Eq. Si . and decisions are forced with a given delay D. indeed. it is similar to the Viterbi algorithm in its optimum version. The Sliding Window BCJR (SW-BCJR) As previous description made clear. the BCJR algorithm requires that the whole sequence has been received before starting the decoding process. Beyond the rigidity. and suboptimal. (5) and.7 and then decode the received sequence block by block. 69 . the demodulator supplies to the decoder the “branch metrics” γk of Eq. IV. A few comments on the computational complexity of the ﬁnite-sequence BCJR algorithm can be found in [3]. we retain only that of the knowledge of the initial state s0 . We call this new. u) according to Eq. algorithm the sliding window BCJR (SW-BCJR) algorithm. 6 The 7 The presence of the interleaver naturally points toward a block length equal to the interleaver length. we need to subdivide the information sequence into blocks. and x. To apply it in a PCCC. (8). termination of trellises in a PCCC has been considered a hard problem by several authors. We will describe the two algorithms using the previous step description suitably modiﬁed. ﬁnally. (2) As soon as each term yk of the sequence y is received. In this aspect. this solution also reduces the overall code rate.with hβ a constant obtainable through the constraint βk (Si ) = 1 Si and where the recursion is initialized as βn (Si ) = 1 if Si = sn 0 otherwise (11) We can now formulate the BCJR algorithm by the following steps: (1) Initialize α0 and βn according to Eqs. (6). A more ﬂexible decoding strategy is oﬀered by a modiﬁcation of the BCJR algorithm in which the decoder operates on a ﬁxed memory span.

Moreover. number of states N . (6) and (8). we need a D-times long backward recursion. The Second. the demodulator supplies to the decoder the M distinct branch metrics. and cardinality of the code alphabet M = 2n0 . not fair. Thus. (5) The a posteriori transition probabilities at time k − D are computed according to + σk−D (Si .A. u))βk−D (Si (u)) (13) (6) The APP at time k − D is computed as Pk−D (u|y) = Si σk−D (Si . the computational complexity requires overall • (D + 1)M × 2k0 multiplications • (D + 1)M additions of 2k0 numbers each As a comparison. The First Version of the Sliding Window BCJR Algorithm (SW1-BCJR) Here are the steps: (1) Initialize α0 according to Eq. in the same situation. The obtained values of αk (Si ) are stored for all Si . (3) Initialization of the backward recursion (k > D): βk (Sj ) = αk (Sj ). n0 ). M × 2k0 additions and M × 2k0 -way comparisons. B. as well as the γk (x). To output the set of APP at each time instant.8 the Viterbi algorithm would require. and the decoder computes the probabilities αk (Si ) according to Eqs. (10) from time k − 1 back to time k − D. as the Viterbi algorithm does not provide the information we need. ∀Sj (12) (4) Backward recursion: It is performed according to Eq. u) (14) For a convolutional code with parameters (k0 . 70 . (2) Forward recursion at time k: Upon receiving yk . (9). the SW1-BCJR algorithm requires storage of N × D values of α’s and M × D values of the probabilities γk (x) generated by the soft demodulator. to update the α’s and β’s for each time instant. indeed. to get the decoded bits. u) = hσ · αk−D−1 (Si )γk−D (x(Si . Simpliﬁed Version of the Sliding Window BCJR Algorithm (SW2-BCJR) A simpliﬁcation of the sliding window BCJR that signiﬁcantly reduces the memory requirements consists of the following steps: 8 Though. the algorithm needs to perform M × 2k0 multiplications and N additions of 2k0 numbers. plus the trace-back operations.

u) (17) This version of the sliding window BCJR algorithm does not require storage of the N × D values of α’s as they are updated with a delay of D steps. As a consequence. the probabilities αk−D−1 (Si ) are computed according to Eq. (2) Forward recursion (k > D): If k > D. since the initialization of the β recursion is less accurate. ∀Sj N βk (Sj ) = (15) (4) Backward recursion (k > D): It is performed according to Eq. (9). they can be converted into an additive form passing to the logarithms. This observation will receive quantitative evidence in the section devoted to simulation results. The Log-BCJR The BCJR algorithm and its sliding window versions have been stated in multiplicative form. The computational complexity is the same as the previous version of the algorithm. Owing to the monotonicity of the logarithm function. (3) Initialization of the backward recursion (k > D): 1 . (10) from time k − 1 back to time k − D.(1) Initialize α0 according to Eq. However. u))βk−D (Si (u)) (16) (6) The APP at time k − D is computed as Pk−D (u|y) = Si σk−D (Si . (8). u) = hσ · αk−D−1 (Si )γk−D (x(Si . Let us deﬁne the following logarithmic quantities: Γk (x) = log[γ(x)] Ak (Si ) = log[αk (Si )] 71 . a larger value of D should be set in order to obtain the same accuracy on the output values Pk−D (u|y). only N values of α’s and M × D values of the probabilities γk (x) generated by the soft demodulator must be stored. Additive Algorithms A. (5) The a posteriori transition probabilities at time k − D are computed according to + σk−D (Si . V.

Further simpliﬁcations and the required circuits for implementation are discussed in the Appendix. and (5): Ak (Si ) = log u − exp Ak−1 (Si (u)) + Γk (x(u. u) = log[σk (Si . AM = max Ai i so that log i exp{Ai − AM } Ai =AM exp{Ai } = AM + log 1 + (21) and by computing the second term of the right-hand side (RHS) of Eq. derived from Eqs. (10). Simpliﬁed Versions of the Log-BCJR The problem in the recursions deﬁned for the log-BCJR consists of the evaluation of the logarithm of a sum of exponentials: log i exp{Ai } An accurate estimate of this expression can be obtained by extracting the term with the highest exponential. u)] These deﬁnitions lead to the following A and B recursions. u)) + Bk+1 Si (u) + HB (19) + Σk (Si . u) =Ak−1 (Si ) + Γk (x(Si . Si )) + HA (18) Bk (Si ) = log u + exp Γk+1 (x(Si .Bk (Si ) = log[βk (Si )] Σk (Si . (21) using lookup tables. (8). u)) + Bk (Si (u)) + HΣ (20) with the following initializations: A0 (Si ) = 0 if Si = s0 −∞ otherwise 0 if Si = sn −∞ otherwise B1 (Si ) = B. 72 .

and channel. Si )) + HA u (22) + Bk (Si ) = max Bk+1 (Si (u)) + Γk+1 (x(Si . ykn ] ˜ where the notations have been modiﬁed to show the vector nature of the symbols.However. when AM Ai . The channel symbols x and the output symbols from the encoder can be represented as vectors of n binary components: c = [c1 . xn ]xi ∈ {A. leading to the SW-log-BCJR and the SWAL1-BCJR and SWAL2-BCJR algorithms. · · · . −A} ˜ xk = [xk1 . we will make explicit the quantities considered in the previous algorithms’ descriptions by making assumptions on the code type. in terms of the likelihood ratios: 73 . can be split as n x γk (˜) = m=1 P (ykm |xkm = xm )P (xkm = xm ) (25) Since in this case the encoded symbols are n-tuple of binary symbols. cn ]ci ∈ {0. Both versions of the SW-BCJR algorithm described in the previous section can be used. · · · . Explicit Algorithms for Some Particular Cases In this section. u)) + Bk (Si (u)) + HΣ (24) with the same initialization of the log-BCJR. we particularize the previous equations in the case of a rate 1/n binary systematic encoder associated to n binary-pulse amplitude modulation (PAM) signals or binary phase shift keying (PSK) signals. over a memoryless channel. · · · . u) =Ak−1 (Si ) + Γk (x(Si . The joint probabilities x γk (˜). to transform the block log-BCJR and the AL-BCJR into their sliding window versions. This approximation leads to the additive logarithmic-BCJR (AL-BCJR) algorithm: − Ak (Si ) = max Ak−1 (Si (u)) + Γk (x(u. Rate 1/n Binary Systematic Convolutional Encoder In this section. · · · . u)) + HB u (23) + Σk (Si . with obvious modiﬁcations. VI. it is useful to redeﬁne the input probabilities. xkn ] ˜ yk = [yk1 . A. γ. the second term can be neglected. modulation format. 1} ˜ x = [x1 .

Si ) (26) βk (Si ) = hγ hβ u + βk+1 (Si (u)) m=1 n λ(k+1)m · λA (k+1)m cm (Si . u) Λkm + ΛA km + HB (30) Σk (Si . The Additive White Gaussian Noise Channel When the channel is the additive white Gaussian noise (AWGN) channel. the AL-BCJR algorithm. B. from Eq. u) = hγ hσ αk−1 (Si ) m=1 λkm · λA km cm (u. (25). u) =Ak−1 (Si ) + m=1 + cm (Si . we obtain the explicit expression of the log–likelihood ratios Λki as 74 . becomes n − Ak (Si ) = max Ak−1 (Si (u)) + u cm (u. n n x γk (˜) = (λkm )cm 1 + λkm m=1 (λA )cm km = hγ λkm · λA km 1 + λA km m=1 cm where hγ takes into account all terms independent of x.Si ) + βk (Si (u)) (28) whereas its simpliﬁcation.u) (27) σk (Si . Si ) Λkm + ΛA km m=1 + HA (29) n + Bk (Si ) = max Bk+1 (Si (u)) + u m=1 n cm (Si . ˜ The BCJR can be restated as follows: n αk (Si ) = hγ hα u − αk−1 (Si (u)) λkm · λA km m=1 n cm (u.λkm = P (ykm |xkm = A) P (ykm |xkm = −A) P (xkm = A) P (xkm = −A) λA = km so that. u) Λkm + ΛA + Bk (Si (u)) km (31) where Λ stands for the logarithm of the corresponding quantity λ.

un ). u) =Ak−1 (Si ) + m=1 cm (Si . 75 . u) 2A ykm + ΛA km σ2 + HB (33) Σk (Si . We will derive the iterative decoding algorithm through suitable approximations performed on maximum-likelihood decoding. The optimum decision algorithm on the kth bit uk is based on the conditional log–likelihood ratio Lk : P (uk = 1|y) P (uk = 0|y) u:uk =1 Lk = log = log P (y|u) j=k P (uj ) u:uk =0 P (y|u) j=k P (uj ) + log P (uk = 1) P (uk = 0) P (uk = 1) P (uk = 0) = log u:uk =1 u:uk =0 P (y|x(u)) P (y|x(u)) j=k j=k P (uj ) P (uj ) + log (35) where. VII. P (uj ) are the a priori probabilities. we will show how the MAP algorithms previously described can be embedded into the iterative decoding procedure of parallel concatenated codes. u) 2A ykm + ΛA km σ2 + + Bk (Si (u)) (34) In the examples presented in Section VIII. Let uk be the binary random variable taking values in {0. Iterative Decoding of Parallel Concatenated Convolutional Codes In this section. which employs three encoders and three interleavers (denoted by π in the ﬁgure). Si ) m=1 2A ykm + ΛA km σ2 + HA (32) n + Bk (Si ) = max Bk+1 (Si (u)) + u m=1 n cm (Si . (35). representing the sequence of information bits u = (u1 . · · · .Λki = log P (yki |xki = A) P (yki |xki = −A) 1 1 2 √ exp{− 2 (yki − A) } 2 2σ = 2A yki = log 2πσ 1 1 σ2 √ exp{− 2 (yki + A)2 } 2 2σ 2πσ Hence. The description will be based on the fairly general parallel concatenated code shown in Fig. we will consider turbo codes with rate 1/2 component convolutional codes transmitted as binary PAM or binary PSK over an AWGN channel. in Eq. 4. 1}. the AL-BCJR algorithm assumes the following form: n − Ak (Si ) = max Ak−1 (Si (u)) + u cm (u.

we restrict ourselves to three codes. given the observation y as λ(u) = log P (u|y) P (0|y) where 0 corresponds to the all-zero symbol. rather than symbol. uko . π3 imply that y2 and y3 are no longer simple convolutional encodings of u. 2. If the rate ko /no constituent code is not equivalent to a punctured rate 1/no code or if turbo trelliscoded modulation is used. in practice. Then we obtain the log–likelihood ratios of the jth bit within the symbol by u:uj =1 u:uj =0 L(uj ) = log eλ(u) eλ(u) In this way. we cannot compute Eq. 3 in Eq. The optimum bit decision metric on each bit is (for data with uniform a priori probabilities) P (y0 |u)P (y1 |u)P (y2 |u)P (y3 |u) u:uk =0 P (y0 |u)P (y1 |u)P (y2 |u)P (y3 |u) u:uk =1 Lk = log (36) but. interleaving is used. u2 . but extension to several codes is straightforward.u • • • π1 x0 u1 • D • D • x1 ENCODER 1 u2 π2 • D • D • x2 ENCODER 2 u3 π3 • D • D • x3 ENCODER 3 Fig. 3(x0 = u) and the corresponding received sequences as yi . 3. (36) for large n because the permutations π2 . 1. To explain the basic decoding concept. we can ﬁrst use the symbol MAP algorithm as described in the previous sections to compute the log–likelihood ratio of a symbol u = u1 . · · · . 1. Suppose that we evaluate P (yi |u). In order to simplify the notation. Parallel concatenation of three convolutional codes. 4. i = 0. i = 0. 2. and bit. the turbo decoder operates on bits. consider the combination of the permuter (interleaver) and the constituent encoder connected to it as a block code with input u and outputs xi . i = 0. (36) using Bayes’ rule and using the following approximation: 76 . 2.

k) + L0k + L2k + L3k ˜ where L0k = 2Ay0k /σ 2 (for binary modulation) and P (y1 |u) P (y1 |u) euj (L0j +L2j +L3j ) ˜ ˜ ˜ euj (L0j +L2j +L3j ) ˜ ˜ ˜ (39) ˜ ˜ ˜ f (y1 . Eq. 3. L1 . To obtain {Pi } or. Eq. The best approximation can be obtained using the Kullback cross-entropy minimizer. L1 . (36) as ˜ ˜ ˜ ˜ ˜ ˜ Lk = f (y1 . L0 . (37) and (38) again. to express Eq. The MAP algorithm approximates a nonseparable distribution with a separable one. L0 . and (42) is ˜ ˜ ˜ ˜ L1k =f (y1 . equivalently. L2 . k) + L0k + L1k + L2k A solution to Eqs. hence. L1 .. L2 . L3 . intuitively one expects that the cross-entropy between the input and the output of the MAP algorithm will decrease. as the reliability of the {uk } improves. however it is not clear how good it is compared with the Kullback cross-entropy minimizer. ˜ ˜ ˜ ˜ ˜ ˜ Lk = f (y3 . which minimizes the cross-entropy H(Q. In the iterative decoding. we can use it in Eq. L3 . L0 . 1. for i = 0. k) + L0k + L1k + L3k and similarly. i. so that the approximation will improve. L0 . k) ˜ ˜ ˜ ˜ L3k =f (y3 . (36) as ˜ ˜ ˜ ˜ ˜ ˜ Lk = f (y2 . we use Eqs. (37) holds with equality. L3 . However. L3 . L0 . Here we use the MAP algorithm for such an approximation. k) (43) (42) (41) 77 . L2 . (37) and (38) for i = 0. L1 . (41). can be obtained.e. (36) for i = 2 and i = 3 (by Bayes’ rule) to complete the algorithm. L0 . P ) = E{log(Q/P )} between the input P and the output Q. If such an approximation. 3 (by Bayes’ rule) to express Eq. L2 . {Lik }. L0 . L2 . 1}. (37). P (u|y0 ) is separable. k) = log u:uk =1 u:uk =0 j=k j=k (40) We can use Eqs. (39). L3 . So we need an algorithm that approximates a nonseparable distribution P (u|yi ) = P n ˜ with a separable distribution k=1 Pi (uk ) = Q. k) ˜ ˜ ˜ ˜ L2k =f (y2 .n P (u|yi ) ≈ k=1 ˜ Pi (uk ) (37) Note that P (u|yi ) is not separable in general. ˜ Deﬁne Lik by euk Lik ˜ Pi (uk ) = ˜ 1 + eLik ˜ (38) ˜ ˜ where uk ∈ {0. 2. but this time for i = 0.

Parallel. with permuters (direct and ˜ (m) inverse) where needed. we can use any MAP algorithm as described in the previous sections. L(m) . Then the summations in Eq. The ﬁnal decision is then based on ˜ ˜ ˜ ˜ Lk = L0k + L1k + L2k + L3k (44) which is passed through a hard limiter with zero threshold. any π1 can be used. 2. · · · . and L3 by using the iterative procedure ˜ ˜ ˜ ˜ (m+1) = α(m) f (y1 . (40). Fig. 2. one can set L systematic bits are distributed among encoders. 5 [13]. such approximations have been used by replacing with max in the log-BCJR algorithm to obtain AL-BCJR. 1. As discussed. 6 (when the switches are in position P). Eq. a serial implementation was proposed in [21]. if desired. i = 1. n. 6 (when the switches are in position S). however. (40) can be replaced by “maximum” operations with the same indices. 6. n. At this point. or hybrid implementations can be realized based on the signal ﬂow graph of Fig. For those applications where the systematic bits are not transmit˜ ted or for parallel concatenated trellis codes with high-level modulation. (43). max ˜ ˜ replacing u:uk =i with u:uk =i for i = 0. each node’s output is equal to internally generated reliability L minus the sum of all inputs to that node. call this the basic decoder Di . L0 . 3 represent the extrinsic information. i. we use the same distribution for y0 among the received observations for MAP decoders. For the computation of f (·). 2. ˜ (0) ˜ (0) ˜ ˜ (0) We start the recursion with the initial condition L1 = L2 = L3 = L0 . Based on our equations. 2. The signal ﬂow graph for extrinsic information is shown in Fig. Signal flow graph for extrinsic information. The overall decoder is composed of block decoders Di connected in parallel. L(m) . which can be implemented as a pipeline or by feedback. 5]. If the in the presence of systematic bits. i = 1. iterating on m. · · · .. Similar recursions hold for L2k and L3k . (43) does indeed exist. 3. Clearly. A serial implementation is also shown in Fig. serial. (45) reduces to 78 .e. For turbo codes with only two constituent codes. 5. as in Fig. A similar approximation can be used for L2k and L3k in Eq. (43) for L1 . We assumed π1 = I identity. The Lik . all versions of SW-BCJR can replace BCJR (MAP) decoders in Fig. which is a fully connected graph without self-loops. Even ˜ 0 = 0 and consider y0 as part of y1 . k) L1k 1 2 3 ˜ (m) ˜ (m) for k = 1. further approximation for iterative decoding is possible if one term corresponding to a sequence u dominates other terms in the summation in the numerator and denominator of Eq. Based on [13. L2 . we should set L0 = 0. This suboptimal decoder then corresponds to an iterative decoder that uses AL-BCJR rather than BCJR decoders. 5 (in this ﬁgure y0 is considered as part of y1 ). provided that a solution to Eq. The BCJR MAP algorithm always starts and ends at the all-zero state since we always terminate the trellis as described in [13].for k = 1. (45) D1 ~ L1 ~ L2 ~ L3 ~ L2 ~ L3 ~ L1 D2 D3 Fig. We attempted to solve the nonlinear equations ˜ ˜ ˜ in Eq.

Iterative decoder structure for three parallel concatenated codes. for each iteration. k) L1k 1 2 ˜ ˜ ˜ (m+1) = α(m) f (y2 . we will use a parallel concatenated code with only two constituent codes. L(m) . α1 and α2 can be optimized (simulated annealing) or set to 1 for simplicity. the decoder structure can be reduced to a serial mode (m) (m) structure if desired. k) L2k 2 1 for k = 1. and m = 1. ˜ ˜ ˜ (m+1) = α(m) f (y1 . In this special case.y0 2A/σ2 ~ L0 DELAY 1 ( m) – ~1 L • + π1 • • DELAY • 3 S P log-BCJR 1 or SWL-BCJR 1 π1–1 L1 + Σ D1 y1 DELAY 2 • • • • DELAY 3 + π2 •• DELAY • 1 S P log-BCJR 2 or SWL-BCJR 2 π2–1 L2 + Σ ~ ( m) – L2 D2 y2 • • ••• DELAY • S 2 P + π3 log-BCJR 3 or SWL-BCJR 3 π3–1 L3 + Σ ~ (m) – L3 D3 y3 S • Σ L P • •• DECODED BITS Fig. 79 (m) (m) . 7 are disjointed. 2. If we optimize α1 and α2 . L0 . The decoding conﬁguration for two codes is shown in Fig. n. the eﬀectiveness of turbo codes lies in the use of recursive convolutional codes and random permutations. In the results presented in the next section. L0 . where. since the paths in Fig. which requires estimates of the variances of L1k and L2k for each iteration in the presence of errors. 7. This results in time-shift-varying codes resembling random codes. 6. our method for two codes is similar to the decoding ˜ ˜ method proposed in [6]. L(m) . However. · · ·. 2. It is interesting to note that the concept of extrinsic information introduced in [6] was also presented as “partial factor” in [22]. · · · .

It can be seen that the penalty incurred by the suboptimum algorithm amounts to about 0. the bit-error probability versus the signal-to-noise ratio is plotted for a ﬁxed number (5) of iterations of the decoding algorithm and for both optimum BCJR and SWAL2-BCJR MAP decoding algorithms. In Fig. Each simulation run examined at least 25. The same curves are plotted in Fig. VIII. and an interleaver of length 16.5 dB. In Fig. in turn.384 designed according to the procedure described in [13]. we plot the bit-error probabilities as a function of the number of iterations of the decoding procedure using the optimum block BCJR algorithm for various values of the signal-to-noise ratio. Simulation Results In this section. where all MAP 80 . but simpler.000.y0 2A/σ2 ~ L0 DELAY 1 ~ ( m) L1 • + log-BCJR 1 OR SWL-BCJR 1 L1 + Σ – D1 y1 DELAY 2 log-BCJR 2 OR SWL-BCJR 2 L2 + ~ (m) L2 + π2 π2–1 Σ – D2 y2 DECODED BITS Fig. 7.75 dB of signal-to-noise ratio is required for convergence to the same BER and with the same number of iterations. Iterative decoder structure for two parallel concatenated codes. All simulations refer to a rate 1/3 PCCC with two equal. 10. uses the optimum BCJR and the suboptimal. 0. SWAL2-BCJR as embedded MAP algorithms. 9 for the case of the suboptimum SWAL2-BCJR algorithm. This ﬁgure is in agreement with a similar result obtained in [12]. recursive convolutional constituent codes with 16 states and generator matrix 1 + D + D3 + D4 1 + D3 + D4 G(D) = 1. In this case. It can be seen that the decoding algorithm converges down to BER = 10−5 at signal-to-noise ratios of 0. which.2 dB with nine iterations. we will present some simulation results obtained applying the iterative decoding algorithm described in Section VII. using an S-random permutation with S = 40.000 bits. 8.

B. 0. 0.75 10–4 1.25 0.05 Pb (e) 0. 81 . 8. The penalty is completely attributable to the approximation of the sum of exponentials described in Section V. To verify this.65 10–2 0. Convergence of turbo coding: bit-error probability versus number of iterations for various Eb/N0 using the SWAL2-BCJR algorithm. Convergence of turbo coding: bit-error probability versus number of iterations for various Eb/N0 using the SW2-BCJR algorithm. we have used a SW2-BCJR and compared its results with the optimum block BCJR.00 10–2 0. obtaining the same results. 11 and 12. These curves provide information on the delay incurred to obtain a given reliability as a function of the bit signal-to-noise ratio.85 algorithms were of the block type.20 10–1 0.05 10–1 0. we plot the number of iterations needed to obtain a given bit-error probability versus the bit signal-to-noise ratio.–0.15 0.50 10–5 2 4 6 8 10 12 14 16 18 20 NUMBER OF ITERATIONS Fig. Finally.35 0.70 10–3 0.60 Pb (e) 0.10 10–3 0.45 10–4 0. in Figs. for the two algorithms.00 10–5 2 4 6 8 10 12 14 16 18 20 NUMBER OF ITERATIONS Fig. 9.

Number of iterations to achieve several bit-error probabilities as a function of the bit signal-to-noise ratio using the SWAL2-BCJR algorithm.1 10–1 10–2 10–3 SWAL2-BCJR SW2-BCJR Pb (e) 10–4 10–5 10–6 10–7 10–8 0.8 0. the performances of the two algorithms were compared on the basis of a powerful rate 1/3 parallel concatenated code.1 0.0 Pb (e) = 10–3 Pb (e) = 10–4 Pb (e) = 10–2 Eb/N0 Fig. Finally. like the Viterbi algorithm. without requiring code trellis termination.5 0.9 1.7 0. 20 18 DELAY.7 0. A heuristic explanation of how to embed the maximum a posteriori algorithms into the iterative decoding of parallel concatenated codes was also presented. Conclusions We have described two versions of a simpliﬁed maximum a posteriori decoding algorithm working in a sliding window form. 11.9 1.3 0.0 Eb/N0 Fig. 10. 82 .2 0. Bit-error probability as a function of the bit signal-tonoise ratio using the SW2-BCJR and SWAL2-BCJR algorithms with five iterations.8 0. The algorithms can be used as a building block to decode continuously transmitted sequences obtained by parallel concatenated codes.6 0.4 0.6 0. number of iterations 16 14 12 10 8 6 4 2 0. IX.

12. Acknowledgment The research in this article was partially carried out at the Politecnico di Torino. 1064–1070. and V. Raviv. D. Concatenated Codes. “Statistical Detection for Communication Channels With Intersymbol Interference.” IEEE Transactions on Information Theory. Abend and B. [3] L.” Proceedings of IEEE. 1966.” Proceedings of ICC’93. Benedetto. [6] C. V. “Near Shannon Limit ErrorCorrecting Coding and Decoding: Turbo-Codes.5 Pb (e) = 10–2 Pb (e) = 10–3 Pb (e) = 10–4 Eb/N0 Fig. “Multidimensional Signals for a Continuous Channel. no. “Optimal Decoding of Linear Codes for Minimizing Symbol Error Rate. Massachusetts: Massachusetts Institute of Technology. Fritchman. vol. Peredachi Inform. and P. Berrou.1 0. Bahl.. under NATO Research Grant CRG 951208. E.. 1. R. J.2 0. A. 28–46. pp. number of iterations 8 7 6 5 4 3 2 1 0. 20. [5] V. Glavieux. Jelinek. 83 . Ginzburg. pp. 284–287. no. Thitimajshima. March 1974.” Probl. Number ot iterations to achieve several bit-error probabilities as a function of the bit signal-to-noise ratio using the SW2-BCJR algorithm. January 1984. New York: Prentice-Hall. References [1] S. Cambridge. May 1993. Switzerland. [4] G. Cocke. May 1970. Forney. Biglieri. pp. 1987. Digital Transmission Theory. and J.10 9 DELAY. vol. 5. Jr.3 0. 779– 785.4 0. D. Italy. Geneva. [2] K. Castellani. F. 58. pp.

1–43. Hoeher.” Proceedings of ICC’93.1. S. California. 3. June 1995.jpl. Pietrobon. 1740–1745. “A Simpliﬁcation of the Modiﬁed Bahl Algorithm for Systematic Convolutional Codes. University of South Australia. S. “Implementation and Performance of a Serial MAP Decoder for Use in an Iterative Turbo Decoder.” The Telecommunications and Data Acquisition Progress Report 42-122. Hagenauer and P. pp. “A Viterbi Algorithm With Soft-Decision Outputs and Its Applications. 31. 1073–1077. February 1995. June 1995. Australia. November 1994. no.5. “Theoretical Performance Evaluation of Diﬀerent SoftOutput Algorithms. Dissertation. Li. R.pdf [18] S. [16] S. “Generalized Viterbi Algorithms for Error Detection With Convolutional Codes. 41.ps. Pollara. 1995. November 1989. April–June 1995. Switzerland. Geneva. and Y. pp. Hansson and T. and F. Chateaubourg. 54–59. Divsalar and F.nasa. Barbulescu. “Turbo Codes for PCS Applications. pp. A. Benedetto and G. Divsalar and F. November 1989.” The Telecommunications and Data Acquisition Progress Report 42-123. Robertson. R. Pasadena. Seattle. Sydney. Seshadri and C.gz [17] D.7. November 5–8. S. 704–713.nasa. Washington. [12] P.” Ph. “Transfer Function Bounds on the Performance of Turbo Codes. vol. “A Comparison of Optimal and SubOptimal MAP Decoding Algorithms Operating in the Log Domain. [19] D. [20] D. [14] CAS 5093 Turbo-Code Codec. San Diego. Hagenauer. France: Comatlas. Villebrun.” Proceedings of ISIT’95. Montorsi. Dallas. August 1995. pp. Texas. Whistler. Pietrobon and A. Also http://audrey. W. 47. and J. [13] D.7 ed. August 15. 1995. Pollara. 99–121. 163–165. Texas. pp. 1009–1013. 875–880. data sheet. Sydney. Canada. pp. California. November 1994. vol. [10] S.1– 47. pp. 43.[7] N. “ Design of Parallel Concatenated Convolutional Codes.jpl.1.3. California. Barbulescu.-E. Pasadena.” Proceedings of ISITA’94. Vucetic. http://tda. pp. Dolinar. http://tda. McEliece.” to be published in IEEE Transactions on Communications. Aulin. no. J.” Proceedings of ICC’95. Divsalar and F. “On the Design of Turbo Codes. “Separable MAP ‘Filters’ for the Decoding of Product and Concatenated Codes. September 1995.D. May 1993. 3.gov/tda/progress report/42-122/122A. Sato. August 1994. and P.3. 3. 3.” Proceedings of IEEE MILCOM95. Benedetto and G. E. “Multiple Turbo Codes. July–September 1995. pp. on Information Theory. Sundberg. 84 . vol. [15] S. Washington.gov/tda/progress report/42-123/123D.” Proceedings of GLOBECOM’89. 471. Australia. British Columbia. Divsalar. S. 44–55.levels. Seattle.pdf [21] S. May 1995. Pollara. Jet Propulsion Laboratory.au/itr-users/steven/turbo/ISIT95ovh2. [22] J. [9] Y. 1996. “Optimum Soft-Output Detection for Channels With Intersymbol Interference.. “Performance of Turbo Codes.edu. B. Hoeher. [11] U.” Electronics Letters. “Iterative Decoding of Turbo Codes and Other Concatenated Codes. [8] J. Hoeher. pp. Pollara.” Proceedings of GLOBECOM’89. P. Montorsi.unisa. Jet Propulsion Laboratory. Dallas.” Proceedings of ISITA’94. pp.” Trans. Young. November 15.” Proceedings of ICC’95. pp. 1995. Lodge.

u) for i = N . A-1. u) for i = 1.Si )) + + COMPARE – Ak–1(Si (0)) + + BRANCH METRIC Γk (x (0. A-1 should be changed according to Footnotes 2 and 5. N . When both of them are one. the value of Lk in Eq. · · · . The circuit performs the operations for i = 1. then switch 1 moves to position 2 for feedback operation. (4) using Eq. Figure A-2 shows the implementation of Eq. normalization can be achieved by monitoring the two most signiﬁcant bits. This method increases the bit representation by an additional 2 bits. – Ak–1(Si (1)) BRANCH METRIC Γk (x (1. (19) for the backward recursion. When the circuit accepts Σk (Si . switch 1 goes to position 1 and switch 2 is closed. we show the basic circuits required to implement a serial additive MAP algorithm for both block log-BCJR and SW-log-BCJR. i. This operation is done for u = 1 and u = 0. The circuit accepts Σk (Si . then we reset all the most signiﬁcant bits to zero. In this ﬁgure. 10 Simpler 85 . 2. A-1. (20) for ﬁnal computation of bit reliability is shown in Fig.e.10 The circuit for maximization can be implemented simply by using a comparator and selector with feedback operation. (18) for the forward recursion using a lookup table for evaluation of log(1 + e−x ). Figure A-1 shows the implementation9 of Eq. 9 For feed-forward and nonconventional recursive convolutional codes.Si )) E SELECT 1 OF 2 E SELECT 1 OF 2 – + + LOOKUP TABLE log (1 + e –x) + x Ak (Si ) NORMALIZE Ak (Si ) – max {Ak (Sj )} NORMALIZED Ak (Si ) Fig. which is similar to Fig. Basic structure for forward computation in the log-BCJR MAP algorithm..Appendix Circuits to Implement the MAP Algorithm for Decoding Rate 1/n Component Codes of a Turbo Code In this appendix. A-3. and subtraction of maxj {Ak (Sj )} from Ak (Si ) is used for normalization to prevent buﬀer overﬂow. The diﬀerence between log(Pk (1|y)) and log(Pk (0|y)) represents the reliability value required for turbo decoding. (35). Extension to a parallel implementation is straightforward. the notations in Fig. A circuit for computation of log(Pk (u|y)) from Eq. switch 1 is in position 1 and switch 2 is open at the start of operation.

Eq. 8. We observed about a 0.1-dB degradation compared with the full MAP algorithm for the code described in Section VIII. (18). the log-BCJR algorithm can be built based on addition. and it may not necessarily be the same for the computation of Eq. We call this “linear” approximation. Approximation 1: We used the approximation log(1 + e−x ) ≈ −ax + b. and selection operations without requiring a lookup table. (20). 86 . and it may not necessarily be the same for the computation of Eq.1)) E SELECT 1 OF 2 E SELECT 1 OF 2 – + + + LOOKUP TABLE log (1 + e –x) x Bk (Si ) NORMALIZE Bk (Si ) – max {Bk (Sj )} NORMALIZED Bk (Si ) Fig. The parameter a should be optimized. If we use this approximation. 0 < x < b/a where b = log(2). and log(Pk (u|y)) from Eq. At most. and log(Pk (u|y)) from Eq.0 for our simulation. Basic structure for backward computation in the log-BCJR MAP algorithm.3 for the simulation. Eq. (20). (18). (4) using Eq. and we selected a = 0.+ Bk+1(Si (1)) BRANCH METRIC Γk+1 (x (Si. which is similar to a Viterbi algorithm implementation. (19). We propose two simpliﬁcations to be used for computation of log(1 + e−x ) without using a lookup table. Approximation 2: We take log(1 + e−x ) ≈ 0 c if x > η if x < η We selected c = log(2) and the threshold η = 1.2-dB degradation compared with the full MAP algorithm for the code described in Section VIII. (19). comparison.1)) + + COMPARE + Bk+1(Si (0)) + + BRANCH METRIC Γk+1 (x (Si. This threshold should be optimized for a given SNR. We observed about a 0.to 10-bit representation suﬃces for all operations (see also [12] and [16]). A-2. We call this “threshold” approximation. (4) using Eq.

u )} • COMPARE E SELECT 1 OF 2 E SELECT 1 OF 2 – • + + 2 LOOKUP TABLE log (1+e –x) + x SWITCH 1 • • • •1 • INITIAL VALUE • SWITCH 2 log Pk (u | y) Fig. A-3. 87 . Basic structure for bit reliability computation in the log-BCJR MAP algorithm.{Σk (Si.

- viTTERBI
- Channel coding.pptx
- C80216m-08_323r1
- getting to know 5G
- Overview Coding Theory
- Gar CIA 01
- D-Distributed Document Title/ Titre Du Document
- kc1_kap4.pdf
- TurboLDPCTutorial
- Convolutional Codes
- ReferenceBooks
- All of Us_mid Term Ppt
- Regular Expressions
- 2.1 - Encoding Techniques (2)
- Interview Electronics Communications Interview Questions and Answers
- 2_BasicInformationTheory
- Comp Arc A
- ragged2e.pdf
- prob1
- Amr Nokia 22 March
- lista simboluri qwerty.pdf
- Technique to Construct Trend Free Ffd Using Some Linear Codes
- Hexadecimal Numbers
- Service830.pdf

- A Pipelined Processing Architecture for Delay Optimized Matching of Data Encoded with Hard Systematic ECC
- Implementation of Viterbi Decoder on FPGA to Improve Design
- Noise Analysis on Indoor Power Line Communication Channel
- BER Performance of MIMO with Orthogonal Space Time Block Code Using FEC and Various Modulation Techniques
- Segment Combination based Approach for Energy- Aware Multipath Communication in Underwater Sensor Networks
- FPGA based BCH Decoder
- Blind Channel Shortening for MIMO-OFDM System Using Zero Padding and Eigen Decomposition Approach
- As NZS 4540-1999 Digital Broadcasting Systems for Television Sound and Data Services - Framing Structure Chan
- Federal Election Commission (FEC)

Sign up to vote on this title

UsefulNot usefulClose Dialog## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

Close Dialog## This title now requires a credit

Use one of your book credits to continue reading from where you left off, or restart the preview.

Loading