Professional Documents
Culture Documents
The UMTS Turbo Code and An Efficient Decoder Implementation Suitable For Software-Defined Radios
The UMTS Turbo Code and An Efficient Decoder Implementation Suitable For Software-Defined Radios
This paper provides a description of the turbo code used by the UMTS third-generation cellular
standard, as standardized by the Third-Generation Partnership Project (3GPP), and proposes an
efficient decoder suitable for insertion into software-defined radio architectures or for use in computer simulations. Because the decoder is implemented in software, rather than hardware, singleprecision floating-point arithmetic is assumed and a variable number of decoder iterations is not
only possible but desirable. Three twists on the well-known log-MAP decoding algorithm are proposed: (1) a linear approximation of the correction function used by the max* operator, which
reduces complexity with only a negligible loss in BER performance; (2) a method for normalizing
the backward recursion that yields a 12.5% savings in memory usage; and (3) a simple method for
halting the decoder iterations based only on the log-likelihood ratios.
KEY WORDS: Coding; turbo codes; WCDMA (UMTS); 3GPP; software-defined radio (SDR).
1. INTRODUCTION
203
1068-9605/01/1000-0203/0 2002 Plenum Publishing Corporation
204
computation of the max* operator and the dynamic halting of the decoder iterations. Simple, but effective, solutions to both of these problems are proposed and illustrated through simulation.
In the description of the algorithm, we have assumed
that the reader has a working knowledge of the Viterbi
algorithm [14]. Information on the Viterbi algorithm can
be found in a tutorial paper by Forney [15] or in most
books on coding theory (e.g., [16]) and communications
theory (e.g., [17]).
We recommend that the decoder described in this
paper be implemented using single-precision floatingpoint arithmetic on an architecture with approximately
200 kilobytes of memory available for use by the turbo
codec. Because mobile handsets tend to be memory limited and cannot tolerate the power inefficiencies of floating-point arithmetic, this may limit the direct application
of the proposed algorithm to only base stations. Readers
interested in fixed-point implementation issues are referred to [18], while those interested in minimizing
memory usage should consider the sliding-window algorithm described in [19] and [20].
The remainder of this paper is organized as follows:
Section 2 provides an overview of the UMTS turbo code,
and Section 3 discusses the channel model and how to
normalize the inputs to the decoder. The next three sections describe the decoder, with Section 4 describing the
algorithm at the highest hierarchical level, Section 5 discussing the so-called max* operator, and Section 6 describing the proposed log-domain implementation of the
MAP algorithm. Simulation results are given in Section
7 for two representative frame sizes (640 and 5114 bits)
in both additive white Gaussian noise (AWGN) and fully
interleaved Rayleigh flat-fading. Section 8 describes a
simple, but effective, method for halting the decoder
iterations early, and Section 9 concludes the paper.
P[ Sk 1Yk ]
(1)
fY (Yk Sk 1)
205
4. DECODER ARCHITECTURE
The architecture of the decoder is as shown in
Fig. 2. As indicated by the presence of a feedback path,
the decoder operates in an iterative manner. Each full iteration consists of two half-iterations, one for each constituent RSC code. The timing of the decoder is such that
RSC decoder #1 operates during the first half-iteration,
and RSC decoder #2 operates during the second halfiteration. The operation of the RSC decoders is described
in Section 6.
The value w(Xk), 1 k K, is the extrinsic information produced by decoder #2 and introduced to the
input of decoder #1. Prior to the first iteration, w(Xk) is
initialized to all zeros (since decoder #2 has not yet acted
on the data). After each complete iteration, the values of
w(Xk) will be updated to reflect beliefs regarding the data
propagated from decoder #2 back to decoder #1. Note
that because the two encoders have independent tails,
only information regarding the actual data bits is passed
between decoders. Thus, w(Xk) is not defined for K 1
k K 3 (if it were defined it would simply be equal
to zero after every iteration).
The extrinsic information must be taken into account by decoder #1. However, because of the way that
the branch metrics are derived, it is sufficient to simply
add w(Xk) to the received systematic LLR, R(Xk), which
forms a new variable, denoted V1(Xk). For 1 k K, the
input to RSC decoder #1 is both the combined systematic data and extrinsic information, V1(Xk), and the received parity bits in LLR form, R(Zk). For K 1 k
K 3 no extrinsic information is available, and thus the
(2)
2 ak
Y
s2 k
(3)
Fig. 2. Proposed turbo decoder architecture.
206
input to RSC decoder #1 is the received and scaled upper
encoders tail bits, V1(Xk) R(Xk), and the corresponding received and scaled parity bits, R(Zk). The output of
RSC decoder #1 is the LLR 1(Xk), where 1 k K
since the LLR of the tail bits is not shared with the other
decoder.
By subtracting w(Xk) from 1(Xk), a new variable,
denoted V2(Xk), is formed. Similar to V1(Xk), V2(Xk)
contains the sum of the systematic channel LLR and the
extrinsic information produced by decoder #1 (note,
however, that the extrinsic information for RSC decoder #1 never has to be explicitly computed). For 1
k K, the input to decoder #2 is V2(Xk), which is the
interleaved version of V2(Xk), and R(Zk), which is the
channel LLR corresponding to the second encoders
parity bits. For K 1 k K 3, the input to RSC
decoder #2 is the received and scaled lower encoders
tail bits, V2(Xk) R(Xk), and the corresponding received and scaled parity bits, R(Zk). The output of RSC
decoder #2 is the LLR 2(Xk), 1 k K, which is
deinterleaved to form 2(Xk). The extrinsic information
w(Xk) is then formed by subtracting V2(Xk) from 2(Xk)
and is fed back to use during the next iteration by
decoder #1.
Once the iterations have been completed, a hard bit
decision is taken using 2(Xk), 1 k K, where Xk 1
when 2(Xk) 0 and X k 0 when 2(Xk) 0.
5. THE MAX* OPERATOR
The RSC decoders in Fig. 2 are each executed using
a version of the classic MAP algorithm [21] implemented
in the log-domain [13]. As will be discussed in Section 6,
the algorithm is based on the Viterbi algorithm [14] with
two key modifications: First, the trellis must be swept
through not only in the forward direction but also in the
reverse direction, and second, the add-compare-select
(ACS) operation of the Viterbi algorithm is replaced with
the Jacobi logarithm, also known as the max* operator
[19]. Because the max* operator must be executed twice
for each node in the trellis during each half-iteration
(once for the forward sweep, and a second time for the reverse sweep), it constitutes a significant, and sometimes
dominant, portion of the overall decoder complexity. The
manner that max* is implemented is critical to the performance and complexity of the decoder, and several
methods have been proposed for its computation. Below,
we consider four versions of the algorithm: log-MAP,
max-log-MAP, constant-log-MAP, and linear-log-MAP.
The only difference among these algorithms is the manner in which the max* operation is performed.
(4)
(5)
if | y x | T
(6)
if | y x | T
if y x T
if y x T
(7)
207
(8)
208
Fig. 4. Trellis section for the RSC code used by the UMTS turbo code.
Solid lines indicate data 1 and dotted lines indicate data 0. Branch metrics are indicated.
(9)
where the tilde above bk(Si) indicates that the metric has
not yet been normalized and Sj1 and Sj 2 are the two states
at stage k 1 in the trellis that are connected to state Si
~
at stage k. After the calculation of bk(S0), the partial path
metrics are normalized according to
b k ( Si ) b k ( Si ) b k ( S0 )
(10)
The backward metrics bk (Si ) at stage k 1 are never used and therefore do not need to be computed.
where Si1 and Si2 are the two states at stage k 1 that are
connected to state Sj at stage k. After the calculation of
a k(S0), the partial path metrics are normalized using
a k ( Si ) a k ( Si ) a k ( S 0 )
(12)
(13)
The likelihood of data 1 (or 0) is then the Jacobi logarithm of the likelihood of all branches corresponding to
data 1 (or 0), and thus:
( Xk ) ( S max*
{l k (i, j )} ( S
max*
{l k (i, j )} (14)
S ): X = 0
S ): X =1
i
Note that ak does not need to be computed for k K (it is never used),
although the LLR (Xk ) must still be found.
209
coded by all four algorithms. Thus, there must be a different reason for this phenomenon. We believe the reason for this discrepancy is as follows: although each of
the two MAP decoders shown in Fig. 2 is optimal in
terms of minimizing the local BER, the overall turbo
decoder is not guaranteed to minimize the global
BER. Thus, a slight random perturbation in the computed partial path metrics and corresponding LLR values could result in a perturbation in the BER. The error
caused by the linear-log-MAP approximation to the Jacobi algorithm induces such a random perturbation both
within the algorithm and into the BER curve. Note that
this perturbation is very minor and the performance of
210
Fading
Algorithm
K 640
K 5114
K 640
K 5114
max-log-MAP
constant-log-MAP
linear-log-MAP
log-MAP
1.532 dB
1.269 dB
1.220 dB
1.235 dB
0.819 dB
0.440 dB
0.414 dB
0.417 dB
2.916 dB
2.505 dB
2.500 dB
2.488 dB
2.073 dB
1.557 dB
1.547 dB
1.533 dB
This suggests that in a software implementation, perhaps the algorithm choice should be made adaptive
(e.g., choose linear-log-MAP at low SNR and max-logMAP at high SNR).
211
Algorithm
Throughput
max-log-MAP
constant-log-MAP
linear-log-MAP
log-MAP
366 kbps/iteration
296 kbps/iteration
262 kbps/iteration
51 kbps/iteration
212
min { 2 ( Xk )} T
1 k K
(15)
213
+ T [ln(1 e x )] dx
2
(16)
This function is minimized by setting the partial derivatives with respect to a and T equal to zero. First, take the
partial with respect to a:
( a, T )
a
T
0 2[ a( x T ) ln(1 ex )]( x T ) dx
2 0 a( x T ) 2 dx 2 0 ( x T )ln(1 ex )dx
(1) n1
T
2
T
a( x T )3 2 0 ( x T )
enx dx
n1
3
n
0
(1) n1 T
2
aT 3 2
0 ( x T )enx dx
n1
n
3
(1) n1
2
1
T (enT 1)
aT 3 2
n1
n2
n
3
(1) n1
2 3
(17)
aT 2 K1T 2 K 2 2
enT
n1
3
n3
T
[ ()
( 1) n +1
K2 =
n3
n =1
K1 =
n =1
0 2[a( x T ) ln(1 + e x )]
T
x =T
( a) dx
(18)
214
4.
(1) n1 T nx
a
T 2
0 e dx
n
2
n =1
(1) n1 nT
a
[e 1]
T 2
n2
2
n =1
(1) n1 nT
a
T 2 K1
e
2
n2
n =1
5.
6.
7.
8.
9.
(19)
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
T 2.50681640022001
(21)
Once the optimal value of T has been found, a can easily be found by setting either (17) or (19) equal to zero
and solving for a, which results in
a 0.24904181891710
(22)
REFERENCES
1. C. Berrou, A. Glavieux, and P. Thitimasjshima, Near Shannon
limit error-correcting coding and decoding: Turbo-codes(1), Proc.
IEEE Int. Conf. on Commun. (Geneva, Switzerland), pp. 1064
1070, May 1993.
2. S. Benedetto and G. Montorsi, Unveiling turbo codes: Some results on parallel concatenated coding schemes, IEEE Trans. Inform. Theory, Vol. 42, pp. 409428, Mar. 1996.
3. L. C. Perez, J. Seghers, and D. J. Costello, A distance spectrum in-
21.
22.
23.
24.
25.
26.
215
to attending graduate school at Virginia Tech, he was an electronics engineer at the United States Naval Research Laboratory, Washington,
DC, where he was engaged in the design and development of a spaceborne adaptive antenna array and a system for the collection and correlation of maritime ELINT signals.
Jian Sun received his B.S.E.E. in 1997 and M.S.E.E. in 2000, both
from Shanghai Jiaotong University (Shanghai, China). He is currently
pursuing a Ph.D. in the Lane Department of Computer Science and
Electrical Engineering at West Virginia University (Morgantown,
WV). His research interests are in wireless communications, wireless
networks, and DSP applications.