You are on page 1of 94

박사학위논문

Doctoral Thesis

4세대 이동통신을 위한 다표준 지원 고성능 터보 디코더의 설계 및 구현

김 지 훈 (金 志 勳 Kim, Ji-Hoon)
김 지 훈 (金 志 勳 Kim, Ji-Hoon)

전자전산학부 전기 및 전자공학전공

Design and Implementation of High-Performance Radix-4 Turbo Decoder for Multiple 4G Standards

School of Electrical Engineering & Computer Science Division of Electrical Engineering

KAIST

2009

4세대 이동통신을 위한 다표준 지원 고성능 터보 디코더의 설계 및 구현

Design and Implementation of High-Performance Radix-4 Turbo Decoder for Multiple 4G Standards

4 세대 이동통신을 위한 다표준 지원 고성능 터보 디코더의 설계 및 구현 Design and Implementation of

Design and Implementation of High-Performance Radix-4 Turbo Decoder for Multiple 4G Standards

Advisor: Professor Park, In-Cheol by Kim, Ji-Hoon

School of Electrical Engineering & Computer Science Division of Electrical Engineering KAIST

faculty of
faculty
of

A thesis submitted to the

the

KAIST in partial

fulfillment of the requirements

for

the

degree of

Doctor of

Philosophy in Engineering in the School of Electrical

Engineering and Computer Science, Engineering

Division

of

Electrical

Daejeon, Korea 2009. 4. 29. Approved by

Professor

In-Cheol Park

Professor Park, In-Cheol Advisor

4 세대 이동통신을 위한 다표준 지원 고성능 터보 디코더의 설계 및 구현

김 지 훈

위 논문은 한국과학기술원 박사학위논문으로 학위논문 심사위원회에서 심사 통과하였음.

4 세대 이동통신을 위한 다표준 지원 고성능 터보 디코더의 설계 및 구현 김 지 훈 위

2009429aa

심사위원장 박 인 철 () 심사위원 이 귀 로 () 심사위원 정 세 영 () 심사위원 조 성 환 () 심사위원 허 재 혁 ()

사랑하는 가족에게 이 논문을 바칩니다.

사랑하는 가족에게 이 논문을 바칩니다 . d ddfasdf

d

ddfasdf

DEE

김 지 훈. Kim, Ji-Hoon. Design and Implementation of High-
20047150 Performance Radix-4 Turbo Decoder for Multiple 4G Standards. 4세대 이동통신을 위한 다표준 지원 고성능 터보 디코더의 설계 및 구

. School of Electrical Engineering & Computer Science, Division of Electrical Engineering. 2009. 85 p. Advisor Prof. Park, In-Cheol. Text in English

Abstract

Recently, turbo codes have been adopted for high-speed data transmission of the 4G communications systems such as Mobile WiMAX (IEEE 802.16e standard) and 3GPP- LTE in the form of the double-binary and the single-binary, respectively. Especially, double-binary convolutional turbo code (CTC) shows superior advantages over the classical single-binary turbo codes. However, compared with the classical single-binary turbo code, nonbinary turbo code is much more complex in hardware implementation and its decoding requires more memory especially for storing the extrinsic information to be exchanged between the two soft-input soft-output (SISO) decoders. Additionally, due to its iterative decoding behavior, implementing a high-performance turbo decoder for next- generation mobile communication systems becomes challenging. Also, as the need to support multiple standards in a single mobile handheld device increases, the efficient implementation of the advanced channel decoders, which is the most area-consuming and computationally intensive block in baseband modem, becomes more important. In order to deal with these issues in resource limited handheld systems, this dissertation presents several solutions from every aspect algorithm, architecture, and implementation. As an algorithmic solution, two techniques are proposed, which are especially suitable for nonbinary / high-radix single-binary turbo decoding. The first one, an energy-efficient SISO decoder based on border metric encoding, eliminates the complex dummy calculation at the cost of a small-sized memory that holds encoded border metrics. Due to the infrequent accesses to the border memory and its small size, the energy consumed for SISO decoding is reduced hugely. As the second one, to reduce the memory size required

DEE 김 지 훈 . Kim, Ji-Hoon. Design and Implementation of High- 20047150 Performance Radix-4 Turbo

for double-binary turbo decoding, a new method to convert the symbolic extrinsic information to the bit-level information and vice versa is presented. By exchanging the bit- level extrinsic information, the number of extrinsic information values to be exchanged in double-binary turbo decoding is reduced to the same amount as single-binary turbo decoding. Since the size of the extrinsic information memory is significant, the proposed method is effective in reducing the total memory size needed in double-binary turbo decoder. Based on the proposed algorithmic solutions, to verify the proposed methods, two chips have been implemented. The first implemented chip contains a double-binary turbo decoder for the mobile WiMAX standard with the dedicated hardware interleaver and fabricated using a 0.13μm CMOS process. The proposed decoder is based on the time- multiplexing architecture consisting of a single optimized SISO decoder, a low-complexity hardware interleaver, and it can provide up to 50Mb/s at the frequency of 200MHz with simple early stopping criterion exploiting the bit-level extrinsic information. The second chip presents the unified radix-4 turbo decoder architecture which can support both Mobile WiMAX and 3GPP-LTE. To exhibit a decoding rate of more than 100Mb/s, the proposed chip consists of eight retimed radix-4 SISO decoders and a dual-mode parallel hardware interleaver to support both standards. The second chip can show more than 400Mb/s at the frequency of 250MHz with simple early stopping criterion. The proposed chip can achieve an energy efficiency of 0.34nJ/bit/iteration while achieving more than 100Mb/s with fixed eight iterations when the supply voltage is scaled since the peak operating frequency is relatively high due to the retiming technique.

for double-binary turbo decoding, a new method to convert the symbolic extrinsic information to the bit-level

Contents

CHAPTER 1

INTRODUCTION

8

  • 1.1 Motivation................................................................................................................

8

  • 1.2 Previous Works ........................................................................................................

9

  • 1.3 Contributions

11

CHAPTER 2

BACKGROUNDS .........................................................14

  • 2.1 Digital Communication System ..............................................................................14

  • 2.2 Introduction to Turbo Codes ..................................................................................16

    • 2.2.1 Turbo Code Encoder Structure ...........................................................................17

Contents CHAPTER 1 INTRODUCTION 8 1.1 Motivation................................................................................................................ 8 1.2 Previous Works ........................................................................................................ 9 1.3 Contributions 11
  • 2.2.2 Turbo Decoding .................................................................................................19

  • 2.2.3 Decoding Algorithm for Turbo Codes .................................................................19

  • 2.3 Turbo code in Mobile WiMAX ...............................................................................27

    • 2.3.1 Encoding ...........................................................................................................28

    • 2.3.2 Decoding ...........................................................................................................29

  • 2.4 Turbo code in 3GPP-LTE .......................................................................................32

    • 2.4.1 Encoding ...........................................................................................................32

    • 2.4.2 Decoding ...........................................................................................................34

  • CHAPTER 3

    BORDER METRIC ENCODING ................................35

    • 3.1 Radix-4 SISO Decoding ..........................................................................................35

      • 3.1.1 Sliding Window for nonbinary SISO Decoding...................................................36

    • 3.2 Proposed Border Metric Encoding .........................................................................38

    • 3.3 Experimental Results..............................................................................................42

    1

    CHAPTER 4

    BIT-LEVEL EXTRINSIC INFORMATION

    EXCHANGE...........................................................................................44

    • 4.1 Extrinsic Information in Double-Binary Turbo Codes ..........................................44

      • 4.1.1 Symbol-level Extrinsic Information in

    Double-Binary Turbo Codes ....................44

    • 4.1.2 Memory Requirement in Double-Binary Turbo Decoder .....................................45

    • 4.2 Proposed Bit-Level Extrinsic Information Exchange ............................................47

      • 4.2.1 Bit-level Extrinsic Information for Double-Binary Turbo Codes .........................47

    • 4.2.2 Symbol-to-Bit

    Conversion

    of

    Extrinsic

    Information ............................................50

    • 4.2.3 Bit-to-Symbol

    Conversion

    of

    Extrinsic

    Information ............................................51

    • 4.3 Experimental Results..............................................................................................53

      • 4.3.1 Hardware Implementation ..................................................................................55

    CHAPTER 5

    A 50MBPS DOUBLE-BINARY CIRCULAR TURBO

    DECODER FOR MOBILE WIMAX ....................................................58

    CHAPTER 4 BIT-LEVEL EXTRINSIC INFORMATION EXCHANGE...........................................................................................44 4.1 Extrinsic Information in Double-Binary Turbo Codes ..........................................44 4.1.1 Symbol-level
    • 5.1 Proposed Chip Architecture ...................................................................................58

      • 5.1.1 Low-Complexity SISO Decoder Design .............................................................59

      • 5.1.2 Bit-level Extrinsic Information Exchange ...........................................................60

      • 5.1.3 Dedicated Hardware Interleaver .........................................................................60

      • 5.1.4 Dedicated Double-Flow Hardware Interleaver ....................................................63

      • 5.1.5 Early Stopping Criterion ....................................................................................64

  • 5.2 Implementation Results ..........................................................................................65

  • CHAPTER 6 A UNIFIED PARALLEL RADIX-4 TURBO DECODER FOR MOBILE WIMAX AND 3GPP-LTE ........................69

    • 6.1 Proposed Chip Architecture ...................................................................................69

      • 6.1.1 Parallel Turbo Decoding.....................................................................................70

      • 6.1.2 Unified Radix-4 SISO Decoder with Retiming ...................................................71

      • 6.1.3 Memory-Sharing with Bit-level Extrinsic Information ........................................74

      • 6.1.4 Dual-Mode Hardware Interleaver .......................................................................75

  • 6.2 Implementation Results ..........................................................................................76

  • 2

    CHAPTER 7

    CONCLUSIONS ...........................................................80

    REFERENCE .........................................................................................82

    CHAPTER 7 CONCLUSIONS ...........................................................80 REFERENCE .........................................................................................82 3

    3

    List of Figures

    Figure 1.1: The Need for Supporting Multiple Standards

    ............................................

    9

    Figure 1.2: Research Overview ....................................................................................12

    Figure 1.3: Proposed Solutions for Nonbinary CTC Decoder Implementation ..........13

    Figure 2.1: Model of a digital communication system .................................................15

    Figure 2.2: Turbo code encoder structure

    .

    ..................................................................17

    Figure 2.3:

    A turbo decoder structure ..........................................................................20

    Figure 2.4: Double-binary CRSC constituent encoder used by WiMAX ....................29

    Figure

    2.5: A decoder for the

    WiMAX turbo code .......................................................30

    List of Figures Figure 1.1: The Need for Supporting Multiple Standards ............................................ 9 Figure 1.2: Research

    Figure 2.6: A Turbo Encoder for 3GPP-LTE ...............................................................33

    Figure 2.7: A Trellis Diagram of a 3GPP-LTE Turbo Encoder ....................................33

    Figure 3.1:

    Trellis Diagrams .........................................................................................36

    Figure 3.2: Sliding window diagrams...........................................................................37

    Figure 3.3: 3-bit border metric encoding function .......................................................39

    Figure 3.4: BER performance comparison with 8 iterations for 4800-bit

    frame.

    .......40

    Figure 3.5: BER performance of 1920-bit frame according to the number of iterations

    ......................................................................................................................................41

    Figure 4.1:

    Memory Requirements in Double-Binary Turbo Decoder ........................47

    Figure 4.2: Block diagram of the proposed bit-level extrinsic information exchange .48

    Figure 4.3: Proposed Bit-Level Extrinsic Information Exchange ...............................54

    Figure 4.4: Comparison of BER performance of 8 iterations for 1920-bit

    frame.

    ......54

    4

    Figure 4.5: Block diagram of the proposed double-binary turbo decoder ..................56

    Figure 4.6: Block diagram and complexity of the proposed bit-to-symbol converter .56

    Figure

    5.1:

    Branch metric memory width comparison ................................................60

    Figure

    5.2: Block diagram of the proposed two converters .........................................61

    Figure 5.3: Interleaving procedure for the WiMAX ....................................................61

    Figure 5.4: Interleaver structure based on the incremental calculation ......................62

    Figure

    5.5: Need of LIFO for Interleaved Address ......................................................63

    Figure 5.6: Double-flow hardware interleaver based on incremental calculation .......64

    Figure 5.7: Double-flow hardware interleaver based on incremental calculation .......65

    Figure 5.8: Block diagram of the proposed double-binary turbo decoder ..................65

    Figure 5.9: Average number of iterations for the proposed turbo decoder .................66

    Figure 4.5: Block diagram of the proposed double-binary turbo decoder ..................56 Figure 4.6: Block diagram and

    Figure 5.10: Comparison of BER performance for 1920-bit frame .............................66

    Figure 5.11: Die photo of the proposed double-binary turbo decoder chip .................67

    Figure 6.1: Overall Unified Turbo Decoder Architecture with Time-Multiplexing .....70

    Figure 6.2: The Proposed Chip Architecture with Eight SISO Decoders ....................71

    Figure 6.3: Add-Compare-Select (ACS) block with Retiming .....................................72

    Figure 6.4: Sliding Window with Register Retiming ...................................................73

    Figure 6.5: Input Frame Memory Configurations .......................................................74

    Figure 6.6: Dual-Mode Dedicated Hardware Interleaver............................................76

    Figure 6.7: FER Performance and Average Iteration Number with Early Termination in an AWGN Channel ...................................................................................................78

    Figure 6.8: Memory Size Reduction in the Proposed Architecture .............................78

    Figure

    6.9: Micrograph of the Chip .............................................................................79

    5

    List of Tables

    Table 1.1 Differences between the 3GPP-LTE and Mobile WiMAX Turbo Codes......10

    Table 3.1 Simulation environment ...............................................................................39

    Table 3.2 Encoded values for border metrics ...............................................................40

    Table 3.3 Single-port SRAM size required for a SISO decoder ...................................43

    Table 3.4 Energy consumptions of SISO decoders.......................................................43

    Table 4.1 Simulation environment ...............................................................................46

    Table 4.2 Memory Configuration for one SISO Decoder ............................................46

    Table 4.3 Memory Configuration for the Extrinsic Information .................................46

    List of Tables Table 1.1 Differences between the 3GPP-LTE and Mobile WiMAX Turbo Codes......10 Table 3.1

    Table 4.4 Single-port SRAM Size required for the Turbo Decoder .............................55

    Table 5.1 CTC Interleaver Parameters for WiMAX....................................................62

    Table 5.2 Single-port SRAM Size Required for the Turbo Decoder ............................67

    Table 6.1 Comparison of Decoder Implementation .....................................................77

    6

    List of Abbreviations

    4G: 4 th Generation

    RSC: Recursive Systematic Convolutional

    CTC: Convolutional Turbo Code

    SISO: Soft-Input Soft-Output

    LLR: Log Likelihood Ratio

    ML: Maximum Likelihood

    SOVA: Soft-Output Viterbi Algorithm

    MAP: Maximum a posteriori

    APP: a posteriori Probability

    ECC: Error Correction Coding

    List of Abbreviations 4G : 4 Generation RSC : Recursive Systematic Convolutional CTC : Convolutional Turbo

    FEC: Forward Error Correction

    BPSK: Binary Phase Shift Keying

    OFDMA: Orthogonal Frequency Division Multiple Access

    NLOS: Non-Link-of-Sight

    AWGN: Additive White Gaussian Noise

    ARP: Almost Regular Permutation

    QPP: Quadratic Polynomial Permutation

    7

    Chapter 1 Introduction

    The turbo code introduced in 1993 is one of the most powerful forward error correction channel codes, and provides near optimal bit-error rates (BERs), that is, within 0.5 dB of Shannons limit at BER of 10 -5 [1]. Having this remarkable performance, the turbo codes have been accepted in many standardized mobile radio systems. Recent advance in convolutional turbo code (CTC) attracts much interest in its applications. Conventional CTC suffers from high error floor due to its relative small minimum Hamming distance and suffers from performance degradation due to puncturing. Nonbinary CTC has recently emerged and it seems to solve many flaws of classical single- binary CTC [2]. In addition, the concept of tail-biting convolutional code has been applied to CTC. The tail-biting code called circular code improves the spectral efficiency of CTC since it solves the problem of tail bits used to terminate the state of the encoder. Recently, turbo codes have been adopted for high-speed data transmission of the 4G mobile communication systems such as Mobile WiMAX (IEEE 802.16e standard) and 3GPP-LTE in the form of the double-binary and the single-binary, respectively.

    Chapter 1 Introduction The turbo code introduced in 1993 is one of the most powerful forward

    1.1 Motivation

    There has been little research dedicated to the hardware implementation of the double- binary turbo decoder although the previous works on the classical single-binary turbo codes can be applied to the nonbinary turbo codes [4]-[11]. Compared with the classical single-binary turbo code, nonbinary turbo code is much more complex in hardware implementation and its decoding requires more memory especially for storing the extrinsic information to be exchanged between the two soft-input soft-output (SISO) decoders.

    8

    Figure 1.1: The Need for Supporting Multiple Standards In addition, as the need to support multiple

    Figure 1.1: The Need for Supporting Multiple Standards

    Figure 1.1: The Need for Supporting Multiple Standards In addition, as the need to support multiple

    In addition, as the need to support multiple standards in a single handheld device increases as shown in Figure 1.1, the efficient implementation of the advanced channel decoders, which is the most area-consuming and computationally intensive block in baseband modem, becomes more important. Accordingly, the unified decoder architecture which can support multiple standards becomes necessary since the separate implementations for different standards require much hardware resources leading to huge silicon area occupation. Since the turbo codes adopted in 3GPP-LTE and Mobile WiMAX are different from each other as denoted in Table 1.1, the efficient implementation of the unified turbo decoder to support both 3GPP-LTE and Mobile WiMAX is important for future mobile hand-held devices. Also, due to its iterative decoding behavior and long critical path, implementing a high-performance turbo decoder for next-generation mobile communication systems becomes challenging.

    1.2 Previous Works

    There have been studies on double-binary turbo decoding to lower the hardware

    9

    Table 1.1 Differences between the 3GPP-LTE and Mobile WiMAX Turbo Codes

    Standards

     

    3GPP-LTE

     

    Mobile WiMAX

     

    Type

     

    Single-Binary

     

    Double-Binary

    Constraint

       

    RSC

    Length

     

    4

    4

    code

    Trellis

    Appending the bits that make both encoder states

    Tail-Biting

    Termination

    all zero and sending the resulting codes

    (Circular Coding)

     

    Type

    QPP Interleaver

     

    ARP Interleaver

     

    40

    8

    f

    ,

    0

    f

    59

    24, 36, 48, 72, 96, 108,

    240, 480, 960, 1440,

    Interleaver

    Frame size

    (N)

    N

      

    512

    16

    • 1024 32

    f

    ,

    f

    ,

    0

    0

    f

    f

    32

    32

    120, 144, 180, 192, 216,

     

    • 2048 64

    f

    ,

    0

    f

    64

    1920, 2400 (pairs)

    complexity [12]-[14]. For a double-binary SISO decoding algorithm, based on the maximum a posteriori (MAP) algorithm [1], the constant log-MAP algorithm has been reported for double-binary turbo decoding [12]. By allowing the constant correction term in log-MAP algorithm for double-binary SISO decoding, a performance improvement was observed. Due to the tail-biting property, the initial values of the forward metric and backward metric are not explicitly specified. In [13], the simple method to determine the initial state in circular turbo decoding is presented. It has been reported that using the information of the previous iteration shows better performance and lower computational complexity than the pre-computing method [15]. To reduce the huge extrinsic information memory size, two techniques have been introduced in [14]. The first one, bitwise approximation for extrinsic information, can reduce three extrinsic information into two extrinsic information in double-binary turbo decoding by modifying the SISO decoding structure. However, it leads to a severe performance degradation of BER performance, about larger than 0.5dB. Also, it is well known that non-uniform quantization can be applied to reduce the extrinsic information memory size since the extrinsic information does not need to be exact in decoding [4][5]. By exploiting this property, the second technique uses block-scaling method where a common shift index is used for three extrinsic information values. This method can reduce

    Table 1.1 Differences between the 3GPP-LTE and Mobile WiMAX Turbo Codes Standards 3GPP-LTE Mobile WiMAX Type

    10

    the extrinsic information memory size hugely with negligible performance degradation although the number of extrinsic information values is still three. In addition, there have been several turbo decoder implementations for single-binary turbo codes [9][20]. To support multiple 3G standards, such as CDMA2000 and W-CDMA, the programmable single-instruction multiple-data (SIMD) processor has been proposed for interleaving in order to provide interleaved data at the speed of the hardware SISO [20]. Compared to the ROM-based interleaver which needs a large ROM to store all of the possible interleaved patterns, the proposed approach can achieve the small area, high performance, and low power consumption of hardware, as well as the flexibility and programmability of software needed to support multiple standards. Also, to support higher user data rates, up to 24Mb/s, a radix-4 log-MAP turbo decoder for 3GPP-HSDPA has been introduced in [9]. The log-MAP SISO decoder processes two received symbols per clock cycle using a windowed radix-4 architecture doubling the throughput for a given clock rate over a similar radix-2 architecture.

    1.3 Contributions

    the extrinsic information memory size hugely with negligible performance degradation although the number of extrinsic information

    The major contribution of this paper is to present the algorithmic modifications for low-complexity hardware implementation, architectural solutions and several optimizations for high-performance turbo decoding with the capability of supporting two 4G communication standards as illustrated in Figure 1.2 and Figure 1.3. In other words, the contribution can be categorized as follows. The first one is the energy-efficient SISO decoding structure for nonbinary turbo decoders. With border metric encoding scheme, the complex dummy calculation in nonbinary turbo decoding can be avoided at the cost of a small-sized memory that holds encoded border metrics. Due to the infrequent accesses to the border memory and its small size, the energy consumed for SISO decoding is reduced hugely. The second one is to present the bit-level extrinsic information exchange. To reduce the memory size required for double-binary turbo decoding, a new method to convert the symbolic extrinsic information to the bit-level information and vice versa is presented. By exchanging the bit-level extrinsic information rather than the symbol-level extrinsic

    11

    Algorithm • Nonbinary Max-log-MAP • Border Metric Encoding • Bit-level Extrinsic Info. • ARP/QPP Interleaving Architecture
    Algorithm
    • Nonbinary Max-log-MAP
    • Border Metric Encoding
    • Bit-level Extrinsic Info.
    • ARP/QPP Interleaving
    Architecture
    Implementation
    • Time-Multiplexing
    • 2 Chips in 130nm CMOS
    • Parallel Turbo Decoding
    • Interconnect Issue
    • Unified SISO Decoding
    • Speed / Area Tradeoff
    • Memory Sharing

    Figure 1.2: Research Overview

    Algorithm • Nonbinary Max-log-MAP • Border Metric Encoding • Bit-level Extrinsic Info. • ARP/QPP Interleaving Architecture

    information, the number of extrinsic information values to be exchanged in double-binary turbo decoding is reduced to the same amount as single-binary turbo decoding. Compared to bitwise approximation for extrinsic information in [14], the proposed method does not require any modifications to the conventional double-binary SISO decoder structure. The proposed method deals with the symbol-to-bit conversion and bit-to-symbol conversion of the extrinsic information for the double-binary turbo code. Since the size of the extrinsic information memory is significant, the proposed method is effective in reducing the total memory size needed in double-binary turbo decoder with negligible performance degradation. The third one is to present the whole decoder architecture of the double-binary circular turbo decoder for Mobile WiMAX. To lower the overall hardware complexity, in addition to the above methods, the dedicated hardware interleaver is designed. By generating the interleaved addresses on-the-fly, the proposed turbo decoder can achieve small area and low power consumption since there is no need to include a large-sized interleaver memory. Also, for the critical path delay reduction, a retimed architecture for double-binary SISO

    12

    Border Metric Encoding Branch Metric Optimization

    Hardware Interleaver

    • loss ~ 0.15 dB • No Error Floor
    • loss ~ 0.15 dB
    • No Error Floor

    Parallel Turbo Decoding Register Retiming

    Bit-level Extrinsic Info. Memory Sharing

    for Radix-4 Processing w/o memory

    WiMAX / 3GPP-LTE

    Figure 1.3: Proposed Solutions for Nonbinary CTC Decoder Implementation

    • Border Metric Encoding • Branch Metric Optimization • Hardware Interleaver • loss ~ 0.15 dB

    decoding is presented. In addition, to avoid unnecessary iterations at good channel environment, a simple early stopping criterion for double-binary turbo decoder is presented. The proposed stopping criterion uses the sign values of incoming bit-level extrinsic information and the hard-decision values. Finally, to support multiple 4G mobile communication systems such as Mobile WiMAX and 3GPP-LTE which require high-speed data transmission, the unified parallel radix-4 turbo decoder architecture is proposed.

    13

    Chapter 2 Backgrounds

    The efficient design of a communication system that enables reliable high-speed service is challenging. Efficient designrefers to the efficient use of primary communication resources, namely, power and bandwidth. The reliability of such systems is usually measured by the required signal-to-noise ratio (SNR) to achieve a specific error rate. Also, a bandwidth efficient communication system with perfect reliability, or as reliable as possible, using as low as SNR as possible is desired. Error correction coding (ECC) is a technique that improves the reliability of communication over a noisy channel. The use of the appropriate ECC allows a communication system to operate at very low error rates, using low to moderate SNR values, enabling reliable high-speed multimedia services over a noisy channel.

    Chapter 2 Backgrounds The efficient design of a communication system that enables reliable high-speed service is

    2.1 Digital Communication System

    The information source generates a message containing information that is to be transmitted to the receiver. In a digital communication system, shown in Figure 2.1, the outputs of the information source are converted into a sequence of bits. This sequence of bits might contain too much redundancy. Ideally, the source encoder removes redundancy and represents the source output sequence with as few bits as possible. Note that the redundancy in the source is different from the redundancy inserted intentionally by the error correcting code. The encrypter encodes the data for security purposes. Encryption is the most effective way to achieve data security. The tree components, information source, source encoder and encrypter can be seen as a single component called the source. The binary sequence is the

    14

    Figure 2.1: Model of a digital communication system output of the source. The number of bits

    Figure 2.1: Model of a digital communication system

    output of the source. The number of bits the source generates per second is the data rate and is in units of bits per second (bps or bits/s). The primary goal of the channel encoder is to increase the reliability of transmission within the constraints of signal power, system bandwidth and computational complexity. This can be achieved by introducing structured redundancy into transmitted signals. Channel coding is used in digital communication systems to correct transmission errors caused by noise, fading and interference. The channel encoder assigns to each message a longer message called a codeword. This usually results in either a lower data transmission rate or increased channel bandwidth relative to an un-coded system. To make the communication system less vulnerable to channel impairments, the channel encoder generates codewords that are as different as possible from one another. Since the transmission medium is a waveform medium, the sequence of bits generated by the channel encoder cannot be transmitted directly through this medium. The main goals of modulation are not only to match the signal to the transmission medium, enable simultaneous transmission of a number of signals over the same physical medium and increase the data rate, but also to achieve this by the efficient use of the two primary resources of a communication system, namely, transmitted power and channel bandwidth. A communication channel refers to the combination of physical medium (copper wires, radio medium or optical fiber) and electronic or optical devices (equalizers, amplifiers) that are part of the path followed by a signal as shown in Figure 2.1. Channel noise, fading and interference corrupt the transmitted signal and cause errors in the received signal. This thesis proposal considers only AWGN type channels, which ultimately limit system performance. Note that many interference sources and background noise can be modeled

    Figure 2.1: Model of a digital communication system output of the source. The number of bits

    15

    as AWGN due to the central limit theorem. At the receiving end of the communication system, the demodulator processes the channel-corrupted transmitted waveform and makes a hard or soft decision on each symbol. If the demodulator makes a hard decision, its output is a binary sequence and the subsequent channel decoding process is called hard-decision decoding. A hard decision in the demodulator results in some irreversible information loss. If the demodulator passes the soft output of the matched filter to the decoder, the subsequent channel decoding process is called soft-decision decoding. The channel decoder works separately from the modulator/demodulator and has the goal of estimating the output of the source encoder based on the encoder structure and a decoding algorithm. In general, with soft-decision decoding, approximately 2 dB and 6 dB of coding gain with respect to hard-decision decoding can be obtained in AWGN channels and fading AWGN channels, respectively. If encryption is used, the decrypter converts encrypted data back into its original form. The source decoder transforms the sequence at its input based on the source encoding rule into a sequence of data, which will be used by the information sink to construct an estimate of the message. These three components, decrypter, source decoder and information sink can be represented as a single component called the sink, as far as the rest of the communication system is concerned. The binary sequence is the input to the sink.

    as AWGN due to the central limit theorem. At the receiving end of the communication system,

    2.2 Introduction to Turbo Codes

    It is well known from information theory that a random code of sufficient length is

    capable of approaching the “Shannon limit”, provided one uses maximum likelihood (ML)

    decoding. Unfortunately, the complexity of ML decoding increases with the size of

    codeword up to the point where decoding becomes impractical. Thus, a practical decoding of long codes requires that the code possess some structure. Coding theorists have been

    trying to develop codes that combine two „seemingly‟ conflicting principles: (a)

    randomness, to achieve high coding gain and so approach the Shannon limit, and (b) structure to make decoding practical. In 1993, Berrou et al. introduced a new coding scheme that combines these two seemingly conflicting principles in an elegant way. They

    16

    (a) (b) Figure 2.2: Turbo code encoder structure. (a) General structure of turbo codes. (b) Typical
    (a) (b)
    (a)
    (b)

    Figure 2.2: Turbo code encoder structure. (a) General structure of turbo codes. (b) Typical structure of turbo codes.

    introduced randomness through an interleaver and structure by employing parallel concatenated convolutional codes. These codes are called turbo codes and offer an excellent tradeoff between complexity and error correcting capability. Concatenated codes are very powerful error correcting codes that are capable of closely approaching the Shannon limit by using iterative decoding [1].

    (a) (b) Figure 2.2: Turbo code encoder structure. (a) General structure of turbo codes. (b) Typical

    2.2.1 Turbo Code Encoder Structure

    A turbo code encoder consists of three building blocks: constituent encoders, interleavers and a puncturing unit. The constituent encoders are used in parallel and each interleaver scrambles the information symbols before feeding them into the corresponding constituent encoder. The puncturing unit is used to achieve higher code rates. In general, turbo codes can have more than two parallel constituent convolutional encoders, where each encoder is fed with a scrambled version of the information symbol u. Figure 2.2(a) shows the general architecture of turbo codes, where the outputs u, P i (i = 1, …, F) are known as the systematic part and the parity part, respectively. In practice, most applications use only two constituent encoders where only the input to the second encoder is scrambled as shown in Figure 2.2(b).

    17

    • 2.2.1.1 The Constituent Encoders

    Turbo codes use recursive systematic convolutional (RSC) encoders. The use of recursive or feed-back encoders prevents the encoders from being driven back to all-zero state by zero symbols. Since u is permuted before entering ENC2, it is likely that one of the RSC code outputs will have high weight. This discussion does not mean that turbo codes exhibit very high minimum distances. In fact, achieving high minimum distances requires the use of a well designed interleaver of sufficient length. Finding such an interleaver is not trivial. The systematic part helps the iterative decoding to provide better convergence. Note that the systematic part prevents the turbo codes from being catastrophic if no data puncturing is involved. If the systematic part is punctured, two different input sequences can produce the same codeword making the codes catastrophic. Since repetition codes are not good codes, the systematic part from only one of the constituent encoders is transmitted.

    • 2.2.1.2 Interleaving

    2.2.1.1 The Constituent Encoders Turbo codes use recursive systematic convolutional (RSC) encoders. The use of recursive

    Interleaving refers to the process of permuting symbols in the information sequence before it is fed to the second constituent encoder. The primary function of the interleaver is the creation of a code with good distance properties. Note that interleaving alone cannot achieve good distance properties unless it is used together with recursive constituent encoders. De-interleaving acts on the interleaved information sequence and restores the sequence to its original order. Achieving good distance properties is a common criterion for interleaver design. This fits very well with the concept of maximum likelihood (ML) decoding. Unfortunately, turbo decoding is not guaranteed to perform a ML decoding, because of the independence assumption made on the sequence to be decoded and the probabilistic information (known as extrinsic information) passed between constituent decoders. This suggests an additional design criterion based on the correlation between the extrinsic information.

    18

    2.2.1.3 Puncturing

    Puncturing refers to the process of removing certain bits from the codeword. The purpose of puncturing is to increase the overall code rate. It is common to puncture only the parity symbols of the first and second encoders.

    • 2.2.2 Turbo Decoding

    The iterative turbo decoding consists of two component decoders serially concatenated via an interleaver, identical to the one in the encoder, as shown in Figure 2.3.

    The first SISO decoder takes as input the received information sequence

    y

    k p1

    and the

    received parity sequence generated by the first encoder

    y

    k p1

    . The decoder then produces

    extrinsic information denoted as L1e , which is interleaved and used to produce an

    improved estimate of the a priori probabilities of the information sequence for the second decoder. The other two inputs to the second SISO decoder are the interleaved received

    information sequence

    y

    k s

    2.2.1.3 Puncturing Puncturing refers to the process of removing certain bits from the codeword. The purpose

    and the received parity sequence produced by the second

    encoder

    y

    k p2

    . The second SISO decoder also produces extrinsic information L2e which is

    used to improve the estimate of the a priori probabilities for the information sequence at the input of the first SISO decoder. The decoder performance can be improved by this iterative operation relative to a single operation serial concatenated decoder. The feedback loop is a distinguishing feature of this decoder and the name turbo code is given with reference to the principle of the turbo engine. After a certain number of iterations the soft outputs of both SISO decoders stop to produce further performance improvements. Then the last stage of decoding makes a hard decision after de-interleaving the log likelihood ratio (LLR), denoted as Lr .

    • 2.2.3 Decoding Algorithm for Turbo Codes

    Turbo codes require SISO decoders to generate extrinsic information and LLR. Either maximum a posteriori (MAP) algorithm [1] or soft output Viterbi algorithm

    19

    D e i n t e r l e a v e r L1e L2e S
    D e i n t e r l e a v e r
    L1e
    L2e
    S
    yK yK
    S I S O
    I n t e r l e a v e r
    P1
    yK ~
    S I S O
    D e c o d e r 1
    S
    Lr2
    D e i n t e r l e a v e r
    D e c o d e r 2
    yK
    I n t e r l e a v e r
    P2
    O u t p u t

    Figure 2.3: A turbo decoder structure

    (SOVA) can be used for the component decoders. MAP based Turbo decoders generally have much better performance than SOVA-based Turbo decoders. In this work, we focus on MAP algorithm.

    • 2.2.3.1 MAP Algorithm

    D e i n t e r l e a v e r L1e L2e S

    Let u = (u 1 , u 2 , , u N ) be a set of binary variables representing information bits, where N denotes the frame size. In the systematic encoders, one of the outputs x s = (x 1 s , x 2 s ,, x N s ) is identical to the information sequence u. The other is the parity information sequence output x p = (x 1 p , x 2 p ,, x N p ). The noisy versions of outputs are y s = (y 1 s , y 2 s , , y N s ) and y p = (y 1 p , y 2 p , , y N p ). Let R 1 N = (R 1 , R 2 , , R k , , R N ) denote the received sequence, where R k = (y k s , y k p ). We assume that binary phase shift keying (BPSK) modulation is used to map each binary symbol into a signal from the { +1, -1} modulation signal set. In the MAP decoder, the decoder decides whether u k = +1 or u k = -1 depending on the following log-likelihood ratio (LLR).

    L R
    L
    R

    (

    u

    k

    )

    log

    P u (   1 R N ) k 1 P u (  
    P u
    (
     
    1
    R
    N
    )
    k
    1
    P u
    (
      R
    1
    N
    )
    k
    1

    (2.1)

    In the final operation, the decoder makes a hard decision by comparing L R (u k ) to a threshold equal to zero, as shown in the expression (2.2).

    20

    u

    k

        0 1

    if L

    R

    (u

    k

    )

    0

    otherwise

    We can compute the APP‟s in (2.1) as

    P

    (

    u

    k

    1|

    R

    N

    1

    )

    U

    P

    (

    S

    k

    1

    s

    ',

    S

    k

    s

    |

    R

    N

    1

    )

    U

    P

    (

    S

    k

    1

    s

    ',

    S

    k

    s

    ,

    R

    N

    1

    )

    R

    N

    1

    (2.2)

    (2.3)

    where S k is encoder state at time k, U + is the set of pairs (s′, s) for the state transitions (S k-1 = s′ ) (S k = s) which correspond to the event u k = +1, and U - is similarly defined. Also

    P ( u  0 | R N )   P ( S  s
    P
    (
    u
    0 |
    R
    N
    )
    P
    (
    S
    s
    ',
    S
    s
    |
    R
    N
    )
    k
    1
    k
    1
    k
    1
    U
    (2.4)
    P
    (
    S
    s
    ',
    S
    s
    ,
    R
    N
    )
    k
     1
    k
    1
    R
    N
    U
    1
    The log-likelihood ratio LLR is then
    P
    (
    S
    s
    ',
    S
    s
    ,
    R
    N
    )
    k
     1
    k
    1
    L
    (
    u
    )
    log
    U
    (2.5)
    R
    k
    P
    (
    S
    s
    ',
    S
    s
    ,
    R
    N
    )
    k
     1
    k
    1
    U
    By several applications of Bayes‟ rule, we have
    N
    k-1
    N
    P
    (
    s
    ',
    s
    ,
    R
    )
    P
    (
    s
    ',
    s
    ,
    R
    ,R
    ,
    R
    )
    1
    1
    k
    k
    1
    N
    k-1
    k-1
     P
    (
    R
    |
    s
    ',
    s
    ,
    R
    ,R
    )
    P
    (
    s
    ',
    s
    ,
    R
    ,R
    )
    k
    1
    1
    k
    1
    k
    N
    k-1
    k-1
    k-1
    P
    (
    R
    |
    s
    ',
    s
    ,
    R
    ,R
    )
    P
    (
    s
    ,R
    |
    s
    ',
    R
    )
    P
    (
    s
    ',
    R
    )
    (2.6)
    k
    1
    1
    k
    k
    1
    1
    N
    k-1
    P
    (
    R
    |
    s
    )
    P
    (
    s
    ,R
    |
    s
    ')
    P
    (
    s
    ',
    R
    )
    k
    1
    k
    1
     s 
    (
    )
    (
    s
    ',
    s
    )
    (
    s
    ')
    k
    k
    k  1
    The log-likelihood ratio LLR can be written as
    (
    s
    ')
    (
    s
    ',
    s
    )
    (
    s
    )
    k
     1
    k
    k
    L
    (
    u
    )
    ln
    U
    (2.7)
    R
    k
    (
    s
    ')
    (
    s
    ',
    s
    )
    (
    s
    )
    k
     1
    k
    k
    U
     k (s',s) is the
    where k 1 (s') is the forward metric, k (s) is the backward metric and
    branch metric. They are defined as
    (
    s
    )
     P
    (
    S
     s R
    ,
    k
    )
    k
    k
    1
    (2.8)

    21

     ( s ', s )  P ( S  s ,R | S 
    (
    s
    ',
    s
    )
    P
    (
    S
    s
    ,R
    |
    S
    s
    ')
    k
    k
    k
    k 
    1
    (
    s
    )
    P
    (
    R
    |
    S
    s
    )
    (2.9)
    k
    k N
    1
    k
    (2.10)
    We can obtain k (s) defined in (2.8) as
    k
    (
    s
    )
    P
    (
    s
    ,
    R
    )
    k
    1
    k
    P
    (
    s
    ',
    s
    ,
    R
    )
    1
    s '
    k-1
    k-1
    (2.11)
    P
    (
    s
    ,R
    |
    s
    ',
    R
    )
    P
    (
    s
    ',
    R
    )
    k
    1
    1
    s '
    k-1
    P
    (
    s
    ,R
    |
    s
    ')
    P
    (
    s
    ',
    R
    )
    k
    1
    s '
    (
    s
    ',
    s
    )
    (
    s
    ')
    k
    k  1
    s '
    We can obtain k (s) defined in (2.10) as
    N
    (
    s
    ')
    P
    (
    R
    |
    s
    ')
    k  1
    k
    N
    P
    (
    R
    ,
    s
    |
    s
    ')
    k
    s
    N
    (2.12)
    P
    (
    R
    |
    s
    ',
    s
    ,R
    )
    P
    (
    s
    ,R
    |
    s
    ')
    k
    k
    k
    s
    N
    P
    (
    R
    |
    s
    )
    P
    (
    s
    ,R
    |
    s
    ')
    k
    k
    s
     
    (
    s
    )
    (
    s
    ',
    s
    )
    k
    k
    s
    The recursion for the k (s) is initialized according to
    s
    0
    (
    s
    )    1
    0
    0
    s
    0
    (2.13)

    which makes the reasonable assumption that the component encoder is initialized to the zero state. The recursion for the k (s)is initialized according to

    N

    (

    s

    )    1

    0

    s

    0

    s

    0

    (2.14)

    which assumes that “termination bits” have been appended at the end of the data word so that the component encoder is again in state zero at time N.

    All that remains at this point is the computation of

    k

    (

    s

    ',

    s

    )

    P

    (

    s

    ,R

    k

    |

    s

    ')

    . Observe that

    k (s',s) may be written as

    22

    k

    (

    s

    ',

    s

    )

    P

    (

    s

    ',

    s

    )

    P

    (

    s

    ',

    s

    ,R

    k

    )

    P

    (

    s

    ')

    P

    (

    s

    ',

    s

    )

    P

    (

    s

    |

    s

    ')

    P

    (R

    k

    |

    s

    ',

    s

    )

    P

    (

    u

    k

    )

    P

    (R

    k

    |

    u

    k

    )

    (2.15)

    where the event u k corresponds to the event s′ → s. Note

    P(s|s) = P(s′ → s ) = 0 if s is

    not a valid state from state

    s′. Hence, k (s,s') 0 if s′ → s is not valid and, otherwise,

    P u ( ) y  x 2  ( s ', s )  k
    P u
    (
    )
    y
    x
    2
    (
    s
    ',
    s
    )
    k
    exp[
    k
    k
    ]
    k
    2
    
    2
    2

    (2.16)

    where it is assumed that codes are transmitted on an AWGN channel and variance.

    σ 2

    is noise

    • 2.2.3.2 Max-log-MAP Algorithm

    In order to avoid the complexity of multiplication and division operation in (2.7), (2.11), and (2.12), the computations are converted into logarithmic domain. The metrics in the new domain are defined as follows:

     k ( s ', s )  P ( s ', s ) P (
     

    k (s) ln(k (s))

     

    s

    k

    (

    )

    ln(

    k

    (

    s

    ))

    k (s) ln(k (s))

    (

    s

    )

    ln(

    (

    s

    ))

     

    k

    ln(

    s'

    k

    1

    (

    s

    ')

    k

    (

    s

    ',

    s

    ))

     

    ln(

     

    (

    ')

    (

    exp(

    s

    s'

    k

    1

    k

     

    (

    s

    

    )  

    0

    s

    0

     

     
     

    0

     

    -

    s

    0

     

    The expression (2.17) is rewritten as

    k

    These log-domain forward metrics are initialized as

     

    (2.17)

    (2.18)

    (2.19)

    (2.20)

    s

    ',

    s

    )))

     

    (2.21)

    The expression (2.18) is rewritten as

    23

     

    k

    1

    (

    s

    ')

    ln(

    k

    1

    (

    s

    '))

     

    (2.22)

     

    ln(

    s

    exp(

    k

    (

    s

    )

    k

    (

    s

    ',

    s

    )))

    with initial conditions

     

     
     

    N

    (

    s

    )  

    0

    s

    0

     

    (2.23)

     

    -

     

    s

    0

     

    under the assumption that the encoder has been terminated. As before, the L R (u k ) is computed as

     
     

    k

    1

    (

    s

    ')

    k

    (

    s

    ',

    s

    )

    k

    (

    s

    )

       

    (

    u

     

    )

    ln

    U

    k

    U

     

    (

    s

    ')

    k

    (

     

    ',

    s

    )

    k

    (

     

    )

     
     

    k

    1

    s

    s

       

     

    (

    s

    ')

     

    (

     

    ',

     

    )

     

    k

    (

    s

     

    (2.24)

     

    ln[

    exp(

    k

    1

    k

    s

    s

    ))]

     
     

    U

     

     

    -ln[

    exp(

    k

    1

    (

    s

    ')

    k

    (

    s

    ',

    s

    )

    k

    (

    s

    ))]

     
     

    U

    These expressions can be simplified by using the expression.

    max( x , y )  ln( e x  e y )
    max(
    x
    ,
    y
    )
    ln(
    e
    x
     e
    y
    )

    (2.25)

    Given the max function, we may now rewrite (2.20), (2.22), and (2.24) as

    k

    (

    s

    )

    max[

    s '

    k

    1

    (

    s

    ')

    k

    (

    s

    ',

    s

    )]

    (2.26)

    k

    1

    (

    s

    ')

    max[

    k

    s

    (

    s

    )

    k

    (

    s

    ',

    s

    )]

    L R
    L
    R

    (

    u

    k

    )

    max[

    U

    k

    1

    (

    s

    ')

    k

    (

    s

    ',

    s

    )

    k

    (

    s

    )]

    -max[

    k

    U

    1

    (

    s

    ')

    k

    (

    s

    ',

    s

    )

    k

    (

    s

    )]

    (2.27)

    (2.28)

    As shown

    in the above operations, the multiplications in the MAP are replaced by

    additions in the Max-log-MAP, which results in the low complexity of Max-log-MAP. The calculation of k (s) will be given in Section 2.2.3.3.

    2.2.3.3 Calculation of Branch Metrics and Extrinsic Information

    The extrinsic information takes the role of a priori information in the iterative decoding algorithm.

    24

    L

    e

    (

    u

    k

    )

    ln(

    P P

    ( (

    u u

    k

       

    1) 1)

    k

    )

    (2.29)

    The a priori term domain, (2.16) becomes

    P(uk )

    shows up in (2.16) in an expression for

    k (s,s') . In the log-

    k

    (

    s

    ',

    s

    )

    ln

    (

    P u

    k

    )

    ln(

    2
    2

    Now observe that we may write from (2.29)

    

    )

    y  x 2 k k 2  2
    y
    x
    2
    k
    k
    2 
    2
     

    exp[

    L

    e

    (

    u

    k

    )/ 2]

     

    P

    (

    u

    k

    )

    (

    )

    exp[

    u L

    k

    e

    (

    u

     

    )/ 2]

     

    k

    1

    exp[

    L

    e

    (

    u

    k

    )]

       
       
     

    A

    exp[

    u L

    (

    u

    )/ 2]

    k

    k

    e

    k

    where the first equality follows since it equals

     
    P /P _  ( ) P /P  _ 1  P /P 
    P /P
    _
    (
    )
    P /P
    _
    1 
    P /P

    P when

    u

    k

     

    1 and

     

    _

    where

    we

    have

    P /P _  ( ) P /P _  1  P /P _ 
    P /P
    _
    (
    )
    P /P
    _
    1 
    P /P
    _
     P when u _ k P  P(uk  1) and
    P
    when u
    _
    k
    P  P(uk  1)
    and

     

    1,

    defined

    PP(uk  1)

    Substitution of (2.31) into (2.30) yields

    k

    (

    s

    ',

    s

    )

    ln(

    A

    k

    /

    2
    2

    

    )

    u L ( u ) y  x k e k  k k 2 2
    u L
    (
    u
    )
    y
     x
    k
    e
    k
    k
    k
    2
    2
    2
     

    (2.30)

    (2.31)

    (2.32)

    for

    convenience.

    (2.33)

    where we will see that the first term may be ignored. Thus, the extrinsic information received from a companion decoder is included in the computation through the branch metric k (s,s') . The rest of algorithm proceeds as before using equations (2.26), (2.37) and (2.28). Using the fact that

    y

    k

    x

    k

    2

    (

    y

    k s

    k s

    x

    )

    2

    (

    y

    k p

    k p

    x

    )

    2

    (

    y

    k s

    )

    2

    2

    k s

    x

    y

    k s

    (

    k s

    x

    )

    2

    (

    y

    k p

    )

    2

    2

    k p

    x

    y

    k p

    (

    k p

    x

    )

    2

    (2.34)

    and that

    only the

    terms dependent on U

    or

    U

    ,

    2 x

    k s

    y

    k s

    and

    2

    x

    k p

    y

    k p

    , survive after

    the subtraction (2.26), (2.27) and (2.28), (2.33) is rewritten as follows.

    25

     

    k

    (

    '

    s s

    )

    u L

    k

    e

    (

    u

    k

    )

    k s

    x

    y

    2

    k s

    k p

    x

    y

    2

    k p

     

    (2.35)

     

    Given

    • L C

    • 2 , we have

     
     

    2

    k

    (

    s

    '

    s

    )

    L

    e

    (

    u

    k

    )

    L

    C

    2

    k s

    x

    Upon substitution of (2.36) into (2.28), we have

    y

    k s

    L

    C

    2

    k p

    x

    y

    k p

    (2.36)

     

    L

    C

    x

     

    L

     

     

    L

    (

    u

    k

    )

    max[

    U

     

    (

    s

    ')

    L

    e

    (

    u

     

    )

    s

    s

    C

    x

    p

    y

    p

    k

    (

    s

    )

    ]

     
     

    R

    k

    k

    2

    k

    y

    k

    2

    k

    k

     

    L

    L

    (2.37)

     

    max[

    k

    (

    s

    ')

    L

    e

    (

    u

    k

    )

    C

    x

    s

    k

    y

    s

    k

    C

    p

    x

    k

    y

    p

    k

    k

    (

    s

    )

    ]

    Now note that

     

    L

    (

    u

    k

    )

    U

    L

    C

    x

    k s

    y

    k s

    L

    (

    u

    k

    )

    L

    2

    C

    y

    k s

    2

    under the first max() operation in

    (2.37) and

    • L (

    e

    u

    k

    )

    e

    L

    C

    2

    k s

    x

    y

    2

    k s

     

    L

    C

    2

     

    y

    k s

    e

    2

    under the second

    max() operation. Using the

    definition for max(), it is easy to see that these terms may be isolated out so
    definition for max(), it is easy to see that these terms may be isolated out so that
    L
    s
    L
    (
    u
    )
    L
    y
    L
    (
    u
    )
    max[
    (
    s
    ')
    C
    x
    p
    y
    p
    (
    s
    )
    ]
    R
    k
    C
    k
    e
    k
    k
    k
    k
    k
    U
    2

    max[

    U

    k

    (

    s

    ')

    L

    C

    2

    p

    x

    k

    y

    p

    k

    k

    (

    s

    )

    ]

    (2.38)

    The interpretation of this new expression for LR (uk ) is that the first term is likelihood

    information received

    directly

    from

    the

    channel,

    the

    second

    is

    extrinsic

    likelihood

    information received from a companion decoder, and the third term ( max

    U

    max )

    U

    is

    extrinsic likelihood information to be passed to a companion decoder. Note that this third term is likelihood information gleaned from received parity not available to the companion

    decoder. Using notation

    L

    e OUT

    ,

    (

    u

    k

    )

    for extrinsic information to be passed and

    L

    e IN

    ,

    (

    u

    k

    )

    for extrinsic information received, we have

    L

    R

    (

    u

    k

    )

    L

    e IN

    ,

    (

    u

    k

    )

    L

    C

    y

    s

    k

    L

    e OUT

    ,

    (

    u

    k

    )

    (2.39)

    Extrinsic information which will be passed to the companion decoder is calculated as follows.

    26

    L

    e OUT

    ,

    (

     

    )

    L

    R

    (

     

    )

    L

    C

     

    s

    u

    k

    u

    k

    y

    k

    L

    e IN

    ,

    (

    u

    k

    )

    (2.40)

    2.3 Turbo code in Mobile WiMAX

    Mobile WiMAX is a rapidly growing broadband wireless access technology based on IEEE 802.16 standard [3]. It utilizes Orthogonal Frequency Division Multiple Access (OFDMA) as the radio access method for improved multipath performance in non-line-of- sight (NLOS) environment and promise to deliver high data rates over large areas to a large number of uses in the near future. This exciting addition to current broadband options such as DSL, cable, and Wi-Fi promises to rapidly provide broadband access to locations in the worlds rural and developing areas where broadband is currently unavailable, as well as competing for urban market share. Recently, to improve system gain and non-line-of-sight (NLOS) coverage, double- binary tail-biting convolutional turbo code (CTC) has been adopted in IEEE 802.16 standard (WiMAX) with its superior advantages over the classical single-binary turbo code.

    L e OUT , ( )  L R ( )  L C s u
    • Double-binary turbo codes double the decoding rates in a hardware implementation, because they allow memory access of two bits at each time instant. The reason for such decoding rates is the fact that the extrinsic information, which must be passed to the next decoder after interleaving or de-interleaving, represents two bits at each time instant. Doubling the decoding rates leads to a reduction in the latency of the decoder by one half.

    • Double-binary turbo codes reduce the sensitivity to puncturing. This can be explained as follows. Since the rate 1/2 double-binary recursive systematic convolutional (RSC) encoder produces two parity streams, most of the code rates can be obtained by simply ignoring one of these parity streams and puncturing the other (if necessary). Ignoring one of the two parity streams results in a new RSC encoder with a single parity stream. This single parity stream is less punctured compared to similar single-binary convolutional RCS encoders, which results in

    27

    less sensitivity to puncturing.

    • Double-binary turbo codes reduce the correlation effects between component decoders, which leads to improved convergence [3].

    For practical purpose, it is important to reduce the computational complexity of turbo decoding. An approach for reducing the computational complexity of maximum a posteriori (MAP) decoding [1] has been introduced in the previous section for single- binary turbo codes, where there are only two branches entering and leaving each state. As opposed to single-binary codes where only two branches enter and leave each state, in double-binary turbo codes there are four branches entering and leaving each state.

    2.3.1 Encoding

    The CSRC constituent encoder used by WiMAX is shown in Figure 2.4. The encoder is fed blocks of k message bits which are grouped into N = k/2 couples. In Figure 2.4, A represents the first bit of the couple, and B represents the second bit. The two parity bits are denoted W and Y. For ease of exposition, subscripts are left off the figure, but below a single subscript is used to denote the time index k {0, , N-1} and an optional second

    less sensitivity to puncturing.  Double-binary turbo codes reduce the correlation effects between component decoders, which

    is

    used on

    the parity bits

    W and Y to indicate which of the two constituent encoders

    produced them. Let the vectors S k = [S k,1 S k,2 S k,3 ] T , S k,m {0,1} denote the state of the encoder at time k. Note that although the input s and outputs of the encoder are defined over GF(4), only binary values are stored within the shift register and thus the encoder has just eight states. The encoder state at time k is related to the state at time k

    where

    Sk+1=GSk +Xk

    X

    k

    A

     

    k

    B

    k

    B

    k

    B k
    B
    k

     

    G

     

    1

    1

    0

    0

    0

    1

    1

    0

    0

     

    (2.41)

    (2.42)

    Because of the tailbiting nature of the code, the block must be encoded twice by each

    28

    Figure 2.4: Double-binary CRSC constituent encoder used by WiMAX constituent encoder. During the first pass at

    Figure 2.4: Double-binary CRSC constituent encoder used by WiMAX

    constituent encoder. During the first pass at encoding, the encoder is initialized to the all- zeros state, S 0 = [0 0 0] T . After the block is encoded, the final state of the encoder S N is used to derive the circulation state

    S =(I+G

    c

    N

    )

    -1

    S

    N

    (2.43)

    where the above operations are over GF(2). In practice, the circulation state S c can be found from S N by using a lookup table [3]. Once the circulation state is found, the data is encoded again. This time, the encoder is set to start in state S c and will be guaranteed to also end in state S c . The first encoder operates on the data in its natural order, yielding parity couples {W k,1 , Y k,1 }. The second encoder operates on the data after it has been interleaved.

    Figure 2.4: Double-binary CRSC constituent encoder used by WiMAX constituent encoder. During the first pass at

    2.3.2 Decoding

    Decoding is complicated by the fact that the constituent codes are double-binary and circular. As with conventional turbo codes, decoding involves the iterative exchange of extrinsic information between the two component decoders. While decoding can be performed in the probability domain, the log-domain is preferred since the low complexity Max-log-MAP algorithm can then be applied. Unlike the decoder for a single-binary turbo code, which can represent each binary symbol as a single log-likelihood ratio, the decoder for a double-binary code requires three log-likelihood ratios. For example, the likelihood ratios for message couple (A k , B k ) can be represented in the form

    29

    Figure 2.5: A decoder for the WiMAX turbo code  a b , ( A ,

    Figure 2.5: A decoder for the WiMAX turbo code

    a b

    ,

    ( A , k
    (
    A
    ,
    k

    B

    k

    )

    log

    P

    (

    A

    k

    a B

    ,

    k

    b

    )

    P

    (

    A

    k

    0,

    B

    k

    0)

    (2.44)

    where (a, b) can be (0, 1), (1, 0), or (1, 1). An iterative decoder that can be used to decode the WiMAX turbo code is shown in Figure 2.5. The goal of each of the two constituent decoders is to update the set of log- likelihood ratios associated with each message couple. In the figure and in the following

    Figure 2.5: A decoder for the WiMAX turbo code  a b , ( A ,

    discussion, ( k , k ) denotes the set of LLRs corresponding to the message couple at

    ( )

    i

    a b

    ,

    A

    B

    the input of the decoder and ( k , k ) is the set of LLRs at the output of the decoder.

    (

    o

    )

    a b

    ,

    A

    B

    Each decoder is provided with ( k , k ) along with the received values of the parity

    ( )

    i

    a b

    ,

    A

    B

    bits generated by the corresponding encoder (in LLR form). Using these inputs and

    knowledge of the code constraints, it is able to produce the updated LLRs

    (

    o

    )

    a b

    ,

    ( A , k
    (
    A
    ,
    k

    B

    k

    )

    at

    its output. As with single-binary turbo codes, extrinsic information is passed to the other constituent decoder instead of the raw LLRs. This prevents the positive feedback of previously resolved information. Extrinsic information is found by simply subtracting the appropriate input LLR from each output LLR, as indicated in Figure 2.5. The extrinsic information that is passed between the two decoders must be interleaved or de-interleaved so that it is in the proper sequence at the input of the other decoder.

    30

    • 2.3.2.1 Max-log-MAP Algorithm for Decoding

    The extension of Max-log-MAP algorithms to the double-binary case is fairly straightforward. In the double-binary turbo codes, the three log-likelihood ratio outputs of the k-th symbol are expressed as follows.

    (

    k

    z

    )

    max

     

    k

    (

    s

    k

    )

    k

    • 1 (

    s

    k

    s

    k

    1

    )

    k

    1

    (

    s

    k

    1

    )

     
     

    (

    s

    k

    s

    k 1

    ,

    z

    )

     
     

    max

     

    k

    (

    s

    k

    )

    k

    • 1 (

    s

    k

    s

    k

    1

    )

    k

    1

    (

    s

    k

    1

    )

     

    (

    s

    k

    s

    k 1

    ,00)

     

    (2.45)

    where z belongs to {01,10,11} , s k is the state of an encoder at time k, and , and

    are the forward, backward, and branch metrics, respectively. The metrics are calculated

    as expressed in equations (2.46), (2.47) and (2.48), where A is the set of states at time k-1 connected to state s k , and B is the set of states at time k+1 connected to state s k .

     

    k

    (

    s

    k

    )

    max

    k

    1

    (

    s

    k

    1

    )

    k

    (

    s

    k

    1

    s

    k

    )

     

    s

    k 1

    A

     

    k

    (

    s

    k

    )

    max

    k

    1

    (

    s

    k

    1

    )

    k

    1

    (

    s

    k

    s

    k

    1

    )

     

    s

    k 1

    B

     s )  ln  P ( y | x )  P u (
    s
    )
    ln
    P
    (
    y
    |
    x
    )
    P u
    (
    k
     1
    k
    k
    k
    L
    c
    (
    x
    s
    s
    s
    s
    p
    p
    1
    y
    1
    x
    2
    y
    2
    x
    1
    y
    1
    k
    k
    k
    k
    k
    k

    2

    z

    p

    x

    k

     

    y