This action might not be possible to undo. Are you sure you want to continue?
Broadband Receiver
Master thesis performed in
Computer Engineering
by
Yulin Huang
Report number: LiTHISYEX09/4103SE
Linköping Date May 2009
VLSI Implementation of Key Components in A Mobile
Broadband Receiver
Master thesis in
Computer Engineering
Department of Electrical Engineering
at Linköping Institute of Technology
by
Yulin Huang
LiTHISYEX09/4103SE
Supervisor: Di Wu
Linköpings Universitet
Examiner: Dake Liu
Linköpings Universitet
Linköping, May 27, 2009
Presentation Date
20090525
Publishing Date (Electronic version)
2009064
Department and Division
Department of Electrical Engineering
Computer Engineering
URL, Electronic Version
http://www.ep.liu.se
Publication Title
VLSI Implementation of Key Components in A Mobile Broadband Receiver
Author(s)
Yulin Huang
Abstract
Digital frontend and Turbo decoder are the two key components in the digital wireless communication
system. This thesis will discuss the implementation issues of both digital frontend and Turbo decoder.
The structure of digital frontend for multistandard radio supporting wireless standards such as IEEE
802.11n, WiMAX, 3GPP LTE is investigated in the thesis. A toptodown design methods. 802.11n
digital downconverter is designed from Matlab model to VHDL implementation. Both simulation and
FPGA prototyping are carried out.
As another significant part of the thesis, a parallel Turbo decoder is designed and implemented for 3GPP
LTE. The block size supported ranges from 40 to 6144 and the maximum number of iteration is eight.
The Turbo decoder will use eight parallel SISO units to reach a throughput up to 150Mits.
Keywords:
Digital frontend, 3GPP LTE, WiMAX, 802.11n, filter, Turbo, SISO decoder, MaxlogMAP, slidingwindow,
loglikelihood ratio, FPGA, hardware implementation.
Language
√ English
Other (specify below)
Number of Pages
71
Type of Publication
Licentiate thesis
√ Degree thesis
Thesis Clevel
Thesis Dlevel
Report
Other (specify below)
ISBN (Licentiate thesis)
ISRN: LiTHISYEX09/4103SE
Title of series (Licentiate thesis)
Series number/ISSN (Licentiate thesis)
I
Abstract
Digital frontend and Turbo decoder are the two key components in the digital wireless
communication system. This thesis will discuss the implementation issues of both digital
frontend and Turbo decoder.
The structure of digital frontend for multistandard radio supporting wireless standards
such as IEEE 802.11n, WiMAX, 3GPP LTE is investigated in the thesis. A toptodown
design methods. 802.11n digital downconverter is designed from Matlab model to VHDL
implementation. Both simulation and FPGA prototyping are carried out.
As another significant part of the thesis, a parallel Turbo decoder is designed and
implemented for 3GPP LTE. The block size supported ranges from 40 to 6144 and the
maximum number of iteration is eight. The Turbo decoder will use eight parallel SISO
units to reach a throughput up to 150Mits.
Keywords: Digital frontend, 3GPP LTE, WiMAX, 802.11n, filter, Turbo, SISO decoder,
MaxlogMAP, slidingwindow, loglikelihood ratio, FPGA, hardware implementation
II
III
Acknowledgements
I would like to thank my supervisor Di Wu and examiner Prof. Dake Liu for their
warmhearted help during the whole thesis. Their help was not only about solving
technical problems but also the methodology of doing scientific research.
I would also like to thank all my friends in Linköping for their help and the good time we
shared.
Finally, I would like to express the deepest gratitude to my parents for their unconditional
love and supporting me in everything. I love you as you love me.
Yulin Huang
Linköping, May 2009
IV
V
Contents
1 INTRODUCTION .................................................................................................................................... 1
1.1 BACKGROUND.................................................................................................................................. 1
1.1.1 RF.............................................................................................................................................. 1
1.1.2 ADC........................................................................................................................................... 2
1.1.3 DFE ........................................................................................................................................... 2
1.1.4 Baseband ................................................................................................................................... 2
1.2 MOTIVATION .................................................................................................................................... 3
1.3 GOAL ................................................................................................................................................. 4
1.4 THESIS ORGANIZATION................................................................................................................... 4
PART Ⅰ ⅠⅠ Ⅰ....................................................................................................................................................... 5
2 DIGITAL FRONT–END (DFE) .............................................................................................................. 5
2.1 INTRODUCTION................................................................................................................................. 5
2.2 DDC STRUCTURE: ............................................................................................................................ 6
2.3 DFE SPECIFICATION FOR DIFFERENT STANDARDS...................................................................... 7
2.3.1 Spectral Mask of WiMAX(802.16e): ........................................................................................ 7
2.3.2 Spectral Mask of 3GPP LTE: .................................................................................................... 8
2.3.3 Spectral Mask of 802.11n:....................................................................................................... 10
2.4 FILTER ARCHITECTURES: .............................................................................................................. 11
2.4.1 CIC .......................................................................................................................................... 11
2.4.2 Halfband .................................................................................................................................. 14
2.4.3 Performance Comparison: ....................................................................................................... 15
3 HARDWARE IMPLEMENTATION OF 802.11N DDC.................................................................... 24
PART Ⅱ ⅡⅡ Ⅱ..................................................................................................................................................... 30
4 CHANNEL CODES................................................................................................................................ 30
4.1 LINEAR BLOCK CODES .................................................................................................................. 30
4.2 CONVOLUTIONAL CODES.............................................................................................................. 30
5 TURBO CODES ..................................................................................................................................... 32
5.1 TURBO ENCODER: .......................................................................................................................... 32
5.2 3GPP LTE TURBO ENCODER......................................................................................................... 33
5.3 SISO DECODER............................................................................................................................... 36
6 ALGORITHM LEVEL DESIGN FOR TURBO DECODER............................................................. 37
6.1 MAP DECODING ALGORITHM....................................................................................................... 37
6.2 LOG MAP ....................................................................................................................................... 38
6.3 MAXLOGMAP .............................................................................................................................. 39
6.3.1 GAMA ........................................................................................................................................... 39
6.3.2 ALPHA .......................................................................................................................................... 42
6.3.3 BETA............................................................................................................................................. 44
6.3.4 LOGLIKELIHOOD RATIO............................................................................................................ 46
6.4 WINDOWING................................................................................................................................... 47
6.4.1 Serial Window......................................................................................................................... 47
6.4.2 Sliding Window....................................................................................................................... 48
6.4.3 Super Window......................................................................................................................... 48
6.5 SLIDING WINDOW.......................................................................................................................... 50
VI
7 TURBO DECODER HARDWARE IMPLEMENTATION AND SIMULATION RESULTS........ 52
7.1 PARALLEL WINDOW...................................................................................................................... 53
7.2 SISO ARCHITECTURE:.................................................................................................................... 54
7.3 TIME SCHEDULE FOR SISO UNIT:................................................................................................. 55
7.4 STATE METRIC ARCHITECTURE: .................................................................................................. 56
7.4.1 Gama: ...................................................................................................................................... 56
7.4.2 Alpha and Beta: ....................................................................................................................... 56
7.4.3 Loglikelihood Ratio (LLR)..................................................................................................... 58
7.5 SIMULATION RESULTS:.................................................................................................................. 60
PART Ⅲ ⅢⅢ Ⅲ................................................................................................................................................... 62
8 CONCLUSION AND FUTURE WORK.............................................................................................. 62
8.1 CONCLUSION .................................................................................................................................. 62
8.2 FUTURE WORK............................................................................................................................... 62
REFERENCE............................................................................................................................................. 64
APPENDIX A............................................................................................................................................. 67
APPENDIX B............................................................................................................................................. 69
VII
Glossary
VLSI verylargescale integration
BER bit error rate
3GPP 3
rd
Generation Partnership Project
LTE LongTerm Evolution
3G 3
rd
Generation technology
UMTS Universal Mobile Telecommunication System
WiMAX Worldwide Interoperability for Microwave Access
RSC recursive systematic convolution code
SNR signal to noise ratio
IF intermediate Frequency
DFE digital frontend
ADC analogy to digital converter
LNA low noise amplifier
FFT fast Fourier transform
OFDMA Orthogonal FrequencyDivision Multiple Access
EVM error vector magnitude
SRC sample rate converter
DDC digital down converter
AGC automatic gain control
CIC cascade integrator comb
WDF wave digital filter
FIR finite impulse response
SISO Softinput SoftOutput
MAP Maximum Aposteriori Probability
SMAP serial MAP
LLR loglikelihood ratio
LUT lookup table
VIII
1
1 Introduction
Wireless system has been widely used many fields, like mobile phone, Wireless Local
Area Network, satellite communication etc. It plays a very important role in our daily life.
We are enjoying the more and more advantage wireless communication service. Wireless
system develops faster and faster, especially after VLSI appeared, which becomes smaller,
faster, and cheaper.
1.1 Background
For a general wireless communication system, it normally contains Antenna, Radio
Frequency (RF) block, Analog to Digital Converter (ADC), Digital FrontEnd (DFE), and
baseband signal processing hardware.
Figure 1.1 Receiver Architecture of Wireless Communication Systems
1.1.1 RF
Radio Frequency (RF) is used to deal with the analog signal which is received from
antenna. RF has a local oscillator, and converts the high frequency to the intermediate
frequency.
Figure 1.2 A Typical RF Receiver Structure
2
1.1.2 ADC
ADC is used to convert analog signal to digital signal after RF. There are SigmaDelta
ADC, Flash ADC, and Successive Approximation Register ADC etc. For the wireless
system, it requires low power, high speed, and small area.
1.1.3 DFE
DFE acts as a bridge between the analog part and digital part in the wireless receiver. The
function of DFE usually includes automatic gain control, sample rate conversion, pulse
shaping, matched filtering etc. Generally speaking, it is mainly a block of digital filters.
DFE is usually considered to be simpler than the baseband part from the functional aspect.
However, it consumes a large portion of the silicon area in the receiver implementations.
1.1.4 Baseband
DFE is followed by baseband, and deals with the data according to different standards. The
basic structure is illustrated by the block diagram in the figure 1.3
Figure 1.3 Diagram of Baseband for Digital Communication System
The information source can not transmit directly due to the channel noise. To maximise the
information transmission rate, we can use source encoder to reduce the redundant part. We
use channel encoder and add some redundant information, which can be used to correct
errors. After the channel encoder, information bits cannot be sent directly. The modulation
will convert the encoded information bits to a signal which is suitable for transmission
channel.
3
Coding plays a very important role in digital communication. A good coding method can
improve BER performance and throughput significantly in digital communication
systems.
Nowadays people require high data rates connections and the transmission rate of wireless
systems is fast growing. 3G standards can support up to 2 Mb/s, and 4G standard will
support up to 100Mb/s. Turbo codes are used in a lot of standards like UMTS, WiMAX
and LTE, and it can reach a high throughput.
Software defined radio is widely used in wireless interface technologies, especially in
multiple wireless communication standards which will be implemented into a single
transceiver system. DFE is based on the concept of software defined radio, and it is the key
component of soft defined radios.
1.2 Motivation
Figure 1.4 Die Micrograph of DFE and Turbo Decoder (ref[1],ref[2]).
Table 1.1 Physical Characteristics (ref[1]).
The left side of figure 1.4 shows the die micro graph of DFE and WCDMA/HSDPA signal
processing (ref[1]), and the right side of the figure1.4 shows the die micro graph of
HSDPA Turbo Decoder (ref[2]). From ref[1] and ref[2], we can see DFE and Turbo
decoder consume more than 50% of the total silicon area (and also power) in broadband
receivers. It is always a great challenge to design and implement them at low costs with
4
sufficient performance. In this thesis, the implementation of DFE and Turbo decoder has
been investigated by looking at several emerged wireless broadband standards.
1.3 Goal
The goal of our thesis project is to research the DFE and turbo decoder. For the DFE part
we will include the specification for different standards, and compare different methods of
DFE designs. Finally we implement the 802.11n standard filter component which is
included in DFE hardware designing.
The turbo decoder part will discuss 3GPP LTE turbo decoder. Then a suitable algorithm
for our hardware implementation will be selected. Trade off between implement speed and
error performance is another important issue in this part. Maltab models will be built for
hardware implementation and used in the coming simulations.
1.4 Thesis Organization
The thesis contains three parts: DFE for Part Ⅰand Turbo decoder for part Ⅱ, conclusion
and future work for Part Ⅲ.
Part Ⅰis about DFE, which is including chapter 2 and chapter 3
Chapter 2 introduces Soft defined radio and DFE, and discusses design method, and
implement 802.11n filter.
Chapter 3 presents hardware implementation for 802.11n DDC
Part Ⅱis turbo decoder, which is including chapter4, 5, 6 and 7
Chapter 4 is the back ground about channel code
Chapter 5 explains 3GPP LTE turbo codes, and SISO decoder
Chapter 6 shows algorithm level design. Discuss the Maximum Aposteriori
Probability (MAP) decoding algorithm. And give some details about branch metrics,
forward metrics, reverse state metric, and loglikelihood rate.
Chapter 7 contains implementation of turbo decoder in hardware model
Part Ⅲ is the conclusion and future work for both Part I and Part II which is given in
chapter8.
Chapter 8 gives the conclusion and a discussion on future work.
5
Part Ⅰ
2 Digital Front–end (DFE)
2.1 Introduction
DFE acts as a bridge between baseband and ADC. Due to different sample rates and
channelization, DFE function is mainly used to convert sample rate and channelization.
Figure 2.1 shows the structure of wireless system with RF and antenna. To design DFE, we
need spectral mask for each DFE. For the whole wireless system, transmitter SNR
normally needs 120 dB, and receiver SNR needs 126 dB.
Figure 2.1 Digital Receiver Structure with Supported SNR.
Table 2.1 SNR supports for each component.
Component dB
anntenna 3 dB
RF amplifer 6 dB
Mixer 3 dB
Band pass filter 30 dB
IF amplifier 10 dB
AM demodulator 4 dB
Audio amplifier 6 dB
ADC 10 dB
6
2.2 DDC Structure:
DFE in the receiver chain is called as digital down converter (DDC). In the receiver part,
DDC contains automatic gain control (AGC) and filter.
AGC filter ADC
DDC
Figure 2.2 A DDC Structure
Filter:
Filter processes channelization and sample rate conversion (SRC). Channelization is used
to select channel of interest, and depends on its spectral mask. But receiver filter requires
higher spectral mask than transmitter. The received analog signal can be converted to
baseband digital signal according to different standards. The sample conversion is a
decimation filter. And sometimes it needs a fractional sample rate converter.
Automatic Gain Control:
AGC is used to control the output signal at a desired power level. Since the signal power in
the input side of the filter cascade is dominated by the interference, it is a dynamic value in
fixed range. Generally the AGC is a classical feedback structure.
7
2.3 DFE Specification for Different Standards
2.3.1 Spectral Mask of WiMAX(802.16e):
WiMAX DDC:
The whole system need for WiMAX SNR is 126 dB.
For error vector magnitude (EVM) which is require for correcting operation in baseband,
it needs 23 dB more.
In ref [3] channel bandwidth is defined as
( / )
u s used FFT
B F N N =
Eq.2.1
Stopband frequency can be defined as BW / 2
u
B
BW is the bandwidth
u
B is the useful bandwidth
s
F is the baseband sampling rate
used
N is used subcarrier number
FFT
N is FFT size
Compile from IEEE Std 802.162004, IEEE Std 802.16e2005, IEEE Std
802.162004/Cor1 2005(ref[6],ref[7]), table 2.2 lists the total number of subcarriers and
the number of used subcarriers for the various zone types for OFDMA.
Table 2.2 Summary of subcarrier parameter for Different OFMDA zone Type (ref[3])
Zone type
PUSC FUSC Optiomal FUSC
Total
subcarriers
2048 1024 512 128 2048 1024 512 128 2048 1024 512 128
Used
Subcarrier
s
1681 841 421 85 1703 852 427 107 1729 865 433 109
Guard
Subcarrier
s
(left,right)
184,18
3
92,91 46,45 22,21
173,17
2
87,86 43,42 11,10
160,15
9
80,79 40,39 10,9
Ratio of
used
Subcarrier
s
to total
subcarriers
0.8208
0.821
2
0.823
3
0.664
0
0.8315
0.831
1
0.834
0
0.835
9
0.8442
0.844
7
0.845
7
0.851
6
Choose 512pt FFT as an example, and assume sample frequency is 92.16 MHz.
8
WiMAX DDC specification：
126dB（3+6+3+30+10+4+6+10）+23dB= 77dB
According to the equation 2.1: ( / )
u s used FFT
B F N N =
The passband Fpass=
u
B /2, and the Stopband Fstop = BW
u
B /2.
The table 2.3 lists the spectral mask for 3.5MHz, 5.0MHz, 7.0MHz, 10.0MHz bandwidth
Bandwidths Input
sample
rate
output
sample rate
Fpass Fstop Apass Astop
3.5MHz 92.16MH
z
4 MHz 1.6914MH
z
1.8086MH
z
0.02dB 77dB
5.0MHz 92.16MH
z
5.6 MHz 2.3680MH
z
2.6320MH
z
0.02 dB 77dB
7.0MHz 92.16MH
z
8.0 MHz 3.3789MH
z
3.6211MH
z
0.015dB 77dB
10.0MHz 92.16MH
z
11.2 MHz 4.7305MH
z
5.2695MH
z
0.015dB 77dB
2.3.2 Spectral Mask of 3GPP LTE:
The passband and stopband calculating methods used in LTE and11n are different from
WiMAX. From the Xilinx WCDMA DFE reference design ref [6], the sample rate is
3.84MHz. According to the 3GPP LTE standard we can know that:
Fp = Fchip(1+α )/2
α is the roll off parameter which specified in 3GPP.
In 3GPP LTE standard, we also have this root raised cosine (RRC) filter roll off α =0.22.
We use this method to calculate the Fpass and Fstop. Finally we can have passband and
stopband.
Fpass= Fs(1+α )/4 (because the bandwidth is symmetric)
Fstop= BW – Fpass
3GPP LTE receiver：127 dB （3+6+3+30+10+4+6+10）+23dB = 78dB
The passband Fpass= Fs(1+α )/4, and Stopband Fstop = BW – Fpass
9
The table 2.4 lists the spectral mask for 3GPP LTE transmitter:
Bandwidt
hs
Input
sample
rate
output
sample
rate
Fpass Fstop Apass Astop
1.4MHz 92.16MH
z
1.92 MHz 0.5856
MHz
0.8144
MHz
0.02dB 78dB
3 MHz 92.16MH
z
3.84 MHz 1.1712
MHz
1.8288
MHz
0.02dB 78dB
5 MHz 92.16MH
z
7.68 MHz 2.3424
MHz
2.6576
MHz
0.02dB 78dB
10 MHz 92.16MH
z
15.36
MHz
4.6848
MHz
5.3152
MHz
0.015dB 78dB
15 MHz 92.16MH
z
23.04
MHz
7.0272
MHz
7.9728
MHz
0.015dB 78dB
20 MHz 92.16MH
z
30.72
MHz
9.3696
MHz
10.6304
MHz
0.015dB 78dB
10
2.3.3 Spectral Mask of 802.11n:
When transmitting in a 20 MHz channel, the transmitted spectrum shall have a 0 dBr (dB
relative to the maximum spectral density of the signal) bandwidth not exceeding 18 MHz,
–20 dBr at 11 MHz frequency offset, –28 dBr at 20 MHz frequency offset and –45 dBr at
30 MHz frequency offset and above (ref [7]) . The transmitted spectral density of the
transmitted signal shall fall within the spectral mask, as shown in Figure 2.3 (Transmit
spectral mask for 20 MHz transmission).
Figure 2.3 Transmit Spectral Masks for 20 MHz Channel (ref[7])
In the absence of other regulatory restrictions, when transmitting in a 40 MHz channel, the
transmitted spectrum shall have a 0 dBr bandwidth not exceeding 38 MHz, –20 dBr at 21
MHz frequency offset, 28 dBr at40 MHz offset and –45 dBr at 60 MHz frequency offset
and above (ref [7]). The transmitted spectral density of the transmitted signal shall fall
within the spectral mask, as shown in Figure 2.4 (Transmit spectral mask for a 40 MHz
channel).
Figure 2.4 Transmit Spectral Masks for A 40 MHz Channel (ref[7])
The transmit spectral mask for 20 MHz transmission in upper or lower 20 MHz channels
of a 40 MHz is the same mask as that used for the 40 MHz channel.
11
2.4 Filter Architectures:
Filter component is a very complex part. Normally it cannot be implemented in one filter,
but cascaded a group of filters. This part is very flexible so that there are lot choices to
reach the same spectral mask.
Ref[3] ref[8] ref[9] design methods can be referred in our case.
2.4.1 CIC
The design methods of ref[8],ref[9] is similar. The basic idea is using Cascaded Integrator
Comb (CIC) filter, with different ways to compensate the CIC droop. The difference
between ref[8] and ref[9] is whether the WDF is used or not.
CIC WDF compensation FIR FRSC Allpass channel FIR
WDF
Figure 2.5 CIC Solution Structure
Cascaded Integrator Comb (CIC) filter:
CIC filter is very efficient to decimate by an integer factor. The elements of CIC filter are
just adder and register, no multiplier.
Figure 2.6 shows the architecture of a CIC filter
1
Z
− 1
Z
−
1
Z
− 1
Z
−
R ↓
− −
−
Figure 2.6 Architecture for CIC Filter
The transfer function of Nth order CIC can be described by
1
1
( )
1
N
RM
CIC
z
H z
z
−
−
  −
=

−
\ ¹
Eq 2.7
Through changing delay M and R in the comb stage, we can adjust the decimation factor
and reconfigure the CIC filter. CIC filter structure just uses adders and delays elements. So
12
it can reduce the area and power consumptions. But the CIC filter always has passband
droop problem. It needs other filters to compensate CIC filter.
Wave Digital Filter (WDF)
WDE is infinite impulse response (IIR) filter. It can achieve high stopband attenuation
with fewer taps. WDF filter is built of N adaptors distributed over two branches.
The figure 2.7 is the structure for adapter. It has two inputs and two outputs. Usually
adapter can be use left of the figure 2.7 to present, and right of the figure 2.7 is the logic
structure.
Figure 2.7 Adapter for WDF Filter
First order lattice adaptor is connecting the out2 and in2 with inserted delay element. The
transfer function can be defined following:
0
1 0
0
1
( , )
A
z
H z
z
α
α
α
−
=
−
Eq2.8
Figure 2.8 Structure for One Order Lattice Adaptor
13
The second order lattice adaptor structure is as the figure 2.9. It serially connects two
adaptors. And transfer function can be defined as:
2
4 3 4
2 3 4 2
4 4 3
1 ( 1)
( , , )
( 1)
A
z z
H z
z z
α α α
α α
α α α
+ − −
=
− + − +
Eq2.9
Figure 2.9 Second Order Lattice Adaptor Structure
The transfer function of 7th order WDF sums of two branches built of first order and a
second order lattice adapter for
0
H and two second adaptors for
1
H .
0 1 0 2 3 4
( ) ( , ) ( , , )
A A
H z H z H z α α α =
1 2 1 2 2 5 6
( ) ( , , ) ( , , )
A A
H z H z H z α α α α =
7 0 1
1
( ) ( ( ) ( ))
2
WDF
H z H z H z = + Eq 2.10
Allpass Filter:
Allpass section is used for correcting group delay ripple which is introduced by WDF. This
all pass filter can be implemented by a first order and a second order lattice adaptor.
The transfer function can be defined by
1 2
( ) ( ) ( )
AP
H z H z H z =
14
Finite Impulse Responses Filter (FIR):
FIR filter is a pulse filter according to each communication standards specification. Also it
can be used to reduce passband ripple by the CIC droop, the WDF and the FRSC in the
passband. This filter can be also used as the channel filter.
Fractional Sample Rate Conversion： ：： ：
If the ADC or DAC is not just the integer times of the baseband, fractional sample rate
conversion filter can be used.
Advantage and Disadvantage of Using WDF:
Advantage:
WDF filter can support good stopband attenuation with fewer taps, so that compensation
FIR can use less taps when WDF is used.
Disadvantage:
WDF filter is IIR filter, and it has pole as the eq2.10. It maybe cause unstable. There is a
group delay problem when using WDF and it need allpass filter to correct.
In this thesis, CIC + compensation without WDF will be used in the following discussion.
2.4.2 Halfband
Ref[1] uses halfband filter instead of CIC filter, so there is not compensation filter any
more.
Figure 2.10 Halfband Solution Structure
Halfband Filter:
Halfband filter is a decimation filter in DDC. Usually, halfband has advantage passband
ripple, and stopband attenuation.
Fractional Sample Rate Conversion:
It is the same as the first and second method, duo to the ADC or DAC is maybe no just the
integer times of the baseband. In this case fractional filter will be introduced.
15
Finite Impulse Responses Filter (FIR):
FIR filter is channel filter and the pulse filtering according to each communication
standards specification.
2.4.3 Performance Comparison:
CIC solution we just consider CIC + compensation FIR here. The performance comparison
is between CIC+ compensation FIR solutions with halfband solution.
2.4.3.1 GSM DDC
Halfband solution is based on halfband filter. Compared ref [1] with ref [9], usually if the
ADC sample rate is more than 32 times of the baseband sample rate, CIC solution can be
used. Sometimes, the ADC sample rate is just 4 or 8 times of the baseband, so that one or
two halfband filter can complete the decimation.
GSM DDC:
Input sample rate = 69.333 MHz
Output sample rate =270.833 KHz
Figure 2.11 GSM CIC and FIR Compensation FIR Structure
And the spectrum mask for the GSM can be illustrated in the Figure2.12
Figure 2.12 GSM specifications (From ref[10])
16
For all the filter: Input word length =12
Input fractional length =11
Output word length =12
Coefficient word length =18
Table 2.5 CIC and FIR compensation specification:
Filter specification
CIC filter decimation factor D = 64
differential delay D_delay=1
Fs_in = 69.333e6;
FIR compensate Fs = 1.0833e6;
Apass = 0.01;
Astop = 60;
Aslope = 60;
Fpass = 80e3;
Fstop = 293e3;
Fstop = 293e3;
FIR channel N = 62;
Fs = 541.666kHz;
Fpass = 80kHz;
Fstop = 100e3;
Figure 2.13 CIC Filter Simulation Results.
To see the CIC droop, we can use the following command:
axis([0 .1 0.8 0]);
17
Figure 2.14 Zoom in CIC Filter Simulation Results
Figure 2.15 CIC Filter with FIR Compensate Filter Simulation Results.
Zoom in, and see 0 to 0.1 MHz. We can use the following command:
axis([0 .1 0.8 0.8]);
18
Figure 2.16 Zoom in CIC Filter with FIR Compensate Filter Simulation Results.
Figure 2.17 Final Solution Simulation Results
Table 2.6 CIC solution Complexity analysis:
Filter Number of adders Number of multipliers
CIC 10 0
FIR compensation 20 21
FIR channel 62 63
Cascade three filter 92 84
We use halfband solution instead of the CIC, FIR compensation solution. The final
channel filter can work as decimation filter, so that the halfband filter need decimate 128
times. And one halfband filter can decimate 2 times. In total, it needs 7 half band filters and
one FIR filter as the channel filter. The figure 2.18 illustrates the cascade structure filters.
1th HB 2th HB 7th HB 6th HB 5th HB 4th HB 3th HB Channel
Figure 2.18 Halfband Solution for WiMAX DDC
19
For all the filter: Input word length =12
Input fractional length =11
Output word length =12
Coefficient word length =18
Table 2.7 Halfband solution complexity analysis:
Filter specification
1th halfband Fs =69.333MHz
Fpass =80kHz
Fstop=Fs/2Fpass= 34.5865 MHz
Transition Width (TW) = 34.5065 MHz
Astop = 140 dB
2th halfband Fs = 34.6665 MHz
Fpass =80kHz
Fstop=Fs/2Fpass= 17.2533 MHz
Transition width = 17.1733 MHz
Astop = 115 dB
3th halfband Fs = 17.3333MHz
Fpass =80kHz
Fstop=Fs/2Fpass= 8.5867 MHz
Transition width = 8.5067 MHz
Astop = 110 dB
4th halfband Fs = 8.6667MHz
Fpass =80kHz
Fstop=Fs/2Fpass= 4.2534 MHz
Transition width = 4.1734 MHz
Astop = 105 dB
5th halfband Fs =4.3334MHz
Fpass =80kHz
Fstop=Fs/2Fpass= 2.0867MHz
Transition width = 2.0067 MHz
Astop = 100 dB
6th halfband Fs =2.1667MHz
Fpass =80kHz
Fstop=Fs/2Fpass= 1.0034 MHz
Transition width = 0.9234 MHz
Astop = 95 dB
7th halfband Fs = 1.0834 MHz
Fpass = 80 kHz
Fstop=Fs/2Fpass= 0.4617 MHz
Transition width = 0.3817 MHz
Astop = 90 dB
Channel FIR Fs = 541.666 kHz;
Fpass = 80e3;
20
Fstop = 100e3;
Astop =55 dB
Apass = 0.06 d B
The following Figure 2.25 shows the simulation results
Figure 2.19 CIC Solution Simulation Results.
Table 2.8 Halfband solution complexity analysis:.
Filter adder multiplexer
1th halfband 8 9
2th halfband 6 7
3th halfband 6 7
4th halfband 6 7
5th halfband 6 7
6th halfband 6 7
7th halfband 8 9
channel FIR 76 77
totally 122 130
To reach the same GSM specification:
CIC solution needs 92 adders and 84 multipliers
Halfband solution needs 122 adders and 130 multipliers.
Compare with two solutions, CIC solution can save 30 adders and 46 multipliers.
In this case, CIC solution is a better solution than halfband solution.
21
2.4.3.2 WiMAX DDC
WiMAX 10 MHz bandwidth
Halfband design:
Figure 2.20 Halfband Solution for WiMAX DDC
Table 2.9 Specification for each filter
Filter specification
First Halfband Fs =89.6 MHz
Fpass = 4.7359 MHz
Fstop = 40.0641 MHz
Tw = 35.3282 MHz
Astop = 102 dB
Second Halfband Fs = 44.8 MHz
Fpass = 4.7359 MHz
Fstop = 17.6641MHz
Tw =12.9282 MHz
Astop=84 dB
FIR channel Fs = 22.4 MHz
Fpass = 4.7359 MHz
Fstop = 5.2641 MHz
Apass = 0.02 MHz
Astop = 83 MHz
Table 2.10 halfband solution Complexity analysis:
Filter adder multiplexer
First Halfband 8 9
Second Halfband 8 9
FIR Channel 169 170
Total 185 188
22
Figure 2.21 Halfband Solution for WiMAX Simulation Result:
Table 2.11 specification of CIC solution
Filter specification
CIC Fs=89.6MHz
D = 2
FIR compensation Fs = 44.8e6;
Apass = 0.01;
Astop = 60;
Aslope = 30;
Fpass = 4.7359e6;
Fstop = 8e6;
FIR Fs = 22.4 MHz
Fpass = 4.7359 MHz
Fstop = 5.2641 MHz
Apass = 0.02 MHz
Astop = 83 MHz
Table 2.12 CIC solution Complexity analysis:
filter adder multiplexer
CIC 10 1(normalize)
FIR compensation 52 53
FIR channel 169 170
totally 231 224
23
Compare this two solution; halfband solution use 185 adders and 188 multipliers
CIC solution use 231 adders and 224 multipliers.
Figure 2.22 CIC Solution Simulation Results
From examples, we can have a conclusion that there is no unified structure for different
structures and different standards.
CIC solution can be used to achieve a better performance if the decimation factor is very
large and the passband is small. Otherwise halfband solution will be selected.
24
3 Hardware Implementation of 802.11n DDC
Hardware design:
Finally, 802.11n 20 MHz channel for FPGA prototyping
Specification for 802.11n:
Clock frequency = 200MHz
Fpass = 9MHZ Fstop = 11MHZ
Apass = 0.2 dB Astop = 40dB
Input = 40 MHz output 20 MHz
For this case, just a channel FIR filter needed.
Filter design:
This filter can use filterbuilder tool in matlab.
Simulation result:
Figure 3.1 802.11n DDC Simulation Results.
To implement hardware design, it need fixed point input data, and coefficient. Because
ADC is 12 bits, input data is 12 bits.
To define coefficient length, the table 2.15 shows simulation results for the different
coefficient length.
25
Table 2.15 filter parameter of different coefficient
coefficient length 12 14 16 18
parameter Fs=40 MHz
Fpass= 9 MHz
Fstop= 11MHz
Apass=0.03751
dB
Astop=39.7157
dB
Fs=40 MHz
Fpass= 9 MHz
Fstop= 11MHz
Apass=0.02396
dB
Astop=40.2114
dB
Fs=40 MHz
Fpass= 9 MHz
Fstop= 11MHz
Apass=0.020177
dB
Astop=40.2945
dB
Fs=40 MHz
Fpass= 9 MHz
Fstop= 11MHz
Apass=0.019417
dB
Astop=40.3144
dB
Finally, 18 bits coefficient can be used in this case.
FPGA clock frequency (200 MHz) is integer of the input sample rate. 200MHz/40MHz =5.
It means 5 clocks input one data. The output sample rate is 20 MHz, so 10 clock output one
data.
Usually a FIR filter transfer function can be defined as
0 1 2 ( 1)
0 1 2 1
( )
n n
n n
H z h z h z h z h z h z
− − − − −
−
= + + + + + K Eq.2.12
For a linear FIR filter transfer function, the coefficient is symmetric.
Take a 10 taps linear FIR filter for example
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
( ) H z h z h z h z h z h z h z h z h z h z h z
− − − − − − − − −
= + + + + + + + + + Eq2.13
The coefficient is symmetric:
0 9 1 8 2 7 3 6 4 5
, , , , h h h h h h h h h h = = = = =
The transfer function can be written as:
0 9 1 8 2 7 3 6 4 5
0 1 2 3 4
( ) ( ) ( ) ( ) ( ) ( ) H z h z z h z z h z z h z z h z z
− − − − − − − − −
= + + + + + + + + + Eq2.14
This function is called preadder; with this way it can reduce half of the multipliers.
In our design, filter length is 52 taps. After symmetry, the polyphase needs 26 taps.
Actually, the input is 5 clocks, so that the MAC time need reduced to less than 5 clocks.
For this reason, parallel MAC can be used.
Table 2.16 compare MAC and maximum cycle
Number of MAC Maximum cycle need for each MAC
2 13
3 9
4 7
5 6
6 5
7 4
8 4
9 3
10 2
26
From the table 2.15, we can find 7 Macs can satisfy the requirement for less than 4.
Figure 3.2 Hardware Architecture
The RAM works as module. The input data is written from top to bottom, if reach bottom,
it will be written from top to down. 7 ARMs need read 52 data from ram. Each ARM can
read two data, and give to preadder. Then MAC accumulates the value from preadder
multiplying its coefficient.
The DDC filter transfer function can be defined as:
H(z)= h1(x(1)+x(52))+h2(x(2)+x(51)+h3(x(3)+x(50))+ h4(x(4)+x(49))+ h5(x(5)+x(48))+h6(x(6)+x(47))+
h7(x(7)+x(46))+h8(x(8)+x(45))+h9(x(9)+x(44))+h10(x(10)+x(43))+h11(x(11)+x(42))+h12(x(2)+x(
41))+ h13(x(13)+x(40))+h14(x(14)+x(39))+ h15(x(15)+x(38))+h16(x(16)+x(37))+h17(x(17)
+x(36))+ h18(x(18)+x(35))+h19(x(1)+x(34))+ h20(x(10)+x(33))+h21(x(21)+x(32))+h22(x(22)
+x(31))+ h23(x(23)+x(30))+h24(x(24)+x(29))+ h25(x(25)+x(28))+ h26(x(26)+x(27)) Eq2.15
Appendix A shows the coefficient value and its corresponding ram value
To make sure if the filter works, the first easy way is to give an impulse, and compare its
output with the coefficient. It should be the same.
If the output is the same as the coefficient, a sine wave can be use in this case. Duo to
lowpass filter, the output should the same as the input if frequency of sine wave is in the
bandwidth.
Then it can realize FPGA prototyping. Because this filter is implemented by Xilinx system
generator, a good prototyping method JTAG hardware cosimulation can be introduced.
27
This method can be use to test FPGA using Matlab, specially, if the design needs
frequency spectrum analysis. It does not need frequency sweeper.
Hardware cosimulation can be divided to 3 steps:
Step 1: To know if the hardware works, a software DDC filter is designed for the
comparison.
Figure 3.3 Software Model for Filter
The component DAFIR v9_0 can get the DDC filter coefficient from Matlab workspace,
DAFIR component is the same function with the Hardware DDC filter.
Step 2: generate hardware model.
The system generator will generate the HDL code and invoke the ISE Foundation software
to generate the bitstream file automatically.
Figure 3.4 Generated Hardware Model
28
Step 3: Connect Hardware and software model. Perform JTAG Cosimulation.
Add the hardware model and connect it according the Figure 3.5.
Figure 3.5 Software and Hardware Model
Figure 3.6 Software Simulation
29
Figure 3.7 Hardware Simulation
Comparing two simulation results, we can see that the software and hardware
implementations have the same spectral mask.
30
Part Ⅱ
4 Channel Codes
There are two main categories of channel coding techniques which are linear block codes
and convolution codes. Normally the other codes are derived from these two main
categories, which are including serial concatenated codes and turbo codes. Coding gain is
used to measure the strength of an errorcontrol code, and can be defined as the reduction
in SNR over an uncoded system to achieve the same Bit Error Rate (BER) and Frame Error
Rate (FER).
4.1 Linear Block Codes
An (n,k) block encoder transforms a message of k bits into a message of n bits which is
called codeword. Codeword depends only on the current input message, and this is the
important feature of a block code. For a block codes, it is possible to compute the corrected
errors in each block. Amount of the redundancy can be determined by the code rate R =k/n.
In an (n, k) block code, there are 2
k
distinct message and also codeword. A linear
systematic block size code, such as Hamming code, BCH code, is often used in linear
block codes. The important feature of the linear block codes is that the message itself is
part of the codeword. LDPC is also a category of the linear block codes.
4.2 Convolutional Codes
Convolutional codes are widely used in digital radio, mobile phone, satellite
communication etc. Convolutional codes are first introduced by Elias ref[11], and
deepened by Forney ref[12]. It is best performance before turbo codes and LDPC code.
Convolutional codes can be defined as: (n,k,m)
k is the input bits
n is the output bits
m is the number of memory registers
The inputs of the memory registers are the information bits. Output encoded bits are
obtained from modulo2 addition of input information bits and the value of the memory
register. The memory registers work as shift registers.
Figure 4.1 shows a rate 1/2 nonsystematic and non recursive convolutional encoder.
At the time l, the input to encoder is
l
c , and the output is code block.
31
(1) (2)
( )
l l l
v v v =
D D
(1)
v
(2)
v
C
(2)
v
Figure 4.1 A Rate 1/2 Convolutional Encoder
The connections can be described by generator polynomials
2
1
( ) 1 g D D = +
2
2
( ) 1 g D D D = + +
In other words:
2 2
1 2
( ) [ ( ) ( )] [1 1 ] g D g D g D D D D = = + + +
32
5 Turbo Codes
Concatenated coding are known as Parallel Concatenated Convolution Codes (PCCC)
ref[13] and serial concatenated Convolutional codes ref[14] , are first proposed by Forney
ref[15] as a method to get a good tradeoff between gain and complexity.
Turbo codes has been first introduced in 1993 By Berrou, Gavieux and Thitimajshima, and
provide near optimal performance approaching the Shannon limit ref[16]. Turbo decodes
works as connecting two convolution codes and separating them by an interleaver. The
main difference between turbo codes and serial concatenated codes is that two identical
Recursive Systematic Convolutional (RSC) codes are connected in parallel in turbo codes.
Turbo codes can be considered as a refinement of the concatenated encoding structure adds
iterative algorithms for decoding.
5.1 Turbo Encoder:
A turbo encoder structure consists of two RSC encoders which are Encoder1 and Encoder2.
Encoder1 and Encoder2 are separated by a random interleaver. The two encoders can
operate on the same time. This structure is called parallel concatenated. A 1/3 turbo
encoder diagram is shown in figure 5.1. The N bit data block is first encoded by Encoder1.
The same data block is interleaved and encoded by Encoder2. The first output S0 is equal
to the input since the encoder is systematic. The second output is the first parity bit P0.
Encoder2 received interleaved input and generate the second parity bit P1. The main
purpose of interleave is to randomize burst error pattern so that it can be corrected by
decoding, and also increase the minimum distance of turbo code ref[17].
input
S0
P0
P2
Encoder1
Encoder2
Interleaver
Figure 5.1 1/3 Turbo Encoder Schematic
33
5.2 3GPP LTE Turbo Encoder
3GPP LTE encoder includes two 8state constituent encoders and one turbo code internal
interleaver. The coding rate of turbo encoder is 1/3.
The transfer function of the 8state constituent code for turbo encoder is:
1
2
( )
( ) 1,
( )
g D
G D
g D
=
Eq 5.1
Where
g0(D) = 1 + D2 + D3, Eq 5.2
g1(D) = 1 + D + D3. Eq 5.3
The initial value of the shift registers of the 8state constituent encoders shall be all zeros
when starting to encode the input bits. The output from the turbo encoder is
(0) (1) (2)
, ,
k k k
d d d
(
k k
x d =
) 0 (
,
k k
z d =
) 1 (
,
k k
z d ′ =
) 2 (
) for 1 ,..., 2 , 1 , 0 − = K k (ref[18]).K is the code block size from
40 to 6144 bits.
After all the information bits are encoded, we take the tail bits from the shit register
feedback, and this is called trellis termination. Tail bits are padded after the encoding of
information bits.
When the second constituent encoder is disabled, the first three tail bits can be used to
terminate the first constituent encoder. Upper switch in the figure 5.2 shows in low
position. Otherwise when the first constituent encoder is disabled, the last three tail bits
can be used to terminate the second constituent encoder. Lower switch in the figure5.2
shows in low position. The final trellis termination of output bits should be:
1 1 2 2 1 1 2 2
, , , , , , , , , , ,
K K K K K K K K K K K K
x z x z x z x z x z x z
+ + + + + + + +
′ ′ ′ ′ ′ ′ .
34
Figure 5.2 The Turbo Encoder of 3GPP LTE (ref[18])
The 8 states constituent encoder contains 3 registers. The input bits of the constituent
encoder are given to the left register. When each new input is coming, one parity bit will be
generated. These parity bits depend not only the on the present input bit, but also the three
previous input bits, which store in the shift registers.
We can use trellis diagram to present the encoder behaviour. k is the number of the input
data. The initial state starts with state=000. At the beginning three shift registers are all
zeros. Depends on the input is 0 or 1. The state goes to the next state and gets
corresponding parity bit. In the trellis diagram, the transition line is labelled with input and
output value. For example, k=2 state=100, the upper transition line labelled 0/1 stands for
input =0 and output =1. After the fourth slice of the trellis, the trellis diagram repeats them,
so that only one slice of the trellis is needed to define the entire trellis.
35
state 000 =
state 001 =
state 010 =
state 011 =
state 100 =
state 101 =
state 110 =
state 111 =
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
0 0 / 0 0 / 0 0 / 0 0 /
0 0 /
1 1 / 1 1 /
1 1 /
0 1 /
1 0 /
1 0 /
1 0 /
1 0 /
1 0 /
1 0 /
1 0 /
1 1 /
1 1 /
1 1 /
1 1 /
1 1 /
0 1 /
0 1 /
0 1 /
0 1 /
0 1 /
0 1 /
0 0 /
0 0 /
0 0 /
k 6 = k 1 = k 2 = k 3 = k 4 =
k 5 =
state 000 =
state 001 =
state 010 =
state 011 =
state 100 =
state 101 =
state 110 =
state 111 =
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
0 0 / 0 0 / 0 0 / 0 0 /
0 0 /
1 1 / 1 1 /
1 1 /
0 1 /
1 0 /
1 0 /
1 0 /
1 0 /
1 0 /
1 0 /
1 0 /
1 1 /
1 1 /
1 1 /
1 1 /
1 1 /
0 1 /
0 1 /
0 1 /
0 1 /
0 1 /
0 1 /
0 0 /
0 0 /
0 0 /
k 6 = k 1 = k 2 = k 3 = k 4 =
k 5 =
0 0 /
Figure 5.3 Trellis Diagram of One Constituent Encoder.
36
5.3 SISO Decoder
In this section the iterative decoding of turbo decoder will be described and the structure of
SISO decoder is shown in figure 5.4. In figure5.4 two decoder blocks correspond to the
two constituent decoders. The received signals are interfered by the channel noise. The
SISO decoder is used to correct errors and retrieve the original message. The SISO
algorithm of the two decoders will operate on soft input, which is the demodulator outputs
and the probability estimates.
The SISO decoder1 makes an estimate of the probability for each data bit. The three inputs
of the SISO decoder1 are systematic apriori information (
1 s
a λ ), systematic intrinsic
information (
1 s
i λ ), parity intrinsic information (
1 p
p λ ). SISO decoder1 calculate the
extrinsic systematic information (
1 s
e λ ) and the softoutput information (
1
( )
k
d Λ ) for each
systematic bit received. The extrinsic systematic information (
1 s
e λ ) from SISO decoder1
is interleaved and fed to the SISO decoder2 as apriori information
1 s
a λ , and also
interleaved systematic intrinsic information of SISO decoder1 (
1 s
i λ ) as the systematic
intrinsic information of SISO decoder2 (
2 s
i λ ). The same as SISO decoder1, SISO
decoder2 calculate the extrinsic systematic information (
2 s
e λ ) and the softoutput
information (
2
( )
k
d Λ ) for each systematic bit received. This schedule is called iteration and
will be continued until some stopping condition is met.
SISO Decoder1
interleaver
Deinterleaver
interleaver
Deinterleaver
SISO Decoder2
1 s
i λ
1 p
i λ
2 s
i λ
2 p
i λ
1 s
e λ
2 s
e λ
2 s
a λ 1 s
a λ
Demapper
0
1
( )
k
d Λ
ˆ
d
Figure 5.4 Turbo Decoder
37
6 Algorithm Level Design for Turbo Decoder
There are two main algorithms in the component of the SISO decoders. They are MAP
decoding and SOVA decoding. The MAP decoding algorithm is based on a posteriori
probabilities abilities (APP) probabilities. The SOVA decoding algorithm is based on ML
probabilities. Both of the algorithms use iterative technique to achieve decoding
performance. The MAP algorithm can output perform SOVA decoding by 0.5dB or more
(ref[19],ref[20]), so we choose MAP algorithm in our thesis.
6.1 MAP Decoding Algorithm
The BahlCockeJelinekRaviv (BCJR) ref[17] is optimal for estimating the a posteriori
probabilities abilities of the states and transitions of a Markov source observed through a
discrete memoryless channel. They show how the algorithm could be used for both block
codes and convolutional codes. The MAP algorithm checks very possible path through the
convolutional decoder trellis, so that it seems too complex for application in the most
systems. It is not widely used before the discovery of turbo codes. The MAP algorithm
provides not only the estimated bit sequence, but also the probabilities for each bit which is
has been decoded correctly. And the soft output can be used in the next iteration, and get
more accurate values.
.
The MAP algorithm ref[22]is rather complex according to large number of multiplications.
P. Robertson proposed a simplified MAP algorithm, LogMAP. In the LogMAP
algorithm all the calculation performs in log domain.
Turbo decoder calculates an accurate aposterioriprobability for the received block.
Finally it will make a hard decision by guessing the largest APP for parity bits after all
iterations finished.
38
6.2 Log MAP
The following equations show MAP algorithm which is used to calculate the soft
output, α value, β value, γ branch value and extrinsic information.
(1, ) 1,
1
( 0, ) 0,
1
1, (1, )
1
0, (0, )
1
( )
f m m m
k k k
f m m m
k k k
m m f m
k k k
m m
k m m f m
k k k
m
m
e
d In
e
α γ β
α γ β
α γ β
α γ β
+
+
+ +
+
+ +
+
Λ = =
∑ ∑
∑
∑
Eq 6.1
( , ) , ( , )
1 1
1 1
( , ) , ( , )
1 1
0 0
b j m j b j m
k k
m b j m j b j m
k k k
j j
In e
α γ
α α γ
− −
+
− −
= =
= =
∑ ∑
Eq 6.2
, ( , )
1
1 1
, ( , )
1
0 0
j m f j m
k k
m j m f j m
k k k
j j
In e
γ β
β γ β
+
+
+
= =
= =
∑ ∑
Eq 6.3
,
( ) ( , )
j m s s p
k k k k
j a i p m j i γ λ λ λ = • + + • Eq 6.4
( ) ( )
s s
k k k k
e d a i λ λ λ = Λ − + Eq 6.5
( )
s
k
e d λ extrinsic softoutput information of
k
d
m
k
α forward recursion state metric of state m in the trellis step k
m
k
β backward recursion state metric of state m in the trellis step k
b(j,m) if input is j and next state is m; b(j,m) is the current state.
f(j,m) if input is j and current state is m ,f(j,m) is the next state
, j m
k
γ branch metric if the state is m and received bit is j at the trellis step k
s
k
i λ Systematic intrinsic information of trellis step k
s
k
a λ Systematic apriori information of trellis step k
p
k
i λ parity intrinsic information of trellis step k
p(m,j) parity bit if the current state is m and systematic bit is j
The above equations are high complexity including log and multiplications. But it is can be
solved by Jacobean algorithm [20]
1
1
( ) max( , , ) ( 1, , )
n
x x
n n
In e e x x f x x +⋅ ⋅⋅ + = ⋅ ⋅ ⋅ + ⋅ ⋅ ⋅ Eq 6.6
( 1, , )
n
f x x ⋅ ⋅ ⋅ is the correction function. The maximizations values have some errors, but it
can be corrected by the correction function. This correction function can be implemented
by lookup table (LUT).
39
6.3 MaxlogMAP
The MaxlogMAP is least complex than logMAP algorithm. The figure 6.1 is the basic
element addcompareselect (ACS) hardware architecture. Compared with two
architectures, logMAP need more than one adder and lookuptable (LUT), If choose
MAXlogMAP, it can reduce more than 1/3 hardware. Usually, low complexity
architecture takes some disadvantage. It offers worse BER performance compared with
logMAP. Ref[23] shows a loss due to quantization of the correction function is not
visible.
1
1
( ) max( , , )
n
x x
n
In e e x x +⋅ ⋅ ⋅ + ≈ ⋅ ⋅ ⋅
+
+
+
+
+
+
+
LUT
MaxlogMAP
logMAP
Figure 6.1 Two Hardware Architecture
The MaxlogMAP algorithm includes the same parameter, but less complexity structure.
6.3.1 Gama
Start with the equation:
,
( ) ( , )
j m s s p
k k k k
j a i p m j i γ λ λ λ = • + + • Eq 6.7
s
k
a λ is the noisy received systematic bit and with code systematic bit j.
p
k
i λ is the
corresponding noisy received parity bit with code parity bit p(m,j). To see how good a
match between the pair of receptions and the codebit meaning of the trellis transition,
branch metrics can give a function in this case for each trellis transition.
40
Table 6.1 calculation of gama
, j m
k
γ
,
( ) ( , )
j m s s p
k k k k
j a i p m j i γ λ λ λ = • + + •
0,0
k
γ
0
1,0
k
γ
s s p
k k k
a i i λ λ λ + +
0,1
k
γ
0
1,1
k
γ
s s p
k k k
a i i λ λ λ + +
0,2
k
γ
p
k
i λ
1,2
k
γ
s s
k k
a i λ λ +
0,3
k
γ
p
k
i λ
1,3
k
γ
s s
k k
a i λ λ +
0,4
k
γ
p
k
i λ
1,4
k
γ
s s
k k
a i λ λ +
0,5
k
γ
p
k
i λ
1,5
k
γ
s s
k k
a i λ λ +
0,6
k
γ
0
1,6
k
γ
s s p
k k k
a i i λ λ λ + +
0,7
k
γ
0
1,7
k
γ
s s p
k k k
a i i λ λ λ + +
From the table we can find actually,
, j m
k
γ just has four kinds of values: 0,
s s
k k
a i λ λ +
p
k
i λ ,
s s
k k
a i λ λ + . It just need save three values:
s s
k k
a i λ λ + ,
p
k
i λ ,
s s p
k k k
a i i λ λ λ + +
in gama memory.
Gama will be used in three values calculation, which is alpha, beta, and LLR.
For example, to calculate alpha
1
( , ) , ( , )
1 1
0
m b j m j b j m
k k k
j
α α γ
− −
=
=
∑
Eq 6.8
, j m
k
γ needs sixteen value the same as the table 21.
But it just needs calculate two values
s s
k k
a i λ λ + , and
s s p
k k k
a i i λ λ λ + + , and it means gama
unity just need three adders.
p
k
i λ is the same as the input value
p
k
i λ .
41
j and p(j,m) just have four case j=0, p(j,m)=0 j=1, p(j,m)=0 j=0, p(j,m)=1 j=1, p(j,m)=1
So Gama value can be defined as gama00, gama01, gama10, gama11.
gama00 =0 (j=0, p(j,m)=0)
gama01=
p
k
i λ (j=0,p(j,m)=1)
gama10=
s s
k k
a i λ λ + (j=1,p(j,m)=0)
gama11=
s s p
k k k
a i i λ λ λ + + (j=1,p(j,m)=1)
But we just need to save gama01, gama10, gama11.
, ( , )
1
1 1
, ( , )
1
0 0
j m f j m
k k
m j m f j m
k k k
j j
In e
γ β
β γ β
+
+
+
= =
= =
∑ ∑
Eq 6.9
Beta calculation is similar with alpha, but beta values are calculated backwards through the
received soft input data.
42
6.3.2 Alpha
m
k
α can be expressed as the summation of all possible transition probabilities from the
time k1.b(j,m) is the state going backwards in tine from state m, via the previous branch
corresponding to the input j.
1
( , ) , ( , )
1 1
0
m b j m j b j m
k k k
j
α α γ
− −
=
=
∑
In the log domain, it can be expressed as:
( , ) , ( , )
1 1
1 1
( , ) , ( , )
1 1
0 0
b j m j b j m
k k
m b j m j b j m
k k k
j j
In e
α γ
α α γ
− −
+
− −
= =
= =
∑ ∑
Eq 6.10
According to the MaxlogMAP algorithm:
1
1
( ) max( , , )
n
x x
n
In e e x x +⋅ ⋅ ⋅ + ≈ ⋅ ⋅ ⋅
We can have the alpha as:
1
( , ) , ( , )
1 1
0
( )
m b j m j b j m
k k k
j
Max α α γ
− −
=
= + Eq 6.11
Gama depends on the systematic, parity, and extrinsic softoutput which is come from
previous iteration. This has been described is in the gama unity.
0
1 k
α
−
1
1 k
α
−
2
1 k
α
−
3
1 k
α
−
4
1 k
α
−
5
1 k
α
−
6
1 k
α
−
7
1 k
α
−
000
001
010
011
100
101
110
111
000
001
010
011
110
111
100
101
0
k
α
1
k
α
2
k
α
3
k
α
4
k
α
5
k
α
6
k
α
7
k
α
1/1
0/0
Figure 6.2 Calculation of Alpha
43
Figure 6.2 is an example of how to calculate alpha. The value
4
k
α at instance k is
calculated by taking the largest value from
0
1 k
α
−
and
1
1 k
α
−
. And each alpha value at instance
k1 should add the corresponding gama value.
Table 6.2 the following table is the detail which is used to calculate the
m
k
α
m
k
α
(0, ) 0, (0, ) (1, ) 1, (1, )
1 1 1 1
( , )
b m b m b m b m
k k k k
Max α γ α γ
− − − −
+ +
0
k
α
0 0,0 1 1,1
1 1 1 1
( , )
k k k k
Max α γ α γ
− − − −
+ +
1
k
α
3 0,3 2 1,2
1 1 1 1
( , )
k k k k
Max α γ α γ
− − − −
+ +
2
k
α
4 0,4 5 1,5
1 1 1 1
( , )
k k k k
Max α γ α γ
− − − −
+ +
3
k
α
7 0,7 6 1,6
1 1 1 1
( , )
k k k k
Max α γ α γ
− − − −
+ +
4
k
α
1 0,1 0 1,0
1 1 1 1
( , )
k k k k
Max α γ α γ
− − − −
+ +
5
k
α
2 0,2 3 1,3
1 1 1 1
( , )
k k k k
Max α γ α γ
− − − −
+ +
6
k
α
5 0,5 6 1,6
1 1 1 1
( , )
k k k k
Max α γ α γ
− − − −
+ +
7
k
α
6 0,6 7 1,7
1 1 1 1
( , )
k k k k
Max α γ α γ
− − − −
+ +
44
6.3.3 Beta
Beta is reverse state metric, and represents
m
k
β as the summation of all possible transition
probabilities from time k+1, and can be written as:
1
, ( , )
1
0
m j m f j m
k k k
j
β γ β
+
=
=
∑
f(j,m) is the next state, if input is j and current state is m
Use MaxlogMAP algorithm,
m
k
β can be derived as:
1
, ( , ) (0, ) 0, (0, ) (1, ) 1, (1, )
1
0
( , )
m j m f j m f m f m f m f m
k k k k k k k
j
Max β γ β β γ β γ
+
=
= = + +
∑
0
k
β
1
k
β
2
k
β
3
k
β
4
k
β
5
k
β
6
k
β
7
k
β
000
001
010
011
100
101
110
111
000
001
010
011
110
111
100
101
0
1 k
β
+
1
1 k
β
+
2
1 k
β
+
3
1 k
β
+
4
1 k
β
+
5
1 k
β
+
6
1 k
β
+
7
1 k
β
+
1/0
0/1
Figure 6.3 Calculation of Beta
Beta calculation is similar with alpha, but calculated in reversed order.
45
Table 6.3 the following is the table, which is the detail to calculate the beta
1
m
k
β
+
(0, ) 0, (0, ) (1, ) 1, (1, )
( , )
f m f m f m f m
k k k k
Max β γ β γ + +
0
1 k
β
+
0 0,0 4 1,4
( , )
k k k k
Max β γ β γ + +
1
1 k
β
+
4 0,4 0 1,0
( , )
k k k k
Max β γ β γ + +
2
1 k
β
+
5 0,5 1 1,1
( , )
k k k k
Max β γ β γ + +
3
1 k
β
+
1 0,1 5 1,5
( , )
k k k k
Max β γ β γ + +
4
1 k
β
+
2 0,2 6 1,6
( , )
k k k k
Max β γ β γ + +
5
1 k
β
+
6 0,6 2 1,2
( , )
k k k k
Max β γ β γ + +
6
1 k
β
+
7 0,7 3 1,3
( , )
k k k k
Max β γ β γ + +
7
1 k
β
+
3 0,3 7 1,7
( , )
k k k k
Max β γ β γ + +
46
6.3.4 Loglikelihood Ratio
Loglikelihood ratio (LLR) is used to find the output of the soft decision.
(1, ) 1,
1
( 0, ) 0,
1
1, (1, )
1
0, (0, )
1
( )
f m m m
k k k
f m m m
k k k
m m f m
k k k
m m
k m m f m
k k k
m
m
e
d In
e
α γ β
α γ β
α γ β
α γ β
+
+
+ +
+
+ +
+
Λ = =
∑ ∑
∑
∑
=
7 7
1, (1, ) 0, (0, )
1 1
0 0
( ) ( )
m m f m m m f m
k k k k k k
m m
Max Max α γ β α γ β
+ +
= =
+ + − + + Eq 6.15
The γ value, α value, and β value can be obtained from γ , α , β unit. The main
operation of LLR is comparison, addition, and subtraction. The sign bit of the LLR is use
to make a hard decision, and the magnitude can give a reliability estimate.
Extrinsic softoutput information
When using turbo decodes, the loglikelihood ratio can be iterated several times to
improve the reliability of hard decision. In ASIC design, extrinsic softoutput information
is fixed point data. If use LLR as apriori information, it will be easy to overflow. Extrinsic
softoutput information is obtained from the loglikelihood ratio by subtracting the
systematic information and the apriori information. Extrinsic softoutput information is
the value fed back to the next decoder as the apriori information after interleaving.
After all iterations complete, the decoded information bits can be retrieved by looking at
the sign bit of LLR. If it is positive the bit is one, and if it is negative the bit is a zero.
47
6.4 Windowing
According to 3GPP LTE standard, the block size range is from 40 to 6144.
The serial sequence of computation for MAP decoding:
Window size is block size.
In forward calculation for each α metric is computed, after all α values have been
calculated, the backward and LLR start, and for each &β Λ is computed from the end of
the block to the beginning. The figure 6.4 shows the schemes. The latency time for this
case is:
W s t
N N N + +
W
N window size
s
N sliding window size
t
N tail bits number
6.4.1 Serial Window
Figure 6.4 Serial Window Schemes
Figure 6.4 is serial MAP (SMAP), and the latency time is the longest. And it also needs to
save all the gama and alpha value, so that the memory area is rather large.
48
6.4.2 Sliding Window
In order to reduce the computational time and area, the sliding window has been
introduced in this case. The basic idea is the sequence of forward calculations is the same
as SMAP, but backward β calculation are separated every sliding window length v.
Figure 6.5 Sliding Window Schemes
6.4.3 Super Window
To reach a high through turbo decoding and super window can be used in this case.
Forward calculation is not the same as the SMAP. Alpha calculation are separated every
parallelism windows.
Figure 6.6 Super Window Schemes
49
The following is a comparison for throughput of different kinds of windows.
In our thesis, clock frequency is 300 MHz, and parallel window is 8, and sliding window length is
40.
The latency can be use the equation:
1
b
sl t
s clock
N
N N
N F
 
+ + •

\ ¹
Eq 6.16
And throughput can use the equations:
b
clock
b
sl t
s
N
F
N
N N
N
•
 
+ +

\ ¹
Eq 6.17
b
N block size
s
N SISO number
t
N tail bits number
sl
N sliding window size
clock
F clock frequency
Table 6.4 throughput and latency time for three kinds of windows
Kinds of window Throughput (Mbps) Latency time ns
Serial window 149.9634 0.0410
Sliding Window 266.5510 0.0231
Super window 2272.7 0.0027
50
6.5 Sliding Window
Implementing serial MAP window requires a very large memory to store the state metrics,
and also cost a long latency time. The introduced sliding window can decrease the memory
size and latency time.
In the serial MAP window, the backward recursion start calculates after forward recursion
complete. If the block size is 6144, and using the serial window, a memory size should be:
6144*8*8=393216 bits. Using super windowing a memory size can be: 40*8*8*4= 10240
bits.
The super windowing brings some initial problems for forward metric, and backward
metric. When using super windowing, forward metric is divided into several windows.
Except the first window, the others just can give approximate value for each window.
Forward metric initial problem is just caused by parallelism. Backward metrics initial
problem is caused not only parallelism but also sliding window.
There are two wellknow technique to solve the initial value problem:
Training calculation: producing initial value by training calculation as figure 6.7
Next Iteration Initialization: Using previous iteration value for the corresponding state
metric
Training calculation: Usually, start a few stages ahead before metric calculation. It can be
10 times the constraint length for realistic channel. This method increases the computation,
and also the latency and power consumption.
51
Figure 6.7 Sliding Window with Training Calculations
Next Iteration Initialisation: Turbo decoder works as iteration and next iteration is more
accurate than the previous iteration. But the previous state metrics in the previous
iterations are close to next state metrics in the next iteration at the same trellis. So the
previous iteration can be considered as the initial value for the next iteration. This needs
extra memory to store those state metrics, which can be used in the next iteration as the
initial value. The time scheme is showed in the figure 6.9.
Figure 6.9 Next Iteration Initialization Sliding Window
52
7 Turbo Decoder Hardware Implementation and Simulation Results
The Hardware architecture for turbo decoder is showed on figure7.1. The introduction for
turbo decoder contains two SISO decoders for two half iterations. Actually, the two SISO
do not work at the same time. It always waits until another computation complete, and the
two SISO decoders are same function and structure. One SISO unit can be used instead of
two SISO decoders.
The input buffer stores the received bits from the channel, including systematic
information bits and parity information bits. Both data memories are used to store the
extrinsic value from SISO unit. To implement the figure5.4, for the first SISO decoder,
SISO unit reads from Data memory 1 and writes to memory 2. For the second SISO
decoder, SISO unit reads from Data memory 2 and writes to memory 1.
Figure 7.1 Turbo Decoder Block Diagram
53
7.1 Parallel Window
To achieve high throughput, parallel window is used in turbo decoder. The block divided
into several windows and each window alpha is executed serially. Due to Parallel window
and sliding window, SISO unit need some memory to save the initializing alpha and beta
value for beginning and end of window. The size of Data Memory 1 and Data memory 2 do
not change, but change to several smaller memories.
For example:
The maximum block size in LTE 3GPP is 6144, and window number is four. Data memory
actually contains four small memory, and small memory size is 6144/4=2048.Parallel
memory needs parallel interleaver.
Figure 7.2 Parallel Window Block Diagram
54
7.2 SISO Architecture:
All the components of the architecture are used in the figure 7.3. γ unit , α unit , β unit,
LLR unit and extrinsic soft output unit will discuss and give details later. γ and α stack
work as first in and last out. Input and output need working at the same time, so it means
two memories needed it works as pingpong buffer.
Figure 7.3 SISO Architecture
55
7.3 Time Schedule for SISO Unit:
The following is an example for SISO decoder:
Block size = 2016
SISO number =8.
Sliding window size =64
Figure 7.4 Time Schedule for SISO Unit
56
7.4 State Metric Architecture:
7.4.1 Gama:
Gama unit is very simple. It just needs adders to calculate the gama00, gama01, gama10,
gama11. Gama00 is always zeros, and gama01 is the same value as
p
k
i λ
, so that gama unit
just needs two adders.
p
k
i λ
s
k
i λ
s
k
a λ
+
+
p
k
i λ gama01
gama10
gama11
Figure 7.5 Branch Metrics Unit (BMU)
7.4.2 Alpha and Beta:
From the alpha and beta equation, the basic operation is: addcompareselect (ACS) unit.
k 1 −
γ
1
( , ) , ( , )
1 1
0
( )
m b j m j b j m
k k k
j
Max α α γ
− −
=
= +
1
( , ) , ( , )
0
( )
m f j m j f j m
k k k
j
Max β β γ
=
= +
+
+
+
Figure 7.6 ACS Unit Structure
Alpha and beta unit is based on the ACS, but just different connection according to the
trellis.
57
Figure 7.7 α Unit
Beta unit is the same structure, but calculate in reversed order.
Figure 7.8 β Unit
58
7.4.3 Loglikelihood Ratio (LLR)
(0, ) 0, (0, ) (1, ) 1, (1, )
( , ) = + +
m f m f m f m f m
k k k k k
Max β β γ β γ Eq 7.3
(k) Λ =
7 7
1, (1, ) 0, (0, )
1 1
0 0
( ) ( )
m m f m m m f m
k k k k k k
m m
Max Max α γ β α γ β
+ +
= =
+ + − + + Eq 7.4
When calculate
m
k
β value,
(0, ) 0, (0, ) f m f m
k k
β γ + and
(1, ) 1, (1, ) f m f m
k k
β γ + have already been got.
(0, ) 0, (0, ) f m f m
k k
β γ + and
(1, ) 1, (1, ) f m f m
k k
β γ + are part of the
0, (0, )
1
m m f m
k k k
α γ β
+
+ + and
1, (1, )
1
m m f m
k k k
α γ β
+
+ +
We can define:
,0 (0, ) 0, (0, ) m f m f m
k k k
β β γ = + and
,1 (1, ) 1, (1, ) m f m f m
k k k
β β γ = +
So that loglikelihood rate can be expressed as
(k) Λ =
7 7
,1 ,0
0 0
( ) ( )
m m m m
k k k k
m m
Max Max α β α β
= =
+ − + Eq 7.5
With this method we can save 1/3 adders.
Loglikelihood rate architecture:
Figure 7.9 LLR Unit.
There is no feedback computation, and latency time is five. At the first level there is
sixteen adders, the pipeline is introduced in the structure to reduce area.
Extrinsic information:
( ) ( )
s s
k k k k
e d a i λ λ λ = Λ − + Eq 7.6
From the equation, two adders just need. But extrinsic information soft output is calculated
in reserved order.
s s
k k
a i λ λ + is the SISO input and normal order. To get reversed input, it
59
needs extra memory to save this value. To solve this problem, beta calculation is the same
need reversed order gama value, and gama10=
s s
k k
a i λ λ + , so that
s s
k k
a i λ λ + can be got from
beta unit.
60
7.5 Simulation Results:
In order to benchmark the parallel Turbo decoder, simulation is carried out based on a
3GPP LTE linklevel simulator. The simulator is compatible to 3GPP specification
defined in 36.211 and 36.212. The 3GPP SCME model [24] is used as the channel model.
In the simulation, 5000 subframes have been simulated. 64QAM modulation and 5/6
coding rate is used. Softoutput sphere decoder which achieves ML performance is used as
the detection method. No closeloop precoding is assumed. At most three retransmissions
are allowed in ChaseCombine (CC) based HARQ. Simulation results are depicted in
Figure 7.10, 7.11 and 7.12. The Turbo decoder runs for up to eight iterations with
earlystopping enabled.
Figure 7.10 Code/Uncoded Bit Error Rate for Parallel /Ideal Turbo Simulation
61
Figure 7.11 Code/Uncoded Frame Error Rate for Parallel /Ideal Turbo Simulation
Figure 7.12 Code/Uncoded Bit Throughput for Parallel /Ideal Turbo Simulation
62
Part Ⅲ
8 Conclusion and Future Work
8.1 Conclusion
In this thesis, it proposes the general DFE architecture and the DDC specification for 3GPP
LTE, WiMAX, and 802.11n with investigation. Compare filter architecture for DDC.
Finally 802.11n DDC filter is designed and implemented.
Hardware and software cosimulation is a good test method and has been introduced in the
DFE part.
For the turbo codes part, 3GPP LTE turbo decoder is designed from algorithm level to the
hardware level. Finally, a Turbo decoder with eight parallel Radix2 SISO units has been
implemented.
The DFE and turbo decoder are the two key components in the wireless digital systems.
Through the prestudy and design carried out in this thesis, the author had chance to learn
both the wireless system design methodology and hardware design skills, specially the
topdown design flow.
8.2 Future Work
For the DFE part, only preliminary work has been done in this thesis. As future work, more
functionality such as automatic gain control needs to be implemented. Furthermore, the
hardware implement in this thesis is only for 802.11n. To implement a flexible DFE
targeting multiple standards such as 3GPP LTE and WiMAX, a parameterized design
needs to be implemented and integrated.
Usually the output of DFE needs to be saved in a FIFO memory, which depends on the
design of the baseband part. The design tradeoff needs to be addressed in the future.
For the Turbo decoder part, a 3GPP LTE Radix2 Turbo decoder has been implemented. It
only supports 3GPP LTE. In order to support other standard such as WiMAX, a Radix4
implementation is needed which can be done as future work.
WiMAX decoder is duobinary turbo decoder, and the structure is similar with the 3GPP
LTE Radix4 turbo decoder. There are two main differences between WiMAX duobinary
turbo decoder and radix 4 turbo decoder:
63
1. Due to different standard, the encoder has different polynomial generator, so that the
trellis is different. It just needs change the connection that means just need to add some
multiplexers.
2. WiMAX duobinary turbo decoder trellis termination is closed as a circle with the start
state equal to the end state, but 3GPP LTE turbo decoder trellis termination is tail bits
with state 0 and end with state 0.
64
Reference
[1] “A 50mW HSDPA Baseband Receiver ASIC with Multimode Digital FrontEnd”,
Martelli, C. Reutemann, R. Benkeser, C. Qiuting Huang ,SolidState Circuits
Conference, 2007. ISSCC 2007. Digest of Technical Papers. IEEE International
[2] C. Benkeser, A. Burg, T. Cupaiuolo, Q. Huang, "A 58mW 1.2mm² HSDPA Turbo
Decoder ASIC in 0.13 μm CMOS", Digest of Technical Papers, IEEE International
SolidState Circuits Conference (ISSCC), pp. 264612, Feb 37, 2008
[3] Xilinx WiMAX DFE reference design, XAPP966C (v1.1) April 3, 2007
[4] IEEE Std 802.162004, IEEE Standard for Local and Metropolitan Area Networks:
Part 16: Air Interface for Fixed Broadband Wireless Access Systems.
[5] IEEE Std 802.16e2005 and IEEE Std 802.162004/Cor12005, IEEE Standard for
Local and metropolitan area networks: Part 16: Air Interface for Fixed Mobile
Broadband Wireless Access Systems, Amendment 2: Physical and Medium Access
Control Layers for Combined Fixed and Mobile Operation in Licensed Bands and
Corrigendum 1.
[6] Xilinx High Density WCDMA Digital Front End Reference Design (Xilinx
Confidential), XAPP921c (v2.1) April 3, 2007
[7] IEEE P802.11n™/D2.00, “Draft STANDARD for Information
TechnologyTelecommunications and information exchange between systemsLocal
and metropolitan area networksSpecific requirementsPart 11: Wireless LAN
Medium Access Control (MAC) and Physical Layer (PHY) specifications.”
[8] G. Hueber,L maurer, G Strasser, R. Stuhjberger,K. Chabrak, and R.Hagleager, “ The
design of a multimode/multisystem capable software radio receiver,”
[9] http://www.mathworks.com/products/filterdesign/demos.html?file=/proucts/demos/s
hipping/filterdesign/ddcfilterchaindemo.html
[10] http://www.mathworks.com/matlabcentral/fileexchange/1716
[11] P. Elias, Error –free coding, IRE Trans. Theory, vol. IT 4, 1954, pp.2937
65
[12] G.D. Forney,Jr., Convolutional Codes I: Algebraic Structure, IEEE Trans. Inform
Theory,vol.IT16,no.6, Nov. 1970 ,pp. 720738,and vol. IT 17,no3,May 1971,p.360
[13] S. Benedetto and G. Montorsi, “Unveiling turbocodes: Some results on parallel
concatenated coding schemes,” IEEE Trans. Inform. Theory, vol.42, pp.409428,
1996.
[14] G. D. Forney Jr., Concatenated Codes. Cambridge, MA: MIT press, 1966.
[15] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon limit errorcorrecting
coding and decoding: Turbo Codes,” in Proc. ICC, 1993, pp. 10641070.
[16] Perez L. C., Seghers J. and Costello D. J. Jr, “A Distance Spectrum Interpretation of
Turbo Codes,” IEEE Trans Information Theory, vol.42, no.6, Nov. 1996, pp.
16981709.
[17] L.R. Bahl, J. Cocke, F. Jelinek and J. Raviv  “Optimal Decoding of Linear Codes for
Minimizing Symbol Error Rate”, IEEE Transactions on Information Theory, Volume
20, Issue 2, Mar 1974 Pages: 284 – 287.
[18] 3GPP TS 36.212 V8.5.0 (200812) 3rd Generation Partnership Project; Technical
Specification Group Radio Access Network; Evolved Universal Terrestrial Radio
Access (EUTRA); Multiplexing and channel coding (Release 8)
[19] Pietrobon, S., “Implementation and Performance of Turbo/MAP Decoder,” Int’l.
J.Satellite Commun., vol. 15, JanFeb 1998, pp.2346.
[20] Robertson, P., Villeburn,E., and Hoeher, P., “A comparison of Optimal and
SubOptimal MAP Decoding Algorithms Operating in the log Domain,” Proc. Of ICC
’95, Seattle, Washington, UN 1995, pp 10091013.
[21] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon limit errorcorrecting
coding and decoding: Turbocodes,” in Proc. ICC’93, Geneve, Switzerland, May
1993, pp. 1064–1070.
[22] P.Robertson ,P. Hoeher ,and E. Vilebrun , “ Optimal and suboptimal maximum a
posteriori algorithms suitable for turbo decoding,” European Trans. Telecomm., vol.8,
no.2 Mar. Apr.1997.
66
[23] Patrick Robertson,Emmanuelle Villebrun and Peter Hoeher, “ A comparison of
Optimal and Sub –Optimal MAP Decoding Algorithms Operation in the Log
Domain” Communications, 1995. ICC '95 Seattle, 'Gateway to Globalization', 1995
IEEE International Conference on 2, on page(s): 10091013 vol.2.
[24] D. S. Baum, J. Salo, M. Milojevic, P. Kyoti, and J. Hansen, “MATLAB
implementation of the interim channel model for beyond3G systems (SCME)”, May
2005.
67
Appendix A
Coefficient for 802.11n DDC filter:
h1 = 0.0020599365234
h2 = 0.00057601928711
h3 = 0.0010108947754
h4 = 0.0016479492188
h5 = 0.0028839111328
h6 = 0.0045013427734
h7 = 0.0068283081055
h8 = 0.010028839111
h9 = 0.014629364014
h10 = 0.021575927734
h11 = 0.033279418945
h12 = 0.057586669922
h13 = 0.14438247681
h14 = 0.45468902588
h15 = 0.09334564209
h16 = 0.05207824707
h17 = 0.035419464111
h18 = 0.026027679443
h19 = 0.019779205322
h20 = 0.015243530273
h21 = 0.011726379395
h22 = 0.0089797973633
h23 = 0.0066986083984
h24 = 0.0050010681152
h25 = 0.0035057067871
h26 = 0.0039749145508
ARM1
coefficient h1 h15 h24 h10
From RAM1 x(0) x(14) x(28) x(43)
From RAM2 x(51) x(37) x(23) x(9)
ARM2
coefficient h2 h16 h23 h9
From RAM1 x(50) x(36) x(23) x(9)
From RAM2 x(1) x(15) x(29) x(43)
68
ARM3
coefficient h3 h17 h22 h8
From RAM1 x(2) x(16) x(30) x(44)
From RAM2 x(49) x(35) x(22) x(7)
ARM4
coefficient h4 h18 h21 h7
From RAM1 x(48) x(34) x(20) x(6)
From RAM2 x(3) x(17) x(31) x(45)
ARM5
coefficient h5 h19 h20 h6
From RAM1 x(4) x(18) x(32) x(46)
From RAM2 x(48) x(34) x(19) x(5)
ARM6
coefficient h11 h25 h11
From RAM1 x(10) x(24) x(38)
From RAM2 x(41) x(27) x(13)
ARM7
coefficient h2 h26 h13
From RAM1 x(40) x(26) x(12)
From RAM2 x(11) x(25) x(39)
Note: X(n) =
n
z
−
69
Appendix B
Standard parameters and detail
Block size f1 f2 Block size f1 f2
40 3 10 1120 67 140
48 7 12 1152 35 72
56 19 42 1184 19 74
64 7 16 1216 39 76
72 7 18 1248 19 78
80 11 20 1280 199 240
88 5 22 1312 21 82
96 11 24 1344 211 252
104 7 26 1376 21 86
112 41 84 1408 43 88
120 103 90 1440 149 60
128 15 32 1472 45 92
136 9 34 1504 49 846
144 17 108 1536 71 48
152 9 38 1568 13 28
160 21 120 1600 17 80
168 101 84 1632 25 102
176 21 44 1664 183 104
184 57 46 1696 55 954
192 23 48 1728 127 96
200 13 50 1760 27 110
208 27 52 1792 29 112
216 11 36 1824 29 114
224 27 56 1856 57 116
232 85 58 1888 45 354
240 29 60 1920 31 120
248 33 62 1952 59 610
256 15 32 1984 185 124
264 17 198 2016 113 420
70
272 33 68 2048 31 64
280 103 210 2112 17 66
288 19 36 2176 171 136
296 19 74 2240 209 420
304 37 76 2304 253 216
312 19 78 2368 367 444
320 21 120 2432 265 456
328 21 82 2496 181 468
336 115 84 2560 39 80
344 193 86 2624 27 164
352 21 44 2688 127 504
360 133 90 2752 143 172
368 81 46 2816 43 88
376 45 94 2880 29 300
384 23 48 2944 45 92
392 243 98 3008 157 188
400 151 40 3072 47 96
408 155 102 3136 13 28
416 25 52 3200 111 240
424 51 106 3264 443 204
432 47 72 3328 51 104
440 91 110 3392 51 212
448 29 168 3456 451 192
456 29 114 3520 257 220
464 247 58 3584 57 336
472 29 118 3648 313 228
480 89 180 3712 271 232
488 91 122 3776 179 236
496 157 62 3840 331 120
504 55 84 3904 363 244
512 31 64 3968 375 248
528 17 66 4032 127 168
544 35 68 4096 31 64
560 227 420 4160 33 130
576 65 96 4224 43 264
592 19 74 4288 33 134
608 37 76 4352 477 408
71
624 41 234 4416 35 138
640 39 80 4480 233 280
656 185 82 4544 357 142
672 43 252 4608 337 480
688 21 86 4672 37 146
704 155 44 4736 71 444
720 79 120 4800 71 120
736 139 92 4864 37 152
752 23 94 4928 39 462
768 217 48 4992 127 234
784 25 98 5056 39 158
800 17 80 5120 39 80
816 127 102 5184 31 96
832 25 52 5248 113 902
848 239 106 5312 41 166
864 17 48 5376 251 336
880 137 110 5440 43 170
896 215 112 5504 21 86
912 29 114 5568 43 174
928 15 58 5632 45 176
944 147 118 5696 45 178
960 29 60 5760 161 120
976 59 122 5824 89 182
992 65 124 5888 323 184
1008 55 84 5952 47 186
1024 31 64 6016 23 94
1056 17 66 6080 47 190
1088 171 204 6144 263 480
Table A.1 LTE interleaver parameters.
VLSI Implementation of Key Components in A Mobile Broadband Receiver
Master thesis in Computer Engineering Department of Electrical Engineering at Linköping Institute of Technology by
Yulin Huang
LiTHISYEX09/4103SE
Supervisor: Di Wu Linköpings Universitet Examiner: Dake Liu Linköpings Universitet Linköping, May 27, 2009
802. WiMAX. 3GPP LTE. FPGA. 802. The structure of digital frontend for multistandard radio supporting wireless standards such as IEEE 802. Turbo.11n digital downconverter is designed from Matlab model to VHDL implementation. A toptodown design methods. filter.ep. Electronic Version http://www.11n.se Publication Title Licentiate thesis Degree thesis Thesis Clevel Thesis Dlevel Report Other (specify below) VLSI Implementation of Key Components in A Mobile Broadband Receiver Author(s) Yulin Huang Abstract Digital frontend and Turbo decoder are the two key components in the digital wireless communication system. As another significant part of the thesis. WiMAX. The Turbo decoder will use eight parallel SISO units to reach a throughput up to 150Mits. 3GPP LTE is investigated in the thesis.11n.Presentation Date 20090525 Publishing Date (Electronic version) 2009064 Department and Division Department of Electrical Engineering Computer Engineering Language √ Type of Publication √ ISBN (Licentiate thesis) ISRN: LiTHISYEX09/4103SE Title of series (Licentiate thesis) Series number/ISSN (Licentiate thesis) English Other (specify below) Number of Pages 71 URL. This thesis will discuss the implementation issues of both digital frontend and Turbo decoder. loglikelihood ratio. slidingwindow. hardware implementation. Both simulation and FPGA prototyping are carried out. SISO decoder. The block size supported ranges from 40 to 6144 and the maximum number of iteration is eight. Keywords: Digital frontend. .liu. MaxlogMAP. a parallel Turbo decoder is designed and implemented for 3GPP LTE.
.
This thesis will discuss the implementation issues of both digital frontend and Turbo decoder. MaxlogMAP. loglikelihood ratio.11n digital downconverter is designed from Matlab model to VHDL implementation. WiMAX. Turbo. Both simulation and FPGA prototyping are carried out. The structure of digital frontend for multistandard radio supporting wireless standards such as IEEE 802.Abstract Digital frontend and Turbo decoder are the two key components in the digital wireless communication system. 3GPP LTE. FPGA. hardware implementation I . WiMAX. As another significant part of the thesis. a parallel Turbo decoder is designed and implemented for 3GPP LTE.11n. A toptodown design methods. Keywords: Digital frontend.11n. 802. 3GPP LTE is investigated in the thesis. SISO decoder. 802. slidingwindow. The Turbo decoder will use eight parallel SISO units to reach a throughput up to 150Mits. filter. The block size supported ranges from 40 to 6144 and the maximum number of iteration is eight.
II .
May 2009 III . Yulin Huang Linköping. Dake Liu for their warmhearted help during the whole thesis. I love you as you love me. Finally. I would also like to thank all my friends in Linköping for their help and the good time we shared.Acknowledgements I would like to thank my supervisor Di Wu and examiner Prof. I would like to express the deepest gratitude to my parents for their unconditional love and supporting me in everything. Their help was not only about solving technical problems but also the methodology of doing scientific research.
IV .
........................................................................... 6 2.........................................................................................4........................................................................5 SLIDING WINDOW .........4 THESIS ORGANIZATION ................... 11 2....................... 5 2.........................1 RF ........................................................................................2 Halfband ......................... 7 2.................................................................................. 48 6................................................4..................................................................................3 Spectral Mask of 802..........................................................................3........ 2 1................2 ALPHA ............................................1 Serial Window ...............1.................................................................................................................1 Spectral Mask of WiMAX(802....................................................................................................................................................................................... 4 PART Ⅰ ..... 36 6 ALGORITHM LEVEL DESIGN FOR TURBO DECODER...1 TURBO ENCODER: .2 ADC. 30 4............................................2 3GPP LTE TURBO ENCODER ..................................................................................................................................................................................... 30 4 CHANNEL CODES................................................................................... 37 6................................................................. 37 6...............4........................................................ 47 6.....................................................................3... 1 1.....................................................................................3....................... 4 1.. 39 6.....................3..................................................................... 30 5 TURBO CODES ......................................11n:..........16e): ..................... 39 6......... 42 6......... 48 6..........................................................................................................3 DFE ....................................................................3 BETA .........1.........................................................................................................2 LOG MAP ...................................................................... 44 6.............................11N DDC ...................4 Baseband ..................................................... 33 5................................................................................................................................................3 Performance Comparison: ......4.........4 WINDOWING .........................................1.................................. 30 4........................................................................................................3..............................................................2 CONVOLUTIONAL CODES ............................................................................ 14 2.............................1 INTRODUCTION............................................4......................... 15 3 HARDWARE IMPLEMENTATION OF 802...1 CIC .....1 LINEAR BLOCK CODES .........................................................3 MAXLOGMAP ..........................................................................................................................4 LOGLIKELIHOOD RATIO .................................4.........................................................................................2 DDC STRUCTURE: ..........................................................3 GOAL ..................................................................... 1 1................................................................................................................................................................................. 47 6......... 32 5.2 Sliding Window....................................................................................................... 46 6............................. 1 1...................................................2 Spectral Mask of 3GPP LTE: ........................................................................................................................................................................................ 50 V ..................................................... 38 6................................................................................................................................................................................................................................................. 7 2................................................................................................................. 32 5.............Contents 1 INTRODUCTION ..... 11 2................................................................................................................................................................................. 5 2 DIGITAL FRONT–END (DFE) ...................................................... 8 2........3........................................................ 5 2.................................................................................................................. 2 1.............................. 24 PART Ⅱ ..............1 BACKGROUND .................................................................................................. 3 1..........1 MAP DECODING ALGORITHM .......................2 MOTIVATION ................... 10 2....................3 SISO DECODER ...............................................1..1 GAMA .............................................3.............................................................................3 Super Window .....................................4 FILTER ARCHITECTURES: ..............3 DFE SPECIFICATION FOR DIFFERENT STANDARDS .......................... 2 1................
..............4......1 CONCLUSION ......................2 Alpha and Beta: ......................................................4...................................................................................................................................... 56 7.............................................1 PARALLEL WINDOW .............2 FUTURE WORK ............................... 62 8 CONCLUSION AND FUTURE WORK .................... 55 7............................ 56 7................... 60 PART Ⅲ .7 TURBO DECODER HARDWARE IMPLEMENTATION AND SIMULATION RESULTS. 52 7.................................................................................................................................................... 56 7........ 62 8....................................4 STATE METRIC ARCHITECTURE: ..........................................................................................2 SISO ARCHITECTURE:........... 62 REFERENCE.............................................................................................................. 64 APPENDIX A......... 67 APPENDIX B ............................................... 53 7....................5 SIMULATION RESULTS:..1 Gama: ..................... 54 7................................................................................................................................................................3 TIME SCHEDULE FOR SISO UNIT:.................................................................................4.................................................................................................... 69 VI .......................................................................................................................................................................................................................................... 58 7..................................3 Loglikelihood Ratio (LLR).................................................................................................................................. 62 8...................................................................................................................................................................................................................
Glossary VLSI BER 3GPP LTE 3G UMTS WiMAX RSC SNR IF DFE ADC LNA FFT OFDMA EVM SRC DDC AGC CIC WDF FIR SISO MAP SMAP LLR LUT verylargescale integration bit error rate 3rd Generation Partnership Project LongTerm Evolution 3rd Generation technology Universal Mobile Telecommunication System Worldwide Interoperability for Microwave Access recursive systematic convolution code signal to noise ratio intermediate Frequency digital frontend analogy to digital converter low noise amplifier fast Fourier transform Orthogonal FrequencyDivision Multiple Access error vector magnitude sample rate converter digital down converter automatic gain control cascade integrator comb wave digital filter finite impulse response Softinput SoftOutput Maximum Aposteriori Probability serial MAP loglikelihood ratio lookup table VII .
VIII .
We are enjoying the more and more advantage wireless communication service. it normally contains Antenna. which becomes smaller.1 Background For a general wireless communication system. Analog to Digital Converter (ADC).1. and cheaper.1 Introduction Wireless system has been widely used many fields. It plays a very important role in our daily life. like mobile phone. RF has a local oscillator. and converts the high frequency to the intermediate frequency.1 Receiver Architecture of Wireless Communication Systems 1.1 RF Radio Frequency (RF) is used to deal with the analog signal which is received from antenna. faster. and baseband signal processing hardware. satellite communication etc. Wireless system develops faster and faster. 1.2 A Typical RF Receiver Structure 1 . Wireless Local Area Network. Digital FrontEnd (DFE). Radio Frequency (RF) block. Figure 1. Figure 1. especially after VLSI appeared.
After the channel encoder. The function of DFE usually includes automatic gain control. The basic structure is illustrated by the block diagram in the figure 1. and deals with the data according to different standards. high speed. information bits cannot be sent directly. Generally speaking. it consumes a large portion of the silicon area in the receiver implementations. To maximise the information transmission rate.1. 2 . 1. DFE is usually considered to be simpler than the baseband part from the functional aspect. sample rate conversion. and small area. we can use source encoder to reduce the redundant part.3 DFE DFE acts as a bridge between the analog part and digital part in the wireless receiver. There are SigmaDelta ADC. it is mainly a block of digital filters. Flash ADC. 1. The modulation will convert the encoded information bits to a signal which is suitable for transmission channel.3 Figure 1.1. it requires low power.1.4 Baseband DFE is followed by baseband.2 ADC ADC is used to convert analog signal to digital signal after RF. pulse shaping.3 Diagram of Baseband for Digital Communication System The information source can not transmit directly due to the channel noise. matched filtering etc. We use channel encoder and add some redundant information. which can be used to correct errors. For the wireless system. and Successive Approximation Register ADC etc.1. However.
especially in multiple wireless communication standards which will be implemented into a single transceiver system.4 shows the die micro graph of DFE and WCDMA/HSDPA signal processing (ref[1]). we can see DFE and Turbo decoder consume more than 50% of the total silicon area (and also power) in broadband receivers.4 shows the die micro graph of HSDPA Turbo Decoder (ref[2]). Nowadays people require high data rates connections and the transmission rate of wireless systems is fast growing.ref[2]). and the right side of the figure1. 1. DFE is based on the concept of software defined radio. The left side of figure 1. A good coding method can improve BER performance and throughput significantly in digital communication systems. and it is the key component of soft defined radios. WiMAX and LTE. and 4G standard will support up to 100Mb/s. Software defined radio is widely used in wireless interface technologies. 3G standards can support up to 2 Mb/s. Table 1.Coding plays a very important role in digital communication. From ref[1] and ref[2].4 Die Micrograph of DFE and Turbo Decoder (ref[1]. Turbo codes are used in a lot of standards like UMTS. It is always a great challenge to design and implement them at low costs with 3 .1 Physical Characteristics (ref[1]). and it can reach a high throughput.2 Motivation Figure 1.
Trade off between implement speed and error performance is another important issue in this part.4 Thesis Organization The thesis contains three parts: DFE for Part Ⅰand Turbo decoder for part Ⅱ. and loglikelihood rate. forward metrics. the implementation of DFE and Turbo decoder has been investigated by looking at several emerged wireless broadband standards. Chapter 7 contains implementation of turbo decoder in hardware model Part Ⅲ is the conclusion and future work for both Part I and Part II which is given in chapter8. Chapter 8 gives the conclusion and a discussion on future work. 4 . and compare different methods of DFE designs. 6 and 7 Chapter 4 is the back ground about channel code Chapter 5 explains 3GPP LTE turbo codes. The turbo decoder part will discuss 3GPP LTE turbo decoder. For the DFE part we will include the specification for different standards. And give some details about branch metrics.3 Goal The goal of our thesis project is to research the DFE and turbo decoder.11n filter. 1. conclusion and future work for Part Ⅲ. reverse state metric. Finally we implement the 802. Chapter 3 presents hardware implementation for 802. 5. Then a suitable algorithm for our hardware implementation will be selected. 1. Discuss the Maximum Aposteriori Probability (MAP) decoding algorithm. Maltab models will be built for hardware implementation and used in the coming simulations. which is including chapter 2 and chapter 3 Chapter 2 introduces Soft defined radio and DFE. Part Ⅰis about DFE.sufficient performance.11n standard filter component which is included in DFE hardware designing. and SISO decoder Chapter 6 shows algorithm level design.11n DDC Part Ⅱis turbo decoder. In this thesis. which is including chapter4. and discusses design method. and implement 802.
we need spectral mask for each DFE. Figure 2.Part Ⅰ 2 Digital Front–end (DFE) 2. Due to different sample rates and channelization. Component dB anntenna RF amplifer Mixer Band pass filter IF amplifier AM demodulator Audio amplifier ADC 3 dB 6 dB 3 dB 30 dB 10 dB 4 dB 6 dB 10 dB 5 . and receiver SNR needs 126 dB.1 Digital Receiver Structure with Supported SNR.1 SNR supports for each component. DFE function is mainly used to convert sample rate and channelization. To design DFE.1 Introduction DFE acts as a bridge between baseband and ADC.1 shows the structure of wireless system with RF and antenna. Table 2. transmitter SNR normally needs 120 dB. Figure 2. For the whole wireless system.
6 . it is a dynamic value in fixed range. And sometimes it needs a fractional sample rate converter.2 DDC Structure: DFE in the receiver chain is called as digital down converter (DDC). In the receiver part. The sample conversion is a decimation filter.2. Channelization is used to select channel of interest. The received analog signal can be converted to baseband digital signal according to different standards. Since the signal power in the input side of the filter cascade is dominated by the interference. DDC contains automatic gain control (AGC) and filter. DDC ADC Figure 2. and depends on its spectral mask. Automatic Gain Control: AGC is used to control the output signal at a desired power level.2 A DDC Structure AGC filter Filter: Filter processes channelization and sample rate conversion (SRC). But receiver filter requires higher spectral mask than transmitter. Generally the AGC is a classical feedback structure.
2 lists the total number of subcarriers and the number of used subcarriers for the various zone types for OFDMA.823 3 0.45 22. IEEE Std 802. it needs 23 dB more.2 Summary of subcarrier parameter for Different OFMDA zone Type (ref[3]) PUSC FUSC Optiomal FUSC 2048 1681 184.834 0 0.86 43.17 2 1024 852 512 427 128 107 2048 1729 160. In ref [3] channel bandwidth is defined as Bu = Fs ( N used / N FFT ) Eq.18 3 1024 841 512 421 128 85 2048 1703 173.2.39 10.851 6 Choose 512pt FFT as an example.1 Spectral Mask of WiMAX(802.8442 0.8315 0.3.162004.21 87.835 9 0. IEEE Std 802.91 46. 7 .10 80.right) Ratio of used Subcarrier s to total subcarriers 92. For error vector magnitude (EVM) which is require for correcting operation in baseband. table 2.162004/Cor1 2005(ref[6].8208 0.42 11.16e2005.2. and assume sample frequency is 92.9 0. Table 2.844 7 0.821 2 0.16 MHz.1 Stopband frequency can be defined as BW.664 0 0.831 1 0.3 DFE Specification for Different Standards 2.79 40.16e): WiMAX DDC: The whole system need for WiMAX SNR is 126 dB.845 7 0.Bu / 2 BW Bu is the bandwidth is the useful bandwidth is the baseband sampling rate is used subcarrier number is FFT size Fs N used N FFT Compile from IEEE Std 802.ref[7]).15 9 1024 865 512 433 128 109 Zone type Total subcarriers Used Subcarrier s Guard Subcarrier s (left.
16MH 4 MHz 1. Finally we can have passband and stopband.2695MH 0.5MHz 92.6320MH 0.2 MHz 4. the sample rate is 3. From the Xilinx WCDMA DFE reference design ref [6].02dB 77dB z z z 5.02 dB 77dB z z z 7.3680MH 2. and the Stopband Fstop = BW.0MHz bandwidth Bandwidths Input output Fpass Fstop Apass Astop sample sample rate rate 3.Bu /2.16MH 11.015dB 77dB z z z 2. Fpass= Fs(1+ α )/4 (because the bandwidth is symmetric) Fstop= BW – Fpass 3GPP LTE receiver：127 dB （3+6+3+30+10+4+6+10）+23dB = 78dB The passband Fpass= Fs(1+ α )/4. We use this method to calculate the Fpass and Fstop. In 3GPP LTE standard.1: Bu = Fs ( N used / N FFT ) The passband Fpass= Bu /2.2 Spectral Mask of 3GPP LTE: The passband and stopband calculating methods used in LTE and11n are different from WiMAX.0 MHz 3.015dB 77dB z z z 10.3789MH 3.16MH 5. The table 2.0MHz 92.84MHz.7305MH 5. 7.0MHz 92.22.0MHz. we also have this root raised cosine (RRC) filter roll off α =0.WiMAX DDC specification： 126dB（3+6+3+30+10+4+6+10）+23dB= 77dB According to the equation 2.8086MH 0.16MH 8. According to the 3GPP LTE standard we can know that: Fp = Fchip(1+ α )/2 α is the roll off parameter which specified in 3GPP.6914MH 1.6211MH 0.0MHz 92.0MHz. 10.3.3 lists the spectral mask for 3. 5. and Stopband Fstop = BW – Fpass 8 .6 MHz 2.5MHz.
9728 0.02dB z MHz MHz 5 MHz 92.6848 5.3152 0.16MH 3.015dB z MHz MHz MHz 15 MHz 92.The table 2.1712 1.015dB z MHz MHz MHz 20 MHz 92.6576 0.0272 7.72 9.04 7.3424 2.02dB z MHz MHz 10 MHz 92.5856 0.16MH 15.92 MHz 0.16MH 1.8288 0.015dB z MHz MHz MHz Astop 78dB 78dB 78dB 78dB 78dB 78dB 9 .4 lists the spectral mask for 3GPP LTE transmitter: Bandwidt Input output Fpass Fstop Apass hs sample sample rate rate 1.84 MHz 1.4MHz 92.02dB z MHz MHz 3 MHz 92.16MH 23.6304 0.68 MHz 2.36 4.8144 0.16MH 7.3696 10.16MH 30.
3 Transmit Spectral Masks for 20 MHz Channel (ref[7]) In the absence of other regulatory restrictions. the transmitted spectrum shall have a 0 dBr (dB relative to the maximum spectral density of the signal) bandwidth not exceeding 18 MHz. as shown in Figure 2. 28 dBr at40 MHz offset and –45 dBr at 60 MHz frequency offset and above (ref [7]). The transmitted spectral density of the transmitted signal shall fall within the spectral mask. as shown in Figure 2. Figure 2.11n: When transmitting in a 20 MHz channel. The transmitted spectral density of the transmitted signal shall fall within the spectral mask. –28 dBr at 20 MHz frequency offset and –45 dBr at 30 MHz frequency offset and above (ref [7]) . –20 dBr at 21 MHz frequency offset.3 Spectral Mask of 802. the transmitted spectrum shall have a 0 dBr bandwidth not exceeding 38 MHz.3. when transmitting in a 40 MHz channel.3 (Transmit spectral mask for 20 MHz transmission).4 (Transmit spectral mask for a 40 MHz channel). Figure 2.2. –20 dBr at 11 MHz frequency offset. 10 .4 Transmit Spectral Masks for A 40 MHz Channel (ref[7]) The transmit spectral mask for 20 MHz transmission in upper or lower 20 MHz channels of a 40 MHz is the same mask as that used for the 40 MHz channel.
6 shows the architecture of a CIC filter Z −1 Z −1 ↓R Z −1 − Z −1 − − Figure 2.2. The elements of CIC filter are just adder and register.7 −1 1− z Through changing delay M and R in the comb stage.6 Architecture for CIC Filter The transfer function of Nth order CIC can be described by 1 − z − RM H CIC ( z ) = Eq 2. Ref[3] ref[8] ref[9] design methods can be referred in our case. The basic idea is using Cascaded Integrator Comb (CIC) filter. no multiplier.1 CIC The design methods of ref[8].5 CIC Solution Structure Cascaded Integrator Comb (CIC) filter: CIC filter is very efficient to decimate by an integer factor. This part is very flexible so that there are lot choices to reach the same spectral mask. Normally it cannot be implemented in one filter. 2. we can adjust the decimation factor and reconfigure the CIC filter. The difference between ref[8] and ref[9] is whether the WDF is used or not.ref[9] is similar. Figure 2.4 Filter Architectures: Filter component is a very complex part. CIC filter structure just uses adders and delays elements. WDF CIC WDF Allpass compensation FIR channel FIR FRSC Figure 2. but cascaded a group of filters.4. So N 11 . with different ways to compensate the CIC droop.
8 Structure for One Order Lattice Adaptor 12 .7 to present. But the CIC filter always has passband droop problem.7 is the structure for adapter. α 0 ) = z − α0 Figure 2.7 Adapter for WDF Filter First order lattice adaptor is connecting the out2 and in2 with inserted delay element.8 H A1 ( z . The transfer function can be defined following: 1 − α0 z Eq2. WDF filter is built of N adaptors distributed over two branches.it can reduce the area and power consumptions. The figure 2. Figure 2.7 is the logic structure. It needs other filters to compensate CIC filter. It has two inputs and two outputs. It can achieve high stopband attenuation with fewer taps. Usually adapter can be use left of the figure 2. Wave Digital Filter (WDF) WDE is infinite impulse response (IIR) filter. and right of the figure 2.
This all pass filter can be implemented by a first order and a second order lattice adaptor. And transfer function can be defined as: H A 2 ( z. α 2 ) H A 2 ( z.The second order lattice adaptor structure is as the figure 2. H 0 ( z ) = H A1 ( z. It serially connects two adaptors. α 4 ) = 1 + (α 4 − 1)α 3 z − α 4 z 2 −α 4 + (α 4 − 1)α 3 z + z 2 Eq2. α 6 ) 1 HWDF 7 ( z ) = ( H 0 ( z ) + H1 ( z )) Eq 2.9 Second Order Lattice Adaptor Structure The transfer function of 7th order WDF sums of two branches built of first order and a second order lattice adapter for H 0 and two second adaptors for H1 . α 3 .9 Figure 2. The transfer function can be defined by H AP ( z ) = H1 ( z ) H 2 ( z ) 13 . α 3 . α 4 ) H1 ( z ) = H A2 ( z . α 5 .9. α 0 ) H A2 ( z . α1 .10 2 Allpass Filter: Allpass section is used for correcting group delay ripple which is introduced by WDF.
Also it can be used to reduce passband ripple by the CIC droop.10. halfband has advantage passband ripple. There is a group delay problem when using WDF and it need allpass filter to correct.4. Figure 2. This filter can be also used as the channel filter. Disadvantage: WDF filter is IIR filter.10 Halfband Solution Structure Halfband Filter: Halfband filter is a decimation filter in DDC. 14 . so that compensation FIR can use less taps when WDF is used. and stopband attenuation. Fractional Sample Rate Conversion： ： If the ADC or DAC is not just the integer times of the baseband.2 Halfband Ref[1] uses halfband filter instead of CIC filter. and it has pole as the eq2. the WDF and the FRSC in the passband. Advantage and Disadvantage of Using WDF: Advantage: WDF filter can support good stopband attenuation with fewer taps. 2. Usually. In this case fractional filter will be introduced. so there is not compensation filter any more. In this thesis. Fractional Sample Rate Conversion: It is the same as the first and second method.Finite Impulse Responses Filter (FIR): FIR filter is a pulse filter according to each communication standards specification. CIC + compensation without WDF will be used in the following discussion. It maybe cause unstable. duo to the ADC or DAC is maybe no just the integer times of the baseband. fractional sample rate conversion filter can be used.
11 GSM CIC and FIR Compensation FIR Structure And the spectrum mask for the GSM can be illustrated in the Figure2. so that one or two halfband filter can complete the decimation. usually if the ADC sample rate is more than 32 times of the baseband sample rate.333 MHz Output sample rate =270. The performance comparison is between CIC+ compensation FIR solutions with halfband solution.Finite Impulse Responses Filter (FIR): FIR filter is channel filter and the pulse filtering according to each communication standards specification.3.3 Performance Comparison: CIC solution we just consider CIC + compensation FIR here.4.1 GSM DDC Halfband solution is based on halfband filter. 2. GSM DDC: Input sample rate = 69. Sometimes. CIC solution can be used. 2.12 GSM specifications (From ref[10]) 15 .12 Figure 2.833 KHz Figure 2. the ADC sample rate is just 4 or 8 times of the baseband.4. Compared ref [1] with ref [9].
Fpass = 80kHz. we can use the following command: axis([0 . To see the CIC droop. Fstop = 293e3.For all the filter: Input word length =12 Input fractional length =11 Output word length =12 Coefficient word length =18 Table 2.01.0833e6.333e6. Fstop = 100e3.1 0. Fpass = 80e3.5 CIC and FIR compensation specification: Filter specification CIC filter decimation factor D = 64 differential delay D_delay=1 Fs_in = 69. Figure 2.13 CIC Filter Simulation Results. FIR compensate Fs = 1. Fstop = 293e3. FIR channel N = 62. 16 .8 0]).666kHz. Astop = 60. Apass = 0. Fs = 541. Aslope = 60.
1 MHz.8 0. We can use the following command: axis([0 . Zoom in. 17 . and see 0 to 0.Figure 2.8]).14 Zoom in CIC Filter Simulation Results Figure 2.1 0.15 CIC Filter with FIR Compensate Filter Simulation Results.
Figure 2. it needs 7 half band filters and one FIR filter as the channel filter. so that the halfband filter need decimate 128 times. The final channel filter can work as decimation filter. 1th HB 2th HB 3th HB 4th HB 5th HB 6th HB 7th HB Channel Figure 2. Figure 2. In total. The figure 2.6 CIC solution Complexity analysis: Filter Number of adders CIC 10 FIR compensation 20 FIR channel 62 Cascade three filter 92 Number of multipliers 0 21 63 84 We use halfband solution instead of the CIC. And one halfband filter can decimate 2 times. FIR compensation solution.17 Final Solution Simulation Results Table 2.18 illustrates the cascade structure filters.16 Zoom in CIC Filter with FIR Compensate Filter Simulation Results.18 Halfband Solution for WiMAX DDC 18 .
0067 MHz Astop = 100 dB 6th halfband Fs =2.5067 MHz Astop = 110 dB 4th halfband Fs = 8.3817 MHz Astop = 90 dB Channel FIR Fs = 541.5865 MHz Transition Width (TW) = 34.6667MHz Fpass =80kHz Fstop=Fs/2Fpass= 4.6665 MHz Fpass =80kHz Fstop=Fs/2Fpass= 17.3333MHz Fpass =80kHz Fstop=Fs/2Fpass= 8.1733 MHz Astop = 115 dB 3th halfband Fs = 17.333MHz Fpass =80kHz Fstop=Fs/2Fpass= 34.0867MHz Transition width = 2.9234 MHz Astop = 95 dB 7th halfband Fs = 1.666 kHz.0834 MHz Fpass = 80 kHz Fstop=Fs/2Fpass= 0.5867 MHz Transition width = 8.5065 MHz Astop = 140 dB 2th halfband Fs = 34. 19 .For all the filter: Input word length =12 Input fractional length =11 Output word length =12 Coefficient word length =18 Table 2.2533 MHz Transition width = 17.7 Halfband solution complexity analysis: Filter specification 1th halfband Fs =69.3334MHz Fpass =80kHz Fstop=Fs/2Fpass= 2. Fpass = 80e3.0034 MHz Transition width = 0.1734 MHz Astop = 105 dB 5th halfband Fs =4.1667MHz Fpass =80kHz Fstop=Fs/2Fpass= 1.2534 MHz Transition width = 4.4617 MHz Transition width = 0.
25 shows the simulation results Figure 2. In this case.06 d B The following Figure 2.8 Halfband solution complexity analysis:. Table 2.19 CIC Solution Simulation Results.Fstop = 100e3. Filter adder 1th halfband 8 2th halfband 6 3th halfband 6 4th halfband 6 5th halfband 6 6th halfband 6 7th halfband 8 channel FIR 76 totally 122 multiplexer 9 7 7 7 7 7 9 77 130 To reach the same GSM specification: CIC solution needs 92 adders and 84 multipliers Halfband solution needs 122 adders and 130 multipliers. CIC solution is a better solution than halfband solution. Astop =55 dB Apass = 0. CIC solution can save 30 adders and 46 multipliers. 20 . Compare with two solutions.
9282 MHz Astop=84 dB Fs = 22.2 WiMAX DDC WiMAX 10 MHz bandwidth Halfband design: Figure 2.3.20 Halfband Solution for WiMAX DDC Table 2.7359 MHz Fstop = 17.02 MHz Astop = 83 MHz Table 2.9 Specification for each filter Filter First Halfband Second Halfband FIR channel specification Fs =89.2641 MHz Apass = 0.4 MHz Fpass = 4.7359 MHz Fstop = 40.8 MHz Fpass = 4.7359 MHz Fstop = 5.6 MHz Fpass = 4.4.2.6641MHz Tw =12.3282 MHz Astop = 102 dB Fs = 44.10 halfband solution Complexity analysis: Filter adder First Halfband 8 Second Halfband 8 FIR Channel 169 Total 185 multiplexer 9 9 170 188 21 .0641 MHz Tw = 35.
Fs = 22.01.2641 MHz Apass = 0.8e6. Apass = 0.12 CIC solution Complexity analysis: filter adder CIC 10 FIR compensation 52 FIR channel 169 totally 231 multiplexer 1(normalize) 53 170 224 22 . Astop = 60.11 specification of CIC solution Filter CIC FIR compensation FIR specification Fs=89.02 MHz Astop = 83 MHz Table 2.7359e6.4 MHz Fpass = 4. Fstop = 8e6. Fpass = 4.6MHz D=2 Fs = 44.Figure 2. Aslope = 30.7359 MHz Fstop = 5.21 Halfband Solution for WiMAX Simulation Result: Table 2.
Compare this two solution; halfband solution use 185 adders and 188 multipliers CIC solution use 231 adders and 224 multipliers.
Figure 2.22 CIC Solution Simulation Results From examples, we can have a conclusion that there is no unified structure for different structures and different standards. CIC solution can be used to achieve a better performance if the decimation factor is very large and the passband is small. Otherwise halfband solution will be selected.
23
3 Hardware Implementation of 802.11n DDC
Hardware design: Finally, 802.11n 20 MHz channel for FPGA prototyping Specification for 802.11n: Clock frequency = 200MHz Fpass = 9MHZ Fstop = 11MHZ Apass = 0.2 dB Astop = 40dB Input = 40 MHz output 20 MHz For this case, just a channel FIR filter needed. Filter design: This filter can use filterbuilder tool in matlab. Simulation result:
Figure 3.1 802.11n DDC Simulation Results. To implement hardware design, it need fixed point input data, and coefficient. Because ADC is 12 bits, input data is 12 bits. To define coefficient length, the table 2.15 shows simulation results for the different coefficient length.
24
Table 2.15 filter parameter of different coefficient
coefficient length 12 parameter Fs=40 MHz Fpass= 9 MHz Fstop= 11MHz Apass=0.03751 dB Astop=39.7157 dB 14 Fs=40 MHz Fpass= 9 MHz Fstop= 11MHz Apass=0.02396 dB Astop=40.2114 dB 16 Fs=40 MHz Fpass= 9 MHz Fstop= 11MHz Apass=0.020177 dB Astop=40.2945 dB 18 Fs=40 MHz Fpass= 9 MHz Fstop= 11MHz Apass=0.019417 dB Astop=40.3144 dB
Finally, 18 bits coefficient can be used in this case. FPGA clock frequency (200 MHz) is integer of the input sample rate. 200MHz/40MHz =5. It means 5 clocks input one data. The output sample rate is 20 MHz, so 10 clock output one data. Usually a FIR filter transfer function can be defined as H ( z ) = h0 z 0 + h1 z −1 + h2 z −2 + K + hn −1 z − ( n −1) + hn z − n Eq.2.12 For a linear FIR filter transfer function, the coefficient is symmetric. Take a 10 taps linear FIR filter for example H ( z ) = h0 z 0 + h1 z −1 + h2 z −2 + h3 z −3 + h4 z −4 + h5 z −5 + h6 z −6 + h7 z −7 + h8 z −8 + h9 z −9 Eq2.13 The coefficient is symmetric: h0 = h9 , h1 = h8 , h2 = h7 , h3 = h6 , h4 = h5 The transfer function can be written as: H ( z ) = h0 ( z 0 + z −9 ) + h1 ( z −1 + z −8 ) + h2 ( z −2 + z −7 ) + h3 ( z −3 + z −6 ) + h4 ( z −4 + z −5 ) Eq2.14 This function is called preadder; with this way it can reduce half of the multipliers. In our design, filter length is 52 taps. After symmetry, the polyphase needs 26 taps. Actually, the input is 5 clocks, so that the MAC time need reduced to less than 5 clocks. For this reason, parallel MAC can be used.
Table 2.16 compare MAC and maximum cycle Number of MAC Maximum cycle need for each MAC 2 13 3 9 4 7 5 6 6 5 7 4 8 4 9 3 10 2
25
From the table 2.15, we can find 7 Macs can satisfy the requirement for less than 4.
Figure 3.2 Hardware Architecture The RAM works as module. The input data is written from top to bottom, if reach bottom, it will be written from top to down. 7 ARMs need read 52 data from ram. Each ARM can read two data, and give to preadder. Then MAC accumulates the value from preadder multiplying its coefficient. The DDC filter transfer function can be defined as:
H(z)= h1(x(1)+x(52))+h2(x(2)+x(51)+h3(x(3)+x(50))+ h4(x(4)+x(49))+ h5(x(5)+x(48))+h6(x(6)+x(47))+ h7(x(7)+x(46))+h8(x(8)+x(45))+h9(x(9)+x(44))+h10(x(10)+x(43))+h11(x(11)+x(42))+h12(x(2)+x( 41))+ h13(x(13)+x(40))+h14(x(14)+x(39))+ h15(x(15)+x(38))+h16(x(16)+x(37))+h17(x(17) +x(36))+ h18(x(18)+x(35))+h19(x(1)+x(34))+ h20(x(10)+x(33))+h21(x(21)+x(32))+h22(x(22) +x(31))+ h23(x(23)+x(30))+h24(x(24)+x(29))+ h25(x(25)+x(28))+ h26(x(26)+x(27)) Eq2.15
Appendix A shows the coefficient value and its corresponding ram value To make sure if the filter works, the first easy way is to give an impulse, and compare its output with the coefficient. It should be the same. If the output is the same as the coefficient, a sine wave can be use in this case. Duo to lowpass filter, the output should the same as the input if frequency of sine wave is in the bandwidth. Then it can realize FPGA prototyping. Because this filter is implemented by Xilinx system generator, a good prototyping method JTAG hardware cosimulation can be introduced.
26
4 Generated Hardware Model 27 . Figure 3. if the design needs frequency spectrum analysis. The system generator will generate the HDL code and invoke the ISE Foundation software to generate the bitstream file automatically. It does not need frequency sweeper. a software DDC filter is designed for the comparison. Hardware cosimulation can be divided to 3 steps: Step 1: To know if the hardware works. specially. Step 2: generate hardware model. Figure 3. DAFIR component is the same function with the Hardware DDC filter.This method can be use to test FPGA using Matlab.3 Software Model for Filter The component DAFIR v9_0 can get the DDC filter coefficient from Matlab workspace.
Perform JTAG Cosimulation.5.Step 3: Connect Hardware and software model. Figure 3.5 Software and Hardware Model Figure 3.6 Software Simulation 28 . Add the hardware model and connect it according the Figure 3.
29 .7 Hardware Simulation Comparing two simulation results.Figure 3. we can see that the software and hardware implementations have the same spectral mask.
4.m) k is the input bits n is the output bits m is the number of memory registers The inputs of the memory registers are the information bits.k) block encoder transforms a message of k bits into a message of n bits which is called codeword. Codeword depends only on the current input message.k. In an (n. A linear systematic block size code. there are 2k distinct message and also codeword. and the output is code block. such as Hamming code. Normally the other codes are derived from these two main categories.Part Ⅱ 4 Channel Codes There are two main categories of channel coding techniques which are linear block codes and convolution codes. satellite communication etc. At the time l. It is best performance before turbo codes and LDPC code.1 shows a rate 1/2 nonsystematic and non recursive convolutional encoder. 4. is often used in linear block codes. and this is the important feature of a block code. k) block code.1 Linear Block Codes An (n. Figure 4. and can be defined as the reduction in SNR over an uncoded system to achieve the same Bit Error Rate (BER) and Frame Error Rate (FER). it is possible to compute the corrected errors in each block. Convolutional codes can be defined as: (n. mobile phone. The important feature of the linear block codes is that the message itself is part of the codeword. Convolutional codes are first introduced by Elias ref[11]. The memory registers work as shift registers. For a block codes. Coding gain is used to measure the strength of an errorcontrol code. LDPC is also a category of the linear block codes. Amount of the redundancy can be determined by the code rate R =k/n. Output encoded bits are obtained from modulo2 addition of input information bits and the value of the memory register.2 Convolutional Codes Convolutional codes are widely used in digital radio. 30 . the input to encoder is cl . and deepened by Forney ref[12]. BCH code. which are including serial concatenated codes and turbo codes.
vl = (vl(1) vl(2) ) v (1) C D D v (2) Figure 4.1 A Rate 1/2 Convolutional Encoder The connections can be described by generator polynomials g1 ( D ) = 1 + D 2 g2 ( D) = 1 + D + D 2 In other words: g ( D ) = [ g1 ( D) g 2 ( D )] = [1 + D 2 1 + D + D 2 ] v (2) 31 .
5. and provide near optimal performance approaching the Shannon limit ref[16].1 Turbo Encoder: A turbo encoder structure consists of two RSC encoders which are Encoder1 and Encoder2.1 1/3 Turbo Encoder Schematic 32 . This structure is called parallel concatenated. are first proposed by Forney ref[15] as a method to get a good tradeoff between gain and complexity. A 1/3 turbo encoder diagram is shown in figure 5. S0 P0 input Encoder1 Interleaver Encoder2 P2 Figure 5. Turbo codes can be considered as a refinement of the concatenated encoding structure adds iterative algorithms for decoding. The second output is the first parity bit P0. Encoder2 received interleaved input and generate the second parity bit P1. and also increase the minimum distance of turbo code ref[17].1. Turbo decodes works as connecting two convolution codes and separating them by an interleaver. The main difference between turbo codes and serial concatenated codes is that two identical Recursive Systematic Convolutional (RSC) codes are connected in parallel in turbo codes. The two encoders can operate on the same time. Gavieux and Thitimajshima. Turbo codes has been first introduced in 1993 By Berrou. Encoder1 and Encoder2 are separated by a random interleaver. The main purpose of interleave is to randomize burst error pattern so that it can be corrected by decoding. The N bit data block is first encoded by Encoder1. The same data block is interleaved and encoded by Encoder2. The first output S0 is equal to the input since the encoder is systematic.5 Turbo Codes Concatenated coding are known as Parallel Concatenated Convolution Codes (PCCC) ref[13] and serial concatenated Convolutional codes ref[14] .
K is the code block size from 40 to 6144 bits. d k(2) ′ ( d k(0) = x k . K − 1 (ref[18]).2 g1(D) = 1 + D + D3. Otherwise when the first constituent encoder is disabled. zK . xK . the last three tail bits can be used to terminate the second constituent encoder. Tail bits are padded after the encoding of information bits.2. 1 g2 ( D) Where g0(D) = 1 + D2 + D3.5. z K +1 . and this is called trellis termination.1 After all the information bits are encoded.. x K +1 . xK + 2 . zK . Eq 5.1. z K + 2 . z K + 2 . the first three tail bits can be used to terminate the first constituent encoder.. x K + 2 . d k(1) = z k .3 The initial value of the shift registers of the 8state constituent encoders shall be all zeros when starting to encode the input bits. we take the tail bits from the shit register feedback.. z K +1 . Upper switch in the figure 5.2 shows in low position.2 shows in low position.2 3GPP LTE Turbo Encoder 3GPP LTE encoder includes two 8state constituent encoders and one turbo code internal interleaver. Eq 5. The coding rate of turbo encoder is 1/3. d k(2) = z k ) for k = 0. d k(1) . The transfer function of the 8state constituent code for turbo encoder is: g ( D) G ( D ) = 1. Eq 5. 33 . Lower switch in the figure5. The output from the turbo encoder is d k(0) . When the second constituent encoder is disabled. xK +1 .. The final trellis termination of output bits should be: ′ ′ ′ ′ ′ ′ xK .
so that only one slice of the trellis is needed to define the entire trellis. We can use trellis diagram to present the encoder behaviour. At the beginning three shift registers are all zeros. After the fourth slice of the trellis. The state goes to the next state and gets corresponding parity bit. For example. When each new input is coming. one parity bit will be generated. the trellis diagram repeats them. These parity bits depend not only the on the present input bit. k=2 state=100. 34 .Figure 5. Depends on the input is 0 or 1. but also the three previous input bits.2 The Turbo Encoder of 3GPP LTE (ref[18]) The 8 states constituent encoder contains 3 registers. The initial state starts with state=000. the transition line is labelled with input and output value. The input bits of the constituent encoder are given to the left register. k is the number of the input data. In the trellis diagram. which store in the shift registers. the upper transition line labelled 0/1 stands for input =0 and output =1.
35 .3 Trellis Diagram of One Constituent Encoder.state = 000 state = 001 state = 010 state = 011 state = 100 state = 101 state = 110 state = 111 0 1 2 3 4 5 6 7 k= 1 0/ 0 0 1 2 3 4 5 6 7 k= 2 0/ 0 0 1 2 0/ 0 1/ 1 0/ 0 0 1/ 1 1/ 1 0 1 2 3 4 5 6 7 k= 5 0/ 0 0 1 2 3 4 5 6 7 k= 6 1/ 1 1/ 1 1/ 0 1 2 0/ 0 1 / 0 0/ 1 0/ 1 0/ 1 3 0/ 1 3 1/ 0 0/ 1 1/ 0 0/ 1 1/ 0 4 1/ 0 4 5 6 5 1/ 1 1/ 0 0/ 1 1/ 1 0/ 0 0/ 0 1/ 1 0/ 0 6 7 k= 3 7 k= 4 Figure 5.
The SISO algorithm of the two decoders will operate on soft input. In figure5. Deinterleaver λas1 λis1 λi p1 λes1 SISO Decoder1 interleaver λa s 2 SISO Decoder2 λe s 2 interleaver λi s 2 Λ(dk ) Deinterleaver λi p 2 Demapper 1 ˆ d 0 Figure 5. SISO decoder1 calculate the extrinsic systematic information ( λ e s1 ) and the softoutput information ( Λ (d k1 ) ) for each systematic bit received. systematic intrinsic information ( λ i s1 ). parity intrinsic information ( λ p p1 ). The extrinsic systematic information ( λ e s1 ) from SISO decoder1 is interleaved and fed to the SISO decoder2 as apriori information λ a s1 .5.4 two decoder blocks correspond to the two constituent decoders. The received signals are interfered by the channel noise.4 Turbo Decoder 36 . The same as SISO decoder1. The SISO decoder1 makes an estimate of the probability for each data bit. The SISO decoder is used to correct errors and retrieve the original message. SISO decoder2 calculate the extrinsic systematic information ( λ e s 2 ) and the softoutput information ( Λ ( d k 2 ) ) for each systematic bit received.3 SISO Decoder In this section the iterative decoding of turbo decoder will be described and the structure of SISO decoder is shown in figure 5.4. which is the demodulator outputs and the probability estimates. and also interleaved systematic intrinsic information of SISO decoder1 ( λ i s1 ) as the systematic intrinsic information of SISO decoder2 ( λi s 2 ). This schedule is called iteration and will be continued until some stopping condition is met. The three inputs of the SISO decoder1 are systematic apriori information ( λ a s1 ).
The MAP algorithm checks very possible path through the convolutional decoder trellis. Turbo decoder calculates an accurate aposterioriprobability for the received block. 6. so that it seems too complex for application in the most systems. The SOVA decoding algorithm is based on ML probabilities. They are MAP decoding and SOVA decoding. It is not widely used before the discovery of turbo codes.5dB or more (ref[19]. The MAP algorithm can output perform SOVA decoding by 0. Finally it will make a hard decision by guessing the largest APP for parity bits after all iterations finished. so we choose MAP algorithm in our thesis. The MAP algorithm ref[22]is rather complex according to large number of multiplications.1 MAP Decoding Algorithm The BahlCockeJelinekRaviv (BCJR) ref[17] is optimal for estimating the a posteriori probabilities abilities of the states and transitions of a Markov source observed through a discrete memoryless channel. LogMAP. And the soft output can be used in the next iteration. The MAP algorithm provides not only the estimated bit sequence. . and get more accurate values. Both of the algorithms use iterative technique to achieve decoding performance. They show how the algorithm could be used for both block codes and convolutional codes. but also the probabilities for each bit which is has been decoded correctly.6 Algorithm Level Design for Turbo Decoder There are two main algorithms in the component of the SISO decoders. 37 .ref[20]). Robertson proposed a simplified MAP algorithm. P. In the LogMAP algorithm all the calculation performs in log domain. The MAP decoding algorithm is based on a posteriori probabilities abilities (APP) probabilities.
m ) = In∑ eα 1 j =0 b ( j . ⋅⋅⋅.m = j • (λ aks + λ iks ) + p (m. ⋅⋅⋅. γ branch value and extrinsic information. m ) k +1 m 0.1 ( .m β kf+(1j . β value. but it can be corrected by the correction function.m ) . α value. But it is can be solved by Jacobean algorithm [20] In(e x1 + ⋅⋅⋅ + e xn ) = max( x1 .m) γ kj . b(j.m) is the next state branch metric if the state is m and received bit is j at the trellis step k Systematic intrinsic information of trellis step k Systematic apriori information of trellis step k parity intrinsic information of trellis step k parity bit if the current state is m and systematic bit is j The above equations are high complexity including log and multiplications.m f ( j .m ) 1 β f (0.m) f(j. xn ) + f ( x1.m λ iks λ aks λikp p(m.2 Log MAP The following equations show MAP algorithm which is used to calculate the soft output.j) if input is j and next state is m. j ) • λ ikp Eq 6. α km = ∑ α kb−1j .m k +γ k + β k +1 Eq 6.2 β km = ∑ γ kj . ⋅⋅⋅.m ) m 0.m) is the current state.6 f ( x1. xn ) Eq 6. 38 .5 λ e s (d k ) extrinsic softoutput information of d k α km forward recursion state metric of state m in the trellis step k m βk backward recursion state metric of state m in the trellis step k b(j.m )γ kj−b ( j . if input is j and current state is m .m ) k +γ k + β k +1 ∑ eα m f ( 0.m ) k + β k +1 Eq 6. m ) = In∑ eγ j =0 j =0 1 j . +γ kj−b ( j .3 γ kj . Λ(d k ∑α )= ∑α m m 1 j =0 1 m 1. This correction function can be implemented by lookup table (LUT). m k k γ β kf+(1.f(j.m ) k −1 1 Eq 6.m f (1. m k k γ = In 1 ∑ eα m m 1. xn ) is the correction function. The maximizations values have some errors.4 λ ek = Λ (d k ) − (λ aks + λ iks ) Eq 6.6.
It offers worse BER performance compared with logMAP. In(e x1 + ⋅⋅⋅ + e xn ) ≈ max( x1 . low complexity architecture takes some disadvantage. To see how good a match between the pair of receptions and the codebit meaning of the trellis transition. Usually. If choose MAXlogMAP. xn ) + + + + + + + LUT MaxlogMAP Figure 6. branch metrics can give a function in this case for each trellis transition.j).1 Two Hardware Architecture logMAP The MaxlogMAP algorithm includes the same parameter. 6. logMAP need more than one adder and lookuptable (LUT).3. but less complexity structure. The figure 6. 39 .1 is the basic element addcompareselect (ACS) hardware architecture. ⋅⋅⋅. j ) • λ ikp Eq 6. it can reduce more than 1/3 hardware.1 Gama Start with the equation: γ kj .6. λikp is the corresponding noisy received parity bit with code parity bit p(m. Ref[23] shows a loss due to quantization of the correction function is not visible.7 λ aks is the noisy received systematic bit and with code systematic bit j.m = j • (λ aks + λ iks ) + p (m. Compared with two architectures.3 MaxlogMAP The MaxlogMAP is least complex than logMAP algorithm.
m needs sixteen value the same as the table 21.5 1.7 γk λ aks + λ iks + λ ikp From the table we can find actually.m )γ kj−b ( j .6 1.1 γk λ aks + λ iks + λ ikp λikp λ aks + λ iks γ k0.3 γk λikp λ aks + λ iks γ k0.3 1. λ aks + λ iks + λ ikp in gama memory. 40 .5 γk λikp λ aks + λ iks 0 γ k0.m just has four kinds of values: 0.8 γ kj . γ kj . λikp . and it means gama unity just need three adders. and λ aks + λ iks + λ ikp .7 1. beta. j ) • λ ikp 0 λ aks + λ iks + λ ikp 0 γ k0. to calculate alpha ( . λikp is the same as the input value λikp .4 γk λikp λ aks + λ iks γ k0.0 1. λ aks + λ iks λikp . But it just needs calculate two values λ aks + λ iks .6 γk λ aks + λ iks + λ ikp 0 γ k0. and LLR.0 γk γ kj . λ aks + λ iks .2 1.m γ k0.1 1.1 calculation of gama γ kj . For example. α km = ∑ α kb−1j .Table 6. It just need save three values: λ aks + λ iks . which is alpha.m ) 1 j =0 1 Eq 6. Gama will be used in three values calculation.4 1.m = j • (λ aks + λiks ) + p (m.2 γk γ k0.
j and p(j.m)=1 So Gama value can be defined as gama00. m ) = In∑ eγ j =0 j =0 1 1 j .m)=1) But we just need to save gama01.p(j. gama10. β km = ∑ γ kj .m)=0) gama11= λ a + λ i + λ i (j=1. 41 .p(j.p(j.m f ( j .m) just have four case j=0. but beta values are calculated backwards through the received soft input data.m ) k + β k +1 Eq 6.m)=1 j=1. p(j. gama00 =0 gama01= λikp gama10= λ aks + λ iks s k s k p k (j=0. gama01.m)=0) (j=0. gama11.m)=0 j=0. p(j.9 Beta calculation is similar with alpha. gama10.m β kf+(1j .m)=0 j=1. p(j. gama11. p(j.m)=1) (j=1. p(j.
2 Calculation of Alpha 42 . ⋅⋅⋅.6. α km = Max(α kb−1j .b(j.m ) + γ kj−b ( j . and extrinsic softoutput which is come from previous iteration.m ) 1 j =0 1 In the log domain. α km = ∑ α kb−1j .m) is the state going backwards in tine from state m. α km = ∑ α kb−1j .2 Alpha α km can be expressed as the summation of all possible transition probabilities from the time k1.10 According to the MaxlogMAP algorithm: In(e x1 + ⋅⋅⋅ + e xn ) ≈ max( x1 .m ) = In∑ eα 1 j =0 j =0 1 1 b ( j . xn ) We can have the alpha as: ( .m )γ kj−b ( j .3.m )γ kj−b ( j . parity. it can be expressed as: ( . m ) ) Eq 6. ( .m ) k −1 1 Eq 6.m ) . via the previous branch corresponding to the input j. +γ kj−b ( j . α k0−1 1 α k −1 000 001 010 011 100 101 110 111 1/1 0/0 000 001 010 011 100 101 110 111 α k0 1 αk α k2−1 α k3−1 α k4−1 α k5−1 α k6−1 α k7−1 α k2 α k3 α k4 α k5 α k6 α k7 Figure 6.11 1 j =0 1 Gama depends on the systematic. This has been described is in the gama unity.
6 Max(α k7−1 + γ k0. (1.0 Max(α k −1 + γ k0. α k6−1 + γ k −1 ) − 1. α k5−1 + γ k −1 ) −1 1.1 Max(α k0−1 + γ k0.−b (0.7 . α k −1 + γ k −1 ) −1 3 1. The value α k4 at instance k is 1 calculated by taking the largest value from α k0−1 and α k −1 . α k6−1 + γ k −1 ) −1 α k2 α k3 α k4 α k5 1 1. And each alpha value at instance k1 should add the corresponding gama value.m ) .2 is an example of how to calculate alpha. α k2−1 + γ k −1 ) − 1.Figure 6.51 .5 Max(α k4−1 + γ k0.0 .6 Max(α k −1 + γ k0.2 the following table is the detail which is used to calculate the α km α km α k0 1 αk b (0.11 . Table 6. α kb−1 m ) + γ k −1(1.7 Max(α k6−1 + γ k0. α k0−1 + γ k −1 ) − 1.31 .3 Max(α k2−1 + γ k0.4 . α k3−1 + γ k −1 ) −1 α k6 α k7 5 1.2 Max(α k −1 + γ k0.m ) ) 1 1 1.6 .b Max(α k −1 m ) + γ k0. α k7−1 + γ k −1 ) −1 43 .2 . 1.
44 .m ) . f (0.m ) = Max ( β kf (0. m ) ) j =0 1 β k0 β k1 β k2 β k3 β k4 β k5 β k6 β k7 000 001 010 011 100 1/0 101 110 111 0/1 000 001 010 011 100 101 110 111 β k0+1 1 β k +1 β k2+1 β k3+1 β k4+1 β k5+1 β k6+1 β k7+1 Figure 6. β km can be derived as: 1.3.m) is the next state.m ) + γ k0. β kf (1. and can be written as: β km = ∑ γ kj .m β kf+(1j .3 Beta Beta is reverse state metric.m ) j =0 1 f(j. but calculated in reversed order. and represents β km as the summation of all possible transition probabilities from time k+1.3 Calculation of Beta Beta calculation is similar with alpha. if input is j and current state is m Use MaxlogMAP algorithm. β km = ∑ γ kj . m ) + γ k f (1.6.m β kf+(1j .
5 .7 Max( β k3 + γ k0. Max( β kf (0. β k + γ k ) 1 1.3 . which is the detail to calculate the beta β km+1 β k0+1 β k1+1 β k2+1 β k3+1 1. β k3 + γ k ) 1.5 Max( β k + γ k0.4 .4 Max( β k0 + γ k0.6 .1 Max( β k5 + γ k0.m ) + γ k0. β k6 + γ k ) 1.3 Max( β k7 + γ k0. f (0.6 Max( β k2 + γ k0.1 .2 Max( β k6 + γ k0.m ) + γ k f (1.0 Max( β k4 + γ k0. β k0 + γ k ) 1 1.3 the following is the table. β k7 + γ k ) 45 .2 . β k5 + γ k ) β k4+1 β k5+1 1.7 . β k2 + γ k ) β k6+1 β k7+1 1.m ) .Table 6.0 .m ) ) 1. β kf (1. β k4 + γ k ) 1.
and β value can be obtained from γ . In ASIC design.m + β kf+(0. it will be easy to overflow. m =0 m=0 46 . the decoded information bits can be retrieved by looking at the sign bit of LLR.4 Loglikelihood Ratio Loglikelihood ratio (LLR) is used to find the output of the soft decision.3.6. If use LLR as apriori information. the loglikelihood ratio can be iterated several times to improve the reliability of hard decision. m k k γ β kf+(1. Λ(d k ∑α )= ∑α m m m 1. α value. The sign bit of the LLR is use to make a hard decision. and the magnitude can give a reliability estimate.m ) 1 m 0. β unit. extrinsic softoutput information is fixed point data. m ) ) Eq 6.15 1 1 7 7 The γ value. and subtraction.m ) k + γ k + β k +1 ∑ eα m f ( 0.m ) 1 β kf+(0. addition. If it is positive the bit is one.m f (1. The main operation of LLR is comparison. α . and if it is negative the bit is a zero. Extrinsic softoutput information When using turbo decodes. m ) ) − Max(α km + γ k0. m k k γ = In ∑ eα m m 1.m k + γ k + β k +1 1. Extrinsic softoutput information is the value fed back to the next decoder as the apriori information after interleaving. = Max(α km + γ k m + β kf+(1.m ) m 0. Extrinsic softoutput information is obtained from the loglikelihood ratio by subtracting the systematic information and the apriori information. After all iterations complete.
In forward calculation for each α metric is computed. so that the memory area is rather large.6. after all α values have been calculated. and for each Λ &β is computed from the end of the block to the beginning.4 is serial MAP (SMAP). and the latency time is the longest.4. the block size range is from 40 to 6144.4 Serial Window Schemes Figure 6.4 Windowing According to 3GPP LTE standard. And it also needs to save all the gama and alpha value. The latency time for this case is: NW + N s + N t NW window size Ns Nt sliding window size tail bits number 6. The serial sequence of computation for MAP decoding: Window size is block size.1 Serial Window Figure 6.4 shows the schemes. The figure 6. 47 . the backward and LLR start.
4. Alpha calculation are separated every parallelism windows. the sliding window has been introduced in this case. Forward calculation is not the same as the SMAP.3 Super Window To reach a high through turbo decoding and super window can be used in this case.5 Sliding Window Schemes 6. Figure 6.6. Figure 6.6 Super Window Schemes 48 .4. The basic idea is the sequence of forward calculations is the same as SMAP. but backward β calculation are separated every sliding window length v.2 Sliding Window In order to reduce the computational time and area.
9634 0. and parallel window is 8.4 throughput and latency time for three kinds of windows Kinds of window Throughput (Mbps) Latency time ns Serial window 149.0410 Sliding Window 266. and sliding window length is 40.0231 Super window 2272.0027 49 . Nb 1 Eq 6. clock frequency is 300 MHz.The following is a comparison for throughput of different kinds of windows.16 + N sl + N t • Ns Fclock Nb And throughput can use the equations: • Fclock Eq 6.17 Nb + N sl + N t Ns The latency can be use the equation: Nb Ns Nt block size SISO number tail bits number sliding window size clock frequency N sl Fclock Table 6.7 0.5510 0. In our thesis.
There are two wellknow technique to solve the initial value problem: Training calculation: producing initial value by training calculation as figure 6. and also cost a long latency time. If the block size is 6144. and using the serial window. When using super windowing.7 Next Iteration Initialization: Using previous iteration value for the corresponding state metric Training calculation: Usually. Using super windowing a memory size can be: 40*8*8*4= 10240 bits. Forward metric initial problem is just caused by parallelism. the backward recursion start calculates after forward recursion complete. start a few stages ahead before metric calculation. forward metric is divided into several windows. It can be 10 times the constraint length for realistic channel. The introduced sliding window can decrease the memory size and latency time. and also the latency and power consumption. a memory size should be: 6144*8*8=393216 bits.5 Sliding Window Implementing serial MAP window requires a very large memory to store the state metrics. the others just can give approximate value for each window. and backward metric. Backward metrics initial problem is caused not only parallelism but also sliding window. The super windowing brings some initial problems for forward metric. This method increases the computation.6. In the serial MAP window. Except the first window. 50 .
7 Sliding Window with Training Calculations Next Iteration Initialisation: Turbo decoder works as iteration and next iteration is more accurate than the previous iteration. So the previous iteration can be considered as the initial value for the next iteration. But the previous state metrics in the previous iterations are close to next state metrics in the next iteration at the same trellis. which can be used in the next iteration as the initial value.9 Next Iteration Initialization Sliding Window 51 .9. Figure 6. This needs extra memory to store those state metrics. The time scheme is showed in the figure 6.Figure 6.
One SISO unit can be used instead of two SISO decoders.4. Actually.1 Turbo Decoder Block Diagram 52 . To implement the figure5. for the first SISO decoder. The introduction for turbo decoder contains two SISO decoders for two half iterations. It always waits until another computation complete. The input buffer stores the received bits from the channel. the two SISO do not work at the same time. and the two SISO decoders are same function and structure. Both data memories are used to store the extrinsic value from SISO unit. For the second SISO decoder.1.7 Turbo Decoder Hardware Implementation and Simulation Results The Hardware architecture for turbo decoder is showed on figure7. SISO unit reads from Data memory 2 and writes to memory 1. SISO unit reads from Data memory 1 and writes to memory 2. Figure 7. including systematic information bits and parity information bits.
Data memory actually contains four small memory. For example: The maximum block size in LTE 3GPP is 6144. The size of Data Memory 1 and Data memory 2 do not change.2 Parallel Window Block Diagram 53 .1 Parallel Window To achieve high throughput. The block divided into several windows and each window alpha is executed serially. SISO unit need some memory to save the initializing alpha and beta value for beginning and end of window. and window number is four.7. and small memory size is 6144/4=2048. Figure 7. parallel window is used in turbo decoder.Parallel memory needs parallel interleaver. Due to Parallel window and sliding window. but change to several smaller memories.
LLR unit and extrinsic soft output unit will discuss and give details later.3 SISO Architecture 54 . γ unit . Input and output need working at the same time.7. α unit .3. Figure 7.2 SISO Architecture: All the components of the architecture are used in the figure 7. β unit. γ and α stack work as first in and last out. so it means two memories needed it works as pingpong buffer.
4 Time Schedule for SISO Unit 55 .7. Sliding window size =64 Figure 7.3 Time Schedule for SISO Unit: The following is an example for SISO decoder: Block size = 2016 SISO number =8.
gama01. γ k −1 α km = Max(α kb−1j .m ) + γ kj−b ( j . ( . Gama00 is always zeros. and gama01 is the same value as λik . m ) ) 1 j =0 1 β km = Max( β kf ( j . the basic operation is: addcompareselect (ACS) unit. gama10. p gama11.4.7. so that gama unit just needs two adders.4 State Metric Architecture: 7.5 Branch Metrics Unit (BMU) 7. It just needs adders to calculate the gama00.1 Gama: Gama unit is very simple. 56 . f ( j . λikp λiks λaks gama01 + + gama10 λikp gama11 Figure 7.6 ACS Unit Structure Alpha and beta unit is based on the ACS.4. but just different connection according to the trellis.m ) + γ kj .2 Alpha and Beta: From the alpha and beta equation.m ) ) j =0 1 + + + Figure 7.
Figure 7. but calculate in reversed order.8 β Unit 57 . Figure 7.7 α Unit Beta unit is the same structure.
m ) + γ k0. m + β kf+(0. There is no feedback computation. λ aks + λ iks is the SISO input and normal order.m ) + γ k f (1.m ) .m ) + γ k0. + γ k0. β kf (0.m + β kf+(0. m ) and β kf (1. f (0.m ) and α km + γ k m + β kf+(1.4 1 1 m =0 m=0 7 7 When calculate β m k value.1 ) − Max(α km + β km .m ) are part of the 1.1 = β kf (1. two adders just need. Λ (k) = Max(α km + γ k m + β kf+(1.3 Loglikelihood Ratio (LLR) 1. Extrinsic information: λ ek = Λ(d k ) − (λ aks + λ iks ) Eq 7. m ) So that loglikelihood rate can be expressed as Λ (k) = Max(α km + β km . To get reversed input. Loglikelihood rate architecture: Figure 7. We can define: β km . At the first level there is sixteen adders.4. m ) ) Eq 7. f (0.m ) + γ k f (1.m ) 1 1 1. f (0.m ) ) Eq 7.m ) + γ k f (1.6 From the equation.m ) have already been got.0 = β kf (0. m ) and β kf (1.m ) + γ k0. 1.m ) + γ k f (1. m ) k 1. m ) and β km .5 m=0 m =0 7 7 With this method we can save 1/3 adders. But extrinsic information soft output is calculated in reserved order.9 LLR Unit. α km + γ k0. it 58 . f (0.m ) ) − Max(α km + γ k0. β kf (1.0 ) Eq 7. β km = Max( β kf (0.7. the pipeline is introduced in the structure to reduce area.3 1. β f (0. and latency time is five.
so that λ aks + λ iks can be got from beta unit. beta calculation is the same need reversed order gama value. and gama10= λ aks + λ iks . 59 .needs extra memory to save this value. To solve this problem.
212. 64QAM modulation and 5/6 coding rate is used.12. The simulator is compatible to 3GPP specification defined in 36. 5000 subframes have been simulated. In the simulation.5 Simulation Results: In order to benchmark the parallel Turbo decoder. The 3GPP SCME model [24] is used as the channel model.7. 7.10.11 and 7. No closeloop precoding is assumed. Simulation results are depicted in Figure 7.10 Code/Uncoded Bit Error Rate for Parallel /Ideal Turbo Simulation 60 .211 and 36. Figure 7. The Turbo decoder runs for up to eight iterations with earlystopping enabled. simulation is carried out based on a 3GPP LTE linklevel simulator. At most three retransmissions are allowed in ChaseCombine (CC) based HARQ. Softoutput sphere decoder which achieves ML performance is used as the detection method.
12 Code/Uncoded Bit Throughput for Parallel /Ideal Turbo Simulation 61 .Figure 7.11 Code/Uncoded Frame Error Rate for Parallel /Ideal Turbo Simulation Figure 7.
and the structure is similar with the 3GPP LTE Radix4 turbo decoder. The DFE and turbo decoder are the two key components in the wireless digital systems. which depends on the design of the baseband part. As future work. 3GPP LTE turbo decoder is designed from algorithm level to the hardware level. In order to support other standard such as WiMAX. a 3GPP LTE Radix2 Turbo decoder has been implemented.2 Future Work For the DFE part. Compare filter architecture for DDC. Through the prestudy and design carried out in this thesis.1 Conclusion In this thesis. specially the topdown design flow.11n DDC filter is designed and implemented. more functionality such as automatic gain control needs to be implemented. Finally 802. For the turbo codes part. 8. Furthermore.11n. a parameterized design needs to be implemented and integrated. Hardware and software cosimulation is a good test method and has been introduced in the DFE part. WiMAX. Usually the output of DFE needs to be saved in a FIFO memory. it proposes the general DFE architecture and the DDC specification for 3GPP LTE. There are two main differences between WiMAX duobinary turbo decoder and radix 4 turbo decoder: 62 . The design tradeoff needs to be addressed in the future. It only supports 3GPP LTE. For the Turbo decoder part. WiMAX decoder is duobinary turbo decoder. and 802. the hardware implement in this thesis is only for 802. To implement a flexible DFE targeting multiple standards such as 3GPP LTE and WiMAX. a Turbo decoder with eight parallel Radix2 SISO units has been implemented. only preliminary work has been done in this thesis. a Radix4 implementation is needed which can be done as future work. Finally.Part Ⅲ 8 Conclusion and Future Work 8. the author had chance to learn both the wireless system design methodology and hardware design skills.11n with investigation.
It just needs change the connection that means just need to add some multiplexers. the encoder has different polynomial generator. 63 .1. Due to different standard. 2. WiMAX duobinary turbo decoder trellis termination is closed as a circle with the start state equal to the end state. so that the trellis is different. but 3GPP LTE turbo decoder trellis termination is tail bits with state 0 and end with state 0.
Benkeser. A.00. C. Q.html [10] http://www.” [8] G.1) April 3. Martelli. "A 58mW 1.K. IEEE Standard for Local and Metropolitan Area Networks: Part 16: Air Interface for Fixed Broadband Wireless Access Systems. [6] Xilinx High Density WCDMA Digital Front End Reference Design (Xilinx Confidential).mathworks. XAPP921c (v2. 2007.SolidState Circuits Conference. 2007 [7] IEEE P802. 2008 [3] Xilinx WiMAX DFE reference design. Digest of Technical Papers. XAPP966C (v1. IRE Trans. and R. IEEE International SolidState Circuits Conference (ISSCC). “ The design of a multimode/multisystem capable software radio receiver.Reference [1] “A 50mW HSDPA Baseband Receiver ASIC with Multimode Digital FrontEnd”. Benkeser.L maurer.com/products/filterdesign/demos. [5] IEEE Std 802. T.16e2005 and IEEE Std 802. pp.11n™/D2. R.2mm² HSDPA Turbo Decoder ASIC in 0. IEEE Standard for Local and metropolitan area networks: Part 16: Air Interface for Fixed Mobile Broadband Wireless Access Systems. Burg. Chabrak. IEEE International [2] C. Error –free coding.2937 64 . pp. Reutemann. Qiuting Huang . Cupaiuolo.com/matlabcentral/fileexchange/1716 [11] P.162004. “Draft STANDARD for Information TechnologyTelecommunications and information exchange between systemsLocal and metropolitan area networksSpecific requirementsPart 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) specifications. R.13 μm CMOS". Digest of Technical Papers.162004/Cor12005. Hueber.1) April 3.” [9] http://www. G Strasser. Huang.html?file=/proucts/demos/s hipping/filterdesign/ddcfilterchaindemo. Amendment 2: Physical and Medium Access Control Layers for Combined Fixed and Mobile Operation in Licensed Bands and Corrigendum 1. IT 4.Hagleager. Stuhjberger. Feb 37. 2007 [4] IEEE Std 802. Theory. C.mathworks. 264612. Elias. vol. 1954. ISSCC 2007.
and E. Multiplexing and channel coding (Release 8) [19] Pietrobon. 65 .“Optimal Decoding of Linear Codes for Minimizing Symbol Error Rate”. 1966. pp. IEEE Trans.May 1971. 1996. “A comparison of Optimal and SubOptimal MAP Decoding Algorithms Operating in the log Domain. IEEE Transactions on Information Theory. pp 10091013. Thitimajshima. vol. Berrou.5.. Forney. and Hoeher. Forney Jr. Cambridge. Benedetto and G. C. Technical Specification Group Radio Access Network. Mar 1974 Pages: 284 – 287.. Inform Theory. no. no.” in Proc.Robertson . “Unveiling turbocodes: Some results on parallel concatenated coding schemes. Glavieux.no. [21] C..2 Mar. P. Hoeher . Vilebrun .. “Near Shannon limit errorcorrecting coding and decoding: Turbo Codes. J.42. Convolutional Codes I: Algebraic Structure. D. F. [15] C. Telecomm. Thitimajshima.” IEEE Trans. Switzerland. J. [18] 3GPP TS 36. Washington. Glavieux. pp. Bahl.1997. Seattle. Villeburn. A. 15.and vol.no3. [16] Perez L.8. IT 17.” in Proc. [17] L..” Proc. Seghers J. pp.6. May 1993.P. ICC.p. 1064–1070.42.R. Concatenated Codes. Raviv . Montorsi.360 [13] S.2346. Cocke..409428. Evolved Universal Terrestrial Radio Access (EUTRA)..D. Theory.Apr. [20] Robertson.pp. Inform.0 (200812) 3rd Generation Partnership Project. Jelinek and J. 1996. Nov. 1970 . “Implementation and Performance of Turbo/MAP Decoder..[12] G. vol.” European Trans.6. “ Optimal and suboptimal maximum a posteriori algorithms suitable for turbo decoding. vol. 720738. pp. A. ICC’93. Volume 20. Of ICC ’95. 10641070. S.Jr. Geneve. “A Distance Spectrum Interpretation of Turbo Codes. and P.E. MA: MIT press. Berrou.” Int’l.212 V8. JanFeb 1998. UN 1995. P. Issue 2. and Costello D. Nov.vol. 1993. “Near Shannon limit errorcorrecting coding and decoding: Turbocodes. pp.IT16.Satellite Commun. [14] G.. vol.” IEEE Trans Information Theory. 16981709. [22] P. Jr.. J. and P.
'Gateway to Globalization'. “ A comparison of Optimal and Sub –Optimal MAP Decoding Algorithms Operation in the Log Domain” Communications. P.[23] Patrick Robertson. J.Emmanuelle Villebrun and Peter Hoeher. 1995. Baum. on page(s): 10091013 vol. 66 . Salo. S. Hansen. ICC '95 Seattle. M. and J. [24] D. “MATLAB implementation of the interim channel model for beyond3G systems (SCME)”. Kyoti. 1995 IEEE International Conference on 2. Milojevic. May 2005.2.
0045013427734 0.Appendix A Coefficient for 802.021575927734 0.019779205322 0.033279418945 0.010028839111 0.0068283081055 0.45468902588 0.0020599365234 0.14438247681 0.11n DDC filter: h1 h2 h3 h4 h5 h6 h7 h8 h9 h10 h11 h12 h13 h14 h15 h16 h17 h18 h19 h20 h21 h22 h23 h24 h25 h26 = = = = = = = = = = = = = = = = = = = = = = = = = = 0.05207824707 0.057586669922 0.035419464111 0.0089797973633 0.011726379395 0.0039749145508 ARM1 coefficient From RAM1 From RAM2 ARM2 coefficient From RAM1 From RAM2 h1 x(0) x(51) h15 x(14) x(37) h24 x(28) x(23) h10 x(43) x(9) h2 x(50) x(1) h16 x(36) x(15) h23 x(23) x(29) h9 x(9) x(43) 67 .0016479492188 0.09334564209 0.0050010681152 0.0028839111328 0.0010108947754 0.00057601928711 0.0066986083984 0.015243530273 0.026027679443 0.0035057067871 0.014629364014 0.
ARM3 coefficient From RAM1 From RAM2 h3 x(2) x(49) h17 x(16) x(35) h22 x(30) x(22) h8 x(44) x(7) ARM4 coefficient From RAM1 From RAM2 h4 x(48) x(3) h18 x(34) x(17) h21 x(20) x(31) h7 x(6) x(45) ARM5 coefficient From RAM1 From RAM2 ARM6 coefficient From RAM1 From RAM2 ARM7 coefficient From RAM1 From RAM2 Note: X(n) = h5 x(4) x(48) h19 x(18) x(34) h20 x(32) x(19) h6 x(46) x(5) h11 x(10) x(41) h25 x(24) x(27) h11 x(38) x(13) h2 x(40) x(11) h26 x(26) x(25) h13 x(12) x(39) z −n 68 .
Appendix B Standard parameters and detail Block size 40 48 56 64 72 80 88 96 104 112 120 128 136 144 152 160 168 176 184 192 200 208 216 224 232 240 248 256 264 f1 3 7 19 7 7 11 5 11 7 41 103 15 9 17 9 21 101 21 57 23 13 27 11 27 85 29 33 15 17 f2 10 12 42 16 18 20 22 24 26 84 90 32 34 108 38 120 84 44 46 48 50 52 36 56 58 60 62 32 198 Block size 1120 1152 1184 1216 1248 1280 1312 1344 1376 1408 1440 1472 1504 1536 1568 1600 1632 1664 1696 1728 1760 1792 1824 1856 1888 1920 1952 1984 2016 f1 67 35 19 39 19 199 21 211 21 43 149 45 49 71 13 17 25 183 55 127 27 29 29 57 45 31 59 185 113 f2 140 72 74 76 78 240 82 252 86 88 60 92 846 48 28 80 102 104 954 96 110 112 114 116 354 120 610 124 420 69 .
272 280 288 296 304 312 320 328 336 344 352 360 368 376 384 392 400 408 416 424 432 440 448 456 464 472 480 488 496 504 512 528 544 560 576 592 608 33 103 19 19 37 19 21 21 115 193 21 133 81 45 23 243 151 155 25 51 47 91 29 29 247 29 89 91 157 55 31 17 35 227 65 19 37 68 210 36 74 76 78 120 82 84 86 44 90 46 94 48 98 40 102 52 106 72 110 168 114 58 118 180 122 62 84 64 66 68 420 96 74 76 2048 2112 2176 2240 2304 2368 2432 2496 2560 2624 2688 2752 2816 2880 2944 3008 3072 3136 3200 3264 3328 3392 3456 3520 3584 3648 3712 3776 3840 3904 3968 4032 4096 4160 4224 4288 4352 31 17 171 209 253 367 265 181 39 27 127 143 43 29 45 157 47 13 111 443 51 51 451 257 57 313 271 179 331 363 375 127 31 33 43 33 477 64 66 136 420 216 444 456 468 80 164 504 172 88 300 92 188 96 28 240 204 104 212 192 220 336 228 232 236 120 244 248 168 64 130 264 134 408 70 .
624 41 234 4416 640 39 80 4480 656 185 82 4544 672 43 252 4608 688 21 86 4672 704 155 44 4736 720 79 120 4800 736 139 92 4864 752 23 94 4928 768 217 48 4992 784 25 98 5056 800 17 80 5120 816 127 102 5184 832 25 52 5248 848 239 106 5312 864 17 48 5376 880 137 110 5440 896 215 112 5504 912 29 114 5568 928 15 58 5632 944 147 118 5696 960 29 60 5760 976 59 122 5824 992 65 124 5888 1008 55 84 5952 1024 31 64 6016 1056 17 66 6080 1088 171 204 6144 Table A.1 LTE interleaver parameters. 35 233 357 337 37 71 71 37 39 127 39 39 31 113 41 251 43 21 43 45 45 161 89 323 47 23 47 263 138 280 142 480 146 444 120 152 462 234 158 80 96 902 166 336 170 86 174 176 178 120 182 184 186 94 190 480 71 .
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue listening from where you left off, or restart the preview.