You are on page 1of 41

Implementation Issues for Channel Estimation and Detection Algorithms for W-CDMA

Sridhar Rajagopal and Joseph Cavallaro ECE Dept.

Contents
Introduction W-CDMA Channel Estimation and Detection DSP Implementation ASIC Implementation Other Current Projects Future Work

The CDMA Research Group

We cover the entire (spread) spectrum!

D M
Algorithms Implementation issues

Implementation Issues

Important because
Real-time Low Power Mobility /Size

DSPs
Signal Processing Communications

ASICs / FPGAs
Speed / Size

W-CDMA

CDMA : Code Division Multiple Access


W-CDMA : Wideband CDMA (5 MHz)
Next Generation Communication Systems
Integrating Multimedia Capabilities

QoS /Multi-rate Services


Higher Data Rates 2048,384,144 Kbps

Uplink - Async, Multiuser


Noise + MAI Reflected Paths Base Station

Direct Path

User 1 User 2

Downlink - Sync, Single User

Reflected Paths Noise + MAI

Direct Path

Base Station User 1 User 1 User 2

Determining the Channel

Channel Estimation
Need to know the Channel for proper detection
Delays and Amplitudes : Multiuser/path

Send sequence of known bits (Pilot / Preamble)


2 types
Code Multiplexed with Data Time Multiplexed with Data

Detection
Use knowledge of channel for detection of Data bits

W-CDMA Standards

Not fixed yet...


Uplink
Channel Estimation - Time Multiplexed Multiuser Detection

Downlink
Channel Estimation - Common Pilot Detection : Rake Receivers/ Equalizers

Channel Estimation

Uplink
Time Multiplexed
Maximum Likelihood Subspace

Downlink
Continuous
LMS Based Adaptive

Multiuser Detection
Optimal
MLSE (Viterbi)

Sub-optimal

Linear
MAI Whitening
Decorrelating MMSE

Interference Cancellation
Serial SIC Parallel PIC

Neural Network

Base-Station Receiver
Antenna

Data

Multiuser Detector

Demodulator

Demux

Estimated Amplitudes & Delays

Decoder

Pilot

Channel Estimator

CDMA Uplink System


User 1 d1

Channel Encoder Channel Encoder Spreading AWGN

Matched Filter Matched Spreading

y1

User 1 d1 '

User 2 d2

y2

MultiUser Detector

Channel Decoder

User 2 d2'

Filter

R(t)

Matched Channel User K dK Encoder Filter Spreading Demux

yK

User K dK'

Channel Estimator

Maximum Likelihood Channel Estimation


Send a time-multiplexed Preamble (Pilot). Channel properties extracted Compare with known pilot and estimate. Keep estimate for remaining data bits (static). Repeat preamble every frame, if no tracking.

The Maximum Likelihood Algorithm


Compute the correlation matrices Rrr, Rbr & Rbb.

Compute the channel estimate

Y Rbr Rbb-1 .

Calculate the noise covariance matrix K.


Calculate the channel impulse response vector z.

Extract the ampitudes and delays using least


squares fit.

The ML Algorithm Complexity


Complex-Real Dot Product.

1

1 r.b L
1 bb

rb

Complex-Real Matrix Product.

R r bR Y
Offline

Complex -Real Product.


)k U
L 'L

k U

R k

'R

() k U k U

H k2

R k

1 k 2

y(

H k

Real Square roots. Solving quadratic equation for least squares fit.

Critical code : Matrix-vector / Dot Product


Assuming Unity Noise Covariance

Differencing Multistage Multiuser Detection

Based on the principle of Parallel Interference Cancellation (PIC)

Cross-correlation information used to remove


interference of other users

Repeated iterations for convergence Differencing techniques to improve performance

The Differencing Multistage Detector

Split the cross correlation matrix into lower,


upper and the diagonal matrix.

R D S ST
D R
ST

Calculate impulse response


(l )

Az

Az

( l 1)

where x ( xk {0,2,2})

( l 1)

( S S ) Ax d ( l 1) d ( l 2 )
T

( l 1)

x is called the differencing vector.

Multistage Detector Complexity


Matrix Multiplication:

B (S S ) A
T

Computed only once for one frame

Dot Product:

l 1 k

j l z Bijx
l k

Computed iteratively

Critical code: Dot Product

TI Tools Used

Evaluation Modules (EVM) for C6201 and C6701


fixed and floating point DSPs 64 KB each internal program & data memory 256 KB SBSRAM, 8 MB SDRAM (external)

C Compiler ver 3.0 from Code Generation Tools Code Composer ver 4.02 for profiling

DSP Implementation: Channel Estimation

Floating point implementation found more feasible due to matrix inversions and square-roots.
Code optimized for the DSP Use of Specialized approximate instructions Approximate reciprocal square roots Approximate reciprocals

Use of Assembly Code for critical part. TI's C67 floating point benchmarks for Matrix-Vector Multiplication & Dot Product

Data Memory requirements for Channel Estimation

Approximate Instructions & Assembly


Use of specialized instructions and assembly code on C6701 DSP 140 Execution time(in milliseconds) -->

TMS320C67x DSP Cycles


Approx. FP Reciprocal instruction FP reciprocal function Approx. FP Reciprocal Sq. root Instruction FP Reciprocal Sq. root Instruction

120 100

C6701: Original C6701: with Intrinsics C6701: with Assembly

28 1

80
60 40 20 0 0

10% improvement

34

L = 150, P =3, N= 31, SNR = 5dB, SINR = -10 dB

100%
improvement
5 10 Number of users --> 15

Data Memory Requirements

Data to be placed in External memory

130

DSP Implementation: Multistage Detection


16-bit Fixed Point C Code Code optimized for the DSP Use of Assembly Code for critical part TI's C62 fixed point assembly benchmarks for Dot Product

Data memory requirements for Multistage Detection

Data Memory Requirements


Data can be placed completely in Internal memory

Flops Count
14 x 10
4

Users:K=15 SNR=6dB

2X speedup
Number of Flops

12

Conventional Method Differencing Method

for a three-stage detector

10

conventional
6

differencing

0 1 2 3 4 5 6 7 8

Total Number of Iterations

Real-Time Requirements
SNR=10dB Window Size=12
350 300

Real-Time capability by C6201 DSP

MAX BIT RATE PER USER (kb/s)

Conventional Method Differencing Method


250

200

12users 150kb/s

150

100

50 8 9 10 11 12 13 14

NUMBER OF USERS

Trends in Recent DSPs

More internal memory and higher clock speeds


C6203 : 512 KB data, 384 KB program, 250 MHz useful for uplink channel estimation algorithms.

Specialized Blocks in the DSP Core. Viterbi decoding in C54.

Lower Voltage operation 1.2 V in C5402 , useful for saving power consumption in the mobile.

ASIC Implementation
MOSIS Tiny-Chip (40-pin DIP)
8 synchronous users 12-bit fixed point implementation

6000 transistors
1.2 m CMOS technology 190kb/s for each user (@12.5MHz)

3-stage cascade delay < 15 s

Advantages of ASICs

Highly paralleled instructions: 4 RISC IPC (instructions per cycle) accumulating while shifting, loading and storing recoding while loading
Application specific architecture faster I/O smaller on chip memory smaller ALU

Chip (Single Stage) Architecture


z (l )
d
( l 1)

zz

(l(l ) )
A

REG

z ( l 1)
d (l )

d (l )
d ( l 1)
SHIFT RECODER

L U

(L+L)A

Control Logic

z (l 1) z (l ) ( L LT ) Ax (l ) where x (l ) d (l ) d (l 1)

Internal signals External signals

Chip Layout
2.0 mm Recoding logic Soft Decisions

CrossCorrelation

12-bit ALU

The Actual Chip Photograph

3-stage Cascade Mode


Matche d Filter Output Hand Shakin g Sin Sout Hin Hout Fin Load CLK 1/2 Load R Clock Fout Sin Sout Hin Hout Fin Load CLK 1/2 Fout Sin Sout Hin Hout Fin Load CLK 1/2 Fout Detector Output

Output Valid

System Timing

Load R

1st Stage

2nd Stage Final Output 3rd Stage

Interference Cancellation

10100000

00100000

00100000

Scalable ASIC Design


Current Tiny-Chip Features Clock Rate 8 synchronous users, 12-bit fixed point 12.5MHz Chip in the future 30 asynchronous users, 16-bit fixed point 100MHz 8 kb

Internal registers 0.3 kb ALU Transistors Output bandwidth Design method

12-bit partial carry look- Three 16-bit full carry lookahead adder ahead adders 6K 1.5Mb/s layout 100K 3.0Mb/s VHDL synthesis

Xilinx FPGA XC4000: 500k gates, 96MHz

DSP-ASIC Comparison
8 users
Clock Precision Speed Complexity Design Cycle DSP (C6201) 200MHz 16-bit 300kb/s/user ASIC (Tiny Chip) 12.5MHz 12-bit 190kb/s/user

~10M (0.25m) 6K (1.2m) transistors transistors short long

TIs C54xx: General purpose DSP core + ASIC

Other Current Projects

Simulation Testbed
Entire Chain of Algorithms
Simulink - RTW Rapid Prototyping Matlab to DSP

Copper Contest
Implementation of Multistage Detector using 0.15 micron Copper Technology

Wireless LAN Project

Home Area Wireless LAN

High Speed Office Wireless LAN

Outdoor CDMA Cellular Network

Future Work
Fixed Point Implementations on DSPs/ASICs
Uplink & Downlink Algorithms

Approximations using Linear Algebra

Support Long Codes and Fading


Multistage Detector Execution time Predictability Increase Efficiency

GPP Comparisons : Praful, Partha, Dr.Adve Effect of DMA and Caches