Professional Documents
Culture Documents
Arquitectura Hardware
Arquitectura Hardware
Arquitectura Hardware
2004
Parameter
Time slot length
N
Q
Midamble
2560
69
16
256
Value
chips
QPSK symbols
chips
chips
I. I NTRODUCTION
Universal Mobile Telecommunication System (UMTS) using Time Division Duplex (TDD) is a Time Division - Code
Division Multiple Access (TD-CDMA) system expected to
provide high data rate exible asymmetrical communications
for cellular systems. The classical rake receiver has shown
limited results due to the fact that it suffers from Multiple
Access Interference (MAI). Recently, improved receivers aiming at detecting the bursts jointly have been introduced. They
happened to be a good trade-offs between V ERD U` s optimal
receiver [4] and the rake receiver in terms of both complexity
and performance. The joint detection algorithm is used at the
output of a multi-user rake detector in order to decorrelate the
outputs of these detectors. The strategy is based on the fact that
the joint detector knows the spreading codes and the channel
estimates of the other users in order to lower their inuence
on the user of interest. Several joint detectors can be applied
according to the error minimisation criterion. In this paper,
we will focus on the Zero Forcing Block Linear Estimator
(ZF-BLE) which aims at cancelling the MAI regardless to
the noise. The principle of the ZF-BLE joint detection will
TABLE I
UMTS/TDD BURST TYPE 2 PARAMETER VALUES
957
M id a m b le
D a ta b lo c k 2
G u a rd
Q .N c h ip s
N = 6 9
B u rs t 8
Fig. 1.
d
1
d
2
(4)
z = U H .y
K
(W -1 ).J
(Q + W -1 ).J
(5)
N .K
Fig. 2.
B. ZF-BLE detector
The rake receiver computes the d estimates as follows:
drake = AH e = z
(2)
958
(0)
xi
..
(k)
xi
..
xi
(k)
= 0
(k1)
= xi
= (bi xi
(6)
)/ai,i
TABLE II
D ATA STREAM FOR LINE 1, 2 AND 3 COMPUTATION
x
a c c u
a c c u + a .x
in s ta n t t
t=1
t=2
t=3
t=4
in s ta n t t+ 1
Fig. 3.
t=1
t=2
t=3
t=4
t=5
(b -a c c u )/a
a c c u
(b -a c c u )/a
in s ta n t t
in s ta n t t+ 1
Fig. 4.
t=1
t=2
t=3
t=4
t=5
t=6
(Q + W 1)
Q
(7)
Note that the diagonal elements are positive and real. Then,
the non diagonal elements can be processed according to:
Line 1 computation
MAC2
DIAG
0
u1,1
l1,2
0
u1,1
l1,3
0
u1,1
l1,4
0
0
0
Line 2 computation
MAC1
MAC2
DIAG
0
u1,3
0
0
0
u1,4
u2,2
l2,3
0
u1,5
u2,2
l2,4
0
0
u2,2
l2,5
0
0
0
0
Line 3 computation
MAC1
MAC2
DIAG
u1,4
0
0
0
u1,5
u2,4
0
0
u1,6
u2,5
u3,3
l3,4
0
u2,6
u3,3
l3,5
0
0
u3,3
l3,6
0
0
0
0
MAC1
0
0
0
0
ui,j =
mi,j
i1
k=0
Result
x
u1,2
u1,3
u1,4
Result
x
x
u2,3
u2,4
u2,5
Result
x
x
x
u3,4
u3,5
u3,6
uk,i .uk,j
ui,i
(9)
Considering equation 9, it can be noticed that the computation of the non diagonal elements is close to the equation
for linear system solving: a sum of product terms followed
by a subtract and divide operation. As for the the linear
system solving, the number
k=i1of MAC cells equals the number
of product terms in k=1 uk,i uk,j . However, since j > i
and since U is a p wide band matrix, the number of MAC
cells can be reduced to p 2. Then, for P = 2, only 22 MAC
cells are required. Therefore, 23 MAC cells will be used to
perform both Cholesky decomposition and triangular system
solving. The DIAG cell is almost the same as for linear system
solving except that the backward stream is not fed with ui,j ,
but with uk,i .
The uk,i needed for the product terms are the same for all
elements of the ith line. These values are stored in each MAC
cell before the line elements computation. During a second
step, the uk,j , li,j and ui,i coefcients are send to the MAC
cells. The example below provides with an example of the
computation of 3 lines of a matrix for which p = 4.
Finally, the block diagram of the recongurable systolic
architecture that can perform both triangular system solving
and Cholesky decomposition is shown in gure 6
V. P ERFORMANCE ANALYSIS
2
Fig. 5.
i- 1
959
m e m
m e m
a c c u R e
a c c u Im
b a c k R e
b a c k Im
m a c
c e ll
1
b a c k R e
b a c k Im
1
1
b a c k R e
b a c k Im
m a c
c e ll
3
2
m a c
c e ll
p -1
m R e
m Im
R e
Im
p
p
a Im
2
2
a R e
a Im
b R e
b Im
p -1
p -1
a R e
m a c
c e ll
2
a Im
a R e
a c c u R e
a c c u Im
1
1
a c c u R e
a c c u Im
a Im
a R e
a Im
a R e
c o n tro l
a c c u R e
a c c u Im
b a c k R e
b a c k Im
ii
ii
m e m
d ia g R e
m e
m e
re s
re s
lR e
lIm
m e m
x R e
d ia g
c e ll
x Im
p
a c c u
L U T
Fig. 6.
n2
2
+ O(n)
n
= when n is large
2n 1
4
(10)
This gain should be balanced against the complexity ratio
which is in favour of the sequential solution. [9] suggest to
compute the efciency gure set to the the product of the
complexity by the processing time.
gsystlin =
systol
Tsystlin
seq
seq
.Tsystlin
Csystlin
systol
systol
Csystlin
.Tsystlin
n2
2
1
= when n is large
=
n.(2n 1)
4
(11)
= (l (p 2)).2(p 1) +
SS
= (p 1). 2l
(p2)
2
p2
i + (p 1)
i=1
TS
(12)
960
(13)
SS
p1
rst line
i=2
lines 2 to p-1
p=24
% seq
71 %
29 %
100 %
T systol
(cycles)
4412
5311
9723
% systol
gain
45,4 %
54,6 %
100 %
10
3,4
6,4
1.0e-02
BER
1.0e-03
1.0e-04
1.0e-05
1.0e-06
0
10
Eb/N0 (dB)
Fig. 7.
MAC
Sqrt
DIAG
Number of cells
Multipliers
Slices
Total multipliers
Total slices
23
2
115
46
2645
1
16
536
16
536
1
2
164
2
164
Master
FSM
1
0
221
0
221
Slave
FSM
22
0
62
0
1364
Total
64
4930
961