You are on page 1of 5

2005 IEEE 16th International Symposium on Personal, Indoor and Mobile Radio Communications

A Comparative Study Detection and of QRD-M

Sphere Decoding for MIMO-OFDM Systems


Yongmei DAJ*, SUN*t, Zhongding LEI*
*Institute for Infocomm Research, 21 Heng Mui Keng Terrace, Singapore 119613
Email: sunsm, leizd @ i2r.a-star.edu.sg
tDepartment of Electrical and Computer Engineering, National University of Singapore

Abstract- We present a comparative study of two tree search complexities of the two algorithms, and show that although
based detection algorithms, namely, the M-algorithm combined the worst-case complexity of SD is much higher than that of
with QR decomposition (QRD-M) and the sphere decoding (SD) QRD-M, the average complexity of the former with ordering,
algorithms, for multiple-input multiple-output (MIMO) orthog-
onal frequency division multiplexing (OFDM) systems. First, we is lower than that of the latter.
show that nodes ordering before and during the tree search is
important for both algorithms. With appropriate ordering, QRD-
M can improve detection performance significantly and SD can II. SYSTEM MODEL AND DETECTION ALGORITHMS
reduce decoding complexity substantially. Then we compare the
implementation complexity of the two algorithms, in terms of the The MIMO-OFDM system considered has Nt transmit and
number of nodes required to search or the required number of Nr receive antennas (Nr > Nt). Assuming perfect timing
multiplications to achieve maximum likelihood detection perfor-
mance. It is interesting to show that the average complexity of SD and frequency synchronisation, the received signal at each
is lower than that of QRD-M, whereas the worst case complexity subcarrier can be formulated as [15]
of SD is much higher than that of QRD-M.
r = Hs + rq, (1)
I. INTRODUCTION
Multiple-input and multiple-output (MIMO) systems have where r = (rI,r2, --- , rN,)T, rj denotes the signal received
been receiving growing attention due to the fact that the use at antenna j (j = 1, 2,--- , Nr) and superscript T denotes
of multiple transmit and receive antennas increases the system matrix transpose. H is the channel matrix at the subcarrier
capacity dramatically in rich scattering wireless channels [1]. under consideration, with its entry hj,i being the path gain
As the optimal maximum likelihood (ML) detection incurs from transmit antenna i(i = 1, 2,--- ,Nt) to receive antenna
prohibitively high complexity and is not suitable for practical j. s = (si, S2, ...*, SNt)T and si denotes the frequency domain
implementation, suboptimal detection algorithms are usually data transmitted at antenna i. r7 = (i/l, q2, -.-- 7T7Nr)T and mj
employed. One of the most popular detection algorithms is is the complex additive white Gaussian noise (AWGN) with
the nulling plus cancellation algorithm [2]. But it suffers zero mean and variance N,
significant diversity loss and power loss compared to ML With perfect knowledge of the channel state information H,
detection. the ML detection of (1) can be formnulated as
A lot of efforts have been put into the search for algorithms
achieving ML or near-ML performance with lower complexity. arg SEQ
mn llr Hsll,
- (2)
The M-algorithm combined with QR decomposition (QRD-M) Nt

[3], [4] and sphere decoding (SD) [5]-[14] are possibly the where Q denotes the modulation set and the ML detection
most promising algorithms. Both QRD-M and SD are tree-
search based algorithms. Specifically, QRD-M is a breadth- requires an exhaustive search over QNt candidates to find the
first and the SD is a depth-first tree search algorithm. QRD- optimal ML solution.
M reduces complexity, as opposed to the ML detection, by To compare and discuss the QRD-M algorithm and the SD
keeping only a fixed number of candidates with the smallest algorithm on a unified platform, we first apply QR decompo-
accumulated metrics at each level of the tree search whereas sition to the channel matrix H in Eqn. (1), i.e, H = QR,
SD reduces complexity by searching through only those can- where Q is an Nr x Nr unitary matrix and R is an Nr x Nt
didates falling inside a hypersphere. upper triangular matrix with Ri, (j > i) denoting its non-zero
In this paper, we present a comparative study of QRD-M elements and Rj,j for each i being a positive real number.
and SD algorithms for MIMO orthogonal frequency division Left-multiplying (I) with QH, we have
multiplexing (OFDM) systems. First, we will investigate the
different node ordering schemes involved in the tree search y QHr = Rs + QHq = Rs + V,
and show that ordering helps improve the system performance
for QRD-M and reduce the complexity for SD. We will where V is still an AWGN vector with zero mean and variance
then compare the average and the worst-case implementation NoI.

978-3-8007-2909-8/05/$20.00 ©2005 IEEE 186


2005 IEEE 16th International Symposium on Personal, Indoor and Mobile Radio Communications

Therefore, the ML detection problem (2) can be reformu- RNt-l,Nt(sNt - SNt)12 < d2 is applied to nodes at the next
lated as level for each possible root node that has been identified.
This searching process proceeds until the end level. If no
Sml= arg SEmin
QNt
IIY- Rsf12 valid symbol is found, the searching process restarts with an
Nt Nt ~~~~~~2 enlarged radius. When there is one valid symbol found, the
N.,
= ar min _ R-,is1 +
k=Nt+l
IYk 21 searching process also restarts but with a smaller radius until
no symbol can be found. The last found symbol is the ML
J=l
1=j solution.
Nt( 1 Nt2
It can be seen that the Nt-dimensional joint tree search
problem in the conventional ML detection has been reduced to
= arg min (
j= 3t=
Rj,is Nt times one-dimensional search, with later stages correlated
to all the previous stages. This is essentially a depth-first tree
= arg min {IIR(S-S)II2} search.
(Nt Nt 2] III. ORDERING IN QRD-M AND SD
Rj,j (Si isj)
+ RJ,i(si-SO In this section, we will discuss the different ordering
SG:)QNt i
arg min
= --

i=j+l schemes in QRD-M and SD, with the objective of improving


the performance of QRD-M and reducing the complexity of
SD.
where s = Htr = Rty is the zero-forcing solution. The A. Ordering in QRD-M
second equality above follows as the second term in the second There are two layers of ordering involved in the tree search
equation is not related to s. of the QRD-M algorithm:
A. QRD-M Algorithm I) First Layer Ordering: According to (3) and Section 11-
A, the tree search in QRD-M starts with the root node SyN
The QRD-M algorithm [3][4] reduces the tree search com- which corresponds to the last element of the signal vector s
plexity by keeping only Ml branches at each step (or level) and proceeds according to the decreasing order of its element
with the smallest accumulated metric values, instead of testin, index until s1. In fact, the order of the signal elements can
all the hypotheses in QNI according to (3). For example, at be easily changed by permutating r, H, and s in (1). The
the first step, only Ml out of Q possible sN, with smallest QRD-M algorithm is generally a sub-optimal detection and the
IRNt,N,(SN, - SN,) 2 is selected and stored. At the second tree search from one level to another involves the interference
step, only Ml out of AIQ combinations of SN, -1 and SN, cancellation step, therefore mapping different signal elements
with the smallest accumulated metric IRN, N, (N, - SN, )2 + to the root node or to higher level nodes will lead to different
RN1-1,N,-1(AN-1 -SN-1) + RN1-1,N, (SN, - SN1)| are performance. The first layer ordering therefore should try to
retained. At the last step, the s with the smallest accumulated select the nodes (or elements of the signal) that are closer to
metric value is chosen as the solution. Note that QRD-M the ML solution as the root node or nodes close to the root
algorithm is sub-optimal in nature unless Al = QN1. For a node.
small or medium Ml values, the complexity is substantially a) H-norm Ordering: The first layer ordering proposed
lower than ML detection. However, the searched result is no in the original QRD-M algorithm is based on the column
longer guaranteed to be the ML solution. It could be deviated norms of the channel matrix H [4], or the channel gain (power)
far from the ML solution especially when the channel matrix of the signal elements, referred here as H-norm ordering.
H is ill-conditioned, where the search starts from the wrong In other words, the original QRD-M algorithm uses power-
node. based orderingr. Noticing that interference cancellation process
B. SD Algorithm is involved during the tree search of QRD-M, it is intuitive to
search first those nodes corresponding to the signal elements
Unlike QRD-M, SD is an optimal algorithm achieving ML with strongest power.
performance. Instead of testing all the hypotheses in QNt, b) VB Ordering: As the channel columns are correlated
SD examines only those hypothetical points falling inside a and the signal element with the most received power is not
hypersphere with the radius d. For the detection problem of guaranteed to be the most reliable (closest to ML solution) one
(3), the hypotheses should meet the condition with the largest post-detection signal to noise ratio (post-SNR),
Nt Nt2 2 we propose here to investigate the post-SNR based ordering. It
is eventually the same as the ordering process in the V-BLAST
ERj,j(sj -sj) + i=j+l
j=l
E Rj,i (ji- Si < d2 (4) detection in [2]. We therefore refer it as VB ordering in the
paper. In the context of the V-BLAST detection [2], there is
The tree search starts from the condition |RNI,N (SNt- also a simplified ordering scheme, which is based on the row
SN1)12 < d2. The root nodes with constellations meeting the norm of the pseudo inverse of the channel matrix, i.e., HT.
condition in (4) are identified. Next, a more stringent con- We will investigate this ordering as well and refer it as Hinv
straint |RN,N (SNN -SNt)12+-IRN1-il,N1,-1(SN-1,-SN1-l)+ ordering.

978-3-8007-2909-8/05/$20.00 ©2005 IEEE 187


2005 IEEE 16th International Symposium on Personal, Indoor and Mobile Radio Communications

VBLAST-OFDM 'A' 4x4 160AM ORDM M=4,12


c) DiagR Ordering: According to (3) and Section Il-A, 10

the cost function is closely related to Ri,, apart from the inter-
ference cancellation process si-,j-si,j = N, Nt**N 1, 7 1.
The value of R,j will also affect the performance. The larger
the value at the root level, e.g, RN,,N1, the more impact
(provide other parameters are unchanged) of JRNI,Nt(SN, -
SNt )12 < d2 to the whole cost function and the less possible
Cc
2
valid nodes could be found for the later stages (levels) search Ue

(more stringent constraint for later stages). Since fixed M


surviving branches instead of all possible branches are retained
at each stage of QRD-M, the less possible nodes means the less
lost due to the branches cutting off. Therefore, permutation
of r, H, and s in (1) could be performed to accommodate a
decreasing Rij from i = Nt to i = 1 after QR decomposition.
2) Second Layer Ordering: The first layer ordering is to
perform permutation on the channel matrix before the actual Fig. l. Performance comparison of different ordering schemes for a 4 x 4
tree search process. Whereas the second layer ordering is System
performed during the tree search process.
Since the QRD-M algorithm keeps only a fixed number of
M branches at each level of tree search, it is intuitive that the M so on. Note that if the channel is well conditioned, the first
branches with the smallest accumulated metrics IRN,,N, (SN,- point found by the new algorithm is more likely to be the
SNt )12 + tRN,-1,N1-1(SN,-1 - SN,-1) + RNt1INJ(SNt - ML solution. Thus the complexity can be greatly reduced. It
SNt ) 2 + .. should be the survivors. Therefore, the second is notable again that the purpose of the first layer ordering is
layer ordering during the tree search process is based on the to make the first found point closer to the ML solution.
exact metric calculation.
IV. PERFORMANCE AND COMPLEXITY
B. Ordering in SD In this section, we study and compare the performance and
The original SD algorthm is not equipped with any ordering complexity of QRD-M and SD algorithms based on various
scheme, due to the fact that the BER performance of SD ordering methods through extensive simulations.
is ML and hence independent of ordering. That is to say, Throughout the simulations, we consider the typical 4 x 4
the different ordering will not affect the BER performance. and 8 x 8 MIMO-OFDM systems with 64 subcarriers occu-
However, ordering can affect the implementation complexity pying 20MHz bandwidth. The spatial domain channels are
significantly, as will be illustrated later in the simulations. assumed to be identically and independently distributed (i.i.d.)
Similar to QRD-M, the SD algorithm can also employ two with 16 Rayleigh-distributed taps decaying exponentially. The
layers of ordering. root-mean-squared (RMS) delay spread is assumed to be 50ns.
1) First Layer Ordering: According to (3) and Section II-B, All the 64 subcarriers are used for data transmission and
the permutation of r, H, and s in (1) is closely related to the each packet of data consists of one OFDM symbol from each
cost function. Similar to QRD-M, the H-norm, VB, and DiagR antenna.
ordering schemes affect the number of valid points falling
inside a specified sphere differently at each tree searching A. Performance of QRD-M
stage. This leads to different implementation complexity since We first show in Fig. 1 the performance comparison of four
a better ordering scheme can reach the ML solution with less different first layer ordering schemes, i.e., H-norm ordering,
nodes searched. DiagR ordering, Hinv ordering and VB ordering, coupled
2) Second Layer Ordering: The complexity of the original with the second layer ordering, in a 16-QAM modulated
SD algorithm heavily depends on the choice of the initial 4 x 4 system. In the simulations, M are set to 4 and 12
radius d. If d is too large, too many points are found and respectively. It can be seen in the figure that ordering can
if it is too small, no points are found and the radius has to be improve the system performance significantly and VB ordering
increased and the search has to be restarted. The second layer performs the best among all the first layer ordering schemes.
ordering based on the metric calculation, which is similar to For example, in the case of Ml = 4, VB ordering improves
the ordering in QRD-M, makes the complexity insensitive to the BER performance of the original H norm ordering by
the initial radius and thus reduces the complexity substantially. about 4 dB. However, when Ml increases, the improvement
The SD algorithm with metric-based ordering (or the second becomes smaller as the QRD-M will finally approach the
layer ordering) always starts the search from the constellation ML performance with large M regardless of the ordering.
point with the smallest branch metric, eg., at the first step, Simulation results also show that the simplified Hinv ordering
the SN1 minimizing IRNT,N1 (SNT - SN1 )12 is chosen and at and VB ordering have similar results in this case.
the second step, the SN,1- minimizing IRNt-1,Nt-1(SNt-1 - Fig. 2 shows the performance comparison of different
SNt-1) + RNt-1INt(5Nt - SN1)12 is chosen given sN, and ordering schemes for a 1 6-QAM modulated 8 x 8 system where

978-3-8007-2909-8/05/$20.00 ©2005 IEEE 188


2005 IEEE 16th International Symposium on Personal, Indoor and Mobile Radio Communications

VBLAST-OFDM 'A' x8 16QAM QRDM M=8,16 VBLAST-OFDM 4x4 QPSK SNR=7db 32000samples
10° 805
-4- no orderng
o order metric
700 order H-norm
order H-norm+metic
-0- order DiagR
L
la order DiagR+metnc

600
0c

Soo0
.. .....
-d
E
aF

400d -
E-
rc
E 300 -

2005- -. ...............i
......-..
- -0-...
--0 ---------------

1Wo 0 0
21 22 23 24 25 1 2 3 4 6 7 8 9 10
RN R n., rivfBt..n. K=[0 5,1,2,3.4,5,6,7,8,10] inital radus=-KNR*No

Fig. 2. Performance comparison of different ordering schemes for a 8 x 8 Fig. 4. Complexity comparison in terms of the mean number of real
system multiplications for a QPSK modulated 4 x 4 system

VBLAST-OFDM 4x4 OPSK SNR=7db 32000samples


70 1I

- noordenng
order metric
The initial radius d is chosen as suggested in [ II], following
60
order H-norrnr
order H-eormT+retric the rule
order DragR
order DiagR+metric
lIr - Hsli2 11q11- NOE{X2 N}
No Nr < d2,
a
=
(5)
s0
where E{ } denotes the expectation operation. That is, d is
-0
E chosen as d2 = KNoNr where K > 1 is a scaling factor.
c
Fig. 3 and Fig. 4 show the complexity comparison between
E 20
.r -. .- , - -
different ordering schemes for a QPSK modulated 4 x 4
~~~~~~~~~~~~~~~~~~~~~~~~.....................-2'
30
system, in terms of the mean number of nodes searched and
the mean number of real multiplications performed vs the K
20
factor, respectively. The working SNR is set to 7 dB.
Note that when SNR increases, the complexity will de-
1U,
1 2
I_
3 4 5 6 7 8 9 crease. It can be seen from the figures that the ordering
K=[0.5,1,2,3,4,5,6,7,8,10] initial radius-K'NR^No based on the branch metric (the second layer ordering) has
the most significant effect on the complexity, which makes
Fig. 3. Complexity comparison in terms of the mean number of nodes for
a QPSK modulated 4 x 4 system
the algorithm insensitive to the initial radius. While for those
ordering without taking the metric into consideration, the
complexity surges as the radius increases. H-norm ordering
and DiagR ordering can help reduce the complexity a bit
M = 8 and = 16 are used respectively. We can see that
provided the initial radius is chosen properly. When the initial
VB ordering performs slightly better than Hinv ordering in the radius is too large, their complexities are even higher than the
system with large number of antennas. When l = 8, similar case without ordering. It also can be seen that DiagR ordering
observation as in Fig. 1 can be made. When Ml = 16, however, always leads to a lower complexity than H-norm ordering.
diagR ordering performs the best and its performance is only The mean complexity of the metric ordering algorithm is
0.1 dB away from the ML performance. This implies that the reduced by around 25% when combined with DiagR or H-
first layer ordering is dependant on the value of Ml. In practice, norm ordering.
we have to employ the ordering scheme which is optimized Fig. 5 shows the comparison of mean number of nodes
for each specific value of parameter M-I. searched between the different ordering schemes for a 16-
QAM modulated 4 x 4 system. Similar to the QPSK modulated
B. Complexity of SD system, we found that the mean number of real multiplications
Since it is difficult to derive a closed form expression for shows the same tendency for the various ordering schemes,
the implementation complexity of SD, we resort to computer hence it is not reproduced here. The working SNR is 15 dB.
simulations and compare their complexities in terms of the Same as for QPSK, for a large range of initial radius, H-
number of nodes searched and the number of real multiplica- norm ordering and DiagR ordering have lower complexity than
tions performed based on (3). Here we assume the complexity without ordering. Note that when the dimension increases, eg.
of testing whether one point is inside a sphere is negligible. the 8 x 8 system vs the previous 4 x 4 system, the complexity
Moreover, the manipulations on the channel matrix is also not reduction may become more significant when ordering is
included in the calculation. applied.

978-3-8007-2909-8/05/$20.00 ©2005 IEEE 189


Qw

XXF
0'

c
IdU,0

160

140

120 _

100_-

80

60

40

20

0.
0
.z5{iv'o/0;,-t<.S ,s_:-/. . .a e,_-0>fi
---i- no

1
ordernng
order metric
order H-norm
order H-orm+metnc
order DiagR
e order DiagR+metnc

"

2
.,

El_>' +- f --,,_e -

II
3 4 6
/e

6
,
/
"'

-0t
/>a'
,/'
,,
;.
2005 IEEE 16th International Symposium on Personal, Indoor and Mobile Radio Communications

VBLAST-OFDM 4x4 16QAM SNR=l 5db 32000 sampies

fd
.,
,,

.'/ '-
/:

7
K=[0.5,1,2,3,4,5,6,7,8,10] initial radus-K'NR *No

Fig. 5. Complexity comparison in terms of the mean number of nodes for


a 16-QAM modulated 4 x 4 system
8 9
I

10
worst case complexity is significantly higher than the average
one.
V. CONCLUSIONS
A comparative study between the sphere decoding and
the QRD-M algorithm for MIMO-OFDM systems has been
presented in this paper. It is shown that the ordered tree search
can reduce the complexity of sphere decoding or improve the
performance of the QRD-M algorithm significantly. As for
the implementation complexity, it is interesting to know that
the sphere decoding has lower average complexity, but much
higher worst case complexity than QRD-M.
ACKNOWLEDGEMENT
The authors would like to thank Y-C Liang, Y. Wu,
C. K. Ho, P. H. W. Fung, Y. Li, and H. Fu for the helpful
discussion.
REFERENCES
[1] G. J. Foschini and M. J. Gans, "On the limits of wireless communica-
C. Complexity comparison of QRD-M and SD tions in a fading environment when using multiple antennas," Wireless
Personal Communications, pp. 315-335, March 1996.
In this section, we will further compare the implementation [2] P. W. Wolniansky, G. J. Foschini, G. D. Golden and R. A. Valenzuela,
complexity of the QRD-M and SD algorithms through com- "V-BLAST: An architecture for realizing very high data rates over
the rich-scattering wireless channel," in IEEE ISSSE-98, (Pisa, Italy),
puter simulations with similar setups as in previous sections. pp. 295-300, Sept. 1998.
The comparison results are listed in Table I for QPSK and [3] Kyeong Jin Kim and Ronald A. Iltis, "Joint Detection and Chan-
Table II for 16-QAM modulated 4 x 4 systems, respectively. nel Estimation Algorithms for QS-CDMA Signals Over Time-Varying
Channels," IEEE Trans. Commun., vol. 50, pp. 845-855, May 2002.
For the SD algorithm, H-norm ordering coupled with the [41 Jiang Yue, Kyeong Jin Kim, G. D. Gibson and Ronald A. Iltis, "Channel
metric ordering is used as the reference since this ordering Estimation and Data Detection for MIMO-OFDM Systems." Global
does not introduce much complexity as opposed to DiagR or- Telecoomtunications Conft?rence, vol. 22, pp. 581-585, Dec. 2003.
[5] E. Viterbo and J. Boutros, "A Universal Lattice Decoder for Fading
dering. To ensure a fair comparison, for the QRD-M algorithm, Channels," IEEE Trans. Inftrtn. Theory, vol. 45, pp. 1639-1642, July
Ml = 4 is used for QPSK modulation and M = 16 is used for 1999.
16-QAM modulation, in which cases QRD-M has the same [61 Oussama Damen, Ammar Chkeif and Jean-Claude Belfiore, "Lattice
Code Decoder for Space Time Codes," IEEE Commain. Lett., vol. 4,
ML-achieving performance as the SD algorithm. pp. 161-163, May 2000.
[7] Albert M. Chan and Inkyu Lee, "A New Reduced-Complexity Sphere
TABLE I Decoder for Multiple Antenna Systems," IEEE International ConJirence
COMPLEXITY COMPARISON FOR A QPSK MODULATED 4 x 4 SYSTEM
on Communnications, vol. 1, pp. 460-464, May 2002.
[8] Babak Hassibi and Haris Vikalo, "On the expected complexity of integer
least-squares problems," IEEE International Conference on Acoulstics,
Speech and Signal Processing, vol. 2, pp. 1497 -1500, 2002.
[9] Joakim Jalden and Bjorn Ottersten, "An Exponential Lower Bound
on the Expected Complexity of Sphere Decoding," IEEE International
Conference on Acoustics, Speech and Signal Processing, May 2004.
[10] Erik Agrell, Thomas Eriksson, Alexander Vardy and Kenneth Zeger,
"Cloest Point Search in Lattices," IEEE Trans. Iqiform. Theory, vol. 48,
pp. 2201-2214, Aug. 2002.
[11] Bertrand M. Hochwald, Stephan ten Brink, "Achieving Near-Capacity on
a Multiple-Antenna Channel," IEEE Trans. Comnmntn., vol. 51, pp. 389-
399, March 2003.
TABLE II [12] Mohamed Oussama Damen, Hesham El Gamal and Giuseppe Caire, "On
Maximum-Likelihood Detection and the Search for the Closest Lattice
COMPLEXITY COMPARISON FOR 16QAM MODULATED 4 x 4 SYSTEM
Point," IEEE Trans. Informn. Theory, vol. 49, pp. 2389-2402, Oct. 2003.
[13] Jijun Yin, Heung-No Lee, Mohin Ahmed, Bo Ryu and Lewis Peterson,
"Iterative MMSE-Sphere List Detection and Graph Decoding MIMO
mean # of max # of mean # of nodes max # of OFDM Transceiver," IEEE Vehiciular Technology Conference, May
real " x " real " x " nodes 2004.
ML 330880 330880 4369(1+16+256+4096 4396 [14] W. H. MoW, "Universal Lattice Decoding: Principle and Recent Ad-
QRD-M 3520 3520 49(1+16+16+16) 49 vances," Wireless Comtmunications and Mobile Comnputting, Special Issiue
(M= 16) on Coding and Its Applications in Wireless CDMA Systems, vol. 3,
SD 200 8248 19 676 pp. 553-569, Aug. 2003.
[15] Yan Wu, Sumei Sun, and Zhongding Lei, "Low complexity VBLAST
OFDM detection for WLAN," IEEE Commnln. Lett., vol. 8, pp. 374-376,
It is clear from the tables that SD always has the lowest June 2004.
average complexity. Its complexity advantage is more obvious
for 16-QAM modulation. The problem of SD lies in that its

978-3-8007-2909-8/05/$20.00 ©2005 IEEE 190

You might also like