19 views

Uploaded by veeresh_bit

- SAP ME How-To-Guide - Nonconformance
- 10 Programs
- lec34old
- Efficient Allocation of CQI Channels in Broadband.pdf
- Mining Search Engine Query Logs via Suggestion Sampling
- G0603063438.pdf
- Network MIMO
- Analysis of BER for Multiple Antenna Technology Using STBC
- qesjhdjsahd
- MIMO Channel Modelling
- DSS Syllabus
- Transmit Signal Design for Optimal Estimation of Correlated MIMO Channels
- Thies Dna06
- PhD Seminar
- race_sol
- p60 0x09 Big Loop Integer Protection by Oded Horovitz
- Separating Input Language and Formatter in GNU Lillpond
- DSTRUC Q A
- Course Notes on Data Structures and Algorithm by Clifford a. Shafford
- Formal Cut Proposal WiMAX

You are on page 1of 11

Spatial-Multiplexing MIMO Communications

Chun-Hao Liao, To-Ping Wang, and Tzi-Dar Chiueh, Senior Member, IEEE

Abstract—In this paper, VLSI implementation of a configurable, Recently, in an amazing demonstration, 5 Gbps downlink

soft-output MIMO detector is presented. The proposed chip can wireless communication is achieved using spatial multiplexing

support up to 8 8 64-QAM spatial multiplexing MIMO commu-

on 12 12 antenna configuration and 100 MHz bandwidth,

nications, which surpasses all reported MIMO detector ICs in an-

tenna number and modulation order. Moreover, this chip provides reaching a record-breaking 50 bps/Hz spectrum efficiency [4].

configurable antenna number from 2 2 up to 8 8 and modula- Moreover, along with the development of the RF front-end

tion order from QPSK to 64-QAM. Its outputs include bit-wise log circuit technology, short-range data communication through the

likelihood ratios (LLRs) and a candidate list, making it compatible extremely high frequency (EHF) band is no longer infeasible.

with powerful soft-input channel decoders and iterative decoding

system. The MIMO detector adopts a novel sphere decoding algo- In this frequency band, antenna size can be shrunk to several

rithm with high decoding efficiency and superior error rate per- millimeters, making MIMO systems with a large number of

formance, called modified best-first with fast descent (MBF-FD). antennas practical even for portable devices.

Moreover, a low-power pipelined quad-dual-heap (quad-DEAP) One of the most challenging tasks in MIMO communi-

circuit for efficient node pool management and several circuit tech-

niques are implemented in this chip. When this chip is configured

cation systems is data detection at the receiver when spatial

as 4 4 64-QAM and 8 8 64-QAM soft-output MIMO detectors, multiplexing is applied. Multiple streams of signals, coupled

it achieves average throughputs of 431.8 Mbps and 428.8 Mbps with noise and channel fading, interfere with each other when

with only 58.2 mW and 74.8 mW respective power consumption traveling in space and are received by a plurality of antennas.

and reaches 10 5 coded bit error rate (BER) at signal-to-noise

The optimal detection solution mandates exhaustive search

ratio (SNR) of 24.2 dB and 22.6 dB, respectively.

among the entire transmitted signal space and requires com-

Index Terms—Multiple-input multiple-output (MIMO) detec- plexity that scales exponentially with the number of antennas.

tion, soft-output sphere decoder, VLSI implementation.

To reduce the search complexity, sphere decoding (SD) was

proposed and it is capable of achieving optimal detection

I. INTRODUCTION performance with much reduced complexity [5]. To further

improve the error rate performance, the original hard-output

M ULTIPLE-INPUT multiple-output (MIMO) techniques

have recently enjoyed high degree of popularity in

wireless communications as they significantly enhance spec-

sphere decoding has been modified to provide soft-valued out-

puts, making it applicable in iterative detection and decoding

trum resource utilization [1]. In particular, a MIMO technique architectures to attain significantly enhanced detection perfor-

called spatial multiplexing can increase the data throughput mance [5]. The complexity of the hard-output and soft-output

almost linearly with the number of antennas [2]. Hence, the sphere decoding algorithms depends to a large extent on the

spatial multiplexing MIMO technique has been adopted in adopted search method. Several previous research works pro-

many current wireless communication standards. For example, posed a variety of search algorithms, such as K-best [6]–[11],

the IEEE 802.11n wireless LAN standard adopts MIMO depth-first [5], [12], [13] etc. However, owing to the limitation

configurations with up to 4 4 spatial multiplexing, and the in search scalability these algorithms are mainly applicable

latest IEEE 802.16e mobile WiMAX standard also includes a to MIMO systems with either fewer antenna elements or

lower-order modulation.

4-stream spatial multiplexing mode.

Systems with higher number of antennas are on the horizon. In light of the trend in spatial-multiplexing MIMO commu-

For instance, it was proposed that 8 8 spatial multiplexing nications toward higher-order modulation, more spatial streams

may be necessary in the next-generation (4G) mobile com- and soft-valued output, we propose, in this paper, a configurable

munication standard to achieve peak spectrum efficiency [3]. soft-output MIMO detector IC based on a novel complex-plane

sphere decoding algorithm. In this IC, several architecture and

circuit techniques are proposed and implemented to achieve the

Manuscript received February 02, 2009; revised July 06, 2009. Current

version published February 05, 2010. This paper was approved by Associate

following advanced features:

Editor Bevan Baas. This work was supported in part by the National Science • First MIMO detector IC supporting 8 8 64-QAM spatial

Council, Taiwan, R.O.C., under Grant NSC98-2752-M-002-002-PAE and multiplexing.

NSC97-2219-E-002-011. The work of Chun-Hao Liao is also partially spon-

sored by the Institute for Integrated Signal Processing Systems, RWTH Aachen • Support for antenna configuration from 2 2 to 8 8 and

University, Aachen, Germany. modulation from QPSK to 64-QAM.

The authors are with the Graduate Institute of Electronics Engineering and • Provision of soft-valued outputs and candidate list, making

the Department of Electrical Engineering, National Taiwan University, Taipei,

Taiwan 10617 (e-mail: chiueh@cc.ee.ntu.edu.tw). it compatible with soft-input error-correction-code (ECC)

Digital Object Identifier 10.1109/JSSC.2009.2037292 decoders and iterative detection and decoding system.

0018-9200/$26.00 © 2010 IEEE

Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on March 30,2010 at 02:20:45 EDT from IEEE Xplore. Restrictions apply.

412 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 45, NO. 2, FEBRUARY 2010

• Novel modified best first with fast descent (MBF-FD) vector that contains the a priori information about . In iter-

MIMO detection algorithm enhancing detection efficiency ative detection and decoding system, the MIMO detector first

and performance. computes the LLR outputs without the a priori information; the

• Low-latency, pipelined quad-dual-heap (quad-DEAP) cir- LLRs are then passed to a soft-in-soft-out ECC decoder, whose

cuit facilitating node pool maintenance. outcome will then be fed back to the MIMO detector as a priori

• Tabular enumeration scheme providing fast and efficient information; and the iteration goes on.

enumeration. Generally speaking, sphere decoders are very effective as

• Optimized node processing circuit enabling high clock rate soft-output MIMO detectors due to the efficient search strategy

and low power consumption. that confines the search space to include only the vectors

• Average throughput of 431.8 Mbps with 58.2 mW in 4 4 whose costs are smaller than the sphere constraint. However,

64-QAM configuration and 428.82 Mbps with 74.8 mW in as search with the sphere constraint the whole space in each

8 8 64-QAM configuration. iteration is still time-consuming, we adopted a compromised

The rest of this paper is organized as follows: after intro- solution proposed in [5]. This scheme generates a candidate

ducing the MIMO detection problem and the conventional list during the first MIMO detection and afterwards confines

sphere decoding algorithms in Section II, we will give the the search to only among the solutions in that candidate list. In

main idea of the MBF-FD algorithm for MIMO detection the candidate-list-based MIMO detectors, where the a priori

in Section III and expound on related simulation results and information can be ignored, we can rewrite the cost function as

comparison with existing solutions. Section IV presents circuit

design and implementation of the proposed IC, including (4)

hardware architecture and circuit techniques. Then, Section V

reports the chip measurements and compares the proposed chip where , and and are respectively a unitary ma-

with several reported MIMO detector chips. Finally the paper trix and an upper-triangle matrix that satisfy . Since

is concluded in Section VI. is an upper-triangular matrix, the complex symbols in

can be determined sequentially from bottom to top. The de-

coding can then be mapped into a search over an -layer -ary

II. DETECTION IN SPATIAL-MULTIPLEXING MIMO SYSTEMS tree, whose leaf nodes correspond to the possible solu-

tion vectors. By expanding (4), we recursively define the par-

Let us consider spatial streams, each transmitting -bit

tial cost of an intermediate node in layer with partial solution

data per symbol using -QAM modulation over an MIMO

as

system with transmitting and receiving antennas.

Denote these -bit data as a binary row vector with

, , and let be the QAM-mapped complex

constellation symbol vector having complex symbols,

. The received complex symbol, ,

is then given by

(5)

(1)

where and are respectively elements in and , and

where is the channel matrix that is assumed known

.

beforehand and is the complex Gaussian noise vector.

Several tree search schemes have been studied in the context

For simplicity, in the rest of the paper we assume

of sphere decoding MIMO detection. Among them, the breadth-

.

first, depth-first and best-first algorithms are the most popular.

Hard-output MIMO detectors try to find the symbol vector

Breadth-first algorithms [6]–[11] are favored due to their regular

(and correspondingly the binary vector ) that maximizes the

memory arrangement and amenability to pipelined and paral-

likelihood of the received vector. On the other hand, soft-output

leled implementation. However, for systems with more number

MIMO detectors compute the extrinsic bit-wise log-likelihood

of antenna and/or higher modulation order, breadth-first algo-

ratio (LLR) of each bit in under the max-log maximum a pos-

rithms tend to require enormous computational complexity to

teriori (MAP) criterion according to [5]

achieve acceptable performance. On the other hand, depth-first

algorithms [5], [12], [13] have better search efficiency, although

their tree traversing strategy still leaves room for improvement.

In [14], a best-first algorithm is proposed and shown to be a

(2) better search method. This best-first algorithm maintains a pool

of nodes to visit, which are not necessarily in the same sub-tree.

(3) When the current best node with the lowest partial cost has been

visited and processed, the best-first algorithm starts from the

where and are respectively the extrinsic and a next best node in the pool. Namely, it can hop within the tree

priori LLR of the bit ; is the search space ; without being restricted by the structure and connectivity of the

is the noise power spectral density, and is the row tree and always looks into the most promising nodes.

Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on March 30,2010 at 02:20:45 EDT from IEEE Xplore. Restrictions apply.

LIAO et al.: A 74.8 mW SOFT-OUTPUT DETECTOR IC FOR 8 8 SPATIAL-MULTIPLEXING MIMO COMMUNICATIONS 413

Fig. 1. Operation of the best-first algorithms: (a) original best-first, (b) MBF, and (c) Modified best-first with fast descent algorithm MBF-FD.

III. A LOW-COMPLEXITY SEARCH ALGORITHM order to reach more leaf nodes, the MBF algorithm is further

modified to include the flavor of depth-first tree search. The

A. Algorithm Description final algorithm, called modified best-first with fast descent

By continuously starting the search from the current best node (MBF-FD) [15], continuously searches downward for the best

with the lowest partial cost, the aforementioned best-first algo- child nodes and pushes best sibling nodes along the search path

rithm avoids the limitation of adjacency in the tree suffered by into the node pool until a leaf node is reached. Then a new

the depth-first and thus achieves a better search efficiency. How- forward search is started from the best node in the node pool.

ever, in the original best-first algorithm, the nodes are connected The MBF-FD algorithm preserves the benefits of the MBF

in a traditional -ary tree—each node has children and each algorithm while guaranteeing enough full-length solutions for

child node can be reached only from its parent. So individual soft-output MIMO detection. Fig. 1(c) illustrates the operation

partial cost of all children nodes must be evaluated before the of the MBF-FD algorithm.

search can move downward to the next level as indicated in

Fig. 1(a). Evaluation of all child nodes’ partial costs often makes B. Simulation Results

the best-first algorithm’s efficiency less than desirable. What’s

We compare the proposed MBF-FD algorithm with the

worse is that in a tree with high degree, pushing in many nodes

modified K-best Schnorr-Euchner (MKSE) algorithm [7] and

and removing only one parent node can quickly bloat the node

the single tree search (STS) algorithm [13], which are popular

pool with useless nodes.

breadth-first-based and depth-first-based algorithms, respec-

In the modified best-first (MBF) algorithm [15], the original

tively. To make a fair comparison, we evaluate them in terms of

-ary tree is converted into an equivalent binary tree, as illus-

the computational complexity measured in average number of

trated in Fig. 1(b). When a node is visited, we can replace this

required partial cost calculations (PCC) to reach coded bit error

node in the pool by only two new nodes: its best child node in

rate (BER) of at certain SNR. The data are coded in a rate

the next layer and its best yet-to-visit sibling. Afterwards, the

systematic convolutional code with constraint length 3,

next best node in the sorted node pool is examined and vis-

and interleaved with a 128 72 row-in-column-out interleaver.

ited and so on. By adding these two nodes into the pool (and

A spatially uncorrelated Rayleigh channel matrix is assumed

deleting the current node), the legacy of the current node is pre-

in each case and its elements are complex zero-mean Gaussian

served, downward by its child node and horizontally by its sib-

random variables with variance 0.5 per dimension. The sphere

ling node. This procedure is similar to encoding a general or-

constraint is set to 2 in all algorithms initially, which leads to a

dered -ary tree (e.g., 4-ary, 16-ary, or 64-ary) into a binary

fair search space reduction while maintaining good error rate

tree by a method called first-child/next-sibling binary tree [16].

performance from extensive simulations. The different sphere

The MBF algorithm greatly reduces the degree of a node by in-

decoding algorithms are compared under various run-time

troducing horizontal connections and thus effectively decreases

constraint settings, e.g., maximum number of visited nodes,

the complexity of child node evaluation in the original best-first

in MBF-FD and STS, and in MKSE.

algorithm. It also makes the node pool more efficient in cap-

Fig. 2 depicts the required average number of PCC and min-

turing promising nodes for future visit.

imum SNR to achieve coded BER for each algorithm

Although the MBF algorithm successfully resolves the

under different run-time constraints, where the SNR is defined

complexity and node pool issues of the traditional best-first

as

algorithm, it still has the problem of spending too much time

searching on higher layers and may not reach even one leaf

node (for a full-length solution) under a time constraint. In (6)

Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on March 30,2010 at 02:20:45 EDT from IEEE Xplore. Restrictions apply.

414 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 45, NO. 2, FEBRUARY 2010

Fig. 2. Comparison of MBF-FD, STE, and MKSE algorithms under different run-time constraints.

the MBF-FD algorithm over the STS and MKSE algorithms is

quite obvious. In particular, when BER is mandated for

channels with 17.5 dB SNR in the 4 4 16-QAM configuration,

the average number of PCC required by the MBF-FD algorithm

is only 41% of the STS algorithm and 13% of the MKSE algo-

rithm. In the 8 8 64-QAM configuration this advantage is even

more pronounced, where for channels with 21.5 dB SNR the

MBF-FD algorithm needs on the average only 9.8% and 3.3%

the complexity of the STS algorithm and the MKSE algorithm,

respectively.

The proposed MIMO detector consists of three major parts

for the implementation of the MBF-FD algorithm. First, a node

pool holds the information of the nodes for future visit. Sec-

ondly, the node processing part performs the MBF-FD tree tra-

Fig. 3. An example of the DEAP structure.

versal. Finally, a third part generates the soft detection result and

the candidate list output.

best and the worst nodes is obviously not acceptable due to high

A. Node Pool circuit complexity and long delay. In the following, we will pro-

Node pool, the most critical block in the proposed IC, main- pose the pipelined quad-DEAP for implementatiing the node

tains a group of nodes to visit in a way that the best and the pool. Techniques that improve throughput, accuracy, power and

worst nodes can be readily identified. The size of the node pool complexity will also be presented.

should be properly determined to guarantee satisfactory BER Dual-heap (DEAP) [17], consisting of a minimum and a max-

performance. From extensive simulation, we found that a node imum heap1 arranged in a back-to-back fashion, is a data struc-

pool with about 30–40 nodes is sufficient. The best node (with ture dedicated for efficiently maintaining the minimum and the

minimum partial cost) is the next node to visit, while the worst maximum among a group of numbers. Fig. 3 depicts an example

node (with maximum partial cost) is to be removed when a of DEAP. Note that a leaf node in the minimum heap is less than

new node is inserted and the pool is already full. The nodes or equal to the corresponding leaf node in the maximum heap.

come in and out so frequently that the efficiency of the node Upon retrieval of the minimum or maximum node, DEAP can

pool significantly affects the MIMO detector performance. A 1A minimum (maximum) heap is a tree in which a parent node always holds

simple-minded design using two comparators to search for the a value less (greater) than the values that its child nodes hold.

Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on March 30,2010 at 02:20:45 EDT from IEEE Xplore. Restrictions apply.

LIAO et al.: A 74.8 mW SOFT-OUTPUT DETECTOR IC FOR 8 8 SPATIAL-MULTIPLEXING MIMO COMMUNICATIONS 415

be easily maintained through node exchanges propagating from Although pipelining improves the node pool, it also intro-

one end to the other. duces possible incoherence in the best node value when the node

With the above DEAP structure, we next present several exchanges associated with a best node replacement are not com-

techniques for efficient circuit implementation of DEAP. First, pleted in time, leading to degradation in error rate performance.

DEAP suffers from possible long latency which scales lin- To this end, we include a best node cache, which holds the best

early with the number of layers due to propagation of node node of the pool while the quad-DEAP handles the other nodes

exchanges. To reduce the number of layers, we propose to use in the pool. Armed with this cache, the aforementioned inco-

quad trees instead of binary trees. Fig. 4 depicts the adopted herence and possible BER degradation are avoided. Finally, we

6-layer quad-DEAP structure, which contains 42 nodes to introduce two more low-power circuit techniques for the node

guarantee satisfactory BER performance. pool. First, for the idle nodes which are not on the path of

Moreover, an interlaced pipelining scheme is implemented in propagation, we turn off the associated circuits by clock gating.

the node exchange operations to improve the node processing Second, when node exchange procedure halts at a certain stage,

rate and circuit utilization, as illustrated in Fig. 4. To implement the inputs of the comparators in the ensuing stages are frozen to

node exchanges, the pipelining stages operate in a period of two minimize signal switching. A 38.2% saving in power is achieved

clock cycles. Specifically, in the first clock cycle, two root nodes by these techniques according to gate-level power simulation.

in layers 1 and 6 update their values with the respective inputs

if necessary, while nodes in layers 2, 3, 4, 5 that have been up- B. Node Processing

dated in the previous cycle compare with associated nodes in This part performs the main operations of MBF-FD tree tra-

layers 3, 2, 5, 4 and exchange values whenever necessary. In the versal, including identifying the child and sibling nodes and cal-

second clock cycle, similar node exchanges are performed be- culating their partial costs. A dedicated pipelining strategy is in-

tween layers 1 and 2, layers 3 and 4, layers 5 and 6, respectively. troduced to cut down the possible long delay path. We partition

Note that both upward and downward propagation of node ex- the computation involved with a child node into three stages: the

changes are possible. In addition, these two types of propagation inter-antenna interference cancellation (IAIC) block first can-

can happen simultaneously in the pipelining stages. Therefore, cels the interference from the QAM symbols that have been de-

the circuit is designed to handle upward and downward node cided in the previous layers; the child node processing (CNP)

exchanges concurrently. Finally, the comparators are shared be- block then finds the best child node; and finally the partial cost

tween the two phases (even-cycle phase and odd-cycle phase) to calculation (PCC) block computes and accumulates the squared

increase circuit utilization. error. The operation involved with a sibling node is similarly

Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on March 30,2010 at 02:20:45 EDT from IEEE Xplore. Restrictions apply.

416 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 45, NO. 2, FEBRUARY 2010

Second, the complex multiplication and the following subtrac-

tion are integrated in one carry-save adder whose final addition

is a multi-stage carry-select adder. Gate-level synthesis results

show that the above techniques reduce 38.2% of the critical path.

2) Child Node Processing: Assuming the newly-popped-out

node is in layer , CNP finds recursively the constellation

point and its corresponding difference of the best

child node in layer ,

(8)

(9)

Fig. 5. Pipelining schedule of node processing.

where runs from layer down to layer 0 and is the quan-

tization function that converts its argument to the nearest con-

partitioned into the sibling node processing (SNP) block and stellation point. To avoid the division in (8), we adopt a search

PCC as well. With the above partitioning and pipelining, the over all constellation points instead:

clock speed of the proposed chip can be close to 200 MHz.

Referring to Fig. 1, note that and depend on the de- (10)

cision of . Thus, we propose a pipelining schedule as shown

in Fig. 5. The top three blocks refer to the processing of the By the orthogonality between the real and imaginary parts, we

child node . At time , the decision of is already avail- can search the real and the imaginary part independently for

able though the partial cost of is yet to be computed. Con- the closest constellation point. In addition, as the signs of the

sequently, we parallelize the processing of and with real/imaginary parts of the closest point is identical to those of

the PCC of . Tree traversal for the following layers is simi- , we search only constellation points in the first quad-

larly performed until a leaf node is reached. Note that in parallel rant. In summary, we compare and with

another SNP block determines the best sibling of the best , , in parallel to find the real and imaginary parts

node retrieved from the node pool, [see Fig. 1(c)]. of simultaneously. Concurrently, all possible combinations

The above scheduling has several advantages. First, only one of the corresponding difference are computed and then

set of IAIC, CNP, SNP, and PCC circuits is implemented. Next, selected by the results of aforementioned comparison to reduce

since the tree is traversed sequentially, by adjusting the schedule the path delay. Fig. 6 depicts the circuit diagram of the CNP

this architecture can be configured to support different number block. From synthesis results, the proposed simplified child

of antennas (layers), different modulations, and run-time con- node search circuit saves 70.4% of area and 39.7% of circuit

straints. Also, the rate of node processing matches that of the delay when compared with the straightforward implementation.

node pool, i.e., two clock cycles per node, thus enhancing cir-

cuit utilization. We next introduce the circuit techniques adopted 3) Sibling Node Processing and Tabular Enumeration:

in these blocks. Finding the next sibling node requires sorting the yet-to-visit

1) Inter-Antenna Interference Cancellation: Assuming that constellation points according to their partial costs, which can

the current node is in layer , IAIC computes the first two terms account for a significant portion of the complexity in tree-search

inside the square norm of (5): MIMO detection hardware. To avoid that, we apply the tabular

enumeration (TE) technique proposed in [15] for fast node

order look-up. Fig. 7 illustrates how this technique works.

(7)

First, suppose the constellation point closest to the equalized

and interference-cancelled signal, , has been found and

To reduce the critical path delay, the associated terms in- denoted as . The region around this constellation

side the summation in (7) are computed and accumulated as point is then divided into eight triangular sub-regions. For each

early as possible, namely, during the processing of nodes at sub-region, the most likely visiting order of all other constel-

layers through . Hence, can be computed with lation points is computed in advance and stored in a table.

only one final multiplication and addition. Rearranging the cal- Extensive simulation indicates that TE introduces negligible

culation of greatly facilitates design configurability over the BER degradation when comparing to the exact enumeration

number of antennas. For the proposed 8 8 MIMO detector IC, order.

seven such IAIC units are implemented to compute through Direct implementation of TE requires eight tables for each

. In configurations with smaller number of antennas, fewer constellation point, each with entries. To reduce the re-

IAIC units are needed and unused ones are simply turned off. quired storage, we further unify these tables into one by uti-

To further reduce the critical path delay, two more circuit lizing to the symmetry in the eight sub-regions and the shift

techniques are adopted. First, as is a QAM constellation point invariance property of the partial cost function. Fig. 7 shows

and thus its real part (as well as imaginary part) has at most the unified node order table with a maximum offset of

eight possible values, the multiplier uses a simplified radix-4 which supports up to 64-QAM. Note that the node order and

Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on March 30,2010 at 02:20:45 EDT from IEEE Xplore. Restrictions apply.

LIAO et al.: A 74.8 mW SOFT-OUTPUT DETECTOR IC FOR 8 8 SPATIAL-MULTIPLEXING MIMO COMMUNICATIONS 417

Fig. 8. Three possible cases of the second best sibling node assuming sub-

region 0 is considered.

Fig. 7. Illustration of triangular partitions and node ordering table in tabular

enumeration.

can be found in one clock cycle. Fig. 9(a) and (b) shows the cir-

the offset to the current are listed inside and around cuit diagram of the proposed TE and STE, respectively, where

the table, respectively. As there is only one table for all possible the index TN is the sub-region index and the flip block handles

, boundary check is necessary to skip those offsets the symmetry processing of the offsets according to TN. Finally,

that lead to points outside the constellation. For different QAM the SNP circuit adopting STE is depicted in Fig. 10. Note that

modulations, the same table can be reused by merely modifying the first two bits of TN is simply the sign value of real and imagi-

the boundary. In sum, the unified table is implemented in only nary part of difference, , while the third bit of TN, ,

1.76 K bits, which is 0.88% of the straightforward design. Al- needs one more comparison. To reduce the critical path, two

though the unified table significantly reduced the storage, re- STE blocks are implemented to process the two possible cases

peated table-look-up to skip the invalid offsets can be a speed of . Moreover, many possible partial results for the differ-

bottleneck. To prevent this, eight parallel boundary check units ence are available from the CNP unit. These two techniques

are implemented. results in a 56.9% saving in critical path.

Note that that except for in Fig. 1 all other sibling nodes 4) Partial Cost Computation: The PCC block squares the

in MBF-FD are always the second best among all nodes of the differences obtained in CNP and SNP blocks and accumulates

same parent. Therefore, we further propose simplified TE (STE) the partial cost according to

for processing these sibling nodes during fast descent. Assume

that falls in sub-region 0 without loss of generality. (11)

Then there are only three possible cases of TE for these sibling

nodes as illustrated in Fig. 8, and hence the table can be reduced To reduce complexity and shorten critical path, a special squarer

to only three entries: (0, 2), (2, 0), ( 2, 0). These entries are pro- is designed and its outcomes are fed to a carry-save adder that

cessed and boundary checked in parallel so that a sibling node updates the partial cost according to (11).

Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on March 30,2010 at 02:20:45 EDT from IEEE Xplore. Restrictions apply.

418 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 45, NO. 2, FEBRUARY 2010

Fig. 9. Circuit diagram of (a) tabular enumeration (TE) and (b) simplified tabular enumeration (STE).

computing cost differences in pairs of counter hypotheses. Note

that for sphere decoders under run-time constraint, the costs of

counter hypotheses may not be found during the tree search. In

Fig. 10. Circuit diagram of sibling node processing (SNP) block adopting sim- our design, the initial sphere constraint is used as an approxi-

plified tabular enumeration (STE) algorithm.

mation for the unavailable cost.

The candidate list block maintains a list of full-length solu-

tions with lower costs during the search. As proposed in [18], a

C. Soft-Output Generation and Candidate List

4-layer binary heap with 15 entries is used in this work. In ad-

In the proposed MIMO detector IC, SOG generates the LLR dition to the low-power design techniques used in quad-DEAP,

values according to the max-log LLR criterion in (2) and these clock gating turns off unused SOG and CL units when the de-

values are essential for the high-performance soft-input ECC tector IC is configured in low-antenna and/or low-order QAM

decoders in the first iteration. Two register files, each with 48 modulation. The candidate list is quite useful in iterative detec-

registers, are used to store, for each bit, the costs of two hy- tion and decoding system where in later iterations, a reduced

potheses, 0 and 1. This configuration thus supports up to 8 8 search over the candidate list, rather than over the whole solu-

64-QAM MIMO detection. Each register is initialized with a tion space, is sufficient.

maximum cost, which can be regarded as the initial sphere con-

straint of the MBF-FD algorithm. When a full-length solution is D. Summary

found, the SOG block updates those registers corresponding to To sum up, Fig. 11 depicts the block diagram of the proposed

relevant hypotheses depending on whether or not the cost of the MBF-FD MIMO detector. Note that in the node pool, a side in-

found full-length solution is smaller than the register contents. formation memory (SIM) works together with the DEAP circuit

Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on March 30,2010 at 02:20:45 EDT from IEEE Xplore. Restrictions apply.

LIAO et al.: A 74.8 mW SOFT-OUTPUT DETECTOR IC FOR 8 8 SPATIAL-MULTIPLEXING MIMO COMMUNICATIONS 419

TABLE I

SUMMARY OF CIRCUIT TECHNIQUES

to provide detail information about the nodes for future visit. For

node processing, we see the signals flow through IAIC, CNP, Fig. 12. Chip microphotograph.

SNP and PCC. The outputs of the two SNP blocks are fed into

the node pool. Finally, SOG and CL receive the full-length solu-

tions and generate the soft-output LLR values and maintain the

list of candidate solutions.

Significant saving in power, delay, and circuit complexity has

been attained through several circuit techniques adopted in de-

signing the proposed MIMO detector IC. Table I summarizes all

the techniques used and their improvements in power reduction,

clock speed-up, and circuit/storage complexity.

V. EXPERIMENTAL RESULTS

The proposed IC is fabricated in a 0.13-micron CMOS tech-

nology. To validate the feasibility of the proposed IC for high-

speed MIMO receivers, two copies of the circuit in Fig. 11, Fig. 13. Maximum clock rate of the proposed IC versus supply voltage.

processing element (PE), are integrated in this IC. Each PE

can execute independently MBF-FD MIMO detection for a re-

ceived -element signal vector, . The core area of the IC is

mm . Fig. 12 depicts the chip microphotograph.

The maximum operating clock rates of the chip under different

supply voltages are plotted in Fig. 13. In the nominal 1.3 V

supply voltage, the chip can operate up to 198 MHz, about 1%

less than the post-simulation result. Fig. 14 depicts power con-

sumption of the IC when it is configured in four different modes

and operating at the maximum frequencies under several supply

voltages. As expected, more power is consumed when the de-

tector IC operates with more antennas and/or higher-order QAM

constellations.

The throughput of the proposed IC is formulated as Fig. 14. Power consumption versus supply voltage of the proposed detector IC

in four operation modes.

(12)

configurations by constraining to 8 and 16, respectively.

where is the clock rate, is the average number of vis- Specifically, these configuration can reach 10 coded BER at

ited nodes, is the number of PEs, and is the average SNR of 24.2 dB and 22.6 dB. However, when the channel condi-

number of clock cycles to visit a node, which is 2.53 in the pro- tions are poor, the MIMO detector may need to visit more nodes

posed IC. Operating in the maximum frequency and under good and require longer run time to obtain more precise soft-output

channel conditions, the proposed IC achieves 431.8 Mbps and values for acceptable BER. As such, the achievable throughput

421.8 Mbps throughput in 4 4 64-QAM and 8 8 64-QAM can become lower.

Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on March 30,2010 at 02:20:45 EDT from IEEE Xplore. Restrictions apply.

420 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 45, NO. 2, FEBRUARY 2010

TABLE II

COMPARISON OF SPHERE DECODING MIMO DETECTOR IMPLEMENTATIONS

Table II lists the overall performance of the proposed IC and is willing to sacrifice BER performance, the proposed IC can

several reported sphere decoding MIMO detector ICs. The pro- achieve even higher throughput by setting a smaller maximum

posed IC is the only one that supports a maximum of eight an- number of visited nodes. Finally, the proposed IC has the best

tennas and 64-QAM modulation. The detector IC in [8] sup- measured power performance. Note that the power consump-

ports 8 8 MIMO systems, but only for QPSK modulation. On tion is normalized considering supply voltage and adopted tech-

the contrary, the proposed IC, capable of providing 21 configu- nology using

rations from 2 2 to 8 8 and from QPSK to 64-QAM, is the

most configurable among all reported ICs. Only one other im-

plementation provides some degree of configurability, but with

(13)

less flexibility in antenna number [9]. Moreover, the proposed

chip is one of the very few chips that provide both soft LLR

and candidate list output, which are indispensable for MIMO

detectors in advanced iterative detection and decoding systems. VI. CONCLUSIONS

Therefore, although its throughput is not the highest, satisfac- This paper presents the design of a novel configurable

tory BER performance is guaranteed. In other words, if one soft-output MIMO detector IC. From the algorithmic aspect,

Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on March 30,2010 at 02:20:45 EDT from IEEE Xplore. Restrictions apply.

LIAO et al.: A 74.8 mW SOFT-OUTPUT DETECTOR IC FOR 8 8 SPATIAL-MULTIPLEXING MIMO COMMUNICATIONS 421

a new and efficient sphere decoding algorithm, the MBF-FD [14] A. Murugan, H. Gamal, M. Damen, and G. Caire, “A unified frame-

algorithm, is shown to be very effective in soft-output MIMO work for tree search decoding: Rediscovering the sequential decoder,”

IEEE Trans. Information Theory, vol. 52, no. 3, pp. 933–953, 2006.

detection, especially when the antenna number or the modu- [15] T.-P. Wang, T.-H. Lee, and T.-D. Chiueh, “Low-complexity

lation order is high. In terms of VLSI implementation, new soft-output MIMO detection for iterative decoding using modi-

hardware architectures are proposed for a better hardware fied best-first tree search,” IEEE Trans. Wireless Commun., submitted

for publication.

design. These include the pipelined quad-DEAP and tabular

[16] D. E. Knuth, The Art of Computer Programming, 3rd ed. Reading,

enumeration. Several circuit techniques are adopted in the MA: Addison Wesley, 1997, vol. 1, Fundamental Algorithms.

design of function blocks to further improve the performance [17] A. Carlsson, “The DEAP: A double-ended heap to implement double-

of the IC. Measurement results show that the proposed IC ended priority queues,” Information Process. Lett., vol. 26, pp. 33–36,

1987.

outperforms all the other implementations in terms of normal- [18] P. Salmela, J. Antikainen, O. Silven, and J. Takala, “Memory-based list

ized power. Moreover, the proposed IC is the first soft-output updating for list sphere decoders,” in Proc. IEEE Workshop on Signal

sphere decoding MIMO detector IC that can support up to Processing Systems (SiPS), 2007, pp. 633–638.

[19] C. Hess, M. Wenk, A. Burg, P. Luethi, C. Studer, N. Felber, and W.

8 8 64-QAM MIMO systems. When the chip is configured in Fichtner, “Reduced-complexity MIMO detector with close-to ML

4 4 64-QAM and 8 8 64-QAM and constraining to 8 error rate performance,” in Proc. 17th ACM Great Lakes Symp. VLSI

and 16, its throughput can reach 431.8 Mbps and 428.8 Mbps, (GLSVLSI), 2007, pp. 200–203.

respectively. With such performance, the proposed IC is very [20] M. Shabany and P. Gulak, “Scalable VLSI architecture for K-best lat-

tice decoders,” in Proc. ISCAS, 2008, pp. 940–943.

competitive among all soft-output MIMO detectors.

The authors greatly appreciate the Chip Implementation in 1983. He received the B.S. degree in electrical

engineering and the M.S. degree from the Graduate

Center (CIC) of Taiwan for the fabrication and measurement of Institute of Electronics Engineering at National

the proposed chip. They also thank the anonymous reviewers Taiwan University, Taipei, Taiwan, in 2006 and

for the valuable suggestions that greatly improved this paper. 2009, respectively.

From 2008 to 2009, he also worked a research

assistant in the Institute for Integrated Signal

REFERENCES Processing Systems at RWTH Aachen University,

Aachen, Germany. His research interests include

[1] G. J. Foschini, “Layered space-time architecture for wireless commu- baseband signal processing of communication

nication in a fading environment when using multi-element antennas,” systems, VLSI design, LDPC codes, wireless channel enumerator, and sphere

Bell Labs. Tech. J., vol. 1, no. 2, pp. 41–59, 1996. decoding algorithms.

[2] A. Paulraj, R. Nabar, and D. Gore, Introduction to Space-Time Wireless

Communications. Cambridge, U.K.: Cambridge Univ. Press, 2003.

[3] 3GPP Technical Report 36.913: Requirements for Further Advance-

ments for Evolved Universal Terrestrial Radio Access (E-UTRA) To-Ping Wang was born in Taipei, Taiwan, in 1983.

(LTE-Advanced) [Online]. Available: http://www.3gpp.org/FTP/ He received the B.S. degree in electrical engineering

Specs/html-info/36913.htm. and the M.S. degree from the Graduate Institute of

[4] NTT DoCoMo Press Release [Online]. Available: http://www.nttdo- Electronics Engineering at National Taiwan Univer-

como.com/pr/2007/001319.html. sity, Taipei, Taiwan, in 2005 and 2007, respectively.

[5] B. Hochwald and S. t. Brink, “Achieving near-capacity on a multiple- His research interests include baseband signal pro-

antenna channel,” IEEE Trans. Communications, vol. 51, no. 3, pp. cessing of communication systems, MIMO channel

389–399, 2003. enumerator and sphere decoding algorithms.

[6] K.-W. Wong, C.-Y. Tsui, R.-K. Cheng, and W.-H. Mow, “A VLSI ar-

chitecture of a K-best lattice decoding algorithm for MIMO channels,”

in Proc. ISCAS, 2002, vol. 3, pp. 273–276.

[7] Z. Guo and P. Nilsson, “Algorithm and implementation of the K-best

sphere decoding for MIMO detection,” IEEE J. Sel. Areas Commun.,

vol. 24, no. 3, pp. 491–503, 2006.

[8] G. Knagge, M. Bickerstaff, B. Ninness, S. R. Weller, and G. Woodward, Tzi-Dar Chiueh (S’87–M’90–SM’03) received

2

“A VLSI 8 8 MIMO near-ML decoder engine,” in Proc. IEEE 2006

the B.S. and Ph.D. in electrical engineering from

National Taiwan University and California Institute

Workshop on Signal Processing Systems (SiPS), Oct. 2006.

of Technology in 1983 and 1989, respectively.

[9] R. Shariat-Yazdi and T. Kwasniewski, “Configurable K-best MIMO

He is now a Professor in the Department of

detector architecture,” in Proc. 3rd ISCCSP, 2008, pp. 1565–1569. Electrical Engineering and Graduate Institute of

[10] M. Wenk, M. Zellweger, A. Burg, N. Felber, and W. Fichtner, “K-best Electronics Engineering at National Taiwan Uni-

MIMO detection VLSI architectures achieving up to 424 Mbps,” in versity. His research interests include algorithm,

Proc. ISCAS, 2006, pp. 1151–1154. architecture, and integrated circuits for baseband

[11] S. Chen, T. Zhang, and Y. Xin, “Relaxed K-best MIMO signal detector communication systems.

design and VLSI implementation,” IEEE Trans. Very Large Scale In- Dr. Chiueh has received the Acer Longtern Award

tegr. (VLSI) Syst., vol. 15, no. 3, pp. 328–337, Mar. 2007. 11 times and the Golden Silicon Award in 2002, 2005, 2007, and 2009. His

[12] A. Burg, M. Borgmann, M. Wenk, M. Zellweger, W. Fichtner, and teaching efforts were recognized five times by the Teaching Excellence Award

H. Bolcskei, “VLSI implementation of MIMO detection using the from NTU. Prof. Chiueh was the recipient of the Outstanding Research Award

sphere decoding algorithm,” IEEE J. Solid-State Circuits, vol. 40, pp. from National Science Council, Taiwan in 2004–2007. In 2005, he received the

1566–1577, 2005. Outstanding Electrical Engineering Professor from the Chinese Institute of Elec-

[13] C. Studer, A. Burg, and H. Bolcskei, “Soft-output sphere decoding: trical Engineers (Taiwan), and was awarded the Himax Chair Professorship at

Algorithms and VLSI implementation,” IEEE J. Sel. Areas Commun., NTU in 2006. In 2009, he received the Outstanding Industry Contribution Award

vol. 26, no. 2, pp. 290–300, 2008. from the Ministry of Economic Affairs, Taiwan.

Authorized licensed use limited to: BANGALORE INSTITUTE OF TECHNOLOGY. Downloaded on March 30,2010 at 02:20:45 EDT from IEEE Xplore. Restrictions apply.

- SAP ME How-To-Guide - NonconformanceUploaded byOlya Solar
- 10 ProgramsUploaded byAakash Kansal
- lec34oldUploaded byPaVan Nelakuditi
- Efficient Allocation of CQI Channels in Broadband.pdfUploaded byPutra
- Mining Search Engine Query Logs via Suggestion SamplingUploaded bydrsharghi
- G0603063438.pdfUploaded byAnonymous 7VPPkWS8O
- Network MIMOUploaded byjeserq1
- Analysis of BER for Multiple Antenna Technology Using STBCUploaded byNaveenKumar
- qesjhdjsahdUploaded bysubuhpramono
- MIMO Channel ModellingUploaded bydrphrao
- DSS SyllabusUploaded bytheresa.painter
- Transmit Signal Design for Optimal Estimation of Correlated MIMO ChannelsUploaded byliuocean613
- Thies Dna06Uploaded byarnitha_akhila
- PhD SeminarUploaded bymgheryani
- race_solUploaded byismael
- p60 0x09 Big Loop Integer Protection by Oded HorovitzUploaded byabuadzkasalafy
- Separating Input Language and Formatter in GNU LillpondUploaded byStephen Testerov
- DSTRUC Q AUploaded byJaph Dee
- Course Notes on Data Structures and Algorithm by Clifford a. ShaffordUploaded byAbdul Wahid Khan
- Formal Cut Proposal WiMAXUploaded byVamsi Krishna Kaza
- BacktrackingUploaded byMurugananthan Ramadoss
- AlgoModule-4-HuffmanCodesUploaded byGayathri Ravie
- Trees(BST)Uploaded bylogan159
- Sabrina Gerth and Peter beim Graben- Unifying syntactic theory and sentence processing difficulty through a connectionist minimalist parserUploaded byAsvcxv
- Memory Leaks JavaUploaded bynanni453
- Us 8954365Uploaded byPuoya
- WiMAX Basic Sl v2 Sergio CruzesUploaded bySergio Cruzes
- Sequans_Uplink_MIMO_WPUploaded byLara Indriwina Ayu
- 25Uploaded byTony Waribo
- Binary TreesUploaded byFurqan Shaikh

- 30 cows Project Report.pdfUploaded bySatRam
- 30 cows Project Report.pdfUploaded bySatRam
- 12Uploaded byveeresh_bit
- 05401188Uploaded byVijayalayachozhan Ayyapan
- 9Uploaded byveeresh_bit
- 10Uploaded byveeresh_bit
- 2Uploaded byveeresh_bit
- 7Uploaded byveeresh_bit
- 3Uploaded byveeresh_bit
- 8Uploaded byveeresh_bit
- 4Uploaded byveeresh_bit

- IEC Risk Assessment CalculatorUploaded byFredy J GC
- QGBP-KMUploaded bychinki_176
- Von LaysUploaded byDaniel Lévano Alzamora
- Tutorial 2 SolutionUploaded byhafiz azman
- RT - 12510Uploaded byAnGel Amaya
- Vax-Trac Online SystemUploaded byInternational Journal of Innovative Science and Research Technology
- Oil IndiaUploaded byJayadevDamodaran
- Ch01 LecturesUploaded bysabweed
- MYKSavemixSP200Uploaded byperumalbalji
- Salinity Gradient Energy Current State and New Trends 2015 EngineeringUploaded bySambit Kumar Ghosh
- CH306 Transport PhenomenaUploaded byMallikarjunachari Gangapuram
- Magnesium Sulfate Plant CostUploaded byIntratec Solutions
- FC-200 FC-300 Endpoint Link Pro Program GuideUploaded byPaul Berg
- Short Note HRDUploaded byrafi hoque
- RX PresentationUploaded bywrite2arshad_m
- Standard Penetration Test - WikipediaUploaded bygenshaox
- slab-gate-2016-3Uploaded byramyzotty
- phrasesUploaded byAbhijeet Bhagavatula
- Catalogue 2014 - 2015.pdfUploaded bylucian_deac3290
- adding fractions lesson planUploaded byapi-242060776
- Load BalancingUploaded byAnkur Sen
- VeroWhitePlus_FC835_KARAKTERISTIKEUploaded byMorana Krulic
- UT Dallas Syllabus for psci4396.003.08f taught by Rahma Abdulkadir (rxa056100)Uploaded byUT Dallas Provost's Technology Group
- letter for my wife 6-14-17 compressedUploaded byapi-322165486
- New Text DocumentUploaded byDayanand Thammaiah
- The Impact of Social Support and Family Resilience on Parental StUploaded byKresna Konsultan
- The Merciad, Oct. 6, 1967Uploaded byTheMerciad
- Trainning and DevelopmentUploaded bymhuzaimi77
- ResumeUploaded bypowens3
- Method Statement for ExcavationUploaded byNikhil Rathi