Professional Documents
Culture Documents
Abstract—This paper proposes an hardware architecture im- to the MMSE criterion, called MMSE-BLAST was presented
plementing Improved MMSE-SQRD (ISQRD) detection algo- in [6] and a version with lower complexity was introduced
rithm that is one of the low-complexity suboptimal detectors pro- in [7]. This optimal order successive detection requires mul-
posed in [1] for a High-rate Spatial Modulation (HR-SM) scheme
introduced in [2]. This architecture will design to maximize tiple calculations of pseudo-inverses, being a computational
throughput and to gain high frequency. FPGA implementation expensive task. A reduced complexity detection algorithm
is also deployed on Xilinxs Vertex Series, which can give good utilizing a sorted QR decomposition of the channel matrix was
performance. Ultimately, the simulation results are introduced proposed in [8], called MMSE-SQRD. Nonetheless, it offered
and compared to some related works. the lower bit error rate (BER) compare to MMSE-BLAST
Index Terms—MIMO system, High-rate Spatial Modulation,
FPGA implementation, ISQRD, Improved MMSE-SQRD. because it do not perform exactly optimal order detection.
Another tendency research iterative tree-search algorithms like
the sphere decoder or the K-Best detector, which reduce the
I. I NTRODUCTION
computational complexity and increase the BER performance
Multiple-Input Multiple-Output wireless communication near to ML detection.
system that has multiple transmit and receive antennas has
MIMO wireless communication systems can be classified
been researched for many year. It was theoretically shown
into three main categories: Space-time coding to achieve di-
to be higher spectral efficiency than the conventional single
versity gain, MIMO precoding with channel state information
antenna systems in [3], [4]. In order to detect the transmitted
(CSI) available at the transmitter to achieve capacity gain, and
signal, many detection techniques has been developed. ML
Layered space time code to exploit multiplexing gain. The
detector that gives the optimal performance is high of compu-
HR-SM is one of the spatial modulation scheme proposed
tational complexity. Many suboptimal MIMO detection tech-
in [2] provides a substantial increase in spectrum efficiency
niques are studied to decrease the complexity of detector. First,
compared to the conventional spatial modulation (SM) scheme.
the linear detectors such as zero-forcing (ZF) and minimum
In [1], the author proposed some low-complexity suboptimal
mean square error (MMSE) have lower complexity compare
order detectors, which is the modified versions of MMSE-
to ML detector. While ZF detector completely remove inter-
SQRD and MMSE-BLAST detectors applying for the HR-SM
symbol interference but it works poorly unless the channel
scheme, called MSQRD, MBLAST and ISQRD, respectively.
matrix is well conditioned because of correlation between the
noises, MMSE detector represents a trade-off between noise In this paper, being interested in ISQRD detector that
amplification and interference suppression so it give some actually make a full search with one layer signal and deal
improvement of performance. However, both ZF and MMSE with remain layer signal by MMSE-SQRD algorithm, ISQRD
detectors have to find pseudo-inverse of a matrix so they is shown to be remarkably higher performance than the
are still high of computational complexity even when some MBLAST (compared in figure 5) and has lower complexity
methods proposed to avoid finding directly the pseudo-inverse at 4-QAM modulation and accepted complexities at 16-QAM,
matrix such as QR Decomposition method. The problem is we propose a hardware architecture of the ISQRD detector
that they still make a full search for each signal received at for QAM4 modulation and 4x4 antenna of HR-SM system.
each antenna. In order to solve this problem, various succes- In this architecture, the detector is fully pipelined so it can
sive interference cancellation (SIC) detectors were devised to detect one vector y per clock cycle. The SQRD block use
decrease the complexity by detecting signal successively after Sorted Modified-Gram-Schmidt QR decomposition and has
the previous one pretended to known. However, The error are throughput of one output every 8 clock cycles, specially, it
increase because of the propagation error. In [5], ZF-BLAST can handle the channel matrices coming in any distance of
detector was developed for signal detection in the layered the clock cycle as long as it greater than 8 clock cycles. This
space-time system. This detector limit the propagation error novel feature make it be flexible to many wireless interfaces.
by detecting signal in optimal order from strongest signals FPGA implementation was deployed on Xilinx virtex 5, the
to the weaker one. An adaption of the original ZF-BLAST synthesis result costs 26128 slice registers, 17197 slice LUTs
TABLE I
ISQRD D ETECTION A LGORITHM
Input: Y, H̄
Output: x, s̄
Ht
1. Decompose Dt = √1 In −1
using MMSE-SQRD
Es T
algorithm to get Q, R, and the permutation vector P.
2. Detection and Cancellation:
for m = 1 : M
Fig. 1. System model
tm
compute tm = Y − H1 × xm and v = QH
0
for k = nT − 1 : −1 : 1
and 260 DSPs. The maximum frequency is 364.7 MHz, thus if (k == nT − 1)
the throughput of the design can reach to 2.9 Gbps. This cm,k = minc∈Ωx ||vk − c × rk,k ||2
synthesis results will be analyzed and compared to some else
for n = k + 1 : nT − 1
related works.
vk = vk − rl,k × cm,l
The remainder of the paper is organized as follows. In end
end
Section II, the system model and notation is introduced. In
Compute dm = ||tm − Ht × c̄m,p ||
Section III, the ISQRD algorithm was recall. The hardware end
implementation was investigated in section IV. The report 3. Find m̂ : m̂ = arg minm dm
results was shown in section V and the final conclusion is 4. Obtain the recovered modulated symbol x = xm̂ and the
the section VI. recovered SC codeword s̄ = x1 [c̄]
TABLE II
MMSE-SQRD BASED ON MGS ALGORITHM
Fig. 2. ISQRD detector
Input: D
Output: Q, R, P
A. SQRD block
h 0
i 0
1. R = 0, Q = D, P = 1 : nT (nT = nT − 1 for this system)
0
2. For n=1 to nT Each term in D matrix is a complex number including real
norm(k) = ||Qk ||2 part and image part. Each part was represented by a 12 bits
3. end fixed point number with 4 bits integer and 8 bits fraction.
0
4. for i=1 to nT Specially, some blocks which have accumulate operations
a. k = arg mink norm(k) such as the block computing norm value have more accurate
b. Swap p column k and i in Q, P, norm, R. representation of number to reduce truncation error causing
c. rii = norm(i) by fixed point type. For FPGA implementation, the divider
d. qi = rqiii
0 and square root blocks use available library of Xilinx, which
e. for k = i + 1 : nT
H give good speed and hardware usage.The nd signal (new
rik = qi × qk
qk = qk − rik × qi data) informs a new data was fetched. The latency is 143
norm(k) = norm(k) − rik 2 cycle clock and throughput can reach to 8 cycle clock for
f. end each H matrix. Specially, it can handle the channel matrices
5. end coming in any distance of the clock cycle as long as it greater
than 8. For example, H1 matrix comes at the beginning, H2
matrix comes after H1 8 cycle clock, then H3 matrix come
after H2 9 cycle clock... This feature make it be flexible to
many wireless interfaces. In order to perform this feature, the
IV. FPGA I MPLEMENTATION hardware architecture is organized like a waterfall, the signal
will flow inside it. This also costs the hardware into many
sub-state machines to synchronize the signal, but it absolutely
In this section we consider the architecture and implemen- worthwhile. The block diagram of stage one that perform the
tation of the ISQRD detector for 4-QAM modulation and 4×4 first loop (i = 1) of the algorithm in table II is shown in figure
antenna of HR-SM system. Figure 2 shows the block diagram 3. In this diagram, the main computing blocks are painted by
for the ISQRD detector. For this detector, we assume that the purple. The result of one computing block is stored by stored
input signals were normalized and the Es value in D matrix is blocks. One stored block will be activated or reactivated by nd
a constant and was known. The SQRD block performs three (new data) signal and it will sample the stored signal circularly
functions. Firstly, it decomposes the D matrix to Q and R after 8 cycle clock.
matrix and swapped vector P by MMSE-SQRD algorithm. This block receives its input in 4 cycle clock with the high
Secondly, It buffer and apply the swapped vector P to the level of nd (new data) signaling a new data was fetched. The
channel matrix H, outputting the first column, H1, and the delay is 143 cycle clock and throughput can reach to 8 cycle
Ht matrix. Thirdly, it pushes its output at the time when the clock for each H matrix. Specially, it can give any throughput
detector needs, this means its output are not outputted at the of H that is greater than 8 cycle clock, this feature make it be
same time. The detector block performs the main part of the flexibility to many wireless interfaces.
ISQRD algorithm. The Y buffer block’s function is to delay
the y signal in a FIFO buffer to synchronize it with the other B. Detector block
signals which are the input of the detector block. Two follow The detector block performs steps after decomposition D
parts will detail the SQRD and the detector block. matrix. It consists of 4 calculating blocks in parallel, each
Fig. 3. Stage 1 of SQRD diagram
TABLE III
I MPLEMENTATION OF SOME DETECTORS FOR 4 X 4 MIMO SYSTEM
above, has just do its work in one cycle clock. It is because works use different algorithms for the detector so the BER
this design is for 4-QAM so the decision is very simple. performance are not equal.
it just take the combination of the signed bit of real part
and image part of the previous term in the expression to In [11], the author presents a VLSI architecture of QR de-
decide c. This principle can apply for M-QAM but with more composition using Givens Rotation algorithm for 4x4 MIMO-
complexity. This block use fixed point number with 6 bits OFDM systems. This results was used to compute detection
integer (including one signed bit) and 8 bits fraction except for throughput of a conventional MIMO system, which has spec-
its input and Euclidean distance representation. This distance tral efficiency of nT log2 M bits/s/Hz. M is the order of
is represented by a fixed point number with 8 bits integer and modulation and was assumed to be 64-QAM. The latency
8 bits fraction. The delay of this block is 22 cycle clock and is not shown because it is not a complete detector. In [12]
the throughput is one output each a cycle clock. introduced an improvement of K-Best detector which has
been a research object recently give good BER performance.
V. E XPERIMETAL R ESULTS However, it seem to be appropriate for the system which has
small number of transmit antennas and the constellation size.
This architecture was synthesized on Xilinx Virtex5 In [13] proposed a low-complexity hard-output MMSE-SIC
XC5VSX240T. The synthesis results is shown on table III in detector for the general spatially multiplexed MIMO system.
comparison to some related works. Notice that this comparison The highlight of detection is low cost of hardware and short
is not fair because our design is for HR-SM system, which has of latency. From this comparison, we can see the variety of
spectral efficiency of (2(nT − 1) + log2 M ) bits/s/Hz while detections for the MIMO systems. The highlight of our work is
the others are for other 4x4 MIMO system. Furthermore, the high frequency therefore it gives a high detecting throughput.
To assess the BER performance of hardware implemen- [10] P. Luethi, A. Burg, S. Haene, D. Perels, N. Felber, and W. Fichtner, “Vlsi
tation, the fixed point model was built on matlab and the implementation of a high-speed iterative sorted mmse qr decomposition,”
May 2007.
simulation result, called fixed-point-ISQRD is shown on figure [11] Z.-Y. Huang and P.-Y. Tsai, “Efficient implementation of qr ecomposi-
5 in comparison to some floating point detectors. from this tion for gigabit mimo-ofdm systems,” october 2011.
figure, we can see that the of error caused by fixed point [12] S. P. Nils Heidmann, Till Wiegand, “Architecture and fpga-
implementation of a high throughput k+-best detector,” march 2011.
number has significant influences on the region of high SNR. [13] J.-Y. J. Tsung-Hsien Liu and Y.-S. Chu, “A low-cost mmse-sic detector
for the mimo system: Algorithm and hardware implementation,” january
2011.
VI. C ONCLUSION
In this paper we proposes a hardware architecture imple-
menting ISQRD detection algorithm for a High-rate Spatial
Modulation (HR-SM) scheme. This architecture are imple-
mented on Xilinx Virtex 5. The synthesis results show that
it costs 26128 slice registers, 17197 slice LUTs and 260
DSPs (used for multipliers). The delay of the whole block
is 165 cycle clock. The maximum frequency is 364.7 Mhz.
With that Fmax, the throughput of this design can reach
to 2.9 Gbps. This results was compared to some related
works. The BER performence of fixed point and floating point
ISQRD detections is shown in comparison to ML detection
and Modified MMSE-SQRD (MSQRD) detection.
R EFERENCES
[1] D. Nguyen, X. N. Tran, M. T. Do, V. D. Ngo, and M. T. Le, “Low-
complexity detectors for high-rate spatial modulation.”
[2] T. P. Nguyen, M. T. Le, V. D. Ngo, X. N. Tran, and H.-W. Choi, “Spatial
modulation for high-rate transmission systems,” May 2014.
[3] G. J. Foschini and M. J. Gans, “On limits of wireless communications
in a fading environment when using multiple antennas,” 1998/03/01.
[4] E. Telatar, “Capacity of multi-antenna gaussian channels,” 1999.
[5] P. W. Wolniansky, G. J. Foschini, G. D. Golden, and R. Valenzuela,
“V-blast: an architecture for realizing very high data rates over the rich-
scattering wireless channel,” 1998.
[6] A. Benjebbour, H. Murata, and S. Yoshida, “Comparison of ordered
successive receivers for space-time transmission,” Fall 2001.
[7] B. Hassibi, “An efficient square-root algorithm for blast,” June 2000.
[8] D. Wubben, R. Bohnke, V. Kuhn, and K.-D. Kammeyer, “Mmse exten-
sion of v-blast based on sorted qr decomposition,” Otc 2003.
[9] P. Luethi, C. Studer, S. Duetsch, E. Zgraggen, H. Kaeslin, N. Felber,
and W. Fichtner, “Gram-schmidt-based qr decomposition for mimo
detection: Vlsi implementation and comparison,” 2008.