You are on page 1of 15

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2019.2929404, IEEE
Journal on Selected Areas in Communications

IEEE 1

Adaptive Spatial Modulation MIMO Based on


Machine Learning
Ping Yang, Senior Member, IEEE, Yue Xiao, Member, IEEE, Ming Xiao, Senior Member, IEEE,
Yong Liang Guan, Senior Member, IEEE, Shaoqian Li, Fellow, IEEE, Wei Xiang, Senior Member, IEEE,

Abstract—In this paper, we propose a novel frame- in the spatial domain, which is conducive to increased
work of low-cost link adaptation for spatial modulation robustness and simplified detection [3]. In addition, SM
multiple-input multiple-output (SM-MIMO) systems is capable of flexibly supporting diverse services and in-
based upon the machine learning paradigm. Specifi-
cally, we first convert the problems of transmit an- tegrity requirements. Recently, SM and its variants have
tenna selection (TAS) and power allocation (PA) in been adopted in a variety of systems, such as vehicular
SM-MIMO to ones based upon data-driven prediction networks, Internet of Things (IoT), underwater acoustic
rather than conventional optimization-driven decisions. (UWA) communications, and machine-to-machine (M2M)
Then, supervised-learning classifiers (SLC), such as communications [4].
the K-nearest neighbors (KNN) and support vector
machine (SVM) algorithms, are developed to obtain
their statistically-consistent solutions. Moreover, for In spite of the above benefits, the applications of SM
further comparison we integrate deep neural networks to future wireless networks still face many challenges,
(DNN) with these adaptive SM-MIMO schemes, and especially in automated and intelligent application sce-
propose a novel DNN-based multi-label classifier for narios. For instance, as shown in [5], the flexibility and
TAS and PA parameter evaluation. Furthermore, we
investigate the design of feature vectors for the SLC and versatility of SM facilitate its application in vehicular com-
DNN approaches and propose a novel feature vector munications, but the conventional open-loop SM multiple-
generator to match the specific transmission mode of input multiple-output (MIMO) schemes are able to achieve
SM. As a further advance, our proposed approaches only receiver diversity gains, which makes them sensitive
are extended to other adaptive index modulation (IM) to channel variations. To circumvent this constraint, as
schemes, e.g., adaptive modulation (AM) aided orthog-
onal frequency division multiplexing with IM (OFDM- shown in [6]–[10], novel SM signal optimization approaches
IM). Our simulation results show that the SLC and have been proposed recently with the objective of simul-
DNN-based adaptive SM-MIMO systems outperform taneously achieving both transmit and receive diversity
many conventional optimization-driven designs and are gains. Specifically, in [6]–[10] various link-adaptive tech-
capable of achieving a near-optimal performance with niques, such as power allocation (PA), transmit antenna
a significantly lower complexity.
selection (TAS), adaptive modulation (AM), phase ro-
Index Terms—Index modulation, SM-MIMO, ma- tation precoding and their combinations, were proposed
chine learning, neural network, link adaptation.
for single-stream SM and its high-rate generalization. In
general, these link-adaptive designs are formulated as
I. Introduction optimization-driven decision problems, which are usual-
PATIAL modulation (SM) and its variants are com- ly solved by high-complexity brute-force algorithms. Al-
S petitive candidates for the next-generation wireless
networks due to their appealing benefits such as low
though this complexity load can be reduced by some
specific techniques [11], the related reduction has limited
complexity and low-cost transceivers [1], [2]. A key feature applicability for practical communications systems.
of SM is that it conveys information bits by using the index
of the transmit antenna (TA) (or the index of the specific In order to exploit the degrees of freedom of multi-
transmit antenna set). In general, the SM signal is sparse domain resources more efficiently, SM needs to be im-
proved by considering new technologies from other fields.
Manuscript received November 28, 2018; revised May 07, 2019. P. In recent years, breakthroughs have been made in the fields
Yang, Y. Xiao and S. Li are with the National Key Laboratory of
Science and Technology on Communications, University of Electronic of artificial intelligence and machine learning, such as deep
Science and Technology of China 611731, Sichuan, China. (e-mail: learning neural networks (DNNs) and pattern recognition
yang.ping, xiaoyue@uestc.edu.cn, lsq@uestc.edu.cn). (PR) [12], [13]. In machine learning, an optimal solution,
M. Xiao is with the information science and engineering (ISE)
department, School of Electrical Engineering, KTH, Sweden. (e-mail: e.g., the optimal transmit mode, can be obtained by
mingx@kth.se). means of classification or neural network learning in lieu
Y. L. Guan is with the School of Electrical and Electronic Engi- of complex theoretical analyses, by training the classifiers
neering, Nanyang Technological University, Singapore (e-mail: eyl-
guan@ntu.edu.sg). and DNNs using off-line data sets. These machine learning-
W. Xiang is with the College of Science and Engineering, based methods are the excellent candidates to improve the
James Cook University, Cairns, QLD 4878, Australia (e-mail: design and optimization of complex and dynamic wireless
wei.xiang@jcu.edu.au).
The financial support of the National Science Foundation of communication systems. Specifically, the key issues behind
China under Grant number 61876033 and 61671131, the Founda- synchronization, channel estimation, equalization, MIMO
tion Project of National Key Laboratory of Science and Technol- signal detection, iterative decoding, and multi-user detec-
ogy on Communications under Grant 9140C020108140C02005, and
the Fundamental Research Funds for the Central Universities (No. tion in wireless communication systems are similar to the
ZYGX2015KYQD003) are gratefully acknowledged. theoretical basis of machine learning [14], [15].

0733-8716 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2019.2929404, IEEE
Journal on Selected Areas in Communications

IEEE 2

A. Related Work and Motivation on conventional MIMO systems, e.g., spatial multiplexing
MIMO. For SM-MIMO systems, some preliminary intelli-
Motivated by the aforementioned appealing character- gent designs have also been proposed recently. Specifically,
istics of machine learning techniques, a flurry of research the authors of [35] applied a K-means clustering (KMC)
activities on intelligent wireless communication system for blind detection in space shift keying (SSK) systems.
designs have been sparked off [16]–[35]. For instance, in To mitigate the error floor effect in [35], we proposed an
[16], machine learning methods were employed for non- improved KMC detector in [36], which selects the initial
linearity mitigation and channel parameter estimation for centroids based on a novel rule. Moreover, in [36] we
optical communication systems. In [17], [18], neural net- proposed an affinity propagation detector based on belief
works were introduced for adaptive channel equalization propagation, to strike a trade-off between the imposed
in complicated and fast-varying channels, in order to computational complexity and the attainable BER. These
dynamically combat inter-symbol interference (ISI). As a data-driven methods have shown to be capable of outper-
further advance, a statistical boosting classifier [19] and forming conventional detectors. However, to the best of
a general DNN framework [20] were proposed for low- our knowledge, the potential benefits of machine learning
complexity MIMO signal detection. In [21], the authors techniques on the design of link adaptation schemes for
exploited extreme learning machine (ELM), which is a SM-MIMO systems (e.g., the aforementioned TAS, PA and
simple version of the DNN with a fast learning speed, to AM) have not been investigated in the literature.
reduce the minimum Bayes risk in physical-layer authen-
tication. Moreover, in [22] ELM was further developed for
blind MIMO signal classification in spatially correlated B. Main Contributions
channels. In [23], the inherent parallel structure of the In this work, a comprehensive study is conducted to
DNN was utilized for one-shot decoding of random and optimize the link adaptation process of SM-MIMO using
structured codes, such as polar codes. Moreover, machine machine learning techniques. The major contributions of
learning has also been integrated into key technologies of this paper are four-fold:
5th generation mobile networks (5G) and beyond, such as 1) We propose a novel framework that integrates ma-
non-orthogonal multiple access (NOMA), massive MIMO chine learning techniques with adaptive SM-MIMO
and advanced wireless energy transfer techniques [24]– systems, where the link adaptation problems are
[29]. For example, in [24] DNN was adopted for direction- converted to their data-driven prediction counter-
of-arrival (DOA) estimation and super-resolution channel parts. By considering the TAS and PA problems in
estimation in massive MIMO systems. As shown in [24], SM-MIMOs as examples, we first utilize supervised-
the DNN is capable of learning the statistics of the channel learning classifiers (SLCs), such as the K-nearest
model and capturing its sparsity features. In [28], an neighbors (KNN) and SVM algorithms, to obtain
online adaptive machine learning approach was proposed solutions with reduced complexities compared to
for striking an elegant trade-off between the bit error ratio conventional optimization-driven solvers.
(BER) and complexity in 5G-NOMA systems. In [29], a 2) We propose a novel deep neural network based multi-
deep auto-encoder was proposed for channel estimation label classifier for adaptive SM-MIMO parameter
in wireless energy transfer systems, where an autonomous evaluation, e.g., selecting the optimal transmit an-
channel learning scheme was utilized to tackle the phase tenna set and the PA matrix. Compared to SLCs, the
ambiguity issue. bionic-like DNN is more universal and has different
Aside from the aforementioned channel estimation, de- design principles. To the best of our knowledge, our
coding and detection functions, machine learning tech- work is the first to apply the DNN to link adaptation
niques can be also used for designing learning-based link in SM-MIMO systems. The performances of these
adaptation schemes, due to its suitability for classification two types of machine learning methods are compared
and decision making. In [30], a reinforcement learning in terms of the BER and complexity.
method based on support vector regression was proposed 3) We investigate the design of the feature vectors in
for online MIMO-orthogonal frequency-division multiplex- the SLC and DNN approaches, based on the modulus
ing (MIMO-OFDM) systems with the objective of mini- and correlation of the channel matrix coefficients.
mizing the memory cost. In [31], the authors presented a In accordance with our analysis, we propose a novel
novel low-complexity adaptive approach to select the opti- feature vector generator (FVG) to match the specific
mal modulation and coding scheme for multiuser MIMO- transmission mode of SM. Compared to the conven-
OFDM systems based on limited channel state information tional feature extraction method in [32] and [33], our
(CSI). In [32] and [33], machine learning-based antenna method can achieve a better performance.
selection algorithms were proposed for spatial multiplexing 4) We extend our approaches to other adaptive index
systems, where the modulus of the channel matrix ele- modulation (IM) schemes, e.g., the AM-aided or-
ments were employed as the feature vectors for the support thogonal frequency division multiplexing with IM
vector machine, K-nearest neighbors (KNN) and naive- (OFDM-IM). We provide BER and complexity anal-
Bayes aided classifiers. The authors of [34] proposed a yses for the proposed schemes with various configu-
low-cost energy-efficient hybrid precoding architecture for rations. Also, extensive simulation and comparative
mmWave massive MIMO systems, by applying the idea of results are presented to demonstrate the efficiency
cross-entropy optimization in machine learning. and robustness of the proposed methods.
The aforementioned contributions in machine learning- The organization of the paper is as follows. Section II
aided communication system designs have so far focused introduces the concept along with the system model of

0733-8716 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2019.2929404, IEEE
Journal on Selected Areas in Communications

IEEE 3

Joint bit vector Antenna index eq 1 1


selection


SM signal


Index bits SM
b optimization
Input bits split PSK/QAM bits x Detection
Index bits PSK/QAM bits (PA, TAS) Nt Nr
Digital
modulation slq y

Feedback Link
Adaptive Unit

Channel Machine
Information Learning

Fig. 1. System model of the adaptive SM-MIMO system based on machine learning techniques.

the TAS and PA-aided adaptive SM-MIMO systems. In resulting SM symbol x = sl eq ∈ S is given by
Section III, the SLC and DNN methods for low-cost intel-
ligent adaptive SM-MIMO designs are introduced, where x = sl eq = [0, · · · , sl , · · · , 0]T ∈ X ,
(3)
the proposed feature extraction method is also described. ↑ qth
In Section IV, the extension of the proposed machine where X is the Cartesian product of sets Ssignal and
learning methods to other adaptive IM systems is detailed. Sspatial .
Our simulation results and performance comparisons are At the receiver, the corresponding received signal vector
presented in Section V. Finally, Section VI concludes the is given by
paper. y = HPx + n, (4)
Notation: Boldface capital and lowercase letters rep-
resent matrices and column vectors, respectively. Super- where y is the Nr × 1 received signal vector and H ∈
scripts ()T , ()H and ()−1 indicate the transpose, Her- CNr ×N t is the MIMO channel matrix. Here, P ∈ CNt ×N t
mitian transpose and inverse of a matrix, respectively. represents the pre-processing matrix for link adaptation,
∥A∥ denotes the Frobenius norm of A. diag{ ·} refers e.g., the PA matrix in [11] and the transmit precoding
to the diagonal operation. |•| returns the cardinality of matrix in [9]. n ∈ CNr ×1 is the additive complex Gaus-
a given set, or the magnitude of a complex quantity. [A]i,i sian noise vector with mean zero, variance σ 2 and i.i.d.
denotes the i-th diagonal element of matrix A, and tr(•) elements. In this paper, the problems of TAS and PA
is taken to mean the trace operator (i.e., sum of the are used as examples to elaborate on the framework of
diagonal elements). R{x} and I{x} represent
( ) the real machine learning-aided adaptive SM-MIMO systems.
and imaginary parts of x, respectively. nk denotes the
binomial coefficient and IN represents the identity matrix
of order N . A. TAS-SM
Let the MIMO channel be H = [h1 , h2 , · · · , hNt ]
II. System Model of the TAS-aided and ∼ CN (0, INr ×Nt ), where h1 , h2 , · · · , hNt are the column
PA-aided Adaptive SM-MIMO Systems Based on vectors of H. In TAS-aided SM, based on the CSI the
Machine Learning receiver first selects the best L out of Nt antennas for
reverse link transmission. Then, the receiver sends this
Consider an adaptive SM-MIMO communication system
information to the transmitter via a feedback link. Let
equipped with Nt transmit and Nr receive antennas shown
Sk denote the kth possible transmit antenna subset, given
in Fig. 1. In SM, the input bit stream is first split into a
by
joint bit vector, which is then partitioned into two sub-
S1 = {1, 2, · · · , L},
vectors, namely the index bits and the amplitude and
S2 = {1, 2, · · · , L − 1, L + 1},
phase modulation (APM) bits (or PSK/QAM bits), which .. (5)
are used for selecting a unique transmit antenna index .
and mapped to a PSK/QAM symbol, respectively. More SU = {Nt − L + 1, · · · , Nt },
specifically, the total symbol rate of SM is b1 + b2 and the
first b1 = log2 (Nt ) bits (i.e., the index bits) are mapped which is the element of the TAS candidate set S =
to a spatial constellation point, which is an element of the {S1 , · · · , SU }. The total
( number
) of possible transmit an-
set tenna subsets is U = NLt . For the scenario where Sk is
selected and P = INt , the signal observed at the Nr receive
Sspatial = {e0 , e1 , ..., eq , ..., eNt −1 } , (1)
antennas can be expressed as
where eq ∈ RNt ×1 , q = 0, ..., Nt − 1, is the q-th column of
the identity matrix INt , which denotes the activation of y = Hk x + n, (6)
the q-th transmit antenna. Then, the last b2 = log2 (M ) where Hk is the (Nr ×L)-element TAS matrix correspond-
(i.e., the QAM/PSK bits) are mapped to the Gray-coded ing to the selected transmit antenna set Sk .
QAM/PSK symbol sl selected from the signal codebook
Ssignal = {s0 , s1 , ..., sl , ..., sM −1 } , (2) B. PA-SM
where sl ∈ C, l = 0, ..., MAPM − 1, is a power normalized Similar to the TAS technique, PA is another attractive
MAPM -level modulated symbol. As shown in Fig. 1, the link adaptation technique for SM-MIMO [9]–[11]. Based

0733-8716 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2019.2929404, IEEE
Journal on Selected Areas in Communications

IEEE 4

on the adaptive SM model in (4), the PA process can be transmitted PSK/QAM symbol ŝl as follows
modeled by the PA matrix P given by 2
H 2
{q̂, ŝl } = arg min h̄q y/ h̄q − sl ,
P = diag{p1 , · · · , pi , · · · , pNt }, (7) q∈{1,··· ,NSM },sl ∈Ssignal F
(12)
where pi is the ∑
PA parameter of the ith transmit antenna. where h̄q indicates the qth column of Ĥ. In (12), we have
Nt
Here, we have i=1 p2i = 1 so that the transmit power is NSM = Nt for PA-aided SM and NSM = L for TAS-aided
normalized. Similar to [38], we consider codebook-assisted SM.
PA in this paper, where a number of PA candidates
are carefully designed in accordance with the setups and
are available to both the transmitter and receiver before III. Proposed Machine Learning Methods for
transmission. Let P be the codebook for PA-aided SM, Intelligent Adaptive SM-MIMO
given by In order to efficiently identify the parameters of the TAS
P = {P1 , · · · , Pq , · · · PQ }, (8) and PA problems in (11), in this section, the connection
and conversion between the machine learning methods
where Q is the number of selection candidates. and these problems is firstly explained. Then, supervised-
learning classifiers, such as the KNN and SVM algorithms,
C. Problem Statement are adopted to solve our adaptive design problems. More-
As noted in [6]–[11], in order to improve the BER over, we propose a novel DNN based multi-label classifier
performance of SM, the free distance (FD) dmin is usually for adaptive SM parameter evaluation. Compared to the
optimized. For a given equivalent channel matrix H̄, its KNN and SVM, the DNN is well suited for nonlinear map-
FD can be formulated as ping and capable of capturing more implicit information.
2
dmin (H̄) = min H̄(xi − xj ) F
xi ,xj ∈X
xi ̸=xj
A. Problem Conversion
2 (9)
= min H̄eij F , Estimating the index of the transmit antenna set in
eij ∈E TAS-aided SM and the index of the PA matrix in
where eij = xi − xj , (i ̸= j) denotes the error vector codebook-based SM can be viewed as a clustering problem.
generated by the SM signal pair xi and xj . Based on the Particularly, as shown in (11), the total numbers of classes
models in (6) and (7), the equivalent channel matrices for in TAS and PA are U = |S| and Q = |P|, respectively.
TAS-aided SM and PA-aided SM systems are Denote by rm the one-hot representation of the prediction
{ vector for the mth transmission, given by
Hu , TAS-aided SM, {
H̄ = (10)
HPq , PA-aided SM. rm = [I(û = 1), I(û = 2), · · · , I(û = U )] for TAS,
rm = [I(q̂ = 1), I(q̂ = 2), · · · , I(q̂ = Q)] for PA,
Based on (9) and (10), the max-dmin based TAS-aided SM (13)
and PA-aided SM can be formulated as where I(·) denote the indicator function. Therefore, the


 û = arg max {dmin (Hu )}, element corresponding to the prediction vector that is
u∈{1,··· ,U } selected is 1, and all the other elements of rm are zero.
(11)

 q̂ = arg max {dmin (HPq )}. This means that only one of the indices is selected. The
q∈{1,··· ,Q}
objective of the SLC and DNN is to predict this vector
In these designs, the transmit mode (e.g., the antenna set based on the available training data set.
and the PA matrix) selection turns out to be a demanding Specifically, designing the SLC and DNN algorithms
process, and the global optimum is usually found by involves two phases, namely the training preprocessing
virtue of an exhaustive search over all possible equivalent and the learning system establishment. Since the TAS
channel matrices and all distinct error vectors, leading and PA problems given in (11) are similar optimization
to an excessive complexity and feedback load, especially problems (e.g., they have the same optimization criteria),
when the data rate is high. It is worth noting that this the specific procedures of the training preprocessing are
load can be reduced by some techniques proposed in [7], similar, which includes training data generation, feature
[11]. However, the load reduction is of limited applicability vector extraction, key performance indicator (KPI) design
for practical communications systems. and labeling.
In lieu of directly tackling the problems in (11) by using 1) Training Data Generation: Similar to [32] and [33],
optimization-driven decisions, e.g., the methods in [7] and the (Nr × Nt )-element channel matrices are used for the
[11], in the next section, we will convert them to PR-based training. For example, M channel matrices Hm , m =
and deep learning-based classification problems, where the 1, · · · , M are randomly generated to create a set of train-
optimal solution can be obtained by low-complexity data- ing data set as
driven prediction instead of tedious calculations.
H = {H1 , H2 , · · · , Hm , · · · , HM }. (14)
D. Receiver 2) Feature Vector Extraction: In general, the training
In the receiver, based on the optimized equivalent performance depends on the feature provided. In adaptive
channel matrix Ĥ, the single-stream based maximum- SM designs, the channel gains will dominate the system
likelihood (ML) detector [4] is employed for jointly de- performance. As a result, the moduli of the elements of
tecting both the active transmit antenna index q̂ and the the training channel matrices Hm , m = 1, · · · , M are often

0733-8716 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2019.2929404, IEEE
Journal on Selected Areas in Communications

IEEE 5

chosen as the feature vectors [32] [33]. This conventional antenna subset Su , u ∈ {1, ..., U }. Based on this mapping
FVG (CFVG) can be formulated as rule, we can calculate the class labels for all training
m m matrices Hm , m = 1, · · · , M based on the following steps:
m
fCFVG = [ hm
1,1 , · · · , h1,Nt , · · · , hi,j , · · · , m
hm , · · · , hm ], (15) • Step 1: For the mth training sample H , compute
Nr ,1 Nr ,Nt
the KPI dmin for each possible TAS candidate Su , u ∈
where hm
i,j is the (i, j)-th complex value of H
m
and its {1, ..., U }.
modulus is given by • Step 2: Find the TAS candidate Su∗ with the
m √ maximum dmin and obtain its corresponding label
hi,j = R{hm }2 + I{hm }2 .
i,j i,j (16) u∗ , u∗ ∈ {1, ..., U }.
• Step 3: Repeat Steps 1 and 2 until all training samples
As shown in (15) and (16), in conventional FVG, only the
find their labels, which are formulated as a class label
modulus values hm
i,j , i =, · · · , Nr , j =, · · · , Nt of channel vector l = [l1 , l2 , · · · , lM ]T .
elements are utilized. Indeed, the real and imaginary parts
of Hm contain more channel characteristics, since they are
two separate components. Motivated by this observation, B. Proposed SLC Methods
we utilize the modulus values of real and imaginary parts In the second phase, learning system is established based
of matrix elements to generate the feature vectors. This on the feature vectors yielded in (19) and their correspond-
novel FVG method, termed separate FVG (SFVG), can ing label vector. To be specific, in SLC, a supervised-
be expressed as learning classifier is developed, where the normalized

= [ R{h m m R{hm } , I{hm } , m m m
m
fSFVG } , I{h } · · · , feature vectors, e.g., fCFVG , fSFVG or fJFVG , and the
1,1
1,1 i,j i,j
class labels l1 , l2 , · · · , lM of the training channel matrices
· · · , R{hm
Nr ,Nt } , I{hNr ,Nt } ].
m
Hm , m = 1, · · · , M are employed as input to optimize the
(17)
Besides the channel gains, it is noted that the similarity classifier parameters. For ease of exposition, we only use
m
(e.g., correlation) of two distinct column vectors of the fJFVG as an example in the following description. When
channel matrix Hm is also an important feature. To be this learning process is completed, the achieved classifier
specific, similar column vectors of Hm cause detection am- can predict class labels for the new samples without class
biguity, particularly when detecting the transmit antenna labels. Since the dimensions of the prediction vectors given
index in SM-MIMO. Moreover, as shown in (12), the mod- in (13) are larger than 2, we employ a multi-class classifier,
uli of the column vectors of the channel matrix Hm rather such as the U -class classifier for TAS-aided SM and the Q-
than the moduli of its components, dominate the system class classifier for PA-aided SM. In this paper, two classic
performance. Hence, our design uses both the moduli of classifiers, namely the KNN and SVM, are utilized as
the column vectors of the channel matrix Hm and their follows:
correlation values to characterize the channel. For this 1) KNN Classification Algorithm: The KNN is a popu-
purpose, we first compute (Hm )H Hm and construct a new lar classification method in data mining and statistics due
type feature vector, namely joint FVG (JFVG), as follows: to its simple implementation and excellent classification
performance. As noted in [12], the KNN has different ver-
H H
m
fJFVG = [ (h1m ) h1m , (h1m ) h2m , . . . , sions. In our design, the process of the KNN classification
(
m H m m )H m (18)
algorithm for learning models from training samples and
(hi ) hj , hNt hNt ],
then predicting test samples with the learned model is
m H m • Learn the optimal value of K for the test samples
where Ji,j = (hi ) hj , i, j = 1, · · · , Nt are the elements
m H m
of (H ) H . Here, when we have i = j, Ji,j represents Hm , m = 1, · · · , M . Specifically, an exhaustive search
the moduli of the column vectors of Hm , while the other is performed in the range of [1, · · · , M/5] to find the
terms indicate their similarity (the correlation coefficient). value of K that achieves the optimal performance.
• For a given new observation Hnew , obtain its feature
Moreover, to avoid the bias issue in the learning process,
we normalize these features in (15) (17) and (18) as follows vector as fnew using (19) and calculate the distance
{ ˜m between the test sample fnew and all training samples
m m m m
fCFVG (n) = (fCFVG (n)−E[fCFVG ]) / (max(fCFVG )−min(fCFVG )) m
fJFVG , m = 1, · · · , M so as to obtain its K nearest
˜m m m m m
fSFVG (n) = (fSFVG (n)−E[fSFVG ]) / (max(fSFVG )−min(fSFVG )) , neighbors n
fKNN , n = 1, · · · , K and their correspond-
f˜JFVG
m m
(n) = (fJFVG m
(n)−E[fJFVG m
]) / (max(fJFVG m
)−min(fJFVG )) ing labels. Here, the Euclidian distance is utilized as
(19)
m m m 2
where fCFVG (n),fSFVG (n) and fJFVG (n) are the nth ele- m
d(fnew , fJFVG ) = ∥fJFVG
m
− fnew ∥F . (20)
m m m
ments of fCFVG , fSFVG and fJFVG , respectively.
3) KPI Design: The KPI is a performance metric to •Implement KNN classification by assigning Hnew with
classify the training samples Hm , m = 1, · · · , M . In or- label lKNN (∈ [l1 , l2 , · · · , lM ] by the majority rule on
der to improve the BER performance, the archived FD the labels of the selected nearest neighbors. Hence,
dmin (H̄) of the possible candidates is adopted as the KPI. only the lKNN -th element of the prediction vector rm
4) Labeling: In our design, the TAS-aided SM and PA- of (13) is nonzero.
aided SM have a similar labeling process, and thus we 2) SVM Classification Algorithm: The original SVM is
only elaborate on the labeling process for TAS-aided SM. a binary classifier that makes its decisions by constructing
More specifically, in labeling, the transmit antenna subsets an optimal hyperplane that separates the two classes with
{S1 , · · · , SU } are mapped to their corresponding class the largest margin. In more detail, for a given training set
numbers. For example, we assign label k to the transmit of feature-label pairs (ai , ti ), i = 1, · · · , N , where ai ∈ Rn

0733-8716 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2019.2929404, IEEE
Journal on Selected Areas in Communications

IEEE 6

Pre-processing

Channel Matrix H Feature Extraction Modulo Operation

Feed-Forward DNN

Soft max activation

ReLU activation

ReLU activationn
Full connection

Full connection

Full connection
...
...
Output Input
Layer Layer

on
on
on

Fig. 2. Proposed feed-forward DNN based multi-class classifier for adaptive SM-MIMOs.

and ti ∈ {−1, +1}, the SVM obtains the solution to the to formulate the prediction functions as:
following optimization problem: 
 1 T
 (w ) ϕ(fnew ) + b ,
1


N .. (23)
1 T  .
min 2w w +C ξi 
w,b,ξ i=1 (21) (wR )T ϕ(fnew ) + bR .
s. t. ti (wT ϕ(ai ) + b) ≥ 1 − ξi ,
ξi ≥ 0, i = 1, · · · , N, • For a given new observation Hnew , we obtain its
feature vector as fnew using (19) and predict its label
where ξi is the penalty parameter of the error term, ϕ(·) by (23) as:
is the mapping function, w and b are linear parameters,
lSVM = arg max{(wr )T ϕ(fnew ) + br }. (24)
and C is the regularization parameter 1 . r=1,··· ,R
As shown in (21), the SVM maps the training vectors ai
into a higher dimensional space in accordance with func- C. The Proposed DNN Method
tion ϕ(·), and then finds a linear separating hyperplane
with the maximal margin in this space. For our multi- In this subsection, we propose a feed-forward DNN
class problem, we employ the one-against-all concept for based multi-label classifier for adaptive SM-MIMO param-
classification based upon (21). Specifically, we construct R eter prediction. Compared to SLC methods, the DNN is
SVM models where R = U for TAS-aided SM and R = Q another important branch of machine learning, which is
for PA-aided SM as follows: vaguely inspired by biological neural networks. In general,
m
the DNN is capable of offering some attractive char-
• Based on the training feature-label pairs (fJFVG , lm ), acteristics, such as little human intervention and easy
m = 1, · · · , M, lm ∈ {1, 2, · · · , R} obtained in 2) and implementation.
4) in Section III-A, we decompose the classification
1) Main Framework of DNN: As shown in Fig. 2, the
into R-order 2-class SVM problems, where the rth
training preprocessing steps given in Section III-A 1)-
(r ∈ {1, · · · , R}) SVM is trained by considering that
4) are first employed to establish a DNN-based learning
the samples in the rth class have positive labels while
system. In the DNN, the main functions of these steps
all other simples with negative labels.
are: (a) the complex-valued channel matrices are convert-
• For the rth 2-class SVM, we solve the following
ed to real-valued inputs, which facilitates the gradient-
optimization problem:
based optimization process in each layer; (2) the feature

M extraction process can effectively improve the convergence
1 r T r
min 2 (w ) w + Cr r
ξm and the classification accuracy of the neural network, even
w ,b ,ξ r
r r
m=1 though it is not necessary so in convectional DNN designs.
s. t. (wr )T ϕ((fJFVG
m
) + b ≥ 1 − ξm r r
, if lm = r,
In Fig. 2, a feed-forward DNN with G layers is utilized
(w ) ϕ((fJFVG ) + b ≤ −1 + ξm
r T m r r
, if lm ̸= r,
to describe the mapping of an input feature vector, e.g.,
ξm ≥ 0, m = 1, · · · , M, r ∈ {1, · · · , R},
r
m
fJFVG , to an output vector rm given in (13) through G
(22)
iterative precessing steps. More specifically, the mapping
where ξir is the penalty parameter, C r is the regu-
of the layers can be expressed as
larization parameter, while w and b are the linear
parameters for the rth SVM. l
m ; θ ), l = 1, · · · , G,
rlm = fl (rl−1 (25)
• For all legitimate r ∈ {1, · · · , R}, we obtain the
Nl−1 ×1
estimated parameters wr and br , which are employed l
m ;θ ) : R
where fl (rl−1 → RNl ×1 denotes the map-
ping carried out by the lth layer, and θ l , l = 1, · · · , G
1 As shown in [12], the SVM has many variations, such as the C- are neural network parameter vector. Here, the multi-layer
SVM and ν-SVM. In this paper, the C-SVM is employed. network is initialized with r0m = fJFVG
m
. As a result, the

0733-8716 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2019.2929404, IEEE
Journal on Selected Areas in Communications

IEEE 7

overall input-output mapping can be expressed by a chain schemes, e.g., DG = U = |S| and DG = Q = |P|
of functions as follows for TAS-aided SM and PA-aided SM, respectively.
1 G−2 Moreover, the value of the kth element of rG m , namely
rG 0
m = fG−1 (fG−2 (...(f1 (rm ; θ ),...); θ ); θ G−1 ),l = 1,· · ·, G. rG (k), denotes the probability that f m m
m JFVG (also H )
(26) belongs to class k, k ∈ {1, 2, · · · , DG }.
l
m ; θ ), l = 1, · · · , G are nonlinear map-
In the DNN, fl (rl−1 • Based on the network built in (29)-(31), we train the
ping functions and have various types. As shown in Fig. DNN parameters {Wl , bl }, l = 1, · · · , G based on the
2, we consider the fully connected layers2 as follows following sum loss function:
l
m ; θ ) = ∆(W rm + b ), l = 1, · · · , G,
fl (rl−1 l l−1 l
(27)
1 ∑
M
l G(Wl , bl ) = Gm (r̄G G
m , rm ), (32)
where θ = {W , b } and W and b are the weight
l l l l
M m=1
and bias parameter vectors for the lth layer, respectively.
Moreover, ∆(·) is an activation function, which introduces where r̄Gm is the desired output of the DNN when
m
non-linearity to the DNN and facilitates the stacking of fJFVG is used as the input vector. Note that r̄G m can
multiple layers. ∆(·) is usually applied individually to each be generated by using (13) based on the desired label
DG ×1
element of its input vector, i.e., lm . Here, g(r̄Gm , rm ) : R
G
× RDG ×1 → R is the
cross-entropy based loss function given by
[∆(v)]i = ∆(vi ), (28) ∑
m , rm ) = −
Gm (r̄G G
r̄G G
m (k) log(rm (k)). (33)
where vi is the i-th element of vi and [∆(v)]i is the i- k
th element of [∆(v)]. This nonlinear function may be the • For a given new observation Hnew , obtain its feature
ReLU function, the tanh function, the sigmoid function vector as fnew using (19) and predict its label lnew as:
or the softmax function, depending on the application in
question3 . rnew = soft max(max(WG−1 · · · max(
More specifically, as shown in Fig. 2, we construct a W1 fnew +...b1 , 0) + · · · bG−1 , 0))
DNN model consisting of G − 1 fully connected layers (rmax , lnew ) = argmax {rnew (k)}.
new
k=1,...,R
and one softmax layer, which are used to refine feature (34)
extraction and to generate the final classification result, where rmax new
is the maximum value of rnew (k), k =
respectively. In the following, we detail the training and 1, ..., R and lnew is its corresponding index.
prediction processes of the proposed DNN-based multi- As shown in Fig. 2, the first G − 1 layers are employed
class classifier for the TAS and PA problems in SM: to enhance the fitting of the DNN model, which can be
• Let ∆(·) be the ReLU function given by ∆(ui ) = viewed as refined feature extraction of the original feature
max(0, ui ) and the numbers of neurons in the lth m
fJFVG . By contrast, the last layer is utilized to generate
layer be Dl , l = 1, · · · , G. Based on the training the class label. In our design, we adopt the ReLu function
m
feature-label pairs (fJFVG , lm ), m = 1, · · · , M, lm ∈ as the activation function for neurons, since it has the
{1, 2, · · · , R} obtained in 2) and 4) in Section III-A, benefits of sparsity and a reduced likelihood of vanishing
∈ RNt ×1 as the
2
m
we employ the feature vectors fJFVG gradient, which are conducive to faster learning for our
input to the first layer and obtain its corresponding classification problems. Moreover, as shown in (33), we
output as utilize the log loss function to measure the performance of
r1m = f1 (r0m ; θ 1 ) = max(W1 r0m + b1 , 0) our DNN classification model, because it can increase the
(29) rate of convergence when the target prediction vectors are
= max(W1 fJFVG m
+ b1 , 0),
one-hot vectors as given in (13).
where r1m ∈ RD1 ×1 , W1 ∈ RD1 ×D0 , D0 = (Nt )2 and 2) DNN Parameter Optimization Method: In order to
b1 ∈ RD1 ×1 efficiently identify the parameters {Wl , bl }, l = 1, · · · , G
• For the lth layer ( 1 < l < G), we utilize the output of the proposed DNN-based classifier, a novel iterative
of the (l − 1)th layer as its input vector to refine the algorithm through integrating the simplified conjugate
feature vector and generate the following output gradient (SCG) method [39] and the stochastic gradi-
ent decent (SGD) algorithm is developed. Specifically,
rlm = max(Wl rl−1 l
m + b , 0), 1 < l < G, (30)
based on (32) and (33), we only use a random subset
where rlm ∈ RDl ×1 , Wl ∈ RDl ×Dl−1 and bl ∈ RDl ×1 of training samples (fJFVG m
, lm ), m = 1, · · · , M ′ , ln ∈

For the Gth (the {1, 2, · · · , R}, M ≪ M in each iteration and the loss

∑ last) layer, the softmax function
∆(ui ) = eui / j euj is applied to map the element function of (32) becomes to
of the output vector to the interval of [0,1] and the ∑
M ′
1
resultant output vector can be expressed as Ḡ(Wl , bl ) = M′ Gm (r̄G G
m , rm )
m=1
′ (35)
rG G G−1
m = softmax(W rm + bG ). (31) 1
∑ D
M ∑G G
= M′ r̄m (k) log(rG
m (k)).
DG ×1
m ∈R
Note that the size of the output rG m=1 k=1
is equal
to the number of selection candidates in adaptive SM Here, we need to consider the problem of evaluating
∇Ḡ(Wl , bl ). For ease of exposition, , in (29)-(30), we let
2 This means that the neurons of a layer are all connected to the
neurons of its input layer. al = [Wl , bl ] [rl−1 , 1]T = Wl rl−1 l
m +b , (36)
3 More details of these activation functions can be found in Table | {z } | m {z }
II of [14] W̃l r̃l−1
m

0733-8716 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2019.2929404, IEEE
Journal on Selected Areas in Communications

IEEE 8

and only consider the gradient ∇Gm (Wl , bl ) for a single serval classic steepest gradient algorithms. Moreover, our
training sample m. Then, the error back propagation algorithm requires only the first-order derivation of the
technique is employed to calculate the derivatives of the lost function, which has a simple form and thus imposes
loss function with respect to the weight parameters W̃l = a low complexity.
[Wl , bl ], l = 1, · · · , G, which can be expressed as
∇Gm (Wl , bl ) = ∇Gm (W̃l ) D. Comparison of SLC and DNN
(W̃l ) ∂al ∂Gm (W̃l ) l−1 (37)
= ∂Gm
∂al ∂ W̃l
= ∂al r̃m , Based on the design process detailed in Sections III-
l
B and III-C, we compare the SLC-based and DNN-based
where the value of ∂Gm (W̃ )
∂al depends on the activation and classifiers as follows:
loss functions adopted. In our DNN design, assuming that • The performance of SLC-based classifiers may be
al (j) is the jth element of al and rlm (j) is the jth element more sensitive to their parameters. Specifically, the
(W̃G )
of rlm , we first obtain the gradient ∂Gm ∂aG for the Gth proposed KNN classifier needs to search the optimal
layer using the softmax function as follows value of K for all test samples, and its output is
∂Gm (W̃G ) ∂Gm (W̃G ) ∂rG m (j)
not guaranteed to be accurate due to the ambiguity
=
∂aG (j) ∂rGm (j) ∂a∑
G (j) stemming from the majority rule. By contrast, the
r̄G eaG (j) eaG (i) −e2aG (j) (38)
m (j) proposed SVM classifier determines the label of a
= G
rm (j)
∑ i
2 .
( eaG (i) )
i test sample by the use of linear boundaries in high
∂Gm (W̃ ) l dimensional spaces, and it has to consider the trade-
Then, the gradient ∂al when l = G − 1 is derived as off between the smoothness (simple) of the boundaries
∂Gm (W̃G−1 ) ∑ ∂Gm (W̃G ) ∂aG (k) and the training misclassification by adjusting param-
= . (39) eter C r in (22).
∂aG−1 (j) k ∂aG (k) ∂aG−1 (j)
• The SVM selects a subset from a fixed set of basis
Here, the value of ∂aG (k)/∂aG−1 (j) can be easily obtained functions and typically involves simple convex op-
based on the max function adopted in the G − 1 layer. timization to determine the mode parameters. By
As shown in (39), the gradient ∂Gm (W̃G−1 )/∂aG−1 (j) contrast, the proposed DNN-based classifier also fixes
of the (G − 1)th layer depends on the counterpart the number of basis functions but uses their paramet-
∂Gm (W̃G )/∂aG (k) of the the Gth layer. For the remain- ric forms, which allows adaptive operations during
ing layers, the corresponding gradients can be computed training. As a result, the DNN can exploit more
similarly. Based on the gradient ∇Gm (Wl , bl ) achieved implicit feature information, which will be verified by
in (37), the accumulated gradient ∇Ḡ(Wl , bl ) in (35) is our simulation results.
given by

1 ∑
M E. Complexity Analysis of the Proposed Algorithms
∇Ḡ(W , b ) = ′
l l
∇Gm (Wl , bl ). (40)
M m=1 In this subsection, we provide an evaluation in predic-
tion complexity of the proposed SLC- and DNN-based
Given the above gradient, the problem of DNN parameter link adaptive algorithms, by considering only complex
optimization can be solved iteratively by commencing additions and multiplications. It is noted that the training
from an appropriate initial point using the following SCG complexity is excluded since the training can be carried
algorithm: out off-line. Moreover, as shown in Section III-A, different
Algorithm 1 Proposed SCG-SGD algorithm. m
feature vectors fCFVG ∈ RNt Nr ×1 , fSFVG
m
∈ R2Nt Nr ×1 and
Nt2 ×1
1) Initialization: Given a starting value fJFVG ∈ R
m
of various dimensions are considered as
[Wl,(1) , bl,(1) ], set the maximum number of the input of the SLC and DNN classifiers, which may
iterations Iall , the step size of µ > 0 and the introduce unequal complexity orders for the prediction.
termination scalar of β > 0; given the gradient of Let the dimension of the input feature vector be DF . For
the initial DNN parameters U(1) = [Wl,(1) , bl,(1) ] m
fCFVG m
, fSFVG m
or fJFVG given in (18), we have DF = Nt Nr ,
as E(1) = Ḡ(U(1) ) = ∇ Ḡ(Wl , bl ), we set n = 1. DF = 2Nt Nr and DF = Nt2 , respectively.
2) Loop: if ∇Ḡ(Wl , bl ) < β or n > Iall , goto Stop. As shown in Section III-B, the complexity of the KNN-
U(n+1) = U(n) − µE(n)/ ∥E(n)∥ , (41) based classifier is attributed mainly to the calculation of
(n+1) (n+1) H
the Euclidian distance in (20), which is approximately
α = 1/tr(U (U ) ), (42) O(M (3DF − 1)). For the SVM-based classifier given in
(n+1)
√ (n+1)
U = αU , (43) (22)-(24), its complexity is dominated by computing the
2 2 high dimensional vector ϕ(fnew ) and the prediction func-

φl = ∇Ḡ(U(n+1) ) / ∇Ḡ(U(n) ) , (44) tions (wr )T ϕ(fnew ) + bR , r = 1, · · · , R in (23), which is
E(n + 1) = φl E(n) − ∇Ḡ(U(n+1) ). (45) approximately in the order of O(DF2 + DF ).
As detailed in Section III-C, for a given new observation
n = n + 1, goto Loop. Hnew , the prediction complexity of the proposed DNN-
3) Stop: Output U(n+1) as the final solution. based algorithm depends on the evaluation of its corre-
In our SCG-SGD algorithm, small values of M ′ helps sponding class label lnew using (34), which is a function
reduce the complexities of both the gradient computation of the number of used layers G and the numbers of the
and weight updating, while the SCG-based steps given in neurons Dl in these layers. To be specific, for the fully-
(41)-(45) can provide a faster rate of convergence than connected layers, e.g., the 1st to (G − 1)st layers, the

0733-8716 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2019.2929404, IEEE
Journal on Selected Areas in Communications

IEEE 9

complexity of our design is dominated by computing rlm = the benchmarkers. As shown in Sections III-B and III-
max(Wl rl−1 l
m +b , 0), 1 < l < G. For the lth layer, it incurs C, the trained SLC and DNN machine models can be
a complexity of the order of O(2Dl Dl−1 −Dl−1 ). Moreover, employed for all SNRs. Moreover, we also investigate the
the complexity of the last layer depends on the calculation performance of the proposed machine-learning based algo-
of rG
m = softmax(W rm
G G−1
+ bG ), where the complexity of rithms when they are extended for adaptive modulation in
evaluating W rm + b is 2DG DG−1 − DG−1 and the
G G−1 G
OFDM-IM. In our simulation, the parameters for SVM
complexity of the softmax function is 2DG − 1. Hence, the and KNN algorithms, such as the penalty parameters
overall complexity of the proposed DNN-based algorithm C r , r = 1, · · · , R, the variances of the Gaussian radial-
can be shown as based kernel functions and the number of the neighbors
(G )
∑ are chosen by using iterative cross-validation with M/10
CDNN = O (2Dl Dl−1 − Dl−1 ) + 2DG−1 (10%) training samples. We generate M = 2×103 training
(l=1 ) feature-label pairs and set the number of layers of DNN

G
=O 2D1 DF −DF + (2Dl Dl−1 − Dl−1 )+2DG−1 . to G = 44 , where three of them are full-connected layers.
l=2
(46) The numbers of neurons in these layers are DF , 2DF , DF
Note that we have D0 = DF in (46) since the input vector and DG , respectively. Note that the values of DF and DG
to the first DNN layer is the feature vector given in (19). are determined by the specific adaptive IM system and the
feature extraction method, which are detailed in Table I.
IV. Extension of SLC and DNN to other
Adaptive IM systems A. BER Comparison of Different Feature Extraction
We now extend the proposed SLC and DNN-based Methods for TAS and PA Aided SM
algorithms to other adaptive IM applications, e.g., AM- In our machine learning-aided adaptive SM schemes,
aided OFDM-IM systems. In OFDM-IM, all N subcarriers a key problem is to find out the most discriminating
are divided into R subblocks, which are the basic units features for classification. For this purpose, several feature
for bit modulation with a length of LIM = N/R. The extraction methods, i.e., the conventional FVG of (16),
detailed model of OFDM-IM can be found in [1], [37], the separate FVG of (17) and the joint FVG of (18)
[41]. In the AM process, the modulation orders assigned are implemented and compared in our SLC and DNN
to active subcarriers adapt dynamically to the changing based TAS-SM and PA-SM schemes. Note that in our
channel conditions. Denote by T = [T1 , T2 , ..., TIM ]T one figures the features generated by (16), (17) and (18) are
of the subblocks in OFDM-IM, and only KIM out of denoted by ‘mod(H)’, ‘Re(H)+Im(H)’ and ‘mod(HH H)’,
LIM subcarriers are activated during transmission. For a respectively.
given transmit rate, there are several possible modulation A first comparison is made in Fig. 3 in order to state
order combinations. Let R be a set of AM combinations the best performing approach between the proposed KNN
given as R = {R1 , R2 , · · · , Rj , · · · , RUC }, where Rj = and SVM-based classifiers for adaptive SM systems. Here,
[Rj1 , · · · , Rjn · · · , RjKIM ] with Rjn indicating the modulation we investigate the antenna set optimization problem in
order of the nth (n = 1, 2, · · · , KIM ) active subcarrier of TAS-SM and the parameter setup is Nt = 4, Nr = 2,
the jth AM combination. For example, the set of AM L = 2 and QPSK. For the sake of simplification, we only
combinations for (LIM , KIM ) = (4, 3) OFDM-IM with a consider two kinds of feature extraction methods as the
transmit rate 2 bits/subcarrier is given by conventional FVG of (16) and the joint FVG of (18),
       
 BPSK QPSK QPSK  to generate the inputs for KNN and SVM classifiers. As
 8PSK   BPSK  · · ·  QPSK  . shown in Fig. 3, the proposed SLC-based TAS-SM schemes
  provide beneficial gains compared to the conventional SM
QPSK 8PSK QPSK
(47) scheme. Moreover, as expected, the proposed SVM method
In (47), there are UC = 7 legitimate AM candidates. is more preferable than the KNN method since it first
Similar to (9)-(11), for a given channel realization H, employs a powerful high dimensional mapping and then
the modulation order in AM-aided OFDM-IM can be carries out the classification. Furthermore, in Fig. 3 we can
optimized as follows see that the proposed joint FVG is capable of producing
better performance than the conventional one. This is
ĵ = max dmin (H), (48) mainly due to the fact that the joint FVG can deliver
Rj ∈R
more useful information to the classifiers for identifying
which can be converted into the data-driven prediction the channel state, as discussed in Section III-A.
counterpart based on the similar steps given in Section In Figs. 4 and 5, we further compare the proposed SVM-
III. based TAS-SM and the DNN-based TAS-SM schemes by
employing different feature vectors, respectively. In Figs.
V. Simulation Results 4 and 5, the parameter setup is Nt = 8, Nr = 2, L = 2
In this section, we provide simulation results for charac- and QPSK. Beside the conventional FVG and the joint
terizing the proposed SLC and DNN based link adaptation FVG, the separate FVG of (17) is also considered in
algorithms for TAS- and PA-aided SM-MIMO systems.
Here, for comparison, serval conventional optimization- 4 Increasing the number of the layers G for DNN may achieve

driven methods, such as the max-norm based TAS of further performance improvements. In this paper, by considering the
complexity imposed, similar to [14], we set G = 4. For the sake of
[42], the max-dmin based TAS algorithm of [42], [43], and achieving an improved DNN for adaptive SM-MIMOs, our future
the max-dmin based PA algorithm of [11] are utilized as work will concentrate on this optimization problem.

0733-8716 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2019.2929404, IEEE
Journal on Selected Areas in Communications

IEEE 10

TABLE I
The values of parameters DF and DG for the proposed DNN-based algorithm, where U , Q and UC are the adaptive
candidates in TAS-SM, PA-SM and AM-aided OFDM-IM, respectively.

Feature vectors TAS-aided SM-MIMO PA-aided SM-MIMO AM aided OFDM-IM


m
fCFVG DF = Nt Nr , DG = U DF = Nt Nr , DG = Q DF = Nt Nr , DG = UC
m
fSFVG DF = 2Nt Nr , DG = U DF = 2Nt Nr , DG = Q DF = 2Nt Nr , DG = UC
m
fJFVG DF = Nt2 , DG = U DF = Nt2 , DG = Q DF = Nt2 , DG = UC

100 100
Nt=4, Nr=2, L=2, QPSK

10-1 10-1

10-2 10-2
BER

BER
10-3 10-3
Conventional SM, 2 2, QPSK
Proposed KNN-based TAS-SM, mod(H)
Conventional SM, 2 2, QPSK

10-4 Proposed KNN-based TAS-SM, mod(HHH)


10-4 Proposed DNN-based TAS-SM, mod(H)
Proposed DNN-based TAS-SM, Re(H)+Im (H)
Proposed SVM based TAS-SM, mod(H)
Proposed DNN-based TAS-SM, mod(HHH)
Proposed SVM based TAS-SM, mod(HHH)
Optimal max-dmin based TAS-SM

10-5 10-5
0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16
SNR(dB) SNR(dB)

Fig. 3. BER comparison of the KNN- and SVM-based TAS-SM Fig. 5. BER comparison of the proposed DNN-based TAS-SM
schemes for 3 bits/symbol with Nt = 4, Nr = 2, L = 2 and QPSK, schemes when different feature vectors are employed. Here, the
where the conventional FVG and the joint FVG are considered. parameter setup is Nt = 8, Nr = 2, L = 2 and QPSK.

100
Nt=8, Nr=2, L=2, QPSK an appropriate tradeoff between the smoothness of bound-
aries and the training misclassification, as indicated in
10-1 [12]; (2) similar to Fig. 3, the benefits of the proposed
joint FVG are also visible in Figs. 4 and 5 for both SLC
and DNN classifiers. Moreover, as expected in Section III,
10-2
the proposed separate FVG is also preferred than the
BER

conventional FVG; (3)DNN is more robust to the feature


10-3 extraction methods, since the performance difference a-
mong these three FVGs are smaller than that of SVM.
Conventional SM, 2 2, L=2, QPSK
This benefit may be due to the reason that DNN itself
10-4 Proposed SVM-based TAS-SM, mod(H) can be regarded as a feature extraction method, which is
Proposed SVM-based TAS-SM, Re(H)+Im (H)
Proposed SVM-based TAS-SM, mod(HHH) capable of capturing somewhat implicit information; (4)
Optimal max-dmin based TAS-SM
By using the proposed FVG, both SVM- and DNN-based
10-5 TAS-SM schemes are capable of achieving an acceptable
0 2 4 6 8 10 12 14 16
SNR(dB) performance, but the performance of the DNN-based TAS-
SM is more closer to optimal max-dmin based TAS-SM.
Fig. 4. BER comparison of the proposed SVM-based TAS-SM As shown in Fig. 5, the performance gap between the
schemes by employing different feature vectors. To be specific, the proposed DNN-based TAS-SM and the exhaustive-search-
conventional FVG of (16), the separate FVG of (17) and the joint
FVG of (18) are considered. Here, the parameter setup is Nt = 8, based optimal one is only about 0.8 dB.
Nr = 2, L = 2 and QPSK. To further identify the benefits of the proposed joint
FVG, in Fig. 6, other features of the channel matrix H,
such as the phases of elements of H, the moduli of the
these figures. Moreover, we also provide the performance column vectors of H, and the covariance parameters of the
of the conventional SM and the optimal max-dmin based column vectors of H, namely ‘phase (H)’, ‘abs(hi )’ and
TAS algorithm of [42], [43] as benchmarkers. The main ‘cov(hi , hj )’, and their combinations are also considered
observations from Figs. 4 and 5 are: (1) the performance of as the inputs of the SVM classifier. Here, hi and hj
the SVM- and DNN-based TAS-SM is strongly connected are the ith and jth columns of the channel matrix H.
to the way features are extracted. When the number of It is clear from Fig. 6 that the joint FVG advocated
the classes is large and the adopted FVG is unsuitable outperforms the others. This is mainly due to the fact
( i.e., the conventional FVG), the SVM-type classifier that our proposed FVG has joint considered the most
may even perform worse than the conventional scheme important features of SM-MIMO. To be specific, in SM-
without operating TAS, since it becomes hard to achieve based MIMO transmissions, the indices of the column

0733-8716 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2019.2929404, IEEE
Journal on Selected Areas in Communications

IEEE 11

100 order, such as Nt = 16 and 16-QAM. In our simulation,


Nt=4, Nr=2, L=2, QPSK, SVM-based TAS-SM
only the best performing feature extraction method, i.e.,
the joint FVG, is employed. Observe in Fig. 7 that the
10-1
proposed DNN-based schemes still outperform the SVM-
based schemes. This is not surprising, since DNN and SVM
10-2 are both designed based on a fixed set of basis functions,
but DNN innovatively utilizes their parametric forms and
BER

allows adaptive operations during training. As a result,


10-3 DNN may have stronger learning ability than SVM for
Feature phase(H) many applications, as also noted in [12].
Feature mod(H)+phase(H)
10-4 Feature mod(H)+cov(hi,hj) Moreover, beside the TAS-SM, in Fig. 8 we further
Feature mod(H)+abs(hi)
compare the BER performance of DNN and SVM in the
Feature mod(HHH)
PA-SM system with Nt = 2 and Nr = 2, and QPSK.
10-5
0 2 4 6 8 10 12 14 16 In Fig. 8, the performance curves of the conventional
SNR(dB) SM without PA and the optimal max-dmin based PA
algorithm of [11] are added as benchmarkers. Similar to the
Fig. 6. The effects of the feature vectors on the BER performance for aforementioned comparison process in TAS-SM, in PA-SM
the SVM-based TAS-SM, where we have Nt = 4, Nr = 2, L = 2 and
QPSK. Here, beside the features given in (16)-(18), other features of we also first investigate the effect of the feature vectors on
the channel matrix, such as the phases of its elements as well as the the BER performance of machine learning algorithms. As
correlation and the modulus of its column vectors are also adopted shown in Fig. 8, the proposed joint FVG is still preferred,
as the inputs of the SVM classifier.
which is consistent with the result seen in Figs. 4 and 5
for TAS-SM schemes. Also, in PA-SM, the DNN-based
100 algorithm provides better performance than the SVM-
16×2, L=2
based counterpart, which only leads to a negligible loss
10-1 compared to the optimal max-dmin based PA algorithm.

In Figs 9 and 10, the proposed SVM-based and


10-2 DNN-based algorithms are compared to the conventional
BER

optimization-driven algorithms, such as the exhaustive-


8×2 L=2 8×2, L=2 search max-dmin based PA and TAS algorithms, where
10-3 only the joint FVG (the feature mod(HH H)) is employed.
Note that these maximum minimum distance algorithms
are capable of achieving the optimal BER performance
10-4
Proposed SVM-based TAS-SM when FD is adopted as the KPI. Hence, their performances
-5
Proposed DNN-based TAS-SM are the lower bounds of these of the machine-learning
10 based algorithms. For further comparison, we also consider
0 2 4 6 8 10 12 14 16 the conventional max-norm based TAS of [42] for TAS-SM
SNR(dB)
in Fig. 9 and the identical-throughput PA-aided VBLAST
for PA-SM in Fig. 10 as benchmarkers. As shown in Figs.
Fig. 7. BER comparison of the proposed SVM and DNN-based TAS-
SM schemes under different setups. Here, three setups, namely the 9 and 10, the proposed machine learning-based algorithms
setup A of Nt = 8, Nr = 2, L = 2 and 16-QAM, the setup B of exhibit an attractive BER performance. For example, for
Nt = 16, Nr = 2, L = 2 and 16-QAM, and the setup C of Nt = 8, both TAS and PA, the DNN-based algorithms are capable
Nr = 2, L = 2 and QPSK, are utilized.
of achieving almost the same performance of the optimal
exhaustive-search based max-dmin algorithms, only about
vectors are exploited for conveying the information bits, an SNR loss of 0.5 and 0.2 dB in Figs. 9 and 10, respec-
where the modulus values of these vectors dominate the tively. Moreover, as shown in Fig. 10, the proposed DNN-
performance of the PSK/QAM bits while the correlation based PA-SM scheme provides an SNR gain about 1.4 dB
values of these vectors decide the performance of the index over the PA-aided VBLAST of [44] at BER=10−4 .
bits. Compared to other FVGs, in our design, these two
In Fig. 11, the performance of the proposed DNN-
values are efficiently exploited and joint considered by
based algorithm in the AM-aided OFDM-IM system is
using (18).
investigated. In Fig. 11, the number of the subcarriers is
N = 1024, the subcarrier separation is 15 KHz, the cyclic
B. BER Comparison of SLC and DNN Algorithms for TAS prefix length is 64, and the modulation is QPSK. We use
and PA Aided SM the Extended Vehicular A (EVA) channel model with 9
We observe from Figs. 4 and 5 that the proposed paths. In each basic unit of OFDM-IM, KIM = 3 out of
DNN-based algorithm provides a beneficial SNR gain over LIM = 4 subcarriers are activated during transmission. For
the SVM-based algorithm for TAS-SM with the setup this setup, as shown in (47), we have UC = 7 legitimate
of Nt = 8, Nr = 2, L = 2 and QPSK. For further AM candidates. As shown in Fig. 11, the trends of the
comparison, in Fig. 7, we investigate their performance DNN-based algorithm mentioned in TAS-SM and PA-SM
for a higher number of transmit antennas and modulation systems are also visible in AM-aided OFDM-IM.

0733-8716 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2019.2929404, IEEE
Journal on Selected Areas in Communications

IEEE 12

TABLE II
Complexity orders of different optimization algorithms for TAS-SM schemes.

TAS-SM schemes Complexity order Configuration 1 Configuration 2


(8 × 2, L = 2, QPSK) (16 × 2, L = 8, 64QAM)
Max-norm based TAS-SM O (N
(( t N)r ) ) 16 128
Exhaustive max-dmin TAS-SM O NLt Nt2 Nr2 MAPM
2 1.15 × 105 5.39 × 1010
Proposed KNN-based TAS-SM O(M DF ) 1.28 × 105 5.12 × 105
Proposed SVM-based TAS-SM O(DF
2 +D )
F 4.16 × 103 6.55 × 104
Proposed DNN-based TAS-SM O(4DF
2 +D )
F 1.62 × 104 2.62 × 105

DF : the size of the input feature vector. For the proposed JFVG, we have DF = Nt2 . M : the size of training sample set.

100 ×10-3
100
Conventional SM, 2 2, BPSK
3
Proposed DNN-based PA-SM
2 Optimal max-dmin based PA-SM
BER

VBLAST, 2 2, BPSK
10-1 1 10-1 PA-aided VBLAST of [44]

14.4 14.6 14.8 15

BER
BER

SNR(dB)
10 -2
10-2

Proposed SVM-based PA-SM, mod(H)

10-3 Proposed SVM-based PA-SM, Re(H)+Im(H)


10-3
Proposed SVM-based PA-SM, mod(HHH)
Proposed DNN-based PA-SM, mod(HHH)
Optimal max-dmin based PA-SM

10 -4 10-4
0 2 4 6 8 10 12 14 16 0 3 6 9 12 15 18 21
SNR(dB) SNR (dB)

Fig. 8. BER comparison of the proposed SVM and DNN-based PA- Fig. 10. BER comparison of the proposed machine learning based
SM schemes with Nt = 2, Nr = 2, and QPSK. Moreover, similar to PA-SM and the conventional optimization-driven PA-SM schemes,
the TAS, we also investigate the effects of the feature vectors on the such as the norm-based TAS-SM and the optimal max-dmin based
BER performance in the PA-aided adaptive SM. TAS-SM. The parameter setup is Nt = 2, Nr = 2 and BPSK. For
comparison, we also utilize the identical-throughput VBLAST and
PA-aided VBLAST as benchmarkers.
100
0
10
-1
10
-1
10
-2
10
BER

-2
10
10-3
BER

Conventional SM, 2×2, QPSK


-3
The norm-based TAS-SM of [42]
10
10-4 Proposed SVM-based TAS-SM
Proposed DNN-based TAS-SM
Optimal max-dmin based TAS-SM -4
10
10-5 Conventional OFDM-IM
0 3 6 9 12 15 Proposed DNN-based OFDM-IM with AM
SNR(dB) -5
Optimal max-dmin based OFDM-IM with AM
10
0 5 10 15 20 25 30
Fig. 9. BER comparison of the proposed machine learning based SNR (dB)
TAS-SM and the conventional optimization-driven TAS-SM schemes,
such as the max-norm based TAS-SM and the optimal max-dmin Fig. 11. BER comparison of the proposed DNN-based OFDM-
based TAS-SM. The parameter setup is Nt = 4, Nr = 2, L = 2 IM with AM, the conventional OFDM-IM and the exhaustive-
and QPSK. search max-dmin based OFDM-IM with AM. The parameter setup is
(LIM , KIM ) = (4, 3) and QPSK.

C. Complexity Comparison for Different Optimization Al-


gorithms for Adaptive SM Schemes algorithms for PA-SM systems are similar. For reference,
In Table II, the complexity orders of different optimiza- the complexity orders of the max-norm based TAS-SM and
tion algorithms for adaptive SM schemes are compared. of the max-dmin based TAS-SM can be found in Table I
In Table II, we only consider the TAS-SM as an example. of [7]. In our simulation, we have considered four layers
The complexity comparisons of different SLC and DNN with the numbers of neurons DF , D1 = 2DF , D2 = DF ,

0733-8716 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2019.2929404, IEEE
Journal on Selected Areas in Communications

IEEE 13

and D4 = DG , respectively. Based on (46), the complexity [3] P. Yang, Y. Xiao, Y. L. Guan, K. V. S. Hari, A. Chockalingam,
of the DNN-based TAS-SM is approximately in order of S. Sugiura, H. Haas, M. Di Renzo, C. Masouros, Z. Liu, L.
Xiao, S. Li, and L. Hanzo, “Single-carrier spatial modulation: A
O(4DF2 + DF ). promising design for large-scale broadband antenna systems,”
As shown in Table II, the conventional optimization- IEEE Commun. Surveys Tuts., vol. 18, no. 3, pp. 1687-1716,
driven TAS(algorithms requires exhaustive searches over Feb. 2016.
) [4] S. Sugiura, T. Ishihara, and M. Nakao, “State-of-the-art design
all possible NLt adaptive candidates, hence a higher com- of index modulation in the space, time, and frequency domains:
plexity order is imposed compared to the proposed KNN, Benefits and fundamental limitations,” IEEE Access, vol. 5, pp.
SVM and DNN algorithms. In Table II, we also provide 21774-21790, Oct. 2017.
[5] P. Yang, Y. Xiao, Y. L. Guan, M. Di Renzo, S. Li and
the quantified complexity for some specific configurations. L. Hanzo, “Multidomain index modulation for vehicular and
We can observe from Table II that for configuration 1 the railway communications: A survey of novel techniques,” IEEE
complexity of the proposed DNN based design is about Veh. Tech. Magazine, vol. 13, no. 3, pp. 124-134, Sep. 2018
[6] M. Di Renzo and H. Haas, “Improving the performance of
10 times smaller than that of the exhaustive-search based space shift keying (SSK) modulation via opportunistic power
min-dmin design. This advantage becomes more apparent allocation,” IEEE Commun. Lett., vol. 14, no. 6, pp. 500-502,
for higher values of L and MAPM , such as in configuration Jun. 2010.
[7] P. Yang, Y. Xiao, Y. L. Guan, S. Li, and L. Hanzo, “Transmit
2. This is due to the fact that for a new channel matrix, we antenna selection for multiple-input multiple-output spatial
can utilize the off-line trained DNN for predicting the best modulation systems,” IEEE Trans. Commun, vol. 64, no. 5,
candidate of TAS-SM instead of tedious search. By taking pp. 2035-2048, Mar. 2016.
[8] M. Maleki, H. R. Bahrami, M. Kafashan, and N. H. Tran, “On
into account both the BER versus complexity trends, we the performance of spatial modulation: optimal constellation
conclude that the proposed SLC and DNN based adaptive breakdown,”IEEE Trans. Commun., vol. 62, no. 1, pp. 144-
schemes are promising for intelligent MIMO communica- 157, Jan. 2014.
tions. It should be noted that the main complexity of the [9] P. Yang, Y. L. Guan, Y. Xiao, M D. Renzo, S. Li, and L.
Hanzo, “Transmit pre-coded spatial modulation: Maximizing
proposed SVM and DNN based algorithms are dominated the minimum Euclidean distance versus minimizing the bit
by the size of the feature vectors DF , which may be further error ratio,” IEEE Trans. Wireless Commun., vol. 15, no. 3,
reduced by exploiting the special characteristics of SM pp. 2054-2068, Nov. 2015.
[10] P. Cheng, Z. Chen, J. Zhang, Y. Li, and B. Vucetic, “A unified
symbols and with the aid of advance feature extraction precoding scheme for generalized spatial modulation,” IEEE
techniques. This issue will be investigated in our future Trans. Commun., vol. 66, no. 6, pp. 2502-2514, Jun. 2018.
studies. [11] P. Yang, Y. Xiao, B. Zhang, S. Li, M. E-. Hajjar, and L.
Hanzo, “Power allocation-aided spatial modulation for limited-
feedback MIMO systems,” IEEE Trans. Veh. Technol., vol. 64,
no. 5, pp. 2198-2203, May 2015.
VI. Conclusion [12] M. Mohri, A. Rostamizadeh, and A. Talwalkar, Foundations of
Machine Learning, Cambridge, MA, USA: MIT Press, 2012.
In this paper, we proposed a novel framework for ma- [13] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning.
chine learning-aided adaptive SM-MIMO systems, where Cambridge, MA, USA: MIT Press, 2016.
the TAS and PA problems were solved using data-driven [14] T. J. OShea, J. Hoydis, “An introduction to deep learning for
the physical layer,” arXiv preprint arXiv: 1702.00832, 2017.
prediction approaches. Two types of intelligent algorithms [15] C. Jiang, H. Zhang, Y. Ren, Z. Han, K. Chen, and L. Hanzo,
were proposed based upon the SLC and DNN concepts. “Machine learning paradigms for next-generation wireless net-
Moreover, we studied the effect of feature extraction on works,” IEEE Wireless Commun., vol. 24, no. 2, pp. 98-105,
Dec. 2016.
the SLC- and DNN-based algorithms and proposed a novel [16] D. Zibar, M. Piels, R. Jones and C. G. Schaeffer, “Machine
feature vector generator by jointly considering the modu- learning techniques in optical communication,” Joural of Light-
lus and the correlation of the channel matrix coefficient. wave Tech., vol. 34, no. 6, pp. 1442-1452, Mar. 2016.
[17] M. Ibnkahla, “Applications of neural networks to digital
We also extended our machine learning algorithms to communications-A survey,” Signal Processing, vol. 80, no. 7,
other adaptive index modulation schemes. Through the pp. 1185-1215, Feb. 2000.
computational complexity and BER comparisons among [18] K. Burse, R. N. Yadav, and S. C. Shrivastava, “Channel equal-
ization using neural networks: A review,” IEEE Transactions
the proposed and past known schemes, we provided an on Systems, Man, and Cybernetics, Part C (Applications and
insight into the potential benefits of integrating machine Reviews), vol. 40, no. 3, pp. 352-357, May 2010.
learning with index modulation. [19] O. Dabeer, S. Nagaraja, and A. Chockalingam, “Boosting
MMSE receivers using AdaBoost,” in 2013 IEEE 78th Vehicu-
lar Technology Conference (VTC Fall), Las Vegas, NV, 2013,
pp. 1-5.
Acknowledgments [20] N. Samuel, T. Diskin and A. Wiesel, “Deep MIMO detec-
tion,” in 2017 18th IEEE International Workshop on Signal
The authors would like to thank Mr Feilong You, Pengfei Processing Advances in Wireless Communications (SPAWC),
Wang and Yucheng Liao of UESTC, and Prof Lajos Hanzo Sapporo, Japan, 2017, pp. 1-5.
at University of Southampton for their contributions to [21] N. Wang, T. Jiang, S. Lv and L. Xiao, “Physical-layer authen-
tication based on extreme learning machine,” IEEE Commun.
this paper. Lett., vol. 21, no. 7, pp. 1557-1560, Jul. 2017.
[22] X. Liu, C. Zhao, P. Wang, Y. Zhang and T. Yang, “Blind
modulation classification algorithm based on machine learning
References for spatially correlated MIMO system,” IET Commun., vol.
11, no. 7, pp. 1000-1007, May 2017.
[1] M. Di Renzo, H. Haas, A. Ghrayeb, S. Sugiura, and L. Han- [23] T. Gruber, S. Cammerer, J. Hoydis and S. t. Brink, “On
zo,“Spatial modulation for generalized MIMO: Challenges, op- deep learning-based channel decoding,” in 2017 51st Annu-
portunities and implementation,” Proc. IEEE, vol. 102, no. 1, al Conference on Information Sciences and Systems (CISS),
pp. 56-103, Jan. 2014. Baltimore, MD, 2017, pp. 1-6.
[2] E. Basar, M. Wen, R. Mesleh, M. D. Renzo, Y. Xiao, and [24] H. Huang, J. Yang, H. Huang, Y. Song, and G. Gui, “Deep
H. Haas, “Index modulation techniques for next-generation learning for super-resolution channel estimation and DOA
wireless networks,” IEEE Access, vol. 5, pp. 16693-16746, Oct. estimation based massive MIMO system,” IEEE Trans. Veh.
2017. Tech., vol. 67, no. 9, pp. 8549-8560, Sep. 2018.

0733-8716 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2019.2929404, IEEE
Journal on Selected Areas in Communications

IEEE 14

[25] C. Wen, W. Shih, and S. Jin, “Deep learning for massive MIMO Ping Yang [M’13, SM’16] received his Ph.D.
CSI feedback,” IEEE Wireless Commun. Lett., in press, 2018, degree from the University of Electronic Sci-
DOI: 10.1109/LWC.2018.2818160 ence and Technology of China, Chengdu,
[26] Y. Yang, et al., “DECCO: Deep-learning enabled coverage Sichuan, in 2013, where he is currently an
and capacity optimization for massive MIMO systems,” IEEE associate professor. From 2012 to 2013, he
Access, vol. 6, pp. 23361-23371, 2018. was a visiting student at the School of Elec-
[27] K. Kim, J. Lee, and J. Choi, “Deep learning based pilot tronics and Computer Science, University of
allocation scheme (DL-PAS) for 5G massive MIMO system,” Southampton, United Kingdom. From 2014
IEEE Commun. Lett., vol. 22, no. 4, pp. 828-831, Apr. 2018. to 2016, he was a research fellow at the
[28] D. A. Awan, R. L. G. Cavalcante, M. Yukawa and S. Stanczak, School of Electrical and Electronic Engineer-
“Detection for 5G-NOMA: An online adaptive machine learn- ing, Nanyang Technological University, Singa-
ing approach,” in 2018 IEEE International Conference on pore. He has published and presented more than 80 papers in journals
Communications (ICC), Kansas City, MO, 2018, pp. 1-6. and conference proceedings. His research interests include multiple-
[29] J. Kang, C. Chun and I. Kim, “Deep learning based channel input, multiple-output, orthogonal frequency division multiplexing,
estimation for wireless energy transfer,” IEEE Commun. Lett., machine learning, and communication signal processing.
in press, 2018. DOI: 10.1109/LCOMM.2018.2871442.
[30] S. Yun and C. Caramanis, “Reinforcement learning for link
adaptation in MIMO-OFDM wireless systems,” in 2010 IEEE
Global Telecommun. Conference, Miami, FL, 2010, pp. 1-5.
[31] A. Rico-Alvarino and R. W. Heath, “Learning-based adaptive
transmission for limited feedback multiuser MIMO-OFDM,”
IEEE Trans. Wireless Commun., vol. 13, no. 7, pp. 3806-3820,
Yue Xiao received a Ph.D degree in com-
Jul. 2014.
munication and information systems from the
[32] J. Joung, “Machine learning-based antenna selection in wire-
University of Electronic Science and Tech-
less communications,” IEEE Commun. Lett., vol. 20, no. 11,
nology of China in 2007. He is now an full
pp. 2241-2244, Nov. 2016.
professor at University of Electronic Science
[33] D. He, C. Liu, T. Q. S. Quek, and H. Wang, “Transmit anten-
and Technology of China. He has published
na selection in MIMO wiretap channels: A machine learning
more than 80 international journals and been
approach,” IEEE Wireless Commun. Lett., vol. 7, no. 4, pp.
involved in several projects in Chinese Beyond
634-637, Aug. 2018.
3G Communication R&D Program. His re-
[34] X. Gao, L. Dai, Y. Sun, S. Han, and I. Chih-Lin, “Ma-
search interests are in the area of wireless and
chine learning inspired energy-efficient hybrid precoding for
mobile communications.
mmWave massive MIMO systems,” in 2017 IEEE Internation-
al Conference on Communications (ICC), Paris, 2017, pp. 1-6.
[35] H. W. Liang, C. Wei Ho, and S. Y. Kuo, “Coding-aided k-
Means clustering blind transceiver for space shift keying MIMO
systems,” IEEE Trans. Wireless Commun., vol. 15, no. 1, pp.
103-115, Aug. 2016.
[36] L. You, P. Yang, Y. Xiao, et al., “Blind detection for spatial
modulation systems based on clustering,” IEEE Commun. Ming Xiao (S’2002-M’2007-SM’2012) re-
Lett., vol. 21, no. 11, pp. 2392-2395, Aug. 2017. ceived Bachelor and Master degrees in En-
[37] M. Wen, et al., “Index modulated OFDM for underwater gineering from the University of Electronic
acoustic communications,” IEEE Commun. Mag., vol. 54, no.5, Science and Technology of China, ChengDu in
pp. 132-137, May 2016. 1997 and 2002, respectively. He received Ph.D
[38] H. Noh, Y. Kim, J. Lee, and C. Lee, “Codebook design of degree from Chalmers University of technolo-
generalized space shift keying for FDD massive MIMO systems gy, Sweden in November 2007. From 1997 to
in spatially correlated channels,” IEEE Trans. Veh. Technol., 1999, he worked as a network and software
vol. 64, no. 2, pp. 513-523, Feb. 2015. engineer in ChinaTelecom. From 2000 to 2002,
[39] S. Chen, N. Ahmad, and L. Hanzo, “Adaptive minimum bit- he also held a position in the SiChuan com-
error rate beamforming,” IEEE Trans. Wireless Commun., vol. munications administration. From November
4, no. 2, pp. 341-348, Apr. 2005. 2007 to now, he has been in the department of information science
[40] S. Zhang, et al., “Efficient kNN classification with different and engineering, school of electrical engineering and computer sci-
numbers of nearest neighbors,”IEEE Trans. Neural Networks ence, Royal Institute of Technology, Sweden, where he is currently
and Learning Systems, vol. 29, no. 5, pp. 1774-1785, Apr. 2018. an Associate Professor. Since 2012, he was an Editor for IEEE
[41] M. Wen, X. Cheng, and M. Ma, “On the achievable rate of Transactions on Communications (2012-2017), IEEE Communica-
OFDM with index modulation,”IEEE Trans. Signal Process., tions Letters (Senior Editor Since Jan. 2015) and IEEE Wireless
vol. 64, no. 8, pp. 1919-1932, Dec. 2015. Communications Letters (2012-2016), and has been an Editor for
[42] R. Rajashekar, K. V. S. Hari, and L. Hanzo, “Antenna selection IEEE Transactions on Wireless Communications since 2018. He was
in spatial modulation systems,”IEEE Commun. Lett., vol. 17, the lead Guest Editor for IEEE JSAC Special issue on “Millimeter
no. 3, pp. 521-524, Mar. 2013. Wave Communications for future mobile networks” in 2017
[43] J. Zheng and J. Chen, “Further complexity reduction for anten-
na selection in spatial modulation systems,” IEEE Commun.
Lett., vol. 19, no. 6, pp. 937-940, Jun. 2015.

Yong Liang Guan received his Ph.D. de-


gree from the Imperial College of Science,
Technology and Medicine, University of Lon-
don, in 1997, and B.Eng. degree with first
class honors from the National University
of Singapore in 1991. He is now an asso-
ciate professor at the School of Electrical and
Electronic Engineering, Nanyang Technologi-
cal University. His research interests include
modulation, coding and signal processing for
communication, information security and stor-
age systems. The authorąŕs homepage is available online at http:
//www3.ntu.edu.sg/home/eylguan.

0733-8716 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSAC.2019.2929404, IEEE
Journal on Selected Areas in Communications

IEEE 15

Shaoqian Li [F’16] (lsq@uestc.edu.cn) re- Wei Xiang (S’00-M’04-SM’10) received the


ceived his B.S.E. degree in communication B. Eng. and M. Eng. degrees, both in elec-
technology from Northwest Institute of T- tronic engineering, from the University of
elecommunication (Xidian University) in 1982 Electronic Science and Technology of China,
and M.S.E. degree in Communication Sys- Chengdu, China, in 1997 and 2000, respec-
tem from UESTC in 1984. He is a Pro- tively, and the Ph.D. degree in telecommu-
fessor, Ph.D supervisor, director of Nation- nications engineering from the University of
al Key Lab of Communication, director of South Australia, Adelaide, Australia, in 2004.
School of Communication and Information He is currently Foundation Professor and Head
Engineering,UESTC, and member of Nation- of Electronic Systems and Internet of Things
al High Technology R&D Program (863 Pro- Engineering in the College of Science and En-
gram) Communications Group. His research includes wireless com- gineering at James Cook University, Cairns, Australia. During 2004
munication theory, anti-interference technology for wireless commu- and 2015, he was with the School of Mechanical and Electrical Engi-
nications, spread-spectrum and frequency-hopping technology, mo- neering, University of Southern Queensland, Toowoomba, Australia.
bile and personal communications. He is an IET Fellow, a Fellow of Engineers Australia, and an Editor
for IEEE Communications Letters. He was a co-recipient of three
Best Paper Awards at 2015 WCSP, 2011 IEEE WCNC, and 2009
ICWMC.

0733-8716 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

You might also like