You are on page 1of 8

This article has been accepted for inclusion in a future issue of this magazine.

Content is final as presented, with the exception of pagination.

A ccepted from O pen C all

Machine Learning Paradigms for


Next-Generation Wireless Networks
Chunxiao Jiang, Haijun Zhang, Yong Ren, Zhu Han,
Kwang-Cheng Chen, and Lajos Hanzo

Abstract able spectrum with the aid of learning, altruistical-


ly controlling transmission power for the sake of
Next-generation wireless networks are expect- conserving energy as well as adjusting the trans-
ed to support extremely high data rates and mission protocols.
radically new applications, which require a new Machine learning has found wide-ranging
wireless radio technology paradigm. The chal- applications in image/audio processing, finance
lenge is that of assisting the radio in intelligent and economics, social behavior analysis, project
adaptive learning and decision making, so that management, and so on [2]. Explicitly, a machine
the diverse requirements of next-generation wire- learns the execution of a particular task T, with
less networks can be satisfied. Machine learning the goal of maintaining a specific performance
is one of the most promising artificial intelligence metric P, based on a particular experience E,
tools, conceived to support smart radio terminals. where the system aims to reliably improve its
Future smart 5G mobile terminals are expected performance P while executing task T, again by
to autonomously access the most meritorious exploiting its experience E. Depending on how
spectral bands with the aid of sophisticated spec- we specify T, P, and E, the learning might also be
tral efficiency learning and inference, in order to referred to as data mining, autonomous discov-
control the transmission power, while relying on ery, database updating, programming by example,
energy efficiency learning/inference and simul- and so on [3]. Machine learning algorithms can
taneously adjusting the transmission protocols be simply categorized as supervised and unsuper-
with the aid of quality of service learning/infer- vised learning, where the adjectives “supervised/
ence. Hence we briefly review the rudimentary unsupervised” indicate whether there are labeled
concepts of machine learning and propose their samples in the database. Later, reinforcement
employment in the compelling applications of learning emerged as a new category that was
5G networks, including cognitive radios, massive inspired by behavioral psychology. It is concerned
MIMOs, femto/small cells, heterogeneous net- with an agent’s certain form of reward/utility, who
works, smart grid, energy harvesting, device-to- is connected to its environment via perception
device communications, and so on. Our goal is and action. The family of machine learning algo-
to assist the readers in refining the motivation, rithms can also be categorized based on their sim-
problem formulation, and methodology of pow- ilarity in terms of their functionality and structure,
erful machine learning algorithms in the context yielding regression algorithms, instance-based
of future networks in order to tap into hitherto algorithms, regularization algorithms, decision tree
unexplored applications and services. algorithms, Bayesian algorithms, clustering algo-
Chunxiao Jiang is with the rithms, association rule based learning algorithms,
Tsinghua Space Center. Introduction artificial neural networks, deep learning algo-
Y. Ren is with Tsinghua Radical and sometime even un-orthodox next-gen- rithms, dimension reduction algorithms, ensem-
University. eration networking concepts have received sub- ble algorithms, and so on. In this article, we will
stantial attention both in the academic as well as introduce the basic concept of machine learning
Haijun Zhang is with the industrial communities. One of their driving forces algorithms and the corresponding applications
University of Science and is that of providing unprecedented data rates for according to the category of supervised, unsuper-
Technology Beijing, China
supporting radical new applications. Specifically, vised, and reinforcement learning.
Zhu Han is with the next-generation networks are expected to learn Machine learning can be widely used in model-
University of Houston. the diverse and colorful characteristics of both ing various technical problems of next-generation
the users’ ambience as well as human behavior, systems, such as large-scale MIMOs, device-to-
Kwang-Cheng Chen is with in order to autonomously determine the opti- device (D2D) networks, heterogeneous networks
the University of South mal system configurations. These smart mobile constituted by femtocells and small cells, and so
Florida
terminals have to rely on sophisticated learning on. Figure 2 portrays the family-tree of machine
Lajos Hanzo is with the and decision-making. Machine learning, as one learning techniques and their potential applica-
University of Southampton. of the most powerful artificial intelligence tools, tions in 5G. Against this background, we embark
constitutes a promising solution [1]. As shown in on investigating the family of learning techniques.
Digital Object Identifier: Fig. 1, we may envision an intelligent radio that Specifically, in the following sections we consider
10.1109/MWC.2016.1500356WC is capable of autonomously accessing the avail- supervised learning, unsupervised learning, and

2 1536-1284/16/$25.00 © 2016 IEEE IEEE Wireless Communications • Accepted for Publication


This article has been accepted for inclusion in a future issue of this magazine. Content is final as presented, with the exception of pagination.

reinforcement learning. Each section consists of


several subsections, discussing specific learning Smart
Radio learning
antenna
models, such as regression models and the k-near-
est neighbor (KNN) algorithm, support vector RF ADC
Observations
Learning
machines (SVM) and Bayesian learning; k-means module algorithm
DAC
clustering, principal and independent component
analysis; and partially observed Markov decision
processes, Q-learning, and the multi-armed bandit Action Utility and cost
Control
selection evaluation
technique. Each section commences with the intro-
duction of the learning model and its applications
in 5G networks. Finally, our conclusions are drawn. Figure 1. Intelligent radio learning paradigm.

Supervised Learning in
shown that with the aid of an appropriate nonlin-
Wireless Communications ear mapping to a sufficiently high dimension, the
Regression Models, KNN and SVM: data from two classes can always be separated by
a hyperplane [3 p. 21, 185, 239, 349] .
MIMO Channel and Energy Learning Applications: These models can be used for
Models: Regression analysis relies on a statisti- estimating or predicting radio parameters that are
cal process for estimating the relationships among associated with specific users. For example, in
variables. The goal of regression analysis is to pre- massive MIMO systems associated with hundreds
dict the value of one or more continuous-valued of antennas, both detection and channel estima-
estimation targets, given the value of a D-dimen- tion lead to high-dimensional search-problems,
sional vector x of input variables. The estimation which can be addressed by the above-mentioned
target is a function of the independent variables. learning models. In order to generalize the SVM
In linear regression, the regression function is function for employment in data classification
linear, while in logistic regression, it is a logistic problems, its hierarchical version, referred to as
function assuming a common sigmoid curve. The H-SVM, was proposed in [4], where each hierar-
KNN and SVM algorithms are mainly utilized for chical level consisted of a finite number of SVM
classification of points/objects. In KNN, an object classifiers. This regime was used for the estima-
is classified into a specific category by a majority tion of the Gaussian channel’s noise level in a
vote of the object’s neighbors, with the object MIMO-aided wireless network having t transmit
being assigned to the class that is most common antennas and r receive antennas. By exploiting the
among its k nearest neighbors. The output may be training data, the H-SVM model was trained for
constituted by a specific property of the object, the estimation of the channel noise statistics.
such as for example the average of the values In heterogeneous networks constituted by
of its k nearest neighbors. By contrast, the SVM diverse cells, handovers may be frequent, where
algorithm relies on nonlinear mapping, which both the KNN and SVM can be applied to finding
transforms the original training data into a high- the optimal handover solutions. At the application
er dimension where it becomes separable and layer, these models can also be used for learning
then it searches for the optimal linear separating the mobile terminal’s specific usage pattern in
hyperplane that is capable of separating one class diverse spatio-temporal and device contexts, as
from another, again in this higher dimension. They discussed in [5]. This may then be exploited for
correspond to non-linear classification methods prediction of the configuration to be used in the
relying on the family of kernel methods. It was location-specific interface. Given a set of contex-

Machine learning in 5G

Supervised learning Unsupervised learning Reinforcement learning

Regression model, Bayesian learning K-means clustering PCA and ICA MDP, POMDP, Q-learning, multi-armed bandit
KNN, SVM apps in 5G: apps in 5G: apps in 5G: apps on 5G:
apps in 5G: Massive MIMO small cell clustering; spectrum sensing; decision making under unknown network
massive MIMO channel channel estimation; WiFi association; anomaly/fault/intrusion conditions, resource competition in femto/small
estimation/detection; spectrum sensing/ device-to-device user detection; signal cell channel selection and spectrum sharing for
user location/behavior detection and clustering; HetNet dimension reduction smart device-to-device networks, energy modeling in
learning/classification learning in CR clustering grid user classification energy harvesting; HetNet selection/association

Technologies: massive MIMO, femto/small cells and heterogeneous networks (HetNets), cloud radio access networks, cognitive radio, full duplex, energy harvesting, etc.
Machine learning applications: channel estimation/detection, spectrum sensing/access, cell/user clustering, switch and handover among HetNets,
signal dimension reduction, energy modeling, user behavior analysis, location prediction, intrusion/fault/anomaly detection,
cell/channel selection association.

Figure 2. Radio learning architecture.

IEEE Wireless Communications • Accepted for Publication 3


This article has been accepted for inclusion in a future issue of this magazine. Content is final as presented, with the exception of pagination.

HMM is a tool designed Category Learning techniques Key characteristics Application in 5G

for representing prob- • Estimate the variables’ relationships


Regression models Energy learning [5]
ability distributions of • Linear and logistics regression

sequences of observa- K-nearest neighbor • Majority vote of neighbors Energy learning [5]
Supervised
tions. It can be consid- learning • Non-linear mapping to high dimension
Support vector machines MIMO channel learning [4]
ered a generalization of • Separate hyperplane classification

a mixture-based model, Bayesian learning


• A posteriori distribution calculation • Massive MIMO learning [6]
• GM, EM, and HMM • Cognitive spectrum learning [7–9]
where the hidden
Table 1. Supervised machine learning algorithms.
variables, which control
the specific mixture tual input cues, machine learning algorithms are namely by a weighted sum of Gaussian distribu-
of the component to capable of exploiting the user context learned tions having different variances, and then estimat-
for the sake of dynamically classifying the cues ed with the aid of the EM algorithm.
be selected for each into a system state for the sake of saving energy, Another three closely related applications may
while maintaining a high level of user satisfaction. be found in cognitive radio networks. In [7], a
observation, are related Donohoo et al. [5] also conducted experiments cooperative wideband spectrum sensing scheme
to each other through a using five real user profiles, including the user-lo- based on the EM algorithm was proposed for the
cations and energy consumption, but their data detection of a primary user (PU) supported by a
Markov process, rather is not accessible to the public. The experiment multi-antenna assisted cognitive radio network.
than being independent showed that up to 90 percent successful energy This iterative technique first created the log-like-
demand prediction is possible with the aid of the lihood function of both the unknown spectrum
of each other. KNN algorithms. occupancy as well as of the channel information
and of the noise in the “E” step. Then, it maxi-
Bayesian Learning: mized the log-likelihood function for the sake of
inferring the unknown information during the “M”
Massive MIMO and Cognitive Radio step, which was carried out by jointly detecting
Models: The philosophy of Bayesian learning both the PU signal as well as estimating the chan-
is to compute the a posteriori probability distribu- nel’s unknown frequency response and the noise
tion of the target variables conditioned on its input variance of multiple subbands.
signals and on all of the training instances. Some In contrast to [7], the authors in [8] construct-
simple examples of generative models that may ed a HMM relying on a two-state hidden Markov
be learned with the aid of Bayesian techniques process, where the PUs are present or absent and
include, but are not limited to, the Gaussians mix- a two-state observation space, indicating whether
ture model (GM), expectation maximization (EM), the PUs are present or absent. Furthermore, the EM
and hidden Markov models (HMM) [3 p. 445]. algorithm was invoked for finding the true channel
GM is a model where each data point belongs parameters, such as the sojourn time of the avail-
to one of several clusters or groups, and the data able channels, the inactive states of the PUs, and
points within each cluster are Gaussian distributed. the PUs’ signal strength. Finally, the third application
EM is a generalization of maximum likelihood of Bayesian learning was advocated in [9], where a
estimation, which iteratively finds the most likely tomography model, belonging to the Bayesian infer-
solutions or parameters. It is characterized by two ence framework, was proposed for conceiving and
steps: the “E” step that chooses a function repre- statistically characterizing a range of techniques that
senting the lower bound of the likelihood, and the are capable of extracting the prevalent parameters
“M” step that finds the parameters maximizing and traffic/interference patterns for employment
the chosen function. in cognitive radio networks at both the link layer
HMM is a tool designed for representing prob- and network layer. The parameters collected includ-
ability distributions of sequences of observations. ed both the path-delay as well as the proportion
It can be considered a generalization of a mix- of successful packet receptions, while the estimat-
ture-based model, where the hidden variables, ed parameter was the link’s successful transmission
which control the specific mixture of the com- probability. The Bayesian estimators were derived
ponent to be selected for each observation, are for single/multiple transmissions in single/multi-
related to each other through a Markov process, ple path scenarios. In Table 1, we summarize the
rather than being independent of each other. basic characteristics and applications of supervised
Applications: The Bayesian learning model machine learning algorithms.
may be readily invoked for spectral characteristic
learning and estimation in next-generation net- Unsupervised Learning in
works. To address the pilot contamination prob-
lem encountered in massive MIMO systems, the Wireless Communications
authors of [6] estimated both the channel param- K-Means Clustering:
eters of the desired links in a target cell as well as
those of the interfering links of the adjacent cells, Heterogeneous Networks
where channel estimation was carried out with Models: K-means clustering aims for partition-
the aid of sparse Bayesian learning techniques. ing n observations into k clusters, where each
Based on the observation of received signals, the observation belongs to the closest cluster. It
channel component was first modeled by a GM, defines the centroid of a cluster as the center of

4 IEEE Wireless Communications • Accepted for Publication


This article has been accepted for inclusion in a future issue of this magazine. Content is final as presented, with the exception of pagination.

gravity, that is, the mean value of the points within non-Gaussian and mutually independent, and they Principal component
the cluster. The clustering algorithm proceeds in are referred to as the independent components
an iterative manner, where an object is assigned of the observed data, which can be found by ICA analysis (PCA)
to the specific cluster whose centroid is nearest [3 p. 115]. transforms a set of
to the object based on the Euclidean distance Applications: Both the PCA and ICA consti-
‘similarity metric’, and then the in-cluster differ- tute powerful statistical signal processing tech- potentially correlated
ences are minimized by iteratively updating the niques devised to recover statistically independent
cluster-centroid, until ‘convergence’ is achieved. source signals from their linear mixtures. One
variables into a set of
Explicitly, convergence is deemed to be achieved of their major applications may be found in the uncorrelated variables
when the assignment becomes stable, that is, the area of anomaly-detection, fault-detection, and
clusters formed in the current round are the same as intrusion-detection problems of wireless networks, referred to as the princi-
those formed in the previous round [3 p. 161, 317]. which rely on traffic monitoring. Furthermore, sim- pal components, where
Applications: Clustering is a common problem ilar problems may also be solved in sensor net-
in 5G networks, especially in heterogeneous sce- works, mesh networks, and so on. They can also the number of principal
narios associated with diverse cell sizes as well as be invoked for the physical layer signal dimen-
WiFi and D2D networks. For example, the small sion reduction of massive MIMO systems or to
components is less than
cells have to be carefully clustered to avoid inter- classify the primary users’ behaviors in cognitive or equal to the number
ference using coordinated multi-point transmis- radio networks. As a further example, in [11] PCA
sion (CoMP), while the mobile users are clustered and ICA were applied in a smart grid scenario to of original variables.
to obey an optimal offloading policy, the devices recover the simultaneous wireless transmissions
are clustered in D2D networks to achieve high of smart utility meters installed in each home. At
energy efficiency, the WiFi users are clustered to the power utility station, it was required to sepa-
maintain an optimal access point association, and rate the signals received from all the smart meters
so on. In [10], the authors considered a hybrid before the signals can be decoded. The statistical
optical/wireless network scenario, in order to properties of the signals were exploited to blindly
reduce the overall wireless tele-traffic by encour- separate them using ICA. This operation is capa-
aging the utilization of the high-capacity optical ble of enhancing both the transmission efficiency
infrastructure. A mixed integer programming by avoiding channel estimation in each frame, as
(MIP) problem was formulated to jointly optimize well as data security by eliminating any wideband
both the gateway partitioning and the virtual-chan- interference or jamming signals. More explicitly,
nel allocation based on classic k-means clustering, a substantial security enhancement was achieved
which was employed to partition the mesh access by a robust version of the PCA-based method,
points (MAPs) into several groups. The proposed which exploited the sparse, low-rank nature of
scheme commenced its operation from an initial the auto-covariance matrices of the smart meter-
gateway access point (GAP) set, which can be ing signal and of the wideband interferer, respec-
plucked by a random selection from the set of tively, in order to confidently separate them prior
MAPs, or can be more astutely determined using to ICA processing. Another pertinent example is
a meritorious initialization criterion. Next, each found in cognitive radio scenarios, where the so
MAP is assigned to its nearest GAP. If several eli- called Boolean ICA relied on the Boolean mixing
gible GAPs are in the vicinity, then the specific of OR, XOR, and other functions of binary signals
GAP that has a readily available virtual channel [12]. It was also incorporated into the PU sepa-
is chosen. Finally, by using the classic k-means ration problem often encountered in cognitive
clustering algorithm, the MAPs are divided into k radio networks for the sake of distinguishing and
groups associated with the closest GAPs. characterizing the activities of PUs in the context
of collaborative spectrum sensing. Furthermore,
Principal and Independent Component the observations of the secondary users (SUs)
were modeled as Boolean OR mixtures of the
Analysis: Smart Grid and Cognitive Radio underlying binary PU sources. An iterative algo-
Models: Principal component analysis (PCA) rithm, called Binary ICA, was developed to deter-
transforms a set of potentially correlated variables mine the activities of the underlying latent signal
into a set of uncorrelated variables, referred to sources, such as the PUs. It was demonstrated
as the principal components, where the number that given m monitors or SUs, the activities of up
of principal components is less than or equal to to (2m – 1) distinct PUs can be inferred. In Table 2,
the number of original variables. Basically, the we summarize the basic characteristics and appli-
first principal component has the largest possible cations of unsupervised machine learning algo-
variance (i.e., accounts for as much of the vari- rithms.
ability in the data as possible), and each succeed-
ing component in turn has the highest variance Reinforcement Learning in
possible under the constraint that it is orthogonal
to (i.e., uncorrelated with) the preceding compo- Wireless Communications
nents. The principal components are orthogonal, Partially Observable Markov Decision
because they are the eigenvectors of the covari-
ance matrix, which is symmetric. By contrast, inde- Process: Energy Harvesting
pendent component analysis (ICA) is a statistical Models: Markov decision processes (MDPs)
technique conceived to reveal hidden factors that provide a mathematical framework for model-
underlie sets of random variables, measurements, ing decision making in specific situations, where
or signals. In the model, the data variables are the outcomes are partly random and partly under
assumed to be linear mixtures of some unknown the control of a decision maker, as illustrated in
latent variables, and the mixing system is also Fig. 3a. At each time step, the process is in some
unknown. The latent variables are assumed to be state s, and the decision maker may opt for any

IEEE Wireless Communications • Accepted for Publication 5


This article has been accepted for inclusion in a future issue of this magazine. Content is final as presented, with the exception of pagination.

Category Learning techniques Key characteristics Application in 5G Classical applications found in the literature
include the network selection/association prob-
Unsupervised K-means clustering • K partition clustering Heterogeneous lems of heterogeneous networks (HetNets), chan-
learning • Iterative updating algorithm networks [10] nel sensing, and user access in cognitive radio
networks, and so on. Furthermore, energy har-
PCA • Orthogonal transformation Smart grid [11] vesting (EH) has also been extensively modeled
using MDP/POMDP, where the limited battery
ICA • Reveal hidden independent Spectrum learning in and the time-variant channels are usually regard-
factors cognitive radio [12] ed as the environment, while the users’ channel
selection or battery utilization are usually con-
Table 2. Unsupervised machine learning algorithms. sidered as the actions. For instance, in [13] the
transmission power control problems of EH sys-
of the legitimate actions a that is available in state tems were investigated using the POMDP model,
s. The process responds at the next time step by where the state space was defined by including
randomly moving into a new state s’, and giving the battery state, the channel state, the packet
the decision maker a corresponding reward Ua(s). transmission/reception states, and an action by
The probability that the process moves into its the node, which corresponded to sending a pack-
new state s’ is influenced both by the specific et at a certain power level. The feedback messag-
action chosen, as well as by the system’s inherent es implicitly provided the EH system with partial
transitions, formally described by the state transi- channel state information (CSI), which resulted
tion probability Pa(s’|s, a). Given s and a, the state in the corresponding POMDP formulation. Since
transition probability is conditionally independent finding exact solutions to the POMDP tends to be
of all previous states and actions, that is, the state computationally intractable [13], a pair of com-
transitions of an MDP process satisfy the funda- putationally efficient suboptimal solutions, i.e. the
mental Markov property. By contrast, a partially maximum-likelihood heuristic policy and the vot-
observable Markov decision process (POMDP) ing heuristic policy, were explored.
may be viewed as the generalization of a MDP,
where the agent is unable to directly observe the Q-Learning: Femto/Small Cells
underlying state transitions and hence only has Models: Q-learning may be invoked to find an
partial knowledge, as shown in Fig. 3b. The agent optimal action policy for any given (finite) Mar-
has to keep track of both the probability distri- kov decision process, especially when the system
bution of the legitimate states, based on a set of model is unknown, as shown in Fig. 3c. It is a
observations, as well as of the observation proba- model-free reinforcement learning technique and
bilities and of the underlying MDP [3 p. 517]. as such it can be used in conjunction with MDP
Applications: The family of MDP/POMDP models. In such a case, the Q-learning model is
models constitutes ideal tools for supporting deci- also comprised of an agent, of the states S and of
sion making in 5G networks, where the users may a set of actions A per state. By executing an action
be regarded as agents and the network consti- in a specific state, the agent gleans a reward and
tutes the environment. There are usually three the goal is to maximize its accumulated reward.
steps associated with modeling a problem using Such a reward is illustrated by a Q-function,
MDP. The first step is to specify the system’s state where “Q” is initialized to be an (arbitrary) fixed
space and the decision maker’s action space, as value. Then, “Q” is updated in an iterative manner
well as verifying the Markov property. The sec- after the agent carries out an action and observes
ond step is that of constructing the state transition the resultant reward as well as the associated new
probabilities Pa(s’|s, a) formulated as the probabil- state at each time-instant [3 p. 517].
ity of traversing from state s to s’under action a. The Applications: Q-learning has also been exten-
last step is to quantify both the decision maker’s sively applied in heterogeneous networks, usual-
immediate reward Ua(s) and its long-term reward ly in conjunction with the aforementioned MDP
using Bellman’s equation [13]. Then, a carefully models. In [14] the authors presented a hetero-
constructed iterative algorithm may be conceived geneous fully distributed multi-objective strategy
to identify the optimal action in each state. based on a reinforcement learning model con-

System/environment System/environment System/environment

S1 S4 S1 S4 S1 S4

Known True: Unknown


P(s'|s,a) P(s'|s,a) P(s'|s,a)
S3 S3 Partially S3
S2 S2 observed: S2
S5 S5 S5
O(s'|s,a)

Actions Rewards Actions Rewards Actions Observe, learn, rewards


V(s) = max U(s)+P(s'|s,a)U(s') V(s) = max U(s)+O(s'|s,a)U(s') Q= old value + learned value

Figure 3. Illustration of reinforcement learning: a) Markov decision process; b) partially observed Markov decision process;
c) Q-learning.

6 IEEE Wireless Communications • Accepted for Publication


This article has been accepted for inclusion in a future issue of this magazine. Content is final as presented, with the exception of pagination.

structed for the self-configuration/optimization of Category Learning techniques Key characteristics Application in 5G
femtocells. The model was supposed to solve both
the resource allocation and interference coor- Reinforcement MDP/POMDP • Bellman equation Energy harvesting [13]
dination problems in the downlink of femtocell learning maximization
networks. The main objectives of the learning pro- • Value iteration algorithm
cess are two-fold: first, to acquire spectrum allo-
cation awareness and to identify the availability of Q-learning • Unknown system Femto and small cells
unused spectral slots for the provision of opportu- transition model [14, 15]
nistic access; second, to select sub-channels from • Q-function maximization
the available spectrum pool and to configure the
terminals supported by femtocells to operate Multi-armed bandit • Exploration vs. Device-to-device
under carefully constructed restrictions to avoid exploitation networks [16]
interference and to meet the quality of service • Multi-armed bandit game
(QoS) requirements. Another example is consti- Table 3. Reinforcement machine learning algorithms.
tuted by dense small cell networks regarding their
cell outage management and compensation [15].
The system’s state was constituted by the specific vice (D2D) communication system integrated into
allocation of users to the resource blocks of small a cellular network, and another one in the context
cells, as well as by the channel quality, while the of emerging next-generation networks [16]. The
actions were constituted by the downlink power selfish D2D users aimed to optimize their own
control actions, with the rewards being quanti- performance by camping on the vacant cellular
fied in terms of signal-to-interference-plus-noise channels, whose statistics were unknown to the
ratio (SINR) improvement. It was demonstrated users. This distributed channel selection problem
that the compensation strategy based on the rein- was in harmony with the typical MP-MAB settings,
forcement learning model attained an exceptional and thus it was modeled as an MP-MAB game.
performance improvement. Specifically, every D2D user was modeled as a
player of the MP-MAB game, while the channels
Multi-Armed Bandits: were regarded as arms and choosing a channel
corresponds to pulling an arm. The authors pro-
Device-to-Device Networks posed a channel selection strategy consisting of
Models: In practice, multi-armed bandits two main blocks, namely the calibrated forecast-
(MAB) have been used to model resource allo- ing and the no-regret bandit learning strategies. In
cation problems operating under a fixed budget Table 3, we summarize the rudimentary character-
by carefully proportioning resources among com- istics and applications of reinforcement machine
peting projects, whose properties are only partial- learning algorithms.
ly known at the time of resource allocation, but
which may become better understood as time Future Research and Conclusions
passes. Since the agent has no initial knowledge A range of future research ideas on machine
about the machines, the crucial trade-off they con- learning in 5G networks can be summarized as
front at each instance is between the “exploita- follows.
tion” of the specific machine that has the highest The family of supervised learning techniques
expected payoff and the “exploration” required relies on known models and labels that can sup-
to glean more information about the expected port the estimation of unknown parameters. They
payoffs of the other machines. can be utilized for massive MIMO channel esti-
The MAB problem may also be extended into a mation and data detection, spectrum sensing and
multi-player, multi-armed bandit game (MP-MAB), white space detection in cognitive radio, as well
where the reward gleaned by any player depends as for adaptive filtering in signal processing for
on the specific decisions of other players. The key 5G communications. They can also be applied
idea of the proposed approach is to enable each in higher-layer applications, such as inferring the
user to forecast the future actions of its opponents mobile users’ locations and behaviors, which can
based on public knowledge and to proceed by assist the network operators to improve the quali-
best responding to the predicted joint action pro- ty of their services.
file using some bandit strategy [3 p. 517]. Unsupervised learning relies on the input data
Applications: The MAB and MP-MAB mod- itself in a heuristic manner. It can be utilized for
els, as a family of emerging signal processing cell clustering in cooperative ultra-dense small-cell
tools, are capable of solving challenging resource networks, for access point association in ubiqui-
allocation problems in wireless scenarios, where tous WiFi networks, for heterogeneous base sta-
either the channel conditions or some other tion clustering in HetNets, and for load-balancing
wireless environment parameters have to be in HetNets. It can also be applied in anomaly/
“explored,” while the known channels also have fault/intrusion detection and for the users’ behav-
to be “exploited” by a group of users. Gener- ior-classification.
ally, these models may be beneficially used in Reinforcement learning relies on a dynamic
multi-player adaptive decision making problems, iterative learning and decision-making process.
where selfish players infer an optimal joint action It can be utilized for inferring the mobil users’
profile from their successive interactions with a decision making under unknown network condi-
dynamic environment, and finally settle at some tions, for example during channel access under
equilibrium point. This problem has indeed been unknown channel availability conditions in spec-
encountered in many wireless networking sce- trum sharing, for distributed resource allocation
narios, with a compelling one being the channel under unknown resource quality conditions in
selection problem of a distributed device-to-de- femto/small-cell networks, and base station asso-

IEEE Wireless Communications • Accepted for Publication 7


This article has been accepted for inclusion in a future issue of this magazine. Content is final as presented, with the exception of pagination.
[2] M. J. Er and Y. Zhou, “Theory and Novel Applications of
The classes of super- ciation under the unknown energy status of the
Machine Learning,” InTech, 2009.
base stations in energy harvesting networks. [3] E. Alpaydm, Introduction to Machine Learning, 3rd ed., The
vised, unsupervised, Furthermore, computational intelligence para- MIT Press, Cambridge, Massachusetts, 2014.
and reinforcement digms, such as neural networks and neuro-fuzzy [4] P. Zhou, Y. Chang, and J. A. Copeland, “Determination of
methods, swarm intelligence algorithms such Wireless Networks Parameters through Parallel Hierarchical
Support Vector Machines,” IEEE Trans. Parallel Distrib. Syst.,
learning tools were as ant colony optimization, and evolutionary vol. 23, no. 3, Mar. 2012, pp. 505–12.
algorithms such as the competitive imperialist [5] B. K. Donohoo et al., “Context-Aware Energy Enhancements
investigated, along with algorithm, may also be applied to improve the for Smart Mobile Devices,” IEEE Trans. Mobile Comput., vol.
13, no. 8, Aug. 2014, pp. 1720–32.
the corresponding mod- performance of 5G networks. Among those com-
[6] C.-K. Wen et al., “Channel Estimation for Massive MIMO
pelling techniques, neural networks and deep Using Gaussian-Mixture Bayesian Learning,” IEEE Trans.
eling methodology and learning have recently become particularly pop- Wireless Commun., vol. 14, no. 3, Mar. 2015, pp. 1356–68.
possible future applica- ular. Generally, a neural network consists of a [7] K. W. Choi and E. Hossain, “Estimation of Primary User
number of neurons and weighted connections Parameters in Cognitive Radio Systems via Hidden Markov
Model,” IEEE Trans. Signal Process., vol. 61, no. 3, Feb. 2013,
tions in 5G networks. among them, where the neurons can be regarded pp. 782–95.
as variables and the weights can be viewed as [8] A. Assra, J. Yang, and B. Champagne, “An EM Approach
In a nutshell, machine parameters. The network should be appropriately for Cooperative Spectrum Sensing in Multi-Antenna CR
Networks,” to appear in IEEE Trans. Veh. Technol.; DOI:
learning is an exiting configured with the aid of learning techniques to
10.1109/TVT.2015.2408369, 2015.
ensure that the application of a set of inputs pro- [9] C.-K. Yu, K.-C. Chen, and S.-M. Cheng, “Cognitive Radio Net-
area for artificial intelli- duces the desired set of outputs. Explicitly, this can work Tomography,” IEEE Trans. Veh. Technol., vol. 59, no. 4,
gence aided networking be achieved by iteratively adjusting the weights of May 2010, pp. 1980–97.
the existing connections among all neuron pairs [10] M. Xia et al., “Optical and Wireless Hybrid Access Net-
works: Design and Optimization,” OSA/IEEE J. Opt. Com-
research! with the aid of learning based on the labeled data mun. Netw., vol. 4, no. 10, Oct. 2012, pp. 749–59.
for supervised learning or unlabeled data for unsu- [11] R. C. Qiu et al., “Cognitive Radio Network for the Smart
pervised learning. Neural networks have been Grid: Experimental System Architecture, Control Algorithms,
widely utilized for spectral white state estimation Security, and Microgrid Testbed,” IEEE Trans. Smart Grid, vol.
2, no. 4, Dec. 2011, pp. 724–40.
[17], prediction [18], and handoff decisions [19] [12] H. Nguyen et al., “Binary Inference for Primary User Sep-
in cognitive radio networks. Note that the algo- aration in Cognitive Radio Networks,” IEEE Trans. Wireless
rithms introduced in this article are only limited Commun., vol. 12, no. 4, Apr. 2013, pp. 1532–42.
samples of the machine learning field. There are [13] A. Aprem, C. R. Murthy, and N. B. Mehta, “Transmit Power
Control Policies for Energy Harvesting Sensors with Retrans-
many other algorithms that can also be applied missions,” IEEE J. Sel. Topics Signal Process., vol. 7, no. 5,
to the next-generation networks. For example, the Oct. 2013, pp. 895–906.
family of evolutionary algorithms, such as genetic [14] G. Alnwaimi, S. Vahid, and K. Moessner, “Dynamic Hetero-
algorithms can solve optimization problems by geneous Learning Games for Opportunistic Access in LTE-
Based Macro/Femtocell Deployments,” IEEE Trans. Wireless
mimicking a natural selection process, which can Commun., vol. 14, no. 4, Apr. 2015, pp. 2294–2308.
be utilized to solve resource allocation problems [15] O. Onireti et al., “A Cell Outage Management Framework
in HetNets [20]. By contrast, machine learning for Dense Heterogeneous Networks,” IEEE Trans. Veh. Tech-
relies on two phases, the training phase and the nol., vol. 65, no. 4, 2016, pp. 2097–2113; DOI: 10.1109/
TVT.2015.2431371.
testing phase, where the training phase imposes [16] S. Maghsudi and S. Stanczak, “Channel Selection for Net-
a much higher complexity than the testing phase. work-Assisted D2D Communication via No-Regret Bandit
Due to the energy constraints and computational Learning with Calibrated Forecasting,” IEEE Trans. Wireless
complexity constraints of mobil terminals, it is rec- Commun., vol. 14, no. 3, Mar. 2015, pp. 1309–22.
[17] K. Tsagkaris, A. Katidiotis, and P. Demestichas, “Neural
ommended to only implement the testing phase Network-Based Learning Schemes for Cognitive Radio Sys-
on shirt-pocket-sized mobile terminals. tems,” Computer Commun., vol. 31, no. 14, Sep. 2008, pp.
This article reviewed the benefits of artificial 3394–3404.
intelligence aided wireless systems equipped with [18] V. K. Tumuluru, P. Wang, and D. Niyato, “A Neural Net-
work Based Spectrum Prediction Scheme for Cognitive
machine learning. We introduced the major fami- Radio,” Proc. IEEE ICC, May 2010.
lies of machine learning algorithms and discussed [19] L. Giupponi and A. I. Perez-Neira, “Fuzzy-Based Spectrum
their applications in the context of next-generation Handoff in Cognitive Radio Networks,” Proc. CrownCom,
networks, including massive MIMOs, the smart May 2008.
[20] N. Sharma and A. S. Madhukumar, “Genetic Algorithm
grid, cognitive radios, heterogeneous networks, Aided Proportional Fair Resource Allocation in Multicast
femto/small cells, D2D networks, and so on. The OFDM Systems,” IEEE Trans. Broadcast., vol. 61, no. 1, Mar.
classes of supervised, unsupervised, and reinforce- 2015.
ment learning tools were investigated, along with
the corresponding modeling methodology and Biographies
possible future applications in 5G networks. In a Chunxiao Jiang [S’09-M’13-SM’15] received the B.S. in infor-
mation engineering from Beihang University in June 2008, and
nutshell, machine learning is an exiting area for the Ph.D. in electronic engineering from Tsinghua University
artificial intelligence aided networking research! in January 2013, both with the highest honors. From February
2013 to June 2016, Dr. Jiang was a postdoc in the Department
Acknowledgment of Electronic Engineering, Tsinghua University, during which he
visited the University of Maryland College Park and the Univer-
This research was supported by NSFC China under sity of Southampton. He is a recipient of the Best Paper Award
projects 61371079, 61471025, and 91338203, by from IEEE Globecom 2013 and the Best Student Paper Award
the Open Research Fund of the National Mobile from IEEE GlobalSIP 2015. Dr. Jiang became a Senior Member
Communications Research Laboratory, Southeast of IEEE in 2015. Currently, he is a research-track faculty in the
Tsinghua Space Center.
University (No. 2016D07), and also by a Postdoc-
toral Science Foundation funded project. H aijun Z hang (M’13) is a full professor at the University of
Science and Technology Beijing, China. From 2014 to 2016,
References he was a postdoctoral research fellow in the Department of
[1] M. van der Schaar and F. Fu, “Spectrum Access Games Electrical and Computer Engineering, the University of British
and Strategic Learning in Cognitive Radio Networks for Columbia (UBC), Vancouver, Canada. He received his Ph.D.
Delay-Critical Applications,” Proc. IEEE, vol. 97, no. 4, Apr. degree from Beijing University of Posts Telecommunications
2009, pp. 720–40. (BUPT). From September 2011 to September 2012, he visit-

8 IEEE Wireless Communications • Accepted for Publication


This article has been accepted for inclusion in a future issue of this magazine. Content is final as presented, with the exception of pagination.
ed the Centre for Telecommunications Research, King’s Col- Signal Processing in 2015, and several best paper awards at IEEE
lege London, London, UK, as a visiting research associate. Dr. conferences. Currently, he is an IEEE Communications Society
Zhang has published more than 70 papers and authored two Distinguished Lecturer.
books. He serves as an editor of the Journal of Network and
Computer Applications, Wireless Networks, Telecommunication K wang -C heng C hen [M’89-SM’94-F’07] is a professor with
Systems, and KSII Transactions on Internet and Information the Department of Electrical Engineering, University of South
Systems. He is serving or has served as the leading guest edi- Florida, after an academic career in Taiwan and an industrial
tor for IEEE Communications Magazine, IEEE Transactions on career in the U.S. He has contributed essential technology to
Emerging Topics in Computing, and ACM/Springer Mobile Net- various IEEE 802, Bluetooth, and LTE and LTE-A wireless stan-
works & Applications. He is serving or has served as general dards. In addition to service with IEEE journals and conferences,
co-chair of the 6th International Conference on Game Theory he founded and then chairs the TC on Social Networks of the
for Networks (GameNets’16) and 5GWN’17, the symposium IEEE Communications Society. Dr. Chen is an IEEE Fellow and
chair of the GameNets’14, track chair of the 15th IEEE Inter- has received a number of awards, such as the 2011 IEEE Com-
national Conference on Scalable Computing and Communi- Soc WTC Recognition Award, the 2014 IEEE Jack Neubauer
cations (ScalCom2015), and co-chair of the Workshop on 5G Memorial Award, and the 2014 IEEE ComSoc AP Outstanding
Ultra Dense Networks in ICC 2017. Paper Award. His recent research interests include wireless net-
works, social networks and network science, cybersecurity, and
Yong Ren [SM’16] received his B.S., M.S., and Ph.D. degrees data analytics.
in electronic engineering from Harbin Institute of Technology,
China, in 1984, 1987, and 1994, respectively. He worked as L ajos H anzo (http://www-mobile.ecs.soton.ac.uk) FREng,
a post doctor in the Department of Electronics Engineering, FIEEE, FIET, Fellow of EURASIP, DSc, received his degree in
Tsinghua University, China from 1995 to 1997. He is currently a electronics in 1976 and his doctorate in 1983. In 2009 he was
professor in the Department of Electronics Engineering and the awarded an honorary doctorate by the Technical University of
director of the Complexity Engineered Systems Lab (CESL) at Budapest, and in 2015 by the University of Edinburgh. During
Tsinghua University. He holds 12 patents, and has authored or his 38-year career in telecommunications he has held various
co-authored more than 100 technical papers on the behavior of research and academic posts in Hungary, Germany, and the
computer network, P2P networks, and cognitive networks. His UK. Since 1986 he has been with the School of Electronics
current research interests include complex systems theory and and Computer Science, University of Southampton, UK, where
its applications to the optimization and information sharing of he holds the chair in telecommunications. He has successfully
the Internet, Internet of Things and ubiquitous networks, cogni- supervised approximately 100 Ph.D. students, co-authored 20
tive networks, and cyber-physical systems. John Wiley/IEEE Press books on mobile radio communications
totalling in excess of 10,000 pages, published 1500+ research
Zhu Han [S’01-M’04-SM’09-F’14] received the B.S. degree in entries in IEEE Xplore, acted both as TPC and general chair of
electronic engineering from Tsinghua University in 1997, and IEEE conferences, presented keynote lectures, and has been
the M.S. and Ph.D. degrees in electrical and computer engi- awarded a number of distinctions. Currently he is directing a
neering from the University of Maryland, College Park, in 1999 60-member academic research team, working on a range of
and 2003, respectively. From 2000 to 2002 he was an R&D research projects in the field of wireless multimedia commu-
engineer at JDSU, Germantown, Maryland. From 2003 to 2006 nications sponsored by industry, the Engineering and Physical
he was a research associate at the University of Maryland. From Sciences Research Council (EPSRC) UK, the European Research
2006 to 2008 he was an assistant professor at Boise State Uni- Council’s Advanced Fellow Grant, and the Royal Society’s
versity, Idaho. Currently, he is a professor in the Electrical and Wolfson Research Merit Award. He is an enthusiastic support-
Computer Engineering Department, as well as in the Comput- er of industrial and academic liaison and he offers a range of
er Science Department at the University of Houston, Texas. industrial courses. He is also a governor of the IEEE VTS. From
His research interests include wireless resource allocation and 2008 to 2012 he was the editor-in-chief of the IEEE Press and a
management, wireless communications and networking, game chaired professor at Tsinghua University, Beijing. His research
theory, wireless multimedia, security, and smart grid commu- is funded by the European Research Council’s Senior Research
nication. He received an NSF Career Award in 2010, the Fred Fellow Grant. For further information on research in progress
W. Ellersick Prize of the IEEE Communication Society in 2011, and associated publications, please refer to http://www-mobile.
the EURASIP Best Paper Award for the Journal on Advances in ecs.soton.ac.uk Lajos has 24 000+ citations.

IEEE Wireless Communications • Accepted for Publication 9

You might also like