Professional Documents
Culture Documents
Supervised Learning in
shown that with the aid of an appropriate nonlin-
Wireless Communications ear mapping to a sufficiently high dimension, the
Regression Models, KNN and SVM: data from two classes can always be separated by
a hyperplane [3 p. 21, 185, 239, 349] .
MIMO Channel and Energy Learning Applications: These models can be used for
Models: Regression analysis relies on a statisti- estimating or predicting radio parameters that are
cal process for estimating the relationships among associated with specific users. For example, in
variables. The goal of regression analysis is to pre- massive MIMO systems associated with hundreds
dict the value of one or more continuous-valued of antennas, both detection and channel estima-
estimation targets, given the value of a D-dimen- tion lead to high-dimensional search-problems,
sional vector x of input variables. The estimation which can be addressed by the above-mentioned
target is a function of the independent variables. learning models. In order to generalize the SVM
In linear regression, the regression function is function for employment in data classification
linear, while in logistic regression, it is a logistic problems, its hierarchical version, referred to as
function assuming a common sigmoid curve. The H-SVM, was proposed in [4], where each hierar-
KNN and SVM algorithms are mainly utilized for chical level consisted of a finite number of SVM
classification of points/objects. In KNN, an object classifiers. This regime was used for the estima-
is classified into a specific category by a majority tion of the Gaussian channel’s noise level in a
vote of the object’s neighbors, with the object MIMO-aided wireless network having t transmit
being assigned to the class that is most common antennas and r receive antennas. By exploiting the
among its k nearest neighbors. The output may be training data, the H-SVM model was trained for
constituted by a specific property of the object, the estimation of the channel noise statistics.
such as for example the average of the values In heterogeneous networks constituted by
of its k nearest neighbors. By contrast, the SVM diverse cells, handovers may be frequent, where
algorithm relies on nonlinear mapping, which both the KNN and SVM can be applied to finding
transforms the original training data into a high- the optimal handover solutions. At the application
er dimension where it becomes separable and layer, these models can also be used for learning
then it searches for the optimal linear separating the mobile terminal’s specific usage pattern in
hyperplane that is capable of separating one class diverse spatio-temporal and device contexts, as
from another, again in this higher dimension. They discussed in [5]. This may then be exploited for
correspond to non-linear classification methods prediction of the configuration to be used in the
relying on the family of kernel methods. It was location-specific interface. Given a set of contex-
Machine learning in 5G
Regression model, Bayesian learning K-means clustering PCA and ICA MDP, POMDP, Q-learning, multi-armed bandit
KNN, SVM apps in 5G: apps in 5G: apps in 5G: apps on 5G:
apps in 5G: Massive MIMO small cell clustering; spectrum sensing; decision making under unknown network
massive MIMO channel channel estimation; WiFi association; anomaly/fault/intrusion conditions, resource competition in femto/small
estimation/detection; spectrum sensing/ device-to-device user detection; signal cell channel selection and spectrum sharing for
user location/behavior detection and clustering; HetNet dimension reduction smart device-to-device networks, energy modeling in
learning/classification learning in CR clustering grid user classification energy harvesting; HetNet selection/association
Technologies: massive MIMO, femto/small cells and heterogeneous networks (HetNets), cloud radio access networks, cognitive radio, full duplex, energy harvesting, etc.
Machine learning applications: channel estimation/detection, spectrum sensing/access, cell/user clustering, switch and handover among HetNets,
signal dimension reduction, energy modeling, user behavior analysis, location prediction, intrusion/fault/anomaly detection,
cell/channel selection association.
sequences of observa- K-nearest neighbor • Majority vote of neighbors Energy learning [5]
Supervised
tions. It can be consid- learning • Non-linear mapping to high dimension
Support vector machines MIMO channel learning [4]
ered a generalization of • Separate hyperplane classification
gravity, that is, the mean value of the points within non-Gaussian and mutually independent, and they Principal component
the cluster. The clustering algorithm proceeds in are referred to as the independent components
an iterative manner, where an object is assigned of the observed data, which can be found by ICA analysis (PCA)
to the specific cluster whose centroid is nearest [3 p. 115]. transforms a set of
to the object based on the Euclidean distance Applications: Both the PCA and ICA consti-
‘similarity metric’, and then the in-cluster differ- tute powerful statistical signal processing tech- potentially correlated
ences are minimized by iteratively updating the niques devised to recover statistically independent
cluster-centroid, until ‘convergence’ is achieved. source signals from their linear mixtures. One
variables into a set of
Explicitly, convergence is deemed to be achieved of their major applications may be found in the uncorrelated variables
when the assignment becomes stable, that is, the area of anomaly-detection, fault-detection, and
clusters formed in the current round are the same as intrusion-detection problems of wireless networks, referred to as the princi-
those formed in the previous round [3 p. 161, 317]. which rely on traffic monitoring. Furthermore, sim- pal components, where
Applications: Clustering is a common problem ilar problems may also be solved in sensor net-
in 5G networks, especially in heterogeneous sce- works, mesh networks, and so on. They can also the number of principal
narios associated with diverse cell sizes as well as be invoked for the physical layer signal dimen-
WiFi and D2D networks. For example, the small sion reduction of massive MIMO systems or to
components is less than
cells have to be carefully clustered to avoid inter- classify the primary users’ behaviors in cognitive or equal to the number
ference using coordinated multi-point transmis- radio networks. As a further example, in [11] PCA
sion (CoMP), while the mobile users are clustered and ICA were applied in a smart grid scenario to of original variables.
to obey an optimal offloading policy, the devices recover the simultaneous wireless transmissions
are clustered in D2D networks to achieve high of smart utility meters installed in each home. At
energy efficiency, the WiFi users are clustered to the power utility station, it was required to sepa-
maintain an optimal access point association, and rate the signals received from all the smart meters
so on. In [10], the authors considered a hybrid before the signals can be decoded. The statistical
optical/wireless network scenario, in order to properties of the signals were exploited to blindly
reduce the overall wireless tele-traffic by encour- separate them using ICA. This operation is capa-
aging the utilization of the high-capacity optical ble of enhancing both the transmission efficiency
infrastructure. A mixed integer programming by avoiding channel estimation in each frame, as
(MIP) problem was formulated to jointly optimize well as data security by eliminating any wideband
both the gateway partitioning and the virtual-chan- interference or jamming signals. More explicitly,
nel allocation based on classic k-means clustering, a substantial security enhancement was achieved
which was employed to partition the mesh access by a robust version of the PCA-based method,
points (MAPs) into several groups. The proposed which exploited the sparse, low-rank nature of
scheme commenced its operation from an initial the auto-covariance matrices of the smart meter-
gateway access point (GAP) set, which can be ing signal and of the wideband interferer, respec-
plucked by a random selection from the set of tively, in order to confidently separate them prior
MAPs, or can be more astutely determined using to ICA processing. Another pertinent example is
a meritorious initialization criterion. Next, each found in cognitive radio scenarios, where the so
MAP is assigned to its nearest GAP. If several eli- called Boolean ICA relied on the Boolean mixing
gible GAPs are in the vicinity, then the specific of OR, XOR, and other functions of binary signals
GAP that has a readily available virtual channel [12]. It was also incorporated into the PU sepa-
is chosen. Finally, by using the classic k-means ration problem often encountered in cognitive
clustering algorithm, the MAPs are divided into k radio networks for the sake of distinguishing and
groups associated with the closest GAPs. characterizing the activities of PUs in the context
of collaborative spectrum sensing. Furthermore,
Principal and Independent Component the observations of the secondary users (SUs)
were modeled as Boolean OR mixtures of the
Analysis: Smart Grid and Cognitive Radio underlying binary PU sources. An iterative algo-
Models: Principal component analysis (PCA) rithm, called Binary ICA, was developed to deter-
transforms a set of potentially correlated variables mine the activities of the underlying latent signal
into a set of uncorrelated variables, referred to sources, such as the PUs. It was demonstrated
as the principal components, where the number that given m monitors or SUs, the activities of up
of principal components is less than or equal to to (2m – 1) distinct PUs can be inferred. In Table 2,
the number of original variables. Basically, the we summarize the basic characteristics and appli-
first principal component has the largest possible cations of unsupervised machine learning algo-
variance (i.e., accounts for as much of the vari- rithms.
ability in the data as possible), and each succeed-
ing component in turn has the highest variance Reinforcement Learning in
possible under the constraint that it is orthogonal
to (i.e., uncorrelated with) the preceding compo- Wireless Communications
nents. The principal components are orthogonal, Partially Observable Markov Decision
because they are the eigenvectors of the covari-
ance matrix, which is symmetric. By contrast, inde- Process: Energy Harvesting
pendent component analysis (ICA) is a statistical Models: Markov decision processes (MDPs)
technique conceived to reveal hidden factors that provide a mathematical framework for model-
underlie sets of random variables, measurements, ing decision making in specific situations, where
or signals. In the model, the data variables are the outcomes are partly random and partly under
assumed to be linear mixtures of some unknown the control of a decision maker, as illustrated in
latent variables, and the mixing system is also Fig. 3a. At each time step, the process is in some
unknown. The latent variables are assumed to be state s, and the decision maker may opt for any
Category Learning techniques Key characteristics Application in 5G Classical applications found in the literature
include the network selection/association prob-
Unsupervised K-means clustering • K partition clustering Heterogeneous lems of heterogeneous networks (HetNets), chan-
learning • Iterative updating algorithm networks [10] nel sensing, and user access in cognitive radio
networks, and so on. Furthermore, energy har-
PCA • Orthogonal transformation Smart grid [11] vesting (EH) has also been extensively modeled
using MDP/POMDP, where the limited battery
ICA • Reveal hidden independent Spectrum learning in and the time-variant channels are usually regard-
factors cognitive radio [12] ed as the environment, while the users’ channel
selection or battery utilization are usually con-
Table 2. Unsupervised machine learning algorithms. sidered as the actions. For instance, in [13] the
transmission power control problems of EH sys-
of the legitimate actions a that is available in state tems were investigated using the POMDP model,
s. The process responds at the next time step by where the state space was defined by including
randomly moving into a new state s’, and giving the battery state, the channel state, the packet
the decision maker a corresponding reward Ua(s). transmission/reception states, and an action by
The probability that the process moves into its the node, which corresponded to sending a pack-
new state s’ is influenced both by the specific et at a certain power level. The feedback messag-
action chosen, as well as by the system’s inherent es implicitly provided the EH system with partial
transitions, formally described by the state transi- channel state information (CSI), which resulted
tion probability Pa(s’|s, a). Given s and a, the state in the corresponding POMDP formulation. Since
transition probability is conditionally independent finding exact solutions to the POMDP tends to be
of all previous states and actions, that is, the state computationally intractable [13], a pair of com-
transitions of an MDP process satisfy the funda- putationally efficient suboptimal solutions, i.e. the
mental Markov property. By contrast, a partially maximum-likelihood heuristic policy and the vot-
observable Markov decision process (POMDP) ing heuristic policy, were explored.
may be viewed as the generalization of a MDP,
where the agent is unable to directly observe the Q-Learning: Femto/Small Cells
underlying state transitions and hence only has Models: Q-learning may be invoked to find an
partial knowledge, as shown in Fig. 3b. The agent optimal action policy for any given (finite) Mar-
has to keep track of both the probability distri- kov decision process, especially when the system
bution of the legitimate states, based on a set of model is unknown, as shown in Fig. 3c. It is a
observations, as well as of the observation proba- model-free reinforcement learning technique and
bilities and of the underlying MDP [3 p. 517]. as such it can be used in conjunction with MDP
Applications: The family of MDP/POMDP models. In such a case, the Q-learning model is
models constitutes ideal tools for supporting deci- also comprised of an agent, of the states S and of
sion making in 5G networks, where the users may a set of actions A per state. By executing an action
be regarded as agents and the network consti- in a specific state, the agent gleans a reward and
tutes the environment. There are usually three the goal is to maximize its accumulated reward.
steps associated with modeling a problem using Such a reward is illustrated by a Q-function,
MDP. The first step is to specify the system’s state where “Q” is initialized to be an (arbitrary) fixed
space and the decision maker’s action space, as value. Then, “Q” is updated in an iterative manner
well as verifying the Markov property. The sec- after the agent carries out an action and observes
ond step is that of constructing the state transition the resultant reward as well as the associated new
probabilities Pa(s’|s, a) formulated as the probabil- state at each time-instant [3 p. 517].
ity of traversing from state s to s’under action a. The Applications: Q-learning has also been exten-
last step is to quantify both the decision maker’s sively applied in heterogeneous networks, usual-
immediate reward Ua(s) and its long-term reward ly in conjunction with the aforementioned MDP
using Bellman’s equation [13]. Then, a carefully models. In [14] the authors presented a hetero-
constructed iterative algorithm may be conceived geneous fully distributed multi-objective strategy
to identify the optimal action in each state. based on a reinforcement learning model con-
S1 S4 S1 S4 S1 S4
Figure 3. Illustration of reinforcement learning: a) Markov decision process; b) partially observed Markov decision process;
c) Q-learning.
structed for the self-configuration/optimization of Category Learning techniques Key characteristics Application in 5G
femtocells. The model was supposed to solve both
the resource allocation and interference coor- Reinforcement MDP/POMDP • Bellman equation Energy harvesting [13]
dination problems in the downlink of femtocell learning maximization
networks. The main objectives of the learning pro- • Value iteration algorithm
cess are two-fold: first, to acquire spectrum allo-
cation awareness and to identify the availability of Q-learning • Unknown system Femto and small cells
unused spectral slots for the provision of opportu- transition model [14, 15]
nistic access; second, to select sub-channels from • Q-function maximization
the available spectrum pool and to configure the
terminals supported by femtocells to operate Multi-armed bandit • Exploration vs. Device-to-device
under carefully constructed restrictions to avoid exploitation networks [16]
interference and to meet the quality of service • Multi-armed bandit game
(QoS) requirements. Another example is consti- Table 3. Reinforcement machine learning algorithms.
tuted by dense small cell networks regarding their
cell outage management and compensation [15].
The system’s state was constituted by the specific vice (D2D) communication system integrated into
allocation of users to the resource blocks of small a cellular network, and another one in the context
cells, as well as by the channel quality, while the of emerging next-generation networks [16]. The
actions were constituted by the downlink power selfish D2D users aimed to optimize their own
control actions, with the rewards being quanti- performance by camping on the vacant cellular
fied in terms of signal-to-interference-plus-noise channels, whose statistics were unknown to the
ratio (SINR) improvement. It was demonstrated users. This distributed channel selection problem
that the compensation strategy based on the rein- was in harmony with the typical MP-MAB settings,
forcement learning model attained an exceptional and thus it was modeled as an MP-MAB game.
performance improvement. Specifically, every D2D user was modeled as a
player of the MP-MAB game, while the channels
Multi-Armed Bandits: were regarded as arms and choosing a channel
corresponds to pulling an arm. The authors pro-
Device-to-Device Networks posed a channel selection strategy consisting of
Models: In practice, multi-armed bandits two main blocks, namely the calibrated forecast-
(MAB) have been used to model resource allo- ing and the no-regret bandit learning strategies. In
cation problems operating under a fixed budget Table 3, we summarize the rudimentary character-
by carefully proportioning resources among com- istics and applications of reinforcement machine
peting projects, whose properties are only partial- learning algorithms.
ly known at the time of resource allocation, but
which may become better understood as time Future Research and Conclusions
passes. Since the agent has no initial knowledge A range of future research ideas on machine
about the machines, the crucial trade-off they con- learning in 5G networks can be summarized as
front at each instance is between the “exploita- follows.
tion” of the specific machine that has the highest The family of supervised learning techniques
expected payoff and the “exploration” required relies on known models and labels that can sup-
to glean more information about the expected port the estimation of unknown parameters. They
payoffs of the other machines. can be utilized for massive MIMO channel esti-
The MAB problem may also be extended into a mation and data detection, spectrum sensing and
multi-player, multi-armed bandit game (MP-MAB), white space detection in cognitive radio, as well
where the reward gleaned by any player depends as for adaptive filtering in signal processing for
on the specific decisions of other players. The key 5G communications. They can also be applied
idea of the proposed approach is to enable each in higher-layer applications, such as inferring the
user to forecast the future actions of its opponents mobile users’ locations and behaviors, which can
based on public knowledge and to proceed by assist the network operators to improve the quali-
best responding to the predicted joint action pro- ty of their services.
file using some bandit strategy [3 p. 517]. Unsupervised learning relies on the input data
Applications: The MAB and MP-MAB mod- itself in a heuristic manner. It can be utilized for
els, as a family of emerging signal processing cell clustering in cooperative ultra-dense small-cell
tools, are capable of solving challenging resource networks, for access point association in ubiqui-
allocation problems in wireless scenarios, where tous WiFi networks, for heterogeneous base sta-
either the channel conditions or some other tion clustering in HetNets, and for load-balancing
wireless environment parameters have to be in HetNets. It can also be applied in anomaly/
“explored,” while the known channels also have fault/intrusion detection and for the users’ behav-
to be “exploited” by a group of users. Gener- ior-classification.
ally, these models may be beneficially used in Reinforcement learning relies on a dynamic
multi-player adaptive decision making problems, iterative learning and decision-making process.
where selfish players infer an optimal joint action It can be utilized for inferring the mobil users’
profile from their successive interactions with a decision making under unknown network condi-
dynamic environment, and finally settle at some tions, for example during channel access under
equilibrium point. This problem has indeed been unknown channel availability conditions in spec-
encountered in many wireless networking sce- trum sharing, for distributed resource allocation
narios, with a compelling one being the channel under unknown resource quality conditions in
selection problem of a distributed device-to-de- femto/small-cell networks, and base station asso-