You are on page 1of 6

Artificial Intelligence Enabled Self-healing

for Mobile Network Automation


Muhammad Zeeshan Asghar 1,2 Furqan Ahmed 3 Jyri Hämäläinen 2
1
Faculty of Information Technology, University of Jyväskylä, Finland
2021 IEEE Globecom Workshops (GC Wkshps) | 978-1-6654-2390-8/21/$31.00 ©2021 IEEE | DOI: 10.1109/GCWkshps52748.2021.9681937

2
Department of Communication and Networking, Aalto University, Finland
3
Elisa Corporation, Finland

Abstract—This paper presents an artificial intelli- networks is emerging as a viable approach for mo-
gence enabled self-healing framework for cell outage bile network automation [2]–[5]. In the context of
detection and compensation in radio access networks. mobile networks, AI based methods are often used
The developed framework consists of three modules,
namely cell outage detection, cell outage compensa- for anomaly detection [6]–[9]. The next step is to
tion, and continuous optimization that work in closed- embed intelligence in the control loop, facilitating
loop to detect outages, trigger recovery actions, and both detection and corrective actions. Thus, it is
network optimization to minimize the impact of out- particularly relevant to self-healing use-cases where
ages on user experience. The outage detection module the aim is automated detection of network issues
is based on machine learning algorithms aimed to
detect anomalies in the network performance data. and anomalies followed by appropriate remedial
Likewise, the cell outage compensation module uses or healing actions. During the detection phase, it
fuzzy logic to determine compensation actions after is important to identify the outage characteristics
an outage cell has been detected. The continuous op- precisely. Key Performance Indicators (KPIs) are
timization module is tasked with making incremental extensively used to monitor, analyze, and evalu-
improvements to the network configuration through a
heuristic approach. The developed self-healing frame- ate the network performance. The detection deci-
work is validated using a network simulator ns-3 sions regarding performance degradation are usually
based test environment. Results show the framework made on the basis of predefined thresholds. How-
is capable of fully recovering from the outage in ever, such approaches are inadequate in the case of
terms of accessibility and coverage. In addition, the highly dynamic networks, as thresholds may need
cell edge reference signal received power is recovered
by 45%, thereby significantly improving the network to be modified on-the-fly, on the basis of updated
performance once the outage is detected. information. A cell may be identified as abnormal
if the throughput is not in the acceptable range
Index Terms—Self-organizing networks, Artificial
Intelligence, Self-healing, Network automation [10], but the detection of cell outage is not always
straightforward as no alarm is triggered for sleeping
cells. One approach to the design of self-healing
I. I NTRODUCTION systems is to have a set of predefined solutions,
The growing complexity of mobile networks is each corresponding to specific outage scenario. In
the result of a multitude of factors including in- contrast, AI/ML based systems are capable of learn-
creased network density, higher degree of hetero- ing not only about the environment, but also the
geneity, and wide range of mobility scenarios. This impact of different recovery actions. Thus, it is
trend is likely to continue in the future, especially possible to introduce some new solutions to heal the
for 5G and beyond, leading to new challenges network while minimizing error over time. In the
in configuring, operating, and optimizing mobile case of outage problems, the compensation actions
networks in a dynamic manner [1]. In order to mostly consist of adjustment of cell’s reference
meet these challenges, a high degree of network signal power and antenna tilt. For instance, au-
intelligence coupled with fine-grained automation thors in [11] utilized three controlled parameters
capabilities are needed. Due to recent advances in namely antenna tilt and reference signal power, to
the field of artificial intelligence (AI), in conjunc- compensate for the outage in an autonomous man-
tion with the availability of big-data processing ner. Predefined thresholds for signal to interference
tools, the concept of AI enabled self-organizing plus noise ratio (SINR) and throughput are used
to define the detection logic for network failure.
This work is supported by Business Finland 2017-2019.

978-1-6654-2390-8/21/$31.00
Authorized licensed use limited to: ©2021 IEEE of Engineering and Technology. Downloaded on February 07,2024 at 07:06:21 UTC from IEEE Xplore. Restrictions apply.
Parul Institute
In previous studies, notably [8], and [12], control B. Cell Outage Detection
parameters such as remote electrical antenna tilt
and pilot power has been used to recover the cells In order to detect possible outages, data related
contributing to network performance degradation. to users’ measurements is collected from MDT
However, in previous studies, the focus is on only sources. The data is processed to produce a feature
identifying faulty cells individually, and updating vector fm = [ps , pn1 , . . . , pnK , qs , qn1 , . . . , qnK ]
them only during recovery actions. This leads to corresponding to each MDT measurement m. Here,
sub-optimal settings of the network. In this work, p and q denote RSRP and RSRQ measurements,
we develop an AI based self-healing framework respectively. Likewise, subscript s represents serv-
that detects cell outages and enables quick healing ing cell, whereas n is used for neighboring cells.
via a coordinated response of a set of cells in a Therefore, in the feature vector, RSRP and RSRQ
closed-loop manner. For instance, in the case a cell values belonging to the serving cell and K strongest
outage is detected, the downlink transmission power neighboring cells are included. For a total of M
of relevant neighboring cells is increased during the measurements, the data can be represented by a
recovery step. M × L matrix F ∈ RM ×L comprising of feature
The rest of the paper is organized as follows. vectors, where L = 2K + 2. The first step is to
In section II-A, we present the overall framework, standardize the data, i.e. mean of each feature is
followed by discussion on individual modules in- subtracted from its entries to make the mean value
cluding outage detection, outage compensation, and equal to zero, and scaled by the variance of the
continuous optimization. Section III focuses on the feature to give the features unit variance.
details of system level simulations setup and results. Next, principal component analysis (PCA) is
We conclude in Section IV. applied to the data, which converts the possibly
correlated original L variables in a vector to a set of
II. F RAMEWORK uncorrelated variables L̃. We calculate PCA using
A. Overview singular value decomposition (SVD). Therefore, for
data matrix F, where each row is a feature vector,
As shown in Fig 1, the framework comprises
then the SVD of F is F = UΣVT , where U is a
of four main components, namely cell outage de-
unitary matrix, Σ is the diagonal matrix containing
tection, cell outage compensation, continuous op-
the singular values σi , and VT is the transpose
timization, and configuration manager. The cell
of a matrix of eigen vectors (each column is an
eigen vector). These eigen vectors are the new axis
for our transformed variables that we are interested
in. The new feature values can now be calculated
as FV = UΣ. This transforms the variables into a
new coordinate system, but the dimension of each
feature vector is the same. The features are however
arranged by the amount of variance, and thus we can
get the most significant features by selecting only
the first L̃ columns of U (denoted by UL̃ ) and L̃×L̃
Fig. 1. The proposed AI based self-healing framework. upper left submatrix of Σ (denoted by ΣL̃ and cal-
culating the product of UL̃ and ΣL̃ . The L̃ variables
outage detection module makes use of minimization picked have the highest variance, meaning that they
of drive test (MDT) reports. This module uses account for more of the variance in the data than
anomaly detection algorithm to monitor the cells any of the other variables. This ensures a lower
for abnormal behavior in the network and detects dimensional representation of the data to increase
the defective cell, displaying an alarm in the alarm efficiency, while still retaining most of the infor-
graphic user interface (GUI). Cell outage results in mation. The processed feature vectors are used as
coverage gap and throughput loss, which may result input data for different machine learning algorithms
in a situation where quality of service requirements that are subsequently compared to find the one best
are not met. The compensation component opti- suited for outage detection. In the training phase
mizes network parameters based on RSRP (refer- of supervised methods, outages are identified and
ence signal received power) and RSRQ (reference corresponding feature vectors labeled accordingly.
signal received quality) data, to recover from the Machine learning algorithms considered include
performance degradation to the extent possible. three supervised classifiers namely Random Forest

Authorized licensed use limited to: Parul Institute of Engineering and Technology. Downloaded on February 07,2024 at 07:06:21 UTC from IEEE Xplore. Restrictions apply.
classifier, Support Vector Machine (SVM) and De- bership functions of these sets with this RSRP
cision Tree classifier, and an unsupervised classifier value. The degree of membership is between 0
called Isolation Forest. In order to detect outage, the and 1. The next component is the inference engine
machine learning models detect anomalies in the which calculates the fuzzy output values based on
user measurement reports. However, for triggering a predefined set of if-then rules. The degree of
compensation actions, we need to identify the root- truth αk of rule k(αk ) is calculated as a product
cause cell. To obtain an indicator for the cells, αk = µk,1 (x1 ) · µk,2 (x2 ) · · · µk,n (xn ), where µk,i
we consider the z-score for each base station as is the membership function of the i:th condition of
considered in [13]. In training phase a reference z- rule k and xi is the ith numerical input. In addition
score is computed for each base station in the net- to having a truth value, each rule has an output βk ,
work. The reference z-score is computed using the a constant value, which in our case is the change
formula: zb = |Abσ−µ n
n|
, where Ab is the number of in transmission power that should be made in case
anomalous user reports associated with base station the rule is satisfied. As a final result, the inference
b, µ is the mean of the number of anomalous reports engine produces a fuzzy output αk · βk . Afterwards,
per base station in the network, and σ is the standard the defuzzifier compiles the fuzzy outputs into a
deviation. We apply cross-validation to each model single crisp output
Pn γ by calculating a weighted
k=1 αk ·ok
to get an accurate measure of detection rate. Also, average γ = P n . where n is the number
k=1 αk
normalized confusion matrices are constructed for of rules.
each classifier, displayed in Fig. 2. During operation
a selected detection algorithm is run, and a new z- We define two numerical features to be used
score is computed for each cell every time new data as an input to the fuzzy logic controller, namely
is fetched. This score is then compared to the ref- block coverage (BC) and accessibility. For a given
erence scores obtained during normal operation. If cell, block coverage Nbc is calculated as Nbc =
the current score is greater than the reference score, Nb /Nm , where Nb is the number of users that have
this signals a potential anomaly, corresponding to an measured the cell as the one with highest RSRP, but
outage. Once the cell is detected, the compensation the RSRP is lower than a set threshold. These users
module triggers remedial actions as explained in the are blocked due to lack of coverage. Likewise, Nm
following discussion. is the total number of users that measured the cell as
the one with highest RSRP. Next, the accessibility
feature Na tries to capture the available capacity
C. Cell Outage Compensation of the cell, which can be calculated as Na =
The first step is to identify candidate cells for Nd /Nc , where Nd is the number of connected
compensation action. This is done by examining users that have RSRQ lower than a set threshold,
handover logs of the outage cell. The cells that have i.e. users with degraded performance. The value
had handovers with the outage cell are considered Nc is the total number of users connected to the
as neighbors, and these are passed on as candidates cell. The threshold values for RSRP and RSRQ
for further processing. To determine appropriate used in calculating these metrics were determined
compensation action for each of the candidate cells using the approach discussed in [15]. The Nbc has
we employ a fuzzy logic controller [14], shown three different fuzzy sets it can belong to: Low,
in Fig. 3. This approach enables decision making Medium and High. The Na on the other hand has
from inaccurate information and allows us to set two fuzzy sets: Low and High. The membership
soft thresholds for triggering actions. The ability to functions for these sets are shown in The inference
name each fuzzy set used also makes it easier to use rules are based on these fuzzy values. These rules
expert help determining the appropriate inference characterize the situations in which a cell should
rule set. We use a single controller to determine take part in compensation action. The rule set we
the appropriate change in transmission power. The used for inference is outlined in Table I. It is
main components are the fuzzifier, inference en- based on the cell outage compensation rules in [14].
gine, and defuzzifier. The fuzzifier transforms the Note that Rules 4 and 6 of Table I increase the
numerical inputs into fuzzy membership values for power if the cell still has good signal quality, but
each defined fuzzy set using predefined membership there are users blocked due to a coverage hole.
functions, e.g. we might have fuzzy sets Low and Rule 1 decreases power in the case that a cell
High for RSRP measurements. Thus, it is pos- has been overwhelmed by users and signal quality
sible to determine the degree of membership of has degraded, to decrease the coverage area and
a particular RSRP value by evaluating the mem- hopefully induce handovers to other cells. Rules

Authorized licensed use limited to: Parul Institute of Engineering and Technology. Downloaded on February 07,2024 at 07:06:21 UTC from IEEE Xplore. Restrictions apply.
Fig. 2. Confusion matrices of the classifiers analyzed for outage detection.

D. Continuous Optimization

Next, we discuss the continuous optimization


module, which periodically tunes cell parameters
to optimize the dynamically changing network re-
sulting from outages and compensation actions. As
shown in 4, it tunes the transmission power. The
Fig. 3. Block diagram of the fuzzy logic controller. module can be enabled or disabled as desired and it
is independent of outage compensation. In this mod-
ule, optimization comprises of incremental changes
TABLE I to the cell transmission power according to heuristic
F UZZY L OGIC CONTROLLER : I NFERENCE RULES
rules. The possible actions are increase, decrease,
No BC Accessibility ∆ Tx Power or no change. Appropriate action is determined
by ranking the cells by performance. In order to
1 L L Negative
2 L H Null evaluation the performance, we use cell edge RSRP
3 M L Null values. The cells are ranked by their cell edge RSRP
4 M H Positive values and transmission powers of the worst 20%
5 H L Null
6 H H Positive are increased by 1 dB. On the other hand, the
transmission powers of the top 10% is reduced by
1 dB. It is important to note that this change is
recorded and rolled back in the next round if it
2, 3, and 5 maintain current status in situations results in a significant drop in network performance.
where either there aren’t many users blocked due
to coverage, or the cell already has signal quality
issues and shouldn’t take on more users. After
invoking the compensation system, the fuzzy logic
controller is first run for every cell determined to
be in the neighborhood of the outage cell. During
this round, the changes in transmission power are
larger, ±2 dB if selected for action by the controller.
After this the compensation system continues to
run, periodically invoking the fuzzy logic controller,
this time for each cell in the network. The changes
on subsequent rounds are ±1 dB, with the aim that
the system will incrementally achieve a steady state
Fig. 4. Block diagram of the continuous optimization module.
while recovering the network performance to the
previous value to the maximum possible extent.

Authorized licensed use limited to: Parul Institute of Engineering and Technology. Downloaded on February 07,2024 at 07:06:21 UTC from IEEE Xplore. Restrictions apply.
III. S IMULATION S ETUP AND R ESULTS module continues to make incremental changes of
1 dB, targeting all of the network, not just the
The LTE module of ns-3 simulator, part of neighboring cells. This is continued until network
the LTE-EPC network simulator (LENA) project, performance close to normal operation is achieved,
is used for detailed system level simulations. or the outage vanishes.
The modified ns-3 simulator incorporates network-
scenario, radio link failures, handover events [16],
A. Results
outage event generation, and recovery process.
In network scenario, the user’s reporting time The metrics used by the compensation module
interval is 200 ms. The gathered measurements are to determine actions, BC and accessibility averaged
fetched for analysis after every 10 seconds and over the neighboring cells, as well as the average
given as input to the cell outage detection module cell edge RSRP values during the simulation are
discussed in section II-B. In our case the explained plotted in Fig. 5. The beginning of the outage
variance ratios for the produced 3 principal compo- marked with a solid red line and start of com-
nents were [0.549, 0.250, 0.121]. The transmission pensation indicated by a dashed black line. From
power values are limited to between 35-46 dB. The the figure we can see that the cell outage detec-
simulated network consists of 7 base stations each tion module detects the outage at the base station
having 3 cells. The stations are arranged in a hexag- about 10 seconds after the downlink transmission
onal grid with 500 m spacing. The network has power plummets to 0 dB. The cell edge RSRP plot
105 simulated individual users moving at 60 km/h. shows a sharp initial drop from the normal level of
During simulation, after some interval of time in about −108 dB, when users lose connection, and a
normal operation i.e. 10 s, the transmission power quick recovery to a lower level of about −112 dB
of a selected set of base-stations is instantaneously resulting from the weakened coverage in the area
reduced to create an outage in the network. The previously serviced by the outage base station. After
RSRP values of the cells associated with the base cell outage compensation is triggered, we can see a
station gradually decrease to zero in a curve due to gradual rise back to a level close to normal, around
filtering after the outage is created in the network. −110 dB. The BC spikes immediately after outage
In LTE network, L1 (physical layer) and L3 (RRC to about 0.35, but quickly recovers to a lower level
layer) filtering [17] is carried out to obtain the of about 0.1.
RSRP measurements and handover measurements After the cell outage compensation is triggered,
of the users at the base stations. These filtering the value returns to normal 0, with some periodic
layers compute the moving average of the measure- and lower spikes up to nearly 0.1 when users wan-
ments making the signals smooth enough to alle- der out to still existing coverage holes. Accessibility
viate the consequences of the channel fluctuation. plunges about 4 seconds after the outage has been
Moreover, filtering layers ensures that a single low triggered, as the initial wave of users try to connect
or high measurement does not trigger an undesired to new base stations. However, recovery is quick
handover. As a result of the outage, users associated even before the compensation, as the value returns
with the base station lose the connection to their to 1 in under a second and after it is triggered,
respective cell and cause radio link failure. The accessibility remains normal with some drop to
simulation scenario we use for our experiments is about 0.97, which was normal even before the
based on the 3GPP specifications [18]. It comprises outage. Note that the percentage drop was −3.5 %
of 105 users distributed randomly in the mobility in outage. After the cell outage compensation was
area, constantly moving at 60 km/h. After 200ms triggered, the drop from normal was −1.5 %. This
the new user measurements are generated and stored means that 45 % of the reduction was recovered by
in the database. Every 10 seconds, the outage de- the compensation actions.
tection module is executed and if an anomaly is
detected, the compensation module takes immediate IV. C ONCLUSIONS AND F UTURE W ORK
action i.e. increases transmission power of selected We have presented an intelligent self-healing
cells. The compensation component identifies the framework for outage detection and compensation
neighboring cells and determines the appropriate in mobile networks. The system automatically de-
action for each one with the fuzzy logic controller. tects cell outages, and trigger recover actions that
At this step the transmission power increase will compensate for outage cells in the network. The
be 2 dB (if selected as a recovery action). After developed framework uses different artificial intel-
the initial compensation round, the compensation ligence based techniques. These include machine

Authorized licensed use limited to: Parul Institute of Engineering and Technology. Downloaded on February 07,2024 at 07:06:21 UTC from IEEE Xplore. Restrictions apply.
Fig. 5. Performance metrics during the simulation. Solid red line marks when outage was triggered. Dashed black line marks when
cell outage compensation was triggered.

learning anomaly detection algorithms for outage [7] M. Z. Asghar, P. Nieminen, S. Hämäläinen, T. Ristaniemi,
detection, outage compensation entity based on a M. A. Imran, and T. Hämäläinen, “Towards proactive
context-aware self-healing for 5g networks,” Computer
fuzzy logic controller, and a heuristic continuous Networks, vol. 128, pp. 5–13, 2017, survivability Strategies
optimization component. Together the components for Emerging Wireless Networks.
effectively and efficiently improve the network per- [8] A. Gómez-Andrades, R. Barco, P. Muñoz, and I. Serrano,
“Data analytics for diagnosing the rf condition in self-
formance by detecting and compensating for the organizing networks,” IEEE Transactions on Mobile Com-
failure cells. The cell edge reference signal received puting, vol. 16, no. 6, pp. 1587–1600, 2017.
power is recovered by 45%, thereby significantly [9] J. Moysen, F. Ahmed, M. Garcı́a-Lozano, and J. Niëmela,
“Unsupervised learning for detection of mobility related
improving the network performance once the out- anomalies in commercial lte networks,” in 2020 European
age is detected. The possible directions for future Conference on Networks and Communications (EuCNC),
work include investigation of online-learning meth- 2020, pp. 111–115.
[10] M. Z. Asghar, S. Hämäläinen, and T. Ristaniemi, “Self-
ods and more challenging networking environments healing framework for lte networks,” in 2012 IEEE 17th
such as heterogeneous networks. International Workshop on Computer Aided Modeling and
Design of Communication Links and Networks (CAMAD),
Sep. 2012, pp. 159–161.
V. ACKNOWLEDGEMENT [11] M. Amirijoo, L. Jorguseski, R. Litjens, and L. C. Schmelz,
“Cell outage compensation in lte networks: Algorithms and
This research is supported by Business Finland performance assessment,” in 2011 IEEE 73rd Vehicular
Funding (grant no.1916/31/2017). Technology Conference (VTC Spring), May 2011, pp. 1–5.
[12] Fuqiang Li, Xuesong Qiu, Zhengxian Tian, Bingxuan
Wang, and Luoming Meng, “Adjusting electrical downtilt
based mechanism of autonomous cell outage compensa-
R EFERENCES tion,” in IET International Conference on Communication
[1] A. Osseiran, V. Braun, T. Hidekazu, P. Marsch, H. Schot- Technology and Application (ICCTA 2011), Oct 2011, pp.
ten, H. Tullberg, M. A. Uusitalo, and M. Schellman, “The 389–394.
foundation of the mobile and wireless communication [13] O. Onireti, A. Zoha, J. Moysen, A. Imran, L. Giupponi,
system for 2020 and beyond: Challenged, enablers and M. Ali Imran, and A. Abu-Dayya, “A cell outage manage-
technology solutions,” Vehicular Technology Conference ment framework for dense heterogeneous networks,” IEEE
(VTC Spring) IEEE 77th, pp. 1–5, 2013. Transactions on Vehicular Technology, vol. 65, no. 4, pp.
[2] M. Z. Asghar, M. Abbas, K. Zeeshan, P. Kotilainen, and 2097–2113, April 2016.
T. Hämäläinen, “Assessment of deep learning methodology [14] I. de la Bandera, P. Muñoz, I. Serrano, and R. Barco,
for self-organizing 5g networks,” Applied Sciences, vol. 9, “Adaptive cell outage compensation in self-organizing
no. 15, 2019. networks,” IEEE Transactions on Vehicular Technology,
[3] S. Fortes, C. Baena, J. Villegas, E. Baena, M. Z. Asghar, vol. 67, no. 6, pp. 5231–5244, June 2018.
and R. Barco, “Location-awareness for failure management [15] A. Gómez-Andrades, P. Muñoz, I. Serrano, and R. Barco,
in cellular networks: An integrated approach,” Sensors, “Automatic root cause analysis for lte networks based on
vol. 21, no. 4, 2021. unsupervised techniques,” IEEE Transactions on Vehicular
[4] J. Moysen, F. Ahmed, M. Garcı́a-Lozano, and J. Niemelä, Technology, vol. 65, no. 4, pp. 2369–2386, 2016.
“Big data-driven automated anomaly detection and per- [16] B. Herman, D. Petrov, J. Puttonen, and J. Kurjenniemi,
formance forecasting in mobile networks,” in 2020 IEEE “A3-based measurements and handover model for ns-3 lte,”
Globecom Workshops (GC Wkshps, 2020, pp. 1–5. 11 2013.
[5] F. Ahmed, M. Z. Asghar, and A. Imran, Combinatorial [17] K. Vasudeva, M. Simsek, and I. Guvenc, “Analysis of
Optimization for Artificial Intelligence Enabled Mobile handover failures in hetnets with layer-3 filtering,” in 2014
Network Automation. Cham: Springer International Pub- IEEE Wireless Communications and Networking Confer-
lishing, 2021, pp. 663–690. ence (WCNC), April 2014, pp. 2647–2652.
[6] M. Asghar, T. R. P. Nieminen, S. Hämäläinen, and M. Im- [18] 3GPP, “Radio measurement collection for minimization of
ran, “Cell degradation detection based on an inter-cell drive tests.radio measurement collection for minimization
approach,” International Journal of Digital Content Tech- of drive tests (mdt); overall description; stage 2 (release
nology and its Applications, vol. 11, no. 1, 2017. 16).” in 3GPP TS 37.320 V16.3.0 (2020-12).

Authorized licensed use limited to: Parul Institute of Engineering and Technology. Downloaded on February 07,2024 at 07:06:21 UTC from IEEE Xplore. Restrictions apply.

You might also like