You are on page 1of 10

Future Generation Computer Systems 85 (2018) 9–18

Contents lists available at ScienceDirect

Future Generation Computer Systems


journal homepage: www.elsevier.com/locate/fgcs

Research on anomaly detection algorithm based on generalization


latency of telecommunication network
Yan Wang a , Zhensen Wu a, *, Yuanjian Zhu b , Pei Zhang b
a
School of Physics and Optoelectronic Engineering, Xidian University, Xi’an 710071, China
b
China Mobile Group Jiangsu Company Limited Wuxi Branch, Wuxi, 214400, China

highlights

• The concept ‘‘generalization latency’’, taking user experience as the starting point, is proposed. The main influential factors of ‘‘generalization latency’’
include air latency and latency from the core network to the application server.
• Mapping model between network wireless performance and air signaling latency, downlink throughput rate is established in depth in this paper.
• Modified Gaussian mixture model introducing temporal characteristics are researched, so as to detect the data-service of abnormal latency. This
algorithm introduces the temporal characteristic value, promoting detection accuracy.

article info a b s t r a c t
Article history: With the rapid development of Mobile Internet and the 4th Generation mobile communication technol-
Received 31 October 2017 ogy, data service has exceeded voice service, which has also become the important means for mobile
Received in revised form 24 January 2018 operators to promote shares in the communication market. Therefore, the service quality of data service
Accepted 12 February 2018
business will directly influence mobile user perception and satisfaction to network. With complicated
Available online 9 March 2018
process networking procedure is long in the data service process and the fundamental reasons of
Keywords: problems are relatively more difficult to position. During voice communication in mobile networks, there
Anomaly detection are relatively unitary important factors which can accept user perception such as call drop, network
Wireless network congestion and signal interference, etc. However, users’ perception towards data services is somewhat
Generalization latency different, which shows strong association with the usage scenarios of the various applications of users. For
User perception example, in the data browsing service, if terminal connection fails, the background will start the function
Temporal characteristics of automatic repeated connections, during which, latency is increased, so as to influence user perception of
data service latently. Besides, in the video service, initialization delay, stalling during the play and times
of stalling are also the factors which could affect video quality. The above analysis shows the latency
in the various data service processes and the usual network latency indicators, such as TCP three-way
handshake and DNS, etc. gathered and mapped into a total latency, which is the latency perception from
the perspective of user experience. In the current work, it is defined as generalization latency, which is
also known as the total latency covering latency for users to establish connection on the signaling control
plane and latency of user plane.
The first innovation of this paper is to establish a mapping model, where, generalization latency, which
is from the perspective of user using perception, is related to performance indicators of telecommunica-
tion network, under different data service characteristic scenarios, so as to forecast the inflection point of
network performance anomaly. The second innovation is to introduce the abnormally detection model for
generalization latency, so as to detect the performance stability of the application layer of the application
service plane.
© 2018 Elsevier B.V. All rights reserved.

1. Introduction

In recent years, with the sustainable development of the high-


speed LTE wireless connection, network data traffic has increased
* Corresponding author. substantially. Besides, the different consumption of the data ser-
E-mail address: wuzhs@mail.xidian.edu.cn (Z. Wu). vices, such as web browser, video communication, stream media

https://doi.org/10.1016/j.future.2018.02.022
0167-739X/© 2018 Elsevier B.V. All rights reserved.
10 Y. Wang et al. / Future Generation Computer Systems 85 (2018) 9–18

and the application of traditional mobile phones are more diversi-


fied. Compared with the traditional broadband, surfing the Internet
with mobile phones is featured by long process and complicated
networks. Users will experience air interface, transmission access
network, core network and the phase from core network export
to application server. In the existing research system of the user
perception about data services, the various signaling indicators of
telecommunication network are researched to build a model, then
indicators exerting greater influence on the user perception are
judged, and each indicator is end owed with weights coefficient
for the quantified scoring of the user perception. Nevertheless,
this method has two limitations. First of all, the most immediate Fig. 1. Segmental description of generalization latency for surfing Internet with
mobile phones.
response of user perception towards the data service in Internet
surfing is latency. In this research system [1], the perceptual model
is not mapped to the real user latency feeling, and the result of
the response is not objective and direct; with this method, an is defined as ‘‘generalization latency’’. ‘‘Generalization latency’’ is
evaluation model for user perception towards network is provided, the major standard objectively reflecting users’ judgment on the
starting from user perception. However, it is unable to pinpoint quality of wireless network. In terms of network node, latency
the problems encountered by users who have poor perception; of air interface, transmission access network, core network and
for example, whether the problem occurs in the air interface, export are respectively produced from different stages of net-
transmission access, core network or the export from the core work connection; latency of every step has different characteristic
network to the application service platform server is unable to be embodiments. The influence of the performance of transmission
pinpointed. access network and core network on the generalization latency
In this paper, the concept generalization latency of telecom- is basically constant. Take core network latency as an example,
munication network, taking user experience as the starting point, generally, LTE core network design requires the latency of the
is proposed. Generalization latency refers to the total latency in signaling plane to be 100 ms, and user plane latency to be 5 ms, as is
each data services process when users access to the network. For shown by Fig. 1. Therefore, influential factors of whole end-to-end
example, latency felt by users intuitively when clicking on a web latency of surfing online with mobile phones, namely ‘‘generaliza-
page, which is also the total latency covering users’ latency in tion latency’’ are mainly air latency and latency of the export from
establishing connections on the Internet signaling plane and user the core network to the application server. Air latency and latency
plane latency of the Internet application service. Innovations of this of the export from the core network to the application services
paper are reflected by the emphatic analysis on latency combina- platform are analyzed and studied in detail below.
tion of generalization latency in the various data services phase,
such as air interface, transmission access network, core network 2.1. Analysis of latency characteristics of air signaling/user plane
and the phase from core network export to application server.
Model research is made based on a great amount of sample data. When the web page browsing service is triggered by users at UE,
This paper will build a mapping model between generalization end-to-end latency is classified into three stages from the perspec-
latency taking user perception as the starting point and network tive of wireless network. The first stage is the stage of establishing
performance indicators, and then build an abnormally detection signaling connections by the air control plane signaling, including
model so as to solve the problems about the difficult air latency the establishment of RRC connection, the establishment of service
data collection and difficult of locating network anomalies. request, the establishment of eRAB, TCP (2 or 3 times) handshake
This paper is organized as follows. In Section 2, the theoretical and so on. The second stage is the stage of data transmission. When
concept generalization latency is educed. Then, in Section 3, the data are transmitted at the air interface, due to the complexity
mapping model between air latency and network performance of the wireless environment, the process of downlink dispatch
indicators is presented in detail. In Section 4, abnormally detec- and uplink feedback is involved [2]. When data are at the core
tion model for data services based on temporal characteristics is network transmission stage, latency and ratio are relatively stable.
presented in detail. Finally, it comes to Section 5, conclusions. Therefore, for latency of the first stage as well as latency and web
page download rate caused by the air interface of the second stage,
2. Research of the theory of generalization latency based on there shall be corresponding wireless performance indicators. The
user perception evaluation third stage is the stage when web pages are loaded and presented
at the UE terminal, including the response time of UE itself and
In the web page browsing experience of mobile users, the the time the browser is opened. Considering that the third stage
concept latency is the time from clicking on a web page to the is irrelevant to LTE network performance, we will not take it into
page presentation, from the perspective of user perception; while consideration temporarily. Let us return to the first stage and the
from the network level, it is a range of steps, namely Attachment, second stage. To establish the quantitative relationship between
RRC Connection Setup, PDN Connection Setup, eRAB Setup, DNS wireless performance and air control/user plane latency, we shall
Lookup, TCP Handshaking, Get and Post, etc. experienced by the first deduce which wireless performance indicators exert an influ-
terminal from setting up request to successfully load the page. ence on total air latency qualitatively and how mapping models to
And the total latency produced by all the steps will become the be built. Table 1 shows the start-stop signaling of each segment
factor affecting the quality of the web browsing services, which is of process of the air signaling control plane and user plane, for
also the indicator that can be perceived by users subjectively and making statistics to the latency between the start-stop signaling
directly. As is shown by Fig. 1, the whole waiting process above of the various processes.
involves air latency (signaling plane/ user plane), transmission net- Except for affecting latency for the air control plane to estab-
work latency, core network latency(signaling plane/user plane) lish connection, wireless performance–wireless network downlink
and latency of the export from the core network to the application dispatch also has an important influence on latency for the air user
server. In this paper, such whole process end-to-end (E2E) latency plane (reflected as downlink throughput). Appropriate IMCS can
Y. Wang et al. / Future Generation Computer Systems 85 (2018) 9–18 11

Table 1
The network indicators of various SP server of netease news.
The signaling procedures Start node End node The start of signaling The end of the signaling
RRC connection UE eNodeB RRC connection request RRC connection setup complete
Service request UE MME Service request Initial context setup request
eRAB establishment MME eNodeB Initial context setup request Initial context setup response
DNS UE DNS server DNS query DNS query response
TCP (1 or 2 times) hand shake UE SP server SYN SYN ACK
TCP (1 or 2 times) hand shake UE SP server SYN ACK ACK
Service response UE SP server First get First response
Service latency UE SP server First get Response

different requirements for latency. Take the APP NetEase News as


an example, with the generalization latency theory and signaling
depth detection technology, and through the clustering analysis,
39 domain names-host of this app are obtained, each domain name
has different data-service types, and for each subdivided item, the
corresponding latency can be worked out (see Table 2).

3. The mapping model for wireless performance of telecommu-


nication network and air latency

It is shown from the theoretical analysis in Section 2 the main


influential factors on ‘‘generalization latency’’ mainly include air
latency and export latency from the core network to the Applica-
tion service platform. In this section, the mapping model for air
latency and network performance indicators is studied. When the
mapping model for wireless performance of telecommunication
network and total air latency is established, the control-plane
and user-plane latency from user perception perspective shall be
comprehensively considered. Consequently, the mapping model
between network wireless performance and air signaling latency
(control-plane), downlink throughput rate is studied in depth in
this section.
Fig. 2. Signaling process of surfing Internet with mobile phones.
Considering that the air signaling (control-plane) latency can
only be acquired by air signaling collection technology, however,
the investment costs on the software and hardware of this tech-
also be found for UE, by utilizing modulation and encoding rate in nology are very high, and it is difficult to promote and apply
CQI, and through channel quality SINR and MCS link curve. IMCS this technology on a wide range. The signaling data from the LTE
is mapped to TBS indicators, and then TBS indicators are mapped core network can be simply and timely collected. Given that the
to the appropriate PRB number according to Buffer Request. Such easy-to-access signaling data from the LTE core network above
PRB number shall be the number of resources allocated by the S11 interface adopts ENODEC as the time statistical starting point,
down direction in unit TTI to UE. At the eNodeB terminal, eNodeB it is quite difficult to directly calculate the air latency. In this
will allocate one MCS and RB according to CQI reported by UE. section, the sample data of air latency are collected from the small-
Therefore, downlink throughput of the physical layer on TTI can range pilot area with the air signaling collection technology above
also be obtained. Therefore, when building the mapping model mentioned, including the air latency sample data of 63 cells for
for telecommunication network wireless performance and total air three months. Then, the Big data modeling is carried out by utiliz-
latency, overall user latency perception of air signaling control and ing the large-scale air measurement report, network management
user plane shall be considered comprehensively. statistical data and signaling data of core network, and then the
mapping model for network wireless performance indicators and
2.2. Analysis of the latency characteristics from core network to the air signaling latency is constructed, which essentially solves the
application server (SP server) difficulties of the air latency collection, such as high investment
and great difficulty to large-scale deployments.
A complete WEB page browsing process shall include wireless In the meanwhile, this mapping model shall focus on the down-
access [3], bearing establishment, DNS, TCP, and HTTP and so link throughput of air user plane. The complex wireless envi-
on. While the latency process belonging to the application server ronment involves the downlink dispatch and UE uplink feedback
domain can usually be started from TCP establishing process, to the process when the data are transmitted on the wireless side at
ending of the last HTTP response. As is shown by Fig. 2, as to the data transmission stage. Furthermore, the downlink throughput
HTTP process, since data packages will be decomposed to several, is directly reflected on the latency of user plane and serves as a
even dozens of Get/Post processes, its latency shall be latency critical influential factor. Therefore, the mapping model for net-
between the first HTTP request to the last package response. The work wireless performance and downlink throughput needs to be
total business latency = (T 2 − T 1) + (T 4 − T 3). established.
A single data-service usually comprises several domain names,
and is corresponding to different SP IP. Although the long latency • Given that the first step of air signaling latency (RRC con-
of individual domain name server or application server will not af- nection setup phase) is from the collection of air interface
fect the overall statistics, user experience will be fairly poor; more- signaling, while it is unable to obtain the total air signaling
over, different content types under the same data-service also have latency directly through the current signaling data of the
12 Y. Wang et al. / Future Generation Computer Systems 85 (2018) 9–18

Table 2
The network indicators of various SP server of netease news.
Host Service type Indicator 1 Indicator 2 Indicator 3
mimg.127.net Image 119.6880 76.5645 88.1134
comment.api.163.com Text 123.2254 77.1528 76.3300
war.163.com Gif 60.6628 79.3935 91.3400
c.m.163.com Text 24.3160 76.4377 86.3451
mm.bst.126.net Image 39.8235 76.6526 71.4561
imgsize.ph.126.net Image 15.1923 84.7826 92.4322
c.3g.163.com Text 135.0727 76.8205 87.8731
163.wrating.com Gif 241.2105 85.9929 79.5471

core network deployed in large scale, at the stage of making analysis’’ [5] is a kind of prediction model technology that re-
the mapping mode between wireless performance indica- searches on the relationship between dependent and independent
tors and air latency, and air signaling data shall be collected variables. However, there are some limitations and deficiencies in
for the model verification. In this paper, the air signaling existing regression algorithms both in regard to appropriateness
data collected from a small-range pilot area of a city in China range and regression effects. Thus, the method to avoid deficiencies
are adopted for data training and result verification. of existing algorithms so as to reduce errors and improve predic-
• Not only the duration for the air signaling to establish link tion accuracy remains a problem to be urgently solved. In view of
connection shall be considered, but the latency influence of this, this paper proposed a kind of modeling that applies ensemble
the air user plane, namely the downlink throughput shall be machine learning to correlation equations. □
considered. However, since the downlink throughput is not Ensemble learning is a technique, which uses multiple learners
only affected by cellular wireless network resource down- to solve the same problem and can significantly improve the gen-
link scheduling, but affected by SP service server compre- eralization ability and stability of learning systems. The greatest
hensively, when building the air model, the downlink rate advantage of ensemble learning lies in sufficient consideration
of the same kind of application service, such as SohuNews, and use of cognition differences of different learner on the same
NetEase News and WeChat, etc. may be selected for analysis. problem. Through the comprehensive decision by multiple learn-
ers, the problem can be understood more comprehensively. The
key to success of ensemble learning depends on balance between
3.1. Presentation of models accuracy and difference degree of individual learner. Thus, as for
the major difference between ensemble learning and individual
In this chapter, the mapping model between key network per- learning, ensemble learning not only contains the generation of
formance indicators and air Latency is established by taking ad- individual learning devices but also involves the possible interac-
vantages of ensemble machine learning method and on the basis tion effects between individuals and the combination of individual
of partial sample data of air signaling, core network signaling, prediction results.
wireless measurement report (MR) and network management in- Step 1: The construction process of individual learner
In research on the framework of ensemble regression learn-
dicator, and then the Air Latency is predicted according to key
ing algorithms, a number of researchers have proposed different
indicator data.
construction process to construct several learners for integration
from different perspectives, which can be mainly divided into the
3.1.1. Definition of correlation equations
following four kinds:
If WI refers to wireless performance indicator of telecommuni-
cation network; Latency refers to air latency; Throughout refers to • Sequential construction process: in the sequential construc-
downlink throughput capacity, correlation equations of air signal- tion process of individual learners, the learning of each
ing latency (formula (1)), air user latency (formula (2)) and wireless learner is conducted successively in order and the perfor-
performance indicator are established as follows. mance of the previous learning device can directly or indi-
rectly influence the learning of the latter learner. Thus, the
Latency = f (WI i ) i = 1, 2, 3 . . . to 7 (1)
construction process owns quite high efficiency in process-
Throughput = f (WI i ) i = 1, 2, 3 . . . to 7 (2) ing some specific modes. However, it has worse reliability,
which is mainly expressed in: when an error occurs to a
where, WIi refers to wireless performance indicator, its data orig-
certain learner in the construction process, the following
inates from full core network signaling, system data including
learners may be influenced by the error. The ensemble
wireless measurement report (MR) and network management in-
learning algorithm based on Boosting is the most typical
dicator and performance indicators including RSRP, RSRQ, SINR,
case.
CQI, MCS, PDSCH PRB and PUSCH PRB; Latency refers to air latency • Parallel construction process: in the parallel construction
and its data originates from core network signaling above interface process, each learner can independently accomplish the
S11 and air signaling (for instance, when RRC connection is con- learning of sample space in parallel and conduct the uniform
structed, Latency originates from air signaling); Throughput refers ensemble only at the final output stage. This method is the
to downlink throughput capacity; and its data comes from core one with the most theoretical and experimental research
network signaling above interface S11. and the broadest application in the framework of ensemble
learning algorithms. As for the major advantages, the con-
3.1.2. Ensemble machine learning struction process of each learner is independent from each
In network optimizing activities, there is always such a situation other with quite strong robustness and easy to paralleliza-
that a certain network problem depends on a few influencing tion. One of the most typical cases is bagging algorithm.
factors, that is, dependence relationship exists between one depen- • Selective construction process: through optimization selec-
dent variable and a few independent variables [4]. Furthermore, it tion, partial learners are selected from the set of initial
is hard to differentiate the primary of a few influencing factors or individual learner for ensemble in such construction process
the effects of some secondary factors still cannot be omitted. Such so as to reduce calculation time and improve generalization
kinds of problems are jointly named as ‘‘regression’’. ‘‘Regression ability.
Y. Wang et al. / Future Generation Computer Systems 85 (2018) 9–18 13

The model proposed in this chapter applies parallel construc- In the formula, there are two consisting parts including a least
tion process and selective construction process for the construction square term and β 2 *λ, where β refers to the correlation coefficient.
process of individual learners, including the following six kinds of For the shrinkage parameter, it is added to the least square term to
regression algorithms: obtain an extremely low variance.
(1) Linear Regression (5) ElasticNet Regression
It is one of the modeling techniques that are the most famil- Similar to ridge regression, Lasso also penalizes the absolute
iar to people. Generally speaking, linear regression is one of the values of regression coefficients [8]. Additionally, it can reduce the
preferred techniques when people are learning prediction models. changing degree and improve the accuracy of linear regression
In this technique [6], dependent variables are continuous while model. ElasticNet is the combination of Lasso and Ridge regression
independent variables can be continuous or discrete. Regression techniques. It uses L1 regularization for training and regards L2 as
line is linear in nature. In linear regression, the best fitting straight regularization matrix in priority. When there are several correla-
line (that is, the regression line) is used to construct a kind of tion characteristics, ElasticNet is useful. Lasso randomly selects one
relationship between the dependent variable (Y ) and one or more of them while ElasticNet selects two.
independent variables (X ). A simple equation, Y = a + bX + e, is
β̂ = argmin ∥y − X β∥2 + λ2 ∥β∥2 + λ1 ∥β∥1 .
( )
used to express it, where ‘‘a’’ refers to the intercept, ‘‘b’’ refers to (6)
the slope of the straight line and ‘‘e’’ refers to the error term. The β
equation can be adopted to predict the values of targeted variables (6) Bayesian Linear Regression
according to the given prediction variables (X ). There are huge differences between Bayesian linear regres-
(2) Polynomial Regression sion model and classic linear regression model. The latter regards
Though linear regression predicts the overall tendency, under- regression coefficients as fixed unknown parameters while the
fitting phenomenon appears because linear regression solves for former regards regression coefficient as an unknown probability
the unbiased estimation of the minimum mean square deviation. distribution. Subsequently, these unknown distributions can be
As data fluctuates surrounding the straight line, polynomial regres- deduced according to available samples. In calculating the distri-
sion is introduced, allowing some deviations in the estimation, so bution of variables to be predicted, sampling should be conducted
as to reduce the prediction mean square deviation, which finally
when regression coefficients distribute in given independent vari-
appears to be a broken line. The formula of polynomial algorithm
ables so as to acquire the distribution of variables to be pre-
is listed as below:
dicted. Thus, generally speaking, the calculated amount of Bayesian
ŵ = (X T WX )−1 X T Wy . (3) model [8] training and prediction with such kinds of models is
usually larger than that of regular linear regression.
W refers to a quotation that endows weight for each data Step 2: Ensemble combination
point. Kernel functions are used to endow higher weight to the Ensemble combination [9] is accomplished through coordina-
neighboring points and the formula is as follows: tion and mutual compensation between individual learners. Vari-
⏐x(i) − x⏐ ous different combinations basically accord with the basic princi-
[⏐ ⏐]
w(i, i) = exp . (4) ple. When ensemble learning is applied to solve regression prob-
−2k2 lems, the most commonly-used combination is linear combination,
(3) Stepwise Regression which can be divided into simple mean combination and weighted
In processing multiple independent variables, regression of mean combination in detail. When it is applied to solve classi-
such form can be used. In such technique, the choice of inde- fication problems, the most commonly-used combination [10] is
pendent variables is accomplished in an automatic process that majority voting that includes relative majority voting and absolute
includes non-manual operation. Important variables are identified majority voting.
through values that are observed and counted, such as R-square, t- Besides the commonly-used ensemble combinations men-
stats and AIC indicator. Stepwise regression fits the model through tioned above, there are various other ensemble combinations in
simultaneous addition/deletion of covariant based on designated existing literature. Some scholars use learning algorithms, such as
standard. Such modeling technique aims at using the least number nervous network, to conduct re-learning on the new input space
of prediction variables to maximize the prediction ability, which is consisting of the prediction results of individual learners so as to
also one of the methods that process high dimensional datasets. realize the trainable nonlinear combination. Bayesian method and
(4) Ridge Regression Bayesian network method are also used for ensemble combination.
Ridge regression [7] is an algorithm for data with multi- Additionally, mixed-expert system allocates different learners to
collinearity (high correlation between independent variables). different local regions in problem space. Affiliated access network
Under the multicollinearity circumstance, though least square is used to determine which learners to choose each time, train-
method (OLS) fairly treats each variable, the differences are large, ing is necessary for learners and access networks. In this paper,
leading to deviation of observation values from the actual value. weighted combination and dynamic weighted combination are
Ridge regression is a kind of biased estimation regression method, used separately for ensemble combination and the effects of the
which is an improved least square estimation in essence. Through two combinations are evaluated and compared in the following
abandoning the unbiased nature of least square method, it is a chapters.
more actual and reliable regression method to acquire regression In the stage of ensemble combination (the prediction results of
coefficients at the price of losing partial information and reducing combination of individual learners), if the targeted output corre-
accuracy whose fitness with abnormal data is stronger than that sponding to input of xi is yi , the mapping relation, f : xi → yi ,
of least square method. In a linear equation, Y = a+ b1 x1 + exists between input and output. Through certain individual gen-
· · · + bn xn + e, e can be divided into two subcomponents, namely, erating methods, the given training dataset, D = {(xi , yi )}Ni=1 , trains
deviation and variance. Prediction errors may result from the two T learning devices, {f1 , f2 , . . . , fT }, to constitute the set of individual
subcomponents or either of them. Through the shrinkage param-
learners (F0 = {ft }Tt=1 ). Each learner in F0 is an approximation of
eter λ, ridge regression solves the multicollinearity problem. For
the function f . The output of ensemble learners can be expressed
detailed condition, please refer to the formula below:
by following equation:
β̂ = argmin∥y − X β∥22 λ ∥β∥22 . (5)
β∈Rp fˆ (xi ) = g(wt , ft (xi , yi )) (t = 1, 2, . . . , T ) (7)
14 Y. Wang et al. / Future Generation Computer Systems 85 (2018) 9–18

where, T refers to the number of learners; g(·) refers to the combi-


nation method of output results of individual learners; wt refers
to the combination weighted value corresponding to individual
learner.
(1) Weighted combination method
In weighted mean method, the mean square error (MSE) of
prediction output fˆ (xi ) and targeted output yi is frequently utilized
to calculate the combination weighted values of learning devices.
The calculation formula of MSE is shown as below:
N N T
1 ∑ 1 ∑ ∑
MSE = (yi − fˆ (xi ))2 = (yi − wt ft (xi ))2 . (8)
N N
i=1 i=1 t =1

If the sum of weighted values is limited to 1, the least and Fig. 3. The error rate of training and testing sets of eight kinds of algorithms.
optimal weighted value of MSE is:
T
1 ∑ Ctj−1
wt = ∑T ∑T (9)
N
j=1 k=1 j=1 Ckj−1

where,
N
1 ∑
Ctj = (yi − ft (x)) (yi − fj (xi )). (10)
N
I =1

If the weighted values are not limited, the least and optimal
weighted value of MSE should be defined as:
)−1
wt = hTit hit hTit yi .
(
(11)

where,

hit = ht (xi ) (1 ≤ i ≤ N , 1 ≤ t ≤ T ) , yi = f (x − i). (12) Fig. 4. The nonlinear relationship between some key indicators and Air Latency.

(2) Dynamic weighted combination


As for dynamic weighted combination, the given training data
3.2. Model testing
set D = {(xi , yi )}Ni=1 often further divides D into the training dataset
Dtrain and the validation data set Dv alidation . Dtrain is used for the
According to the MR wireless measurement report and signal-
training of individual learners; Dv alidation is adopted for the judg-
ing data from March 16, 2016 to May 16, 2016 of cells in a certain
ment of terminating condition for individual learner training and
region, an ensemble machine learner is researched and developed
the subsequent selective ensemble. Firstly, a group of individual
through integration of six kinds of regression algorithms. The sep-
learners F0 = {ft }Tt=1 are trained in Dtrain where T refers to the arate error rates of six kinds of single sub learner and two kinds
number of learners. As for the given new sample data xnew , the of ensemble learning device (weighted combination and dynamic
final output result of ensemble learners can be expressed as: weighted combination) algorithms in training and testing sets are
T
∑ displayed in Fig. 3:
ynew = wt ft (xnew ) . (13) It can be indicated from Fig. 3 that the error rates of the two
t =1
kinds of ensemble machine learners are lower than that of the
other six kinds of algorithms. As for effects, weighted combination
As for dynamic weighted combination, several sample data algorithm is better than dynamic weighted combination algorithm,
that is the nearest to the new sample data xnew is selected in the showing more significant learning effects. With the application
validation dataset Dv alidation . The prediction errors of each learner of ensemble machine learner, the nonlinear relationship between
ft in F0 on the nearest neighbor sample are regarded as a basis some key indicators and Air Latency is acquired.
for determining the combination weighted values. If the prediction It can be indicated from Fig. 4 that the turning point of the
errors of learners on the nearest neighbor sample have similar pre- curve locates between [−3,−5]. When RSRQ value surpasses −4.5,
diction errors with sample data xnew , the combination weighted Air Latency indicator sharply declines. Thus, if RSRQ indicator
value can be expressed as: surpasses a certain threshold, users’ Internet perception will be
influenced.
wt = h(Dvalidation , xnew , Dnear (xnew )) (14)
4. Abnormally detection model for data services based on tem-
where, Dnear (xnew ) refers to the nearest neighbor sample set of xnew poral characteristics
in Dv alidation . The combination method based on dynamic weight
mainly consists of three steps: (1) determine the nearest neighbor Research and analysis of Section 2 shows that, except for air
sample set of new sample data xnew in Dv alidation ; (2) evaluate the latency, the main influential factors of ‘‘generalization latency’’ also
prediction error of each learner in F0 on Dnear (xnew ); (3) determine include export latency from the core network to the application
the combination weighted value according to the prediction error server. Latency fluctuation of the data-based services application
of each learner on Dnear (xnew ). servers will directly affect users’ experience towards accessing
Y. Wang et al. / Future Generation Computer Systems 85 (2018) 9–18 15

data-based services. Consumption of the services becomes more the probability of each record that belongs to the cluster Zi = k ∈
diversified, with data-based services such as web browsing, video {1, . . . , K } and the possibility is defined as αk,s . When potential
communication or streaming. And the type of content in the diver- cluster Z is known, time is no longer dependence. To simplify
sified services has different requirements for latency. Moreover, calculation, the independence hypothesis between network access
due to different traffic load at different time periods, the latency indicator data and time is conditionally independent if the cluster
will be different as well. When collecting big data of telecommu- category Z is known. Each variable (Xi |Zi = k) obeys the Gaussian
nications, both the traffic load and latency of data services at all distribution whose mean value is µk and variance is Σk ; for all
times. Therefore, temporal information will usually be collected i, P (Xi |Di , Zi ) = P (Xi |Zi ).
simultaneously. For example, a normal value of the peak hours may In order to solve the problem, the following decomposition is
not only be an abnormal value, but unable to be detected. When conducted (the independence hypothesis with known category Z
adding time characteristics to a model, periodic behavior on each is used for the sum of the first factor):
time axis is observed. To highlight the weight of different data- ∑
service indicators as well as the trend where the indicators change P(Xi |Di = s) = P(Xi |Zi = k)P(Zi = k|Di = s). (15)
over time, some anomaly detection algorithms based on single k

indicator, whose threshold value is designated, are not applicable EM algorithm is generally used for parameter estimation. Due
any longer. to the introduction of time feature Di = s into Gaussian mix-
As telecommunication data-service is affected by time, phys- ture model, the model parameter αk,s satisfies:
ical and data-service scenarios and various other factors. Based ∑
on such characteristics, modified Gaussian mixture model intro- αk,s = 1. (16)
ducing temporal characteristics are researched and developed, so k
as to mine the data-service of abnormal latency. This algorithm Probability density function can be expressed by weighting
has not only made full use of the characteristic of Mixtures of function:
Gaussians — adapted to complex scenarios, but has introduced
−1
the temporal characteristic value based on the model, promoting ∑ ∑
detection accuracy. P(Xi ) = αk,s Nk (Xi ; µk , (x − µk )). (17)
k k

4.1. Presentation of models N (x; µ, Σ ) is used to express the Gaussian density of parame-
ters µ and Σ . If time feature di = s, Is is used to express the set of
As researched in this paper, anomaly detection model [11] is a subscripts i.
new model that introduces the correlation between data features −1
and time properties based on Gaussian mixture model. Further- 1 ∑
Nk (Xi ; µk , Σk ) = exp[− (x − µk )T (x − µk )]. (18)
more, the relationship between potential cluster and time axis is 2
k
also introduced. Meanwhile, only the correlation inside the vari-
ables is considered. In other words, when potential cluster Z is In order to obtain the final parameter, the algorithm is described
known, time is no longer the dependence. To facilitate calculation, as below:
the independence hypothesis between data and time axis is con- Step 1: as for all k and i, when Xi = xi and Di = di , calculate the
ditionally independent if the cluster category Z is known. Here are possibility of Zi = k. If the posterior probability of αk,s is βk,s , then
the detailed model steps. K

Network access indicator data of a certain telecom operators βk,s = Nk (Xi ; µk , Σk ) αk,di / N (Xi |µl , Σl ) αl,di . (19)
from December 15, 2015 to July 26, 2016 is taken as the example. l=1
Effective sample data is acquired after removing the zero and
Step 2: firstly, as for all k and s, calculate Sk,s
missing values of page display latency (feature 1), http success ratio
(feature 2) and tcp response success ratio (feature 3). The first fifty #IS

thousand lines of the three indicators from June 29 to July 26 are Sk,s = βk,IS(j) . (20)
taken as the modeling sample. X refers to indicators dataset that j=1
contains N values whose subscript is i. Each value is p-dimensional
Step 3: as for all k and s, update the possibility αk,s
vectors and p is the number of features. In addition, each feature
value is assumed to be continuous; D refers to the category number (t )
Sk,s
of time axis sets that also contain N values. Because it is a periodic αk,s = ∑K (t )
. (21)
day and each value di corresponds to each hour of one day, values l=1 S l,s
are taken as {0, . . . , 23} (see Table 3). Step 4: as for all k, update the mean value µk :
Anomaly detection model proposed in this paper is based on ∑N
Gaussian mixture model that is the extension of a single Gaussian βk,i
µk = ∑iN=1 . (22)
l=1 βk,i
probability density function. If each point is generated by a single
Gaussian distribution, the group of data is generated by M single
Gaussian models [12]; it remains unknown which single Gaussian Step 5: as for all k, update covariance matrix:
model a specific data belongs to; the proportion αk of each single
∑N
(xi − µk )T (xi − µk ) βk,i
Gaussian model in mixture models is unknown; all the data points Σk = i=1
∑N . (23)
from different distribution are mixed together; the distribution is i=1 βk,i

named as Gaussian mixture model. EM algorithm is generally used As for the algorithm based on Gaussian mixture model [14]
to estimate the parameters of Gaussian mixture model [13]. that introduces time feature, there are three kinds of parameters.
The model stated in this paper is a kind of new model that Because the mean value µk (k = 1, . . . , K ) is the same for each
introduces the correlation between the data features and time D, there are K parameters in total; because the variance αk (k =
feature based on traditional Gaussian mixture model. As for each 1, . . . , K ) is the same for each D, there are K parameters in
∑total; if
s ∈{0, . . . , 23}, if Di = s, due to the introduction of time feature Di , the weight of each category αk,s = P(Zi = k | Di = s and k αk,s =
there is a correlation relationship between the time feature s and 1, there are D ∗ (K − 1) parameters in total. As for the model, there
16 Y. Wang et al. / Future Generation Computer Systems 85 (2018) 9–18

Table 3
The start-stop signaling of the air signaling control plane and user plane.
Time Feature 1 Feature 2 Feature 3 X D
06/29 0:00 340 80.85 60.87 (340,80.85, 60.87) 0
06/29 0:30 321 84.54 89.21 (321,84.54, 89.21) 0
... ... ... ... ...
07/26 23:30 368 76.38 82.70 (368,76.38, 82.70) 23

are (2 K + DK − D) parameters. As for Gaussian mixture model


that is divided according D, there are D ∗ (3 K − 1) parameters.
When D > 1, 3DK − D > 2 K + DK − D. So there are fewer model
parameters mentioned above in this paper.

4.2. Calculation and results

Data originates from telecom operators and network access


indicator data from December 15, 2015 to July 26, 2016 is taken
as the example. Effective sample data is acquired after removing
all zero and missing values of page display air latency (feature
1), http success ratio (feature 2) and tcp response success ratio
(feature 3). In data processing, the data lines with missing val-
ues are firstly deleted. Moreover, only numerical values and time
axis are considered without considering cell ID in the calculation (a) tcprespsucpercent & webdelaytime.
process. Some characteristic values are non-negative only with a
skewed distribution. Thus, pre-treatment is conducted on some
features with algorithms. In order to maintain the interpretability
of variables, feature normalization is not conducted on variables.

4.2.1. Anomaly detection effect based on the model


Taking the two-dimensional data consisting of http success
rate and page display latency as the example, the samples are
integrated as six categories and time is divided into three parts:
(0, 7), (8, 15) and (16, 23). The six categories separately include all
the points in the section of (0, 100), in the section of {0} × [0, 100),
in the section of (0, 100] × {100}, in the section of [2, 7] × [70,
95], in the section of [2, 9] × [60, 75] and in the section of [2,
9] × [20, 60]. Fig. 5 reviews the algorithm results of three sets of
two-dimensional variables. Fig. 6 reviews the algorithm results of
(b) tcprespsucpercent & httpsucpercent.
three-dimensional variables.
The result reviews that the algorithms demonstrated in this
paper can detect anomaly values in the situation based on time.
All the anomaly values and anomaly values depending on con-
text (orange mark) can be detected. The accuracy of non-peak-
hour anomaly detection is also improved and the anomaly value
corresponding to the period can also be detected. In practical
application, as for the problem of anomaly detection, detection
effects can be directly and effectively improved through likelihood
calculation. In one cell, with the repeated emergence of low likeli-
hood scores, the anomaly values can be detected.
When it comes to the computation of models, anomaly de-
tection is merely accurate detection based on likelihood value.
Dynamic detection of continuous likelihood values not only en-
hances the reliability of alarms, but also reduces the number of
false alarms. In addition, models trained in fixed datasets can be ex- (c) httpsucpercent & webdelaytime.
tended on real-time data flows in an online mode. As a result, new
models can be updated quickly, thereby improving the response
Fig. 5. Results of three sets of two-dimensional variables.
rate and the detection efficiency of anomaly values.

4.2.2. Comparison with results of K-MEANS, Gaussian mixture algo-


rithms values are the lowest in each model are defined as anomaly points.
K -MEANS model, Gaussian mixture model and the algorithm In the clustering process, the number K of selected clustering is 5.
presented in this paper are conducted on sample data; Separate The results are shown in Fig. 7. In Fig. 7(a) modeling is con-
training is conducted on the three models and the likelihood value ducted on the whole dataset according to K -MEANS model and
of each model is calculated. In the sample set, it is expected to dis- there is no anomaly points found; in Fig. 7(b), according to Gaus-
cover three anomaly values and the three points whose likelihood sian mixture model, each time period is trained and only the
Y. Wang et al. / Future Generation Computer Systems 85 (2018) 9–18 17

(a) The result of K -MEANS model.

Fig. 6. Results of three-dimensional model algorithm.

anomaly point at 20: 00 is discovered; in Fig. 7(c), with algorithm


modeling presented in this chapter, four anomaly values are suc-
cessively detected.
Although the Gaussian mixture model and the model intro-
duced in this paper achieve similar detection results, they differ
from each other in respect to anomaly detection methods. The
model introduced in this paper takes into account both time axis
and data value, namely, estimates all parameters simultaneously.
Therefore, consecutive dates have similar clustering features. As (b) The result of Gaussian mixture model.
for the Gaussian mixture model, it trains parameters of each short
period of time separately and identifies no relations between
clusters of different time periods. The model introduced in this
paper allows users to adjust the number of clusters per period. For
example, the number of clusters specified can differ among time
periods. In the Gaussian mixture model, however, the number of
clusters for each short period of time is specified [15]. This model,
which has been applied to practical work by telecom operators, is
mainly used to detect quality of the various data services. Take the
data service with a total number of 4 million users for example.
After de-duplication, 50 million different URL addresses shall be
visited every day, involving a large amount of data. When this
model is applied, telecom operators can set a warning threshold for
anomaly detection [16]. The detection method is static and simple,
and improves detection efficiency directly and effectively through (c) The result of algorithm modeling presented in this paper.
likelihood calculation.
Fig. 7. The anomaly detection result diagram of the three model algorithms.
5. Conclusions

Based on user perception, this paper has proposed the theory


of generalization latency of telecommunication network, and has Acknowledgment
also profoundly analyzed the main influential factors of general-
This work was supported by the National Natural Science Foun-
ization latency, mainly embodied by air latency and export latency
dation of China under Grant 61571355.
from the core network to the application server. The mapping
model where generalization latency is associated with perfor-
References
mance indicators of telecommunication network under different
application services feature scenarios is worked out, which can [1] R. Cohen, G. Grebla, Joint scheduling and fast cell selection in OFDMA wireless
effectively map and forecast air latency, based on saving the large- networks, IEEE/ACM Trans. Netw. 23 (1) (2015) 114–125.
scale investment to the collection of air signaling, consequently [2] Alexander Engels, Michael Reyer, Xiang Xu, Rudolf Mathar, Jietao Zhang,
predicting the abnormal inflection point of network performance. Hongcheng Zhuang, Autonomous self-optimization of coverage and capacity
In the meanwhile, an anomaly detection model for data services in LTE cellular networks, IEEE Trans. Veh. Technol. 62 (5) (2013) 1989–2004.
[3] David Hsu, Comparison of integrated clustering methods for accurate and-
introducing temporal characteristics is worked out, which can
stable prediction of building energy consumption data, Appl. Energy 160 (3)
effectively conduct anomaly detection to the performance stability (2015) 153–163.
of the application layer under the staggered influence of temporal [4] Song he, Haochen He, ICM: a web server for integrated clustering of multi-
scenarios, application services scenarios and various other factors. dimensional biomedical data, Nucleic Acid Res. 44 (W1) (2016) W154.
The experimental result indicates that, the proposed algorithm has [5] L. Akoglu, H. Tong, D. Koutra, Graph based anomaly detection and description:
high efficiency and correlation. a survey, Data Min. Knowl. Discov. 29 (3) (2014) 626–688.
18 Y. Wang et al. / Future Generation Computer Systems 85 (2018) 9–18

[6] R. Li, Z. Wang, C. Gu, F. Li, H. Wu, A novel time-of-use tariff design basedon Zhensen Wu (M’97–SM’04) received the B.Sc. degree
Gaussian mixture model, Appl. Energy 162 (2016) 1530–1536. in applied physics from Xi’an Jiaotong University, Xi’an,
[7] D. Agarwal, Detecting anomalies in cross-classified streams: a Bayesian ap- China, in 1969 and the M.Sc. degree in space physics from
proach, Knowl. Inf. Syst. 11 (1) (2007) 29–44. Wuhan University, Wuhan, China, in 1981. He is currently
a Professor at Xidian University, Xi’an, China. From 1995 to
[8] L. Chen, J. Zheng, Selective transfer learning for cross domain recommendation,
2001, he was invited multiple times as a Visiting Professor
in: SDM. 2013, pp. 641-649.
to Rouen University, France, for implementing joint study
[9] M.T. Chiang, B. Mirkin, Intelligent choice of the number of clusters in k-
of two projects supported by the Sino-France Program for
means clustering: an experimental study with different cluster spreads, J.
Advanced Research. His research interests include elec-
Classification 27 (1) (2010) 3–40. tromagnetic and optical waves in random media, optical
[10] Alexi Delgado, I. Romero, Environmental conflict analysis using an integrated wave propagation and scattering, and ionospheric radio
grey clustering and entropy-weight method, 77(C), 2016, 108-121. propagation.
[11] A. Patcha, J.M. Park, An overview of anomaly detection techniques: Existing
solutions and latest technological trends, Comput. Netw. 51 (12) (2007) 3448–
3470. Yuanjian Zhu received the Bachelor’s degree in informa-
[12] A.K. Jain, Data clustering: 50 years beyond K-means, Pattern Recognit. Lett. tion engineering from Jilin University and the M.S. degree
31 (8) (2010) 651–666. in electromagnetic and microwave technology from South
[13] D. Hsu, Identifying key variables and interactions in statistical models of East university, China, in 2007 and 2010, respectively.
building energy consumption using regularization, Energy 83 (4) (2015) 144
5.
[14] D. Wang, Integrated dynamic evaluation of depletion-drive performancein
naturally fractured-vuggy carbonate reservoirs using DPSOFCM clustering,
Fuel 181 (2016) 996–1010.
[15] J.D. Banfield, A.E. Raftery, Model-based Gaussian and non-Gaussian clustering,
Biometrics 49 (3) (1993) 803–821.
[16] Asiya Khan, Lingfen Sun, QoE prediction model and its application invideo
quality adaptation over UMTS networks, IEEE Trans. Multimedia 14 (2) (2012) Pei Zhang received the Bachelor’s degree in communica-
431–442. tion engineering and the M.S. degree in communication
and information systems from Nanjing University of posts
and telecommunications, China, in 2009 and 2012, respec-
tively.
Yan Wang received the Bachelor’s degree in communi-
cation engineering from Xidian University and the M.S.
degree in communication and information systems from
Huazhong university of science and technology, China, in
2004 and 2007, respectively. Since 2015, she has been
working towards the Ph.D. degree at Xidian University,
Xi’an, China. Her current work concerns the big data anal-
ysis of mobile communication.

You might also like