You are on page 1of 21

JID: NEUCOM

ARTICLE IN PRESS [m5G;January 3, 2020;16:18]


Neurocomputing xxx (xxxx) xxx

Contents lists available at ScienceDirect

Neurocomputing
journal homepage: www.elsevier.com/locate/neucom

Evaluating QoE in VoIP networks with QoS mapping and machine


learning algorithms
ZhiGuo Hu a,b,∗, HongRen Yan a, Tao Yan a,b, HaiJun Geng a,c, GuoQing Liu a
a
Institute of Big Data Science and Industry, Shanxi University, Taiyuan 030006, China
b
School of Computer and Information Technology, Shanxi University, Taiyuan 030006, China
c
School of Software Engineering, Shanxi University, Taiyuan 030006, China

a r t i c l e i n f o a b s t r a c t

Article history: The quality of experience (QoE) of the end-users is a critical criterion of measurement in VoIP (Voice over
Received 26 February 2019 Internet Protocol) systems for technical and commercial purposes. We investigate how quality of service
Revised 19 October 2019
(QoS) influences QoE and assesses the QoE in VoIP communication. Our contributions are three-fold. First,
Accepted 16 December 2019
the impacts of QoS on QoE are comprehensively analyzed by experimental means and an association test
Available online xxx
method, instead of independently studying each parameter. Second, an algorithm is proposed to integrate
Communicated by Dr F.A Khan the effects of QoS parameters with spatial or temporal characteristics on QoE. Third, we apply machine
learning regression algorithms with QoS impairments, noise and echo impairments to nonintrusive voice
Keywords:
Quality of Service quality prediction in different network environments. The results from numerous experiments show that
VoIP fairly accurate prediction can be obtained from these models. Our work will achieve a more accurate
Association Test evaluation of the QoE in VoIP by using QoS parameters, clarify the influence of IP network environments,
Quality of Experience noise and echo impairments on the quality and reliability of VoIP traffic, and provide QoS parameter
Machine Learning requirements for the VoIP application that runs at the desired QoE level.
© 2019 Elsevier B.V. All rights reserved.

1. Introduction plication or service remain at a "imperceptible" level. Table 1 list


the MOS score and their correspondence to end-users satisfaction.
VoIP (Voice over IP) is the transmission of voice over IP(Internet In terms of QoS, it is a technical concept, whose statistics tell
Protocol) network, which is one of the main applications of the end-users very little about the level of QoE. But the QoS strongly
Internet. The use of high-quality VoIP services in IP networks is influence the performance of IP network application or service.
prevalent. These services boost the working efficiency of users and Good QoS always means less packet loss and jitter, more band-
facilitate their daily communication [1]. However, the IP network width. In general, good QoS is one of the prerequisites for guaran-
only offers best-effort service, and quality of service (QoS) impair- teeing desired QoE. ITU-T Y.1541 defining QoS classes for multiple
ments such as packet loss, delay, and jitter, have substantial im- categories of applications (multimedia conferencing, digital video,
pacts on the VoIP quality. Since QoS cannot directly characterize and interactive data transfer, el.al), as shown in Table 2, is intended
a user’s subjective perception, the quality of experience (QoE) has to be a standard of SLA (service level agreement) between end-
been proposed to measure the quality of a particular service or users and network service providers [5]. It can serve to the judge-
network [2]. ment on the good or bad of the QoS, but it has not established a
QoE indicates user’s perception and user’s satisfaction subjec- direct mapping to the QoE.
tively [3]. It can been expressed by the 5-level mean opinion Even though there have been sequential researches carried out
score (MOS), excellent, good, fair, poor, bad (scored respectively to bridge QoS and QoE [6–9], the impact of the QoS on the QoE for
5,4,3,2,1). For example, if the score of MOS is 5, which indicate that VoIP is not well understood [10].
an application or service can provide QoE at “excellent” level for The majority of previous studies have concentrated on the im-
end users, which is equivalent to say, the impairments of the ap- pairments brought to QoE by parameters, packet loss and delay.
However, these studies disregard the role of jitter and bandwidth;
the temporal and spatial characteristics of network impairments

Corresponding author at: Institute of Big Data Science and Industry, Shanxi Uni- have not been conspicuously stated in these studies; the impor-
versity, No.92 Wucheng Rd., Taiyuan, Shanxi province, China. tance of each impairment that affects user satisfaction has not
E-mail address: huzhiguotj@sxu.edu.cn (Z. Hu).

https://doi.org/10.1016/j.neucom.2019.12.072
0925-2312/© 2019 Elsevier B.V. All rights reserved.

Please cite this article as: Z. Hu, H. Yan and T. Yan et al., Evaluating QoE in VoIP networks with QoS mapping and machine learning
algorithms, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.12.072
JID: NEUCOM
ARTICLE IN PRESS [m5G;January 3, 2020;16:18]

2 Z. Hu, H. Yan and T. Yan et al. / Neurocomputing xxx (xxxx) xxx

Table 1 impairments of several types are fused by E-Model into rating fac-
MOS and its correspondence to perception of end users [3,4].
tor R, the perceptual impairment scale which can be derived by
Score Quality Perception (Description of impairment)
R = R0 − Is − Id − Ie−e f f + A (1)
5 Excellent Imperceptible
4 Good Perceptible, but not annoying where R0 is signal-to-noise ratio with the noise referring to circuit
3 Fair Slightly annoying noise and background noise, Is the impairment on voice signal
2 Poor Annoying
affected by a collection of factors nearly concurrently, Id the
1 Bad Very annoying
impairment by the delay factor, Ie−e f f (or Ie ) being the effect of
information loss due to the encoding scheme and packet loss, and
been distinguished and the respective threshold ranges of distinct A is the factor that adapts the quality value [19]. Higher R indicates
parameters given the specific QoE level have not been provided; a better quality of voice. E-Model is a combination of determined
and the models that map QoS parameters to QoE do not employ a empirical formulae which applies only to restricted network con-
holistic approach to these parameters. Employing machine learning ditions and a limited number of codecs. However, the impairment
algorithms to evaluate QoE is a common method but the selection by the jitter factor is not taken into consideration in this model.
of an appropriate learning method remains an unresolved issue. Most studies are primarily concerned with the effect of packet
In this paper, instead of independently checking each QoS, the loss [20–26]. The method proposed in [20], evaluated the influence
authors analyze the impact of these impairments on QoE and de- of packet loss on Skype quality by a set of formulae according to
termine the threshold ranges of different parameters on QoE levels. subjective MOS measurement and simplified E-Model. In [21], the
Exerting the maximal information coefficient (MIC) [11] and dis- Thai language is estimated by using ACR listening opinion tests,
tance correlation algorithm [12], we notice that the jitter parame- where G.726, G.729 codec and random packet loss were employed.
ter has a strong nonlinear correlation with QoE and the temporal In [22,23], the authors addressed the effect of bursty losses on
and spatial characteristics of QoS are introduced to obtain a more VoIP, in particular, [23] derived a method of adjusting the condi-
reasonable integration of the parameter effects on QoE, which pro- tional loss probability for the Gilbert loss model as the packet in-
vides a perceptually accurate and nonintrusive voice quality pre- terval varies. In [24], the authors primarily scrutinized the effect of
diction that prevents the need for time-consuming subjective tests. packet dispersion (noticeable loss rate) on the quality of VoIP ap-
We identify the best regression learning algorithm and machine plications. In [25,26], by considering the Weber-Fechner Law (WFL)
learning algorithm for predicting or screening the voice quality. and IQX hypothesis (exponential interdependency of QoE and QoS),
Here is an arrangement of the following content in the paper. the authors use logarithmic regression
The literatures review regarding VoIP QoE evaluation are included
in the Section 2. In Section 3, an experiment platform is designed Q oE = log(aQ oS + b) (2)
to emulate the VoIP traffic, and the spatial and temporal charac- and exponential regression
teristics of QoS impairments are described. Section 4 presents our
method to estimate the listening voice quality and conversational QoE = ae−bQoS + γ (3)
voice quality. In Section 5, machine learning regression algorithms to indicate the relation between QoE and packet loss. However,
are developed for predicting the QoE in VoIP applications. Finally, WFL and IQX have only been employed with a single input pa-
we make conclusions and present prospective work in Section 6. rameter/metric.
Other literature studied the consequence of packet loss and de-
2. Related work lay with respect to QoE using ITU’s E-Model [27–29]. In [27], the
authors noticed that the correlation between packet loss and delay
In real-time voice communications, MOS test has been ex- did not satisfy the linearity after assessing the voice quality of PCM
tensively regarded as the QoE rating standard [13]. The test has and G.728 codec varying parameters of packet loss and delay. In
five score levels: bad-1, poor-2, fair-3, good-4, and excellent-5. [28], the function of a few internet backbone links was evaluated.
MOS value are the result of the objective or subjective measure- The analysis indicated VoIP quality substantially relied on not only
ment [14]. However, executing subjective MOS measurements is the link quality of the provider but also the playout buffer scheme.
time-consuming, expensive, and irreproducible, which hinders An approach that considered the interactivity of voice communi-
the suitability of objective methods. Exemplars of the intrusive cations was developed by Sun and Ifeachor [29], where E-Model
objective method are Perceptual Speech Quality Measure (PSQM) and PESQ are combined to evaluate the voice quality, and they pro-
[15], Measuring Normalizing Blocks (MNB) [16], Perceptual Anal- posed a nonlinear regression model of voice quality:
ysis Measurement System (PAMS) [17], and Perceptual Evaluation
of Speech Quality (PESQ) [18]. Although the intrusive objective MOSC = a + bx + cy + dx2 + ey2 + f xy + gx3 + hy3 + ixy2 + jx2 y
method highly correlates with subjective tests, it evaluates speech
(4)
quality on the basis of original speech whence not a practical
monitoring tool on the live traffic. The E-Model is the most exten- where x is the packet loss rate, and y represents the end-to-end
sively applied nonintrusive objective assessment method, it was delay. However, no assessments have been performed to figure
originally designed for conventional network planning [19]. In [19], out how jitter and jitter buffer relate to the VoIP QoE in [29]. In

Table 2
IP network QoS class definitions and network performance objectives [5].

QoS parameter QoS classes

Class 0 Class 1 Class 2 Class 3 Class 4 Class 5

IPTD 100 ms 400 ms 100 ms 400 ms 1s U


IPTV 50ms 50ms U U U U
IPLR 1 × 10−3 1 × 10−3 1 × 10−3 1 × 10−3 1 × 10−3 U

Note “U” means “unspecified” or “unbounded”; IPTD: IP packet delay variation;.


IPTV:IP packet transfer delay; IPLR: IP packet loss ration.

Please cite this article as: Z. Hu, H. Yan and T. Yan et al., Evaluating QoE in VoIP networks with QoS mapping and machine learning
algorithms, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.12.072
JID: NEUCOM
ARTICLE IN PRESS [m5G;January 3, 2020;16:18]

Z. Hu, H. Yan and T. Yan et al. / Neurocomputing xxx (xxxx) xxx 3

Fig. 1. Conceptual diagram of VoIP system and test-bed framework.

[30], the voice quality of cloud-based trading communication sys- of the VoIP component. The median part shows the end-to-end
tem was evaluated by the simplified E-Model, showing delay im- VoIP components from sender to receiver: after original signals
pairment has less negative impact on voice quality than jitter and from the sender are sporadically sampled by the encoder which
packet loss rate do. In [31], in line with native Thai users, a sim- subsequently creates a constant bit rate stream, then the packe-
plified E-Model was proposed for VoIP quality assessment of G.729 tizer take the stream into IP packets; in the network transmission
codec, the authors suggested the jitter impairment factor should component, IP packets transport voice data on the network; at the
be further studied. receiver, the playback buffer provides a smooth playout to allevi-
In [32], the equipment impairment factor of the E-Model has ate delay variations (jitter), the received packets are carried to the
been extended by utilizing an artificial neural network (ANN). In depacketizer retrieved by the decoder. The top part processes the
addition, the use of machine learning algorithms to establish the voice quality by PESQ and converts its result to the MOS.
mapping relationship between QoS and QoE is an important direc- To generate the experiment data sets that corresponds to dif-
tion of current research. For example, genetic programming [33], ferent network parameters, two main software tools (NIST Net
deep belief networks [34], active learning [35] and fuzzy evidence network emulator [42] and OpenPhone VoIP application [43]) are
theory [36] have been deployed in inspections. In [37], for web employed in our tests. The VoIP application runs on the end PCs
surfing service, the authors analyze the relationship between QoS (Personal Computers, typically equipped with a microphone and
and web QoE, and predict user expectation to the network state by loudspeakers) that send voice samples from the source to the des-
using several machine learning algorithms. tination based on an input voice recording file and controls the
Charonyktakis et al. proposed the MLQoE method, which ap- encoding algorithm and the packetization interval. NIST Net is set
plied multiple machine learning algorithms (ANN, Support Vec- on the Linux Router to emulate different network transmission en-
tor Machine, Decision Trees, etc.) and nested cross-validation to vironments. This voice signal is sent through NIST Net, which en-
evaluate the VoIP QoE [11]. Although the MLQoE dominates other abled us to study the application quality of various network im-
methods, the precision of the data set used in the experiment pairments.
has a large deviation from the desired accuracy suggested by As shown in Fig. 1, if we do not consider the echo impairment
the ITU-T standard [38,39]. In [40], the authors think that the due to devices (microphone and loudspeaker), the key impairment
clear/comprehensive manual on the available parametric models factors of VoIP can be divided into two types: digital-to-analog
and the critical QoE performance parameters per service type (D/A) reciprocal transformation impairment, when the voice signal
which is currently missing. In [41], the authors analyzed the re- is encoded/decoded and packetized/depacketized in a VoIP system,
lationship between QoS and QoE in geostationary satellite system, the other one is network impairment which includes packet loss,
but it more focused on the design and use of the experimental delay, jitter, bandwidth and etc.
platform than on the QoE evaluation model itself.
As previously mentioned, few limitations exist, which we
intend to overcome in this paper. First, the majority of achieve- 3.2. Network impairments and its spatial and temporal characteristics
ments concentrate on the effect of packet loss on QoE, few
studies explore the influence of jitter and jitter buffer variation 3.2.1. Network impairments
on QoE has. Second, there are hardly any mapping models directly Generally, in a VoIP system, the primary QoE impairments in-
used for monitoring QoE in view of multiple QoS parameters clude packet loss, delay, jitter (or delay variation) and bandwidth
simultaneously. restrictions. The measurement and analysis of these parameters
have always been a focus of field of network performance research
3. Experiment description
[44–47].
To achieve a more comparative analysis, our first step is to The packet loss is calculated as follows: let PS =
build a VoIP test bed. Although the experiments performed in a { ps,1 , ps,2 , . . . , ps,n } be a group of packets being sent from sender
real network condition can be convincing, they are uncontrollable, to receiver. According to their sending sequence, the packets ps, i
unrepeatable and costly since the involvement of a large number are put in ascending order. Let PR = { pr,1 , pr,2 , . . . , pr,n } denote the
of pieces of equipment and service providers is inevitable. To cir- corresponding collection of received packets at the receiver. The
cumvent these disadvantages, the testbed described in this section packet loss can be calculated as PL = 1 − |PR |/|PS |. The delay is
is specifically designed to provide the necessary network condi- commonly called end-to-end delay, which encompasses (a) en-
tions and simplify the collection of the reference and degraded coding and packetization delay at the sender; (b) depacketization,
voice signals for listening/conversation voice quality measurement. decoding and buffering delay at the receiver; and (c) propagation
time, transmission delay, and queuing delay through a network
3.1. Experiment platform link or network element. The encoding/decoding delay and pack-
etization/depacketization delay are originated from processing
Fig. 1 illustrates the framework of the experimental platform. related to the coder. Table 3 summarizes the codec-related delay
The lower part of the figure illustrates various impairment factors of G.711 and G.729A. According to the recommendation in [48], a

Please cite this article as: Z. Hu, H. Yan and T. Yan et al., Evaluating QoE in VoIP networks with QoS mapping and machine learning
algorithms, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.12.072
JID: NEUCOM
ARTICLE IN PRESS [m5G;January 3, 2020;16:18]

4 Z. Hu, H. Yan and T. Yan et al. / Neurocomputing xxx (xxxx) xxx

Fig. 2. Spatial and temporal-related impairments.

Table 3 Fig. 2.a–c illustrates the phenomenon of packet loss and jitter,
Codec-related delay for different codecs [48].
which causes changes in the packet spatial distribution. In the fig-
Standard codec type frame Look- mean one-way ures, si and ri are the sending time of packets and arrival time
size (ms) ahead(ms) delay introduced by of packets, respectively, and pi is the playout time of packets in
coder-related
the jitter buffer. Two cases exist: in the first case, the packets are
processing (ms)
discarded during network transmission due to bit error or conges-
G.711(64 kb/s) PCM 0.125 0 0.375 tion, et.al. (as shown in Fig. 2.a); in the second case, the packets
G.729A(8 kb/s) CS-ACELP 10 5 35
have been dropped by the jitter buffer when they arrive later than
the scheduled playout deadline (as shown in Fig. 2.b). As a conse-
quence of jitter in a stream of packets, packet disorder may occur
de-jitter buffer adds one half of its peak delay to the end-to-end (as shown in Fig. 2.c). Send the packet stream p1 , p2 , p3 , p4 , p5 , p6 ,
delay. and p7 in that order. If p2 , p5, and p6 are those do not arrive in the
Jitter indicates the statistical variance of packet inter-arrival desired order while others stay correct, the received stream may
time. Different descriptions of the jitter exist: (a) the standard de- become {p1 , p3 , p2 , p4 , p7 , p5 , and p6 }. In this case, p2, p5 , p6 or
viation of the delay; (b) the mean deviation of the packet spac- even p3 , p2 , p4 , p7 , p5 , and p6 are discarded by the jitter buffer al-
ing change according to RFC 1889 [49] and c) the inter-packet de- gorithm. The spatial characteristics of packets are destroyed, which
lay variation according to RFC 3393 [50]. In NIST Net, jitter is the indicates the transmission of less information and a reduction in
simulation by the standard deviation of the delay. Although the user-perceived quality. This phenomenon is termed as the spatial-
bandwidth requirements for single VoIP applications are relatively related impairment of QoS parameters. Since modified jitter butter
low (usually below 64 kbps of voice data), they comprise an im- algorithms can enhance total user-perceived quality by handling
portant factor that cannot be disregarded because the variance in subsequent packets and disorder packets differently, They have al-
packet loss, jitter, and delay are intrinsically related to the band- ways been intensively studied [51–54].
width. By making an appropriate choice for the codec, the required Temporal-related impairments are related to the end-to-end de-
bandwidth can be controlled. In an IP network, the bandwidth lay in a VoIP system (as shown in Fig. 2.d). Longer end-to-end delay
that suffices to carry voice stream primarily depends on the en- increases the potential of voice echo, or response time, and lowers
coding type, sample period and IP/UDP/RTP/Ethernet headers. Con- conversational quality.
sider the G.711 code as an example. Each packet is sent by one Eth-
ernet frame; the payload for the G.711 encoding and 20 ms sam-
ple period is 160 octets; and the IP/UDP/RTP/Ethernet headers add
a fixed 66 octets to the payload. Thus, a total of 226 bytes is re- 3.3. Mean opinion score and metrics
quired, of which the transmission requires a unilateral bandwidth
of 90.4 kbps. Similarly, the bandwidth requirement for G.729A cod- The VoIP QoE can be estimated by subjective or objective mea-
ing is 34.4 kbps. surement, but a subjective test with the standard ITU-T P.800 must
be conducted when provided with at least 100 interviewees and
3.2.2. Spatial and temporal-related impairments of QoS parameters special lab conditions in regard to the room size, noise level, mi-
In VoIP, the speech stream divides to small frames which are crophone position. Despite measuring VoIP QoE with the best ac-
transported by the IP packets. these frames are then brought to- curacy, subjective methods are expensive, lack repeatability and are
gether again to form a continuous stream at the destination based inapplicable. Instead, objective methods, such as the PESQ and E-
on the special spatial and temporal order. In practice, jitter and Model, are the most commonly employed models. The resulting
packet loss change the distribution of the voice packets instead values of the PESQ and E-Model can be converted to MOS under
of preserving the constant inter-packet gap and orderly sequence. ITU-T Rec.P.862.1 and ITU-T Rec.G.107, respectively.

Please cite this article as: Z. Hu, H. Yan and T. Yan et al., Evaluating QoE in VoIP networks with QoS mapping and machine learning
algorithms, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.12.072
JID: NEUCOM
ARTICLE IN PRESS [m5G;January 3, 2020;16:18]

Z. Hu, H. Yan and T. Yan et al. / Neurocomputing xxx (xxxx) xxx 5

Fig. 3. MOS and its relation to PESQ [18].


Fig. 5. Voice quality of G.711.

Fig. 4. MOS and its relation to the E-Model rating R [19].


Fig. 6. Theoretical and measured value of packet loss.

ITU-T P.802.1 defines the mapping function from the P.862 PESQ
score to the MOS as statistically significant data without network impairment. Consider
G711 coding as an example, its experimental results are illustrated
y = 0.999 + 4/(1 + e−1.4945x+4.6607 ) (5) in Fig. 5. As shown in Fig. 5, all MOS scores of G.711 are near 4.4,
where x is the PESQ score and y is the corresponding MOS score. which coincide with the ITU’s standard. This platform guarantees
ITU-T Rec.G.107 describes MOS in terms of R factor, a controllable range of the D/A impairment. Although NIST Net is
famous instrument for the simulation of network conditions, we

1 R<0 inspect its source code and conduct several experiments to ensure
MOS = 1 + 0.035R + R(R − 60 )(100 − R )7 · 10−6 0 < R < 100 the correct simulation under given network conditions. In experi-
4.5 R > 100 ments, the measured value of packet loss and delay is calculated by
averaging ten individual measurement runs. By comparing the pre-
(6)
set value (theoretical value) of NIST Net and the measured value by
Figs. 3 and 4 show the transformational relation curve for the the Wireshark tool [55], we find that the error of packet loss usu-
PESQ, E-Model and MOS, respectively. ally does not exceed 0.15% and the error of delay remains below
1 ms (as shown in Figs. 6 and 7). Extra experimental analysis of
3.4. Accuracy analysis of experimental platform the accuracy and stability of NIST Net is provided in [26,42]. NIST
Net is a stable and reliable platform for simulating network im-
As analyzed in Section 3.1, the accuracy of the experimental pairments.
platform is determined by the impairment degree of digital-to- Furthermore, we did a series of experiments to analyze the
analog (D/A) transformation in the process of voice signal encod- impact of terminal equipment on voice quality. As mentioned in
ing/decoding and the performance of NIST Net. To evaluate the Section 3.1, we use the PCs (typically equipped with a build-in mi-
D/A impairment degree, we perform multiple experiments to gain crophone and loudspeaker) as the sender and the receiver of VoIP

Please cite this article as: Z. Hu, H. Yan and T. Yan et al., Evaluating QoE in VoIP networks with QoS mapping and machine learning
algorithms, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.12.072
JID: NEUCOM
ARTICLE IN PRESS [m5G;January 3, 2020;16:18]

6 Z. Hu, H. Yan and T. Yan et al. / Neurocomputing xxx (xxxx) xxx

Fig. 8. MOS vs. packet loss.


Fig. 7. Theoretical and measured value of delay.
indicates the goodness-of-fit of the fitting on the overall samples.
Table 4
Parameter settings of different terminal for desirable voice quality (MOS=4.4). n
i=1
(yi − f (xi ))2 1 n
PC-Type Parameters setting R2 = 1 − n , with ȳ = i=1 yi (7)
i=1
(yi − ȳ )2 n
Dell Inspiron 3443 laptop computer, Windows Sounds(Volume 35, left
Microsoft Windows 10 operating and right channel ratio 50), where xi represents QoS values and yi represents the measured
system, 4GB memory, Windows Media Player(Volume MOS values. R2 value close to one means perfect fitting. The good-
Realtek@Inter High Definition 60, left and right channel ratio
ness of fit and the parameters of the fitting functions for different
Audio Controller 50), Recording(Volume 60, left
and right channel ratio 50) codecs and different network environments are listed in each fig-
Dell Latitude 7380 laptop computer, Windows Sounds(Volume 60, left ure.
Microsoft Windows 10 operating and right channel ratio 50)
system, 8GB memory, Windows Media Player(Volume 4.1. Voice quality versus QoS parameter
Realtek@Inter High Definition 50, left and right channel ratio
Audio Controller, 50)Recording(Volume 80, left and
right channel ratio 50) 4.1.1. Voice quality vs. packet loss
Lenovo 90DSCTO1WW desktop PC, Windows Sounds(Volume 80,left The packet loss was tuned from 0% to 20% in the experiments.
Microsoft Windows 7 operating and right channel ratio 50) Fig. 8 shows the measurement outcomes for G.711 and G.729A.
system(Service Pack 1), 16GB Windows Media Player(Volume From the plot of MOS against packet loss rate, there is a distinct
memory, Realtek ALC662 @Inter 80,left and right channel ratio
High Definition Audio Controller, 50)Recording(Volume 100, left
exponential curve for both G.711 and G.729A according to the IQX
Danyin/DT2112 Headset and right channel ratio 50) hypothesis ( f (x ) = α  e−β x + γ ) from [25,26], the parameters of
the model α , β , and γ are also labeled in Fig. 8. For G.711, MOS
equals 4.4 under zero packet loss, 4 under 2.0% loss, and 3 under
system. When different PCs serve as VoIP terminals it is the same 11% loss. In the case of G.729A, we achieve MOS 3.86 under zero
to say we use different microphones in the experiments. The ex- packet loss, 3 under 4.0% loss, and 2 under 14% loss. It is noticed
perimental results show that different terminals have different in- that G.711 has a higher quality than G.729A under the same net-
fluences on voice quality, and moreover, to acquire a desired voice work impairment.
quality, the sound related parameter settings must change for dif-
ferent devices. The most crucial parameters to the voice quality 4.1.2. Voice quality vs. delay
are: the master volume of Windows Sound, the left and right chan- Depending on whether delay is considered, the voice quality
nel ratio of Windows Sound, the volume and the left and right is divided into the listening quality MOS (MOSL ) and the conver-
channel ratio of Media Player, the volume and the left and right sational quality MOS (MOSC ) [29]. PESQ reflects no effect of end-
channel ratio channel of Recording function in Windows Operating to-end delay on voice quality because the psychoacoustic model
System. Table 4 displays the parameter settings for a few devices of PESQ contains a time-delay identification technique. Thus, PESQ
that all yield desirable voice quality (MOS value is 4.4). can only measure the listening-only voice quality. According to the
basic computing formula of the E-Model (Eq. (1)) and default pa-
4. Results and discussion rameter table, the reduction in the MOS caused by delay can be
calculated (refer to Fig. 9), and the influence of delay on the MOS
The influence of different network impairments can be mea- can be deduced (refer to Fig. 10). As shown in Figs. 9 and 10, the
sured by the metrics previously mentioned in Section 3.2.1. Fol- delay affects voice quality slightly as it is below 170 ms, but sig-
lowing the recommendation of [56,57], we apply the 20 ms nificantly as soon as it reaches 170 ms. For G.711, theMOSC is 4.4
packetization interval as the default packet size of VoIP in our ex- without any delay, 4 for 245 ms delay, and 3 for 440 ms delay.
periments. According to Figs. 8, 10, 11, and 13, different nonlinear In addition, we determine that the IQX hypothesis cannot quantify
relationships exist between voice quality and different network im- the relationships between QoE and delay impairment. The relation-
pairments. A polynomial or exponential mapping from the factors ships between the end-to-end delay and QoE for G.711 and G.729A
(packet loss, delay, jitter, and bandwidth) to MOS is established for codec are shown in Fig. 10 (curve labeled 5th order polynomial
VoIP applications. In this paper, the determination coefficient R2 function).

Please cite this article as: Z. Hu, H. Yan and T. Yan et al., Evaluating QoE in VoIP networks with QoS mapping and machine learning
algorithms, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.12.072
JID: NEUCOM
ARTICLE IN PRESS [m5G;January 3, 2020;16:18]

Z. Hu, H. Yan and T. Yan et al. / Neurocomputing xxx (xxxx) xxx 7

ter). Fig. 12.b shows a histogram of the delay distribution, which


enables the proportion of packets that have a delay higher than
a certain limit (e.g., in our case, 32.5% of packets have latencies
that exceed 200 ms) to be determined. We also compute the de-
lay difference of consecutive packets, the histogram is represented
in Fig. 12.c, which reveals that 34.67% of the delay difference ex-
ceeds 10 ms and 11.55% exceeds 20 ms. This finding indicates that
34.67% of the packets may be lost and out of order when the jitter
is 10 ms and the jitter buffer is 20 ms, which prevents the packet
from being properly played.

4.1.4. Voice quality vs. bandwidth


The influence of bandwidth on QoE and the resulting polyno-
mial fitting function are plotted in Fig. 13. The bandwidth changes
in the step of 10kbps. The experimental results demonstrate that
as long as the bandwidth satisfies the minimum requirement for
voice transmission, the voice quality remains at a high level and
unchanged, packet losses occur and packet delay increases when
the available bandwidth is below the threshold of bandwidth re-
quirements. For G.711, the MOS is 4.4 when the bandwidth is
Fig. 9. MOS reduction vs. delay. greater than 94.3kpbs, 4 for 80kpbs, and 3 for 57kpbs. Note that
when the bandwidth falls below the 80% threshold of bandwidth
requirements, the voice quality sharply decreases and the VoIP ap-
plication cannot be easily established. In our research, the band-
width is set beyond 94.3kpbs for G.711 codec, which ensure that
we did not have to weigh the bandwidth as a factor in our VoIP
estimation model.

4.2. Voice quality versus echo and noise impairment

In this paper, we also analyzed the impact of echo and noise


impairments on VoIP QoE.

4.2.1. Voice quality vs. echo impairment


The quality of voice communication can be degraded mostly by
the echo occurring in different situations. When end-to-end de-
lay is shorter than 10 ms, we are not able to perceive the echo
of voice, as a longer one can be terrible. The impairment arising
from listener echo or talker echo can be calculated by the formu-
las from E-Model, where impairments from listen echo by param-
eter WEPL(Weighted Echo Path Loss, default value is 110 dB, permit-
ted range from 5 dB to 110 dB) and those from talker echo are
Fig. 10. MOS vs. delay.
estimated by parameter TELR(Talker Echo Loudness Rating, default
value is 65 dB, permitted range from 5 dB to 65 dB), but we found
that when TELR < 35dB and WEPL < 10dB, the MOS value drops
4.1.3. Voice quality vs. jitter
rapidly. We computer the MOS degradations due to echoes, using
As a consequence of jitter in a stream of packets, packets may
the E-Model curves are shown in Figs. 14 and 15.
be reordered or discarded if the playback time has expired (as
shown in Fig. 2). To limit the effect of jitter, the de-jittering buffer
is employed in all VoIP applications, however, it can cause sup- 4.2.2. Voice quality vs. noise impairment
plementary playback delay and additional de-jittering loss. Some In this paper, we take into account the effect of different noise
researches directly map the effect of jitter into packet loss [58,59]. sources on voice quality, such as electric circuit noise, room noise
Nonetheless, the curve characteristics of jitter are completely dif- at both the sender side and receiver side. In E-Model, the factor
ferent from those of the packet loss (as shown in Fig. 8 and Fig. 11), Nc indicates impairment due to electric circuit noise(default value
and no studies can explicate their incidence relation. In this paper, is −70dBm0p, permitted range from −80dBm0p to −40dBm0p),
we are prone to consider jitter as an important factor similar to the factor Ps represents impairments due to room noise at the
packet loss. In the next experiments, the standard deviation of de- sender (default value is 35dB(A), permitted range from 35dB(A) to
lay ranges from 0 ms to 20 ms with the different jitter buffers. −85dB(A)), and Prrepresents impairments from room noise at re-
Fig. 11 shows the relationships between jitter and QoE for three ceiver (default value is 35 dB(A), permitted range from 35dB(A) to
jitter buffers (curve labeled 5th order polynomial function). −85dB(A)). Figs. 16 and 17 illustrate the degradation of MOS due to
We discover that jitter affects QoE beyond our expectations. electric circuit noise, room noise at the sender and that at receiver.
As an explanation, the analysis is performed over the jitter dis-
tributions. According to the artificial delay algorithm of NIST Net, 4.3. Association test of network impairment
the delay variation ranges from −4.0 standard deviation to +4.0
standard deviation. Fig. 12.a shows the delay distribution for an An association test aims to find a dependence relationship be-
average delay of 200 ms and 20 ms standard deviation (of jit- tween the network condition and QoE. In this work, we employ

Please cite this article as: Z. Hu, H. Yan and T. Yan et al., Evaluating QoE in VoIP networks with QoS mapping and machine learning
algorithms, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.12.072
JID: NEUCOM
ARTICLE IN PRESS [m5G;January 3, 2020;16:18]

8 Z. Hu, H. Yan and T. Yan et al. / Neurocomputing xxx (xxxx) xxx

Fig. 11. MOS vs. jitter.

Fig. 12. Characteristic of jitter distribution.

the maximal information coefficient (MIC) [11] and distance corre- ing both the linear relationship and nonlinear relationship among
lation algorithm [12] to identify and classify relationships among variables [11,12].
variables. Comparisons with other state-of-the-art association test The MIC does not provide the given distributional assumptions
methods (e.g., Fisher z-test [60] and specialized test [61]), the MIC about the measured data. The experiments of [11] showed that the
and distance correlation algorithm are more suitable for identify- MIC has generality and equitability, which has no bias towards any

Please cite this article as: Z. Hu, H. Yan and T. Yan et al., Evaluating QoE in VoIP networks with QoS mapping and machine learning
algorithms, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.12.072
JID: NEUCOM
ARTICLE IN PRESS [m5G;January 3, 2020;16:18]

Z. Hu, H. Yan and T. Yan et al. / Neurocomputing xxx (xxxx) xxx 9

Fig. 13. Bandwidth vs MOS. Fig. 16. Degradation in MOS due to circuit noise.

Fig. 14. Degradation in MOS due to listener echo.


Fig. 17. Degradation in MOS due to room noise.

association. The MIC captures a relationship, if exists, by drawing


a grid onto the scatter of the two variables that partition the data.
Then a maximal grid resolution is set up for the given sample size
of the two-variable data set, and compute the maximal mutual in-
formation point-wisely on each integer pairs (x, y) for each reso-
lution. These values of mutual information are normalized to the
range [0,1] for comparison, and the normalized scores form a char-
acteristic matrix M = (mx,y ), of the entries being the highest mu-
tual information value for all grid at specific resolution. mx, y is de-
fined as
M = max IG / log min{x, y} (8)
where IG denotes the mutual information of the probability distri-
bution generated on the rectangle frames of the grid G, Then MIC
is the maximal mx, y over all (x, y) pairs that satisfy xy < B, with B
a function of the sample size n, in practice, B = n0.6 .
In [12], Szekely inspected the nonlinear correlation between
two random vectors by considering the distance between two char-
acteristic functions, which provided an effective measurement for
Fig. 15. Degradation in MOS due to talker echo.
nonlinear correlation analysis. The distance correlation R is in the
interval [0,1], and it is zero only for independent random vectors.

Please cite this article as: Z. Hu, H. Yan and T. Yan et al., Evaluating QoE in VoIP networks with QoS mapping and machine learning
algorithms, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.12.072
JID: NEUCOM
ARTICLE IN PRESS [m5G;January 3, 2020;16:18]

10 Z. Hu, H. Yan and T. Yan et al. / Neurocomputing xxx (xxxx) xxx

iment. Generally, jitter may prompt packet loss, and packet loss in
transmission may shake the jitter distribution. These interactions
will yield a complex nonlinear relation between QoE and the pa-
rameters(the experiments in Section 4.4.1 provide evidence). We
claim that packet loss and jitter co-act to change the spatial distri-
bution of a voice packet, which impairs the final listening quality.
For the listening quality, packet loss and jitter have to be simul-
taneously considered, which for the conversation quality, they are
indispensable.

4.4.1. Listening voice quality measurement


The listening-only voice quality considers the distortion of a
voice signal, that is, the spatial-related impairment of voice pack-
ets (as shown in Fig. 2.a-c). PESQ algorithm can yield listening
voice quality MOSL by comparing the impaired speech to the orig-
inal; the detailed steps for this process are depicted in Figs. 20–22
Fig. 18. Association test by MIC algorithm. list a collection of diagrams that portray the relation between the
MOS and the packet loss and jitter in different jitter buffer con-
ditions, where the packet loss values range from 0% to 18%, the
jitter ranges from 0 ms to 18 ms, and the jitter buffer size ranges
from 10 ms to 40 ms. From the experimental results, we observe
a significant nonlinear relationship between MOSL and packet loss
impairment, jitter impairment.

4.4.2. Conversational voice quality measurement


As previously mentioned, the conversational voice quality MOSC
needs to consider not only the packet loss and jitter but also the
impact of delay, that is, both spatial impairments and temporal-
related impairments need to be considered. Referring to [29,30],
we utilize PESQ/E-Model system for MOSC estimation (as shown
in Fig. 23). In Fig. 23, the listening MOS is obtained by the PESQ
algorithm, the listening MOS score is transformed into the rating
factor R, then to the modified equipment impairment value Ie . The
conversational voice quality is obtained by taking together Ie and
Fig. 19. Association test by distance covariance algorithm.
Id , the effect of end-to-end delay.

4.4.2.1. Conversational voice quality versus packet loss, jitter, and de-
The correlation coefficient of distance is defined as lay. We directly derive the MOSC value from the packet loss, jitter,
 and delay based on the concept of Fig. 23. Here is the procedure.
√ v (X,Y
√)
2
v2 (X )v2 (Y ) > 0 Step 1. Attain the modified Ie from impairment of packet loss
R (X, Y ) =
2 v2 ( X ) v2 (Y ) (9)
0 v2 (X )v2 (Y ) = 0 and jitter
The first step is to calculate the corresponding PESQ score for
where v2 (X, X ) = v2 (X ), v2 (X, Y ) = v2 (Y ) is the distance covari- each combination of packet loss and jitter, then convert the PESQ
ance. Details of the distance correlation algorithm are provided in scores into MOS values.
[12]. Figs. 18 and 19 show the association test results for each net-
work impairment and its corresponding MOS value with respect 1. Count the PESQ against packet loss rate and jitter given the
to the MIC and distance correlation respectively. The packet loss, codec;
delay, jitter, and bandwidth have almost an equal important de- 2. Convert the PESQ scores to MOS values using Eq. (5);
pendence on QoE. As shown in Fig. 19, the smaller is the jitter 3. Map MOS to R. If 6.5 ≤ R ≤ 100, R can be calculated from
buffer, the greater is the association of jitter on QoE. Only consid- the MOS according to the following formula (10):
ering the effect of packet loss is not adequate when evaluating the 20 √ π
voice quality. R= (8 − 226(h + )) (10)
3 3
with h= 1
arctan 2 × (18566 − 6750MOS,
4.4. Measurement of listening voice quality and conversational voice  3
quality 15 −903522 + 1113960MOS − 202500MOS2 ) [19],

arctan( yx )forx ≥ 0
and a tan 2(x, y ) = using Eq. (6).
Classically, the listening quality measurements emphasize the π − arctan( −x y
) f orx < 0
impact of packet loss, while conversation measurements are con- 4. Calculate the modified Ie using R. By only considering the
cerned with the coaction of packet loss and delay [29]. However, impairment from the packet loss, jitter, and codec, we can
the analysis shows that the packet loss, jitter and delay are almost simply write Ie in terms of R
equivalently important in Section 4.3. Thus, disregarding jitter is
inappropriate. Measurements of single impairment (packet loss or Ie = R0 − R (11)
jitter) distinctly mismatch the real network scenario since packet With default value R0 is 93.2.
loss or jitter does not exclusively occur within a given time inter-
val. Research independently parametrizes jitter into the E-Model The resulting curves for Ie vs the packet loss and jitter in three
[62]. This approach is not well supported by any theory or exper- different buffer conditions are shown in Figs. 24 and 25.

Please cite this article as: Z. Hu, H. Yan and T. Yan et al., Evaluating QoE in VoIP networks with QoS mapping and machine learning
algorithms, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.12.072
JID: NEUCOM
ARTICLE IN PRESS [m5G;January 3, 2020;16:18]

Z. Hu, H. Yan and T. Yan et al. / Neurocomputing xxx (xxxx) xxx 11

Fig. 20. Measurement of listening voice quality using the PESQ.

Fig. 21. MOSL versus packet loss and jitter for G.711.

Fig. 22. MOSL versus packet loss and jitter for G.729A.

Fig. 23. Measurement of conversational voice quality using a combined PESQ and E-Model.

Step2. Obtain Id from end-to-end delay


When evaluating the delay impairment on voice quality, the Id = − 2.468 · 10−14 d6 + 5.062 · 10−11 d5 − 3.903 · 10−8 d4
delay impairment factor Id is usually introduced based on the E-
(14)
+ 1.344 · 10−5 d3 − 0.001802d2 + 0.103d − 0.1698[29]
Model. Id is calculated applying the ITU G.107 equations, i.e., the
simplified model (Eqs. (12) and (13)) [63], and the 6th-order poly- The curve of the polynomial fitting is shown in Fig. 26 (6th-
nomial function (Eqs. (14)) [29]. order polynomial function is applied here).
63] Step3. Obtain conversational voice quality from Id and Ie
Id = 0.024 • d + 0.11(d − 177.3 )H (d − 177.3 )[ (12) When we have known Ie and Id , the rating factor R of the E-
Here d is the one-way delay(in milliseconds) and H(x) is the Model is clear:
Heavyside function [63]: R = R0 − Id − Ie (15)
H (x ) = 0 i f x < 0, else Thus, if we convert R to the MOS using Eq. (6), then the re-
(13)
H (x ) = 1 f or x >= 0 lationships among the conversational voice quality and the three

Please cite this article as: Z. Hu, H. Yan and T. Yan et al., Evaluating QoE in VoIP networks with QoS mapping and machine learning
algorithms, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.12.072
JID: NEUCOM
ARTICLE IN PRESS [m5G;January 3, 2020;16:18]

12 Z. Hu, H. Yan and T. Yan et al. / Neurocomputing xxx (xxxx) xxx

Fig. 24. Modified Ie (for G.711 codec).

Fig. 25. Modified Ie (for G.729A codec).

The E-Model provides formulas that allow us to calculate the


impairment from Nc, Ps, and Pr. The Partial plot of the degradation
in RO as different noise increases are shown in Fig. 29.
Step2, Obtain the delay impairment factor Id , which represents the
impairment due to talker echo (Idte ), listen echo(Idle ) and absolute de-
lay from sender to receiver (Idd ).
In E-Model, the Id factor can been calculated by Id = Idte +
Idle + Idd , the factor Idte estimates the impairments due to talker
echo (TELR), the factor Idle rreflects the impairments due to lis-
tener echo (WEPL), and Idd the impairment caused by one-way or
end-to-end delay. So we say that Id captures the degradation in
voice quality due to delay, talker echo and listener echo. Fig. 30
shows Id and its relationship with delay, talker echo and listener
echo.
Step3, Calculate the modified Ie from R, where the impairments
from packet loss, jitter and codec have been incorporated.
The calculation steps and methods for the modified Ie are the
same as the step.1 of Section 4.4.2.1.
Step4. Using the E-Model to fuse QoS impairments, noise and echo
Fig. 26. Id vs end-to-end delay. impairments.
The E-Model combines different impairments into a single rat-
ing R based on the principle that the perceived effects of im-
impairments can be obtained. The MOSC versus the packet loss, jit-
pairments are additive. We calculate the overall rating using
ter, and end-to-end delay for G.711 and G.729A codec are shown in
Eqs. (1) and (15), then translate R to MOS referring to Eq. (6). The
Fig. 27 and Fig. 28, respectively. The relationships are nonlinear.
MOS is a reflection of both impairments, as shown in Fig. 31.

4.4.2.2. Conversational voice quality versus packet loss, jitter, de-


lay, noise and echo. We further derive MOSC value directly from
packet loss, jitter, delay, noise, and echo based on the result of 4.5. Quality thresholds
Sections 4.2 and 4.4.1. The procedure is detailed down below.
Step1, Compute basic signal-to-noise ratio RO with the consider- Using the experiment data, we deduce the QoS thresholds that
ation on the effects of electric circuit noise (Nc), room noise at the correspond to the specific QoE levels. Table 5 illustrates several
sender side (Ps) and at the receiver side (Pr). thresholds in which the QoS parameters must be restricted to at-

Please cite this article as: Z. Hu, H. Yan and T. Yan et al., Evaluating QoE in VoIP networks with QoS mapping and machine learning
algorithms, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.12.072
JID: NEUCOM
ARTICLE IN PRESS [m5G;January 3, 2020;16:18]

Z. Hu, H. Yan and T. Yan et al. / Neurocomputing xxx (xxxx) xxx 13

Fig. 27. MOSc versus packet loss, delay, jitter for G.711.

Table 5
Codec comparison from the point of view of quality thresholds.

Codec Excellent-to-Good quality threshold Good-to-Fair quality threshold

Listening voice quality conversational voice quality (when Listening voice quality conversational voice
delay =150 ms) (when delay quality
=150 ms)

Loss[%] Jitter[ms] Loss[%] Jitter[ms] Loss[%] Jitter[ms] Loss[%] Jitter[ms]

G.711 buffer=10ms <2 <2 < 1.5 < 1.5 < 11 < 8 < 7 < 7
buffer=20ms <2 <8 < 1.5 <6 < 11 < 11 < 7 < 11
buffer=40ms <2 < 12 < 1.5 < 10 < 11 < 18 < 7 < 17
G.729A buffer=10ms < 4 < 4 < 3 < 3
buffer=20ms < 4 < 10 < 3 < 8
buffer=40ms < 4 < 15 < 3 < 14

tain a required quality level. For the conversational voice quality, the packet loss rate is less than 11% or the average jitter does not
the range of parameters for a given condition (reference [10,64] exceed 18 ms, et.al. Outside these bounds, the quality will be poor.
conclusion, here we set the delay value to 150 ms). For example, Note that G.729A cannot achieve “excellent” quality even in a per-
when the jitter buffer is 40 ms, G.711 provides a satisfactory lis- fect network environment. Therefore, the excellent-to-good quality
tening voice quality since the packet loss rate is less than 2% or threshold is not present.
the average jitter does not exceed 12 ms, the listening voice qual- Based on the experiment data of Section 4.2, we infer the noise
ity will attain a fair quality if the packet loss rates range between and echo thresholds that correspond to the specific QoE levels.
2% and 11% or the jitter ranges between 12 ms and 18 ms. When Table 6. illustrates several thresholds that noise and echo parame-
the jitter buffer is 40 ms and the delay is 150 ms, the conversa- ters must be kept beyond or below in order to attain the “Good”
tional voice quality of G.711 can provide excellent quality because or “Fair” quality level.

Please cite this article as: Z. Hu, H. Yan and T. Yan et al., Evaluating QoE in VoIP networks with QoS mapping and machine learning
algorithms, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.12.072
JID: NEUCOM
ARTICLE IN PRESS [m5G;January 3, 2020;16:18]

14 Z. Hu, H. Yan and T. Yan et al. / Neurocomputing xxx (xxxx) xxx

Fig. 28. MOSc versus packet loss, delay, jitter for G.729A.

Fig. 29. Degradation in RO due to electric circuit noise and room noise.

5. Machine learning-based models for VoIP QoE prediction multi-parameter and include several discrete variables, nonlinear
regression models are not available. In this paper, we use machine
If only one or two QoS metrics are used to predict the voice learning algorithms as a means of building mapping models, the
quality, the nonlinear regression method is a suitable choice, as reason is that, machine learning methods reduce the model com-
demonstrated in the literature [26,29]. However, the predictors are plexity, enhance the accuracy, and can be easily expanded. In the

Please cite this article as: Z. Hu, H. Yan and T. Yan et al., Evaluating QoE in VoIP networks with QoS mapping and machine learning
algorithms, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.12.072
JID: NEUCOM
ARTICLE IN PRESS [m5G;January 3, 2020;16:18]

Z. Hu, H. Yan and T. Yan et al. / Neurocomputing xxx (xxxx) xxx 15

Fig. 30. Id vs. delay impairment, talker and listener echo impairment.

Fig. 31. Combined MOS (including both loss, jitter, delay, noise and echo impairment).

Please cite this article as: Z. Hu, H. Yan and T. Yan et al., Evaluating QoE in VoIP networks with QoS mapping and machine learning
algorithms, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.12.072
JID: NEUCOM
ARTICLE IN PRESS [m5G;January 3, 2020;16:18]

16 Z. Hu, H. Yan and T. Yan et al. / Neurocomputing xxx (xxxx) xxx

Table 6
Codec comparison from the point of view of quality thresholds (noise and echo impairment).

Codec Excellent-to-Good quality threshold Good-to-Fair quality threshold

Noise Echo(dB) noise Echo(dB)

Nc (dBm0p) Ps (dB(A)) Pr (dB(A)) TELR(dB) WEPL(dB) Nc (dBm0p) Ps (dB(A)) Prm (dB(A)) TELR(dB) WEPL(dB)
delay(ms) delay(ms) delay(ms) delay(ms)

50 150 300 50 150 300 50 150 300 50 150 300

G.711 <−53 <55 <60 >44 >54 >22 >33 <−38 <67 <70 >33 >44 >55 >13 >20 >36
G.729A <−50 <58 <63 >41 >51 >20 >29

Fig. 32. System structure for voice quality prediction based on machine learning regression model.

practical point of view, machine learning has great advantage in rics, such as delay, jitter, packet loss, jitter buffer, and coding type,
manipulating large-scale information such that model by machine and then outputs the MOS value. The system structure of pre-
learning can be transferred from simulation research to actual VoIP dicting MOS value from QoS impairment, noise and echo impair-
system without difficulties. In addition, in recent years, the theory ment using the machine learning algorithms is shown in Fig. 32.b.
of machine learning has been continuously developed, and new Fig. 32.b is a revision of Scheme I of Fig. 32.a. The predicted
methods are emerging. With the emergence of new theories and MOSC is obtained by the listed machine learning algorithms, and
new methods of machine learning, the modelling of QoE and QoS, the performance of our system is measured by the absolute er-
which has plenty of rooms for its development in accuracy and ef- ror between the predicted MOS score and the measured MOS
ficiency. score.
We employ several classical algorithms (K-nearest neighbors
(KNN), regression tree, ANN, bagging, and SVM) in our work. Al- 5.1. Parameter setting
though other machine learning algorithms are available, the hu-
man ear’s recognition and MOS values are coarse-grained values Via experiments, we obtained 19 data sets. For each dataset, we
and do not require high precision. If the time efficiency and eval- randomly divide it into a training set and a test set at ratio 7:3.
uation accuracy are sufficient, we believe that the classical ma- For each division, we run every method 10 times to study the av-
chine learning algorithms are sufficient to satisfy the requirements erage performance. We evaluate the methods by the MAE (Mean
of voice quality prediction. Fig. 32.a presents an overview of our Absolute Error) and RMSE (Root Mean Square Error). Tables 7 and
system, in which the predictor inputs the network and codec met- 8 list the mean, median, and standard deviation in terms of these

Please cite this article as: Z. Hu, H. Yan and T. Yan et al., Evaluating QoE in VoIP networks with QoS mapping and machine learning
algorithms, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.12.072
JID: NEUCOM
ARTICLE IN PRESS [m5G;January 3, 2020;16:18]

Z. Hu, H. Yan and T. Yan et al. / Neurocomputing xxx (xxxx) xxx 17

Fig. 33. Listening Voice Quality Prediction.


Table 7
Performance evaluation of prediction of MOSL .

Algorithm/metric\Item KNN Regression tree ANN Bagging SVM

MAE RMSE MAE RMSE MAE RMSE MAE RMSE MAE RMSE

MOSL of G.711 (jitter buffer 10 ms)r 0.0603 0.0171 0.1447 0.0408 0.0969 0.0241 0.1239 0.0362 0.0628 0.0147
MOSL of G.711 (jitter buffer 20 ms)r 0.0802 0.0223 0.1952 0.0503 0.1067 0.0259 0.1653 0.0450 0.0651 0.0156
MOSL of G.711 (jitter buffer 40 ms) 0.0968 0.0249 0.1929 0.0458 0.1076 0.0251 0.1567 0.0392 0.0706 0.0163
MOSL of G.711 (jitter buffer 10+20+40 ms) 0.0831 0.0140 0.1565 0.0241 0.0864 0.0124 0.1830 0.0267 0.0625 0.0083
MOSL of G.729A (jitter buffer 10 ms) 0.0523 0.0165 0.1104 0.0324 0.0935 0.0229 0.1245 0.0380 0.0659 0.0144
MOSL of G.729A (jitter buffer 20 ms) 0.0858 0.0234 0.1733 0.0433 0.1124 0.0274 0.1565 0.0435 0.0605 0.0134
MOSL of G.729A (jitter buffer 40 ms) 0.1035 0.0285 0.2032 0.0512 0.1129 0.0288 0.1694 0.0435 0.0756 0.0183
MOSL of G.729A (jitter buffer 10+20+40 ms) 0.0784 0.0137 0.1596 0.0264 0.0856 0.0125 0.1895 0.0283 0.0617 0.0083
MOSL of G.711+ G.729A (jitter buffer 10+20+40 ms) 0.2240 0.0195 0.1677 0.0193 0.0836 0.0086 0.1338 0.0150 0.0638 0.0060

three values. In the KNN method, we heuristically determine the parameters were picked as such: the radial basis function (RBF)
optimal number K of nearest neighbors by using cross-validation. kernel was the default the trade-off parameter C had a value of
The results show that the distance metric for the variables is the 28 ; and the parameter gamma g was obtained from {20 , 20.5 , 21 ,
Cityblock distance, and K = 3. The KNN method achieves the best 21.5 , 22 , 22.5 , 23 }. We train a monolayer Neural Network model.
performance under the measurements of MAE and median abso- The size of the hidden layer and the output layer is 10 and 3 re-
lute error. In the process of bagging, the T samples are randomly spectively. Activation function of the hidden layer and the output
sampled from the training data with replacement, base classifier is layer is the tangent function and the linear function respectively.
constructed using one of the T samples and combined by a mean The measure function is MSE. We use the BP(Back-Propagation)
strategy. The base classifiers are implemented by a regression func- elastic algorithm to train the model. Regression tree is a deci-
tion without pruning in MATLAB, and T is set to 100. The LIBSVM sion tree with binary splits for regression and the split criterion is
library version 3.14 was selected for the SVM, whose the hyper- MSE.

Please cite this article as: Z. Hu, H. Yan and T. Yan et al., Evaluating QoE in VoIP networks with QoS mapping and machine learning
algorithms, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.12.072
JID: NEUCOM
ARTICLE IN PRESS [m5G;January 3, 2020;16:18]

18 Z. Hu, H. Yan and T. Yan et al. / Neurocomputing xxx (xxxx) xxx

Fig. 34. Conversation Voice Quality Prediction.

5.2. Evaluation racy: the majority of the MAE is less than 0.1 and the median abso-
lute error is less than 0.23 in 19 datasets (Fig. 33.a–i; Fig. 34.a–k;).
We evaluate the performance of these algorithms for different In all datasets of the listening test (Fig. 33.a–i), the SVM is the
conditions of parameters. The results show that the machine learn- finest with respect to the median absolute error (MAE, 0.06539
ing regression algorithms can predict the VoIP QoE with fair accu- and 0.06593) and the RMSE (0.01281) followed by the KNN, ANN,

Please cite this article as: Z. Hu, H. Yan and T. Yan et al., Evaluating QoE in VoIP networks with QoS mapping and machine learning
algorithms, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.12.072
JID: NEUCOM
ARTICLE IN PRESS [m5G;January 3, 2020;16:18]

Z. Hu, H. Yan and T. Yan et al. / Neurocomputing xxx (xxxx) xxx 19

Table 8
Performance evaluation of prediction of MOSL .

Algorithm/metric\Item KNN Regression tree ANN Bagging SVM

MAE RMSE MAE RMSE MAE RMSE MAE RMSE MAE RMSE

MOSC of G.711 (jitter buffer 10 ms) 0.0666 0.0038 0.0499 0.0022 0.0737 0.0029 0.0860 0.0037 0.0417 0.0014
MOSC of G.711 (jitter buffer 20 ms) 0.0802 0.0040 0.0594 0.0024 0.0748 0.0029 0.0946 0.0038 0.0440 0.0015
MOSC of G.711 (jitter buffer 40 ms) 0.1025 0.0050 0.0649 0.0026 0.0747 0.0028 0.1019 0.0039 0.0463 0.0016
MOSC of G.711 (jitter buffer 10+20+40 ms) 0.0820 0.0024 0.0568 0.0013 0.0747 0.0017 0.0382 0.0010 0.0440 0.0009
MOSC of G.729A (jitter buffer 10 ms) 0.0494 0.0032 0.0346 0.0017 0.0737 0.0029 0.0643 0.0031 0.0446 0.0015
MOSC of G.729A (jitter buffer 20 ms) 0.0672 0.0034 0.0416 0.0018 0.0746 0.0029 0.0757 0.0032 0.0420 0.0014
MOSC of G.729A (jitter buffer 40 ms) 0.0866 0.0044 0.0490 0.0020 0.0770 0.0029 0.0867 0.0034 0.0475 0.0016
MOSC of G.729A (jitter buffer 10+20+40 ms) 0.0678 0.0021 0.0413 0.0011 0.0747 0.0017 0.0317 0.0008 0.0446 0.0009
MOSC of G.711+ G.729A (jitter buffer 0.1836 0.0027 0.0498 0.0009 0.0765 0.0012 0.0588 0.0010 0.0439 0.0006
10+20+40 ms)
MOSC of G.711+ G.729A (packet loss, delay, 0.2225 0.0011 0.03534 0.00017 0.09502 0.00047 0.03466 0.00017
jitter, jitter buffer, noise, echo)

bagging and regression tree (Table 7). The ascending order of these Declaration of Competing Interest
algorithms with respect to the standard deviation of the absolute
error is SVM, ANN, KNN, regression tree and bagging. In mostly The authors declare that they have no known competing finan-
datasets of the conversation test (Fig. 34.a–i), the SVM also outper- cial interests or personal relationships that could have appeared to
forms the other algorithms with respect to the MAE and median influence the work reported in this paper.
absolute error (0.04429 and 0.0443 in that order) followed by the
regression tree, bagging, ANN and KNN (as illustrated in Table 8).
Acknowledgments
In dataset of conversation test that combined QoS impairments,
noise and echo impairments (Fig. 34.k), we find that the traditional
This work is supported by the National Natural Science Founda-
training algorithms for SVMs, such as chunking and SMO cannot
tion of China (Grant nos. 61872226, 61702315), the Natural Science
run properly because large training sets. In this case, Bagging out-
Foundation of Shanxi Province, China (Grant nos. 201701D121052,
performs the other algorithms(as illustrated in last line of Table 8).
201901D211169), the Key R&D Program (International Science and
When the training set is bigger, our algorithm predicts more
Technology Cooperation Project) of Shanxi Province, China (Grant
accurate. For all the datasets, the prediction quality of the SVM,
no. 201903D421003) and the 1331 Engineering Project of Shanxi
ANN, regression tree and bagging for conversation quality is higher
Province, China.
than that for listening quality. We also determined that if train-
ing dataset includes discrete input parameters, the accuracy of the
KNN rapidly decreases (as shown in Fig. 33.i, Fig. 34.i). References
Although the training phases of machine learning regression al-
[1] J. Barakovič Husič, S. Baraković, S. Muminović, Is there any impact of human
gorithms have relatively high computational complexity, the com- influence factors on quality of experience? in: Proceedings of 40th Interna-
putational complexity of the prediction phase is negligible. For ex- tional Convention on Information and Communication Technology, Electron-
ample, in Thinkpad T460 portable laptop computer and Matlab ics and Microelectronics (MIPRO), 2017, pp. 434–439, doi:10.23919/MIPRO.2017.
7973464.
2015b programming environment, the time cost of predicting a
[2] D. Tsolkas, E. Liotou, N. Passas, L. Merakos, A survey on parametric QoE es-
single item sample of KNN, regression tree, ANN, bagging, and SVM timation for popular services, J. Netw. Comput. Appl. 77 (2017) 1–17, doi:10.
is 2.07 ms, 16.66 ms, 62.80 ms, 69.81 ms, and 1.01 ms, respectively. 1016/j.jnca.2016.10.016.
[3] ITU-T Recommendation P.10/G.100, Vocabulary and effects of transmission pa-
Thus, they can work online.
rameters on customer opinion of transmission quality, amendment 1, June,
2019. Available: https://www.itu.int/rec/T- REC- P.10- 201906- I!Amd1.
6. Conclusions and prospects [4] ITU Recommendation P.800, Methods for subjective determination of transmis-
sion quality, August 1996. Available: https://www.itu.int/rec/T- REC- P.800/en.
[5] ITU Recommendation Y.1541, Network performance objectives for IP-
We analyzed the impacts of QoS parameters on QoE by experi- based services, December 2011. Available: https://www.itu.int/rec/T- REC- Y.
ments and an association test method. Both the theoretical analysis 1541-201112-I/en.
and the experiments show a strong nonlinear relationship between [6] Y.J. Chen, K.S. Wu, Q. Zhang, From QoS to QoE: a tutorial on video quality as-
sessment, IEEE Commun. Surv. Tutor. 17 (2015), 1126–1165, doi:10.1109/COMST.
the jitter and QoE. Then in light of the spatial and temporal char- 2014.2363139.
acteristics of the network impairments, the new methodology was [7] M. Alreshoodi, J. Woods, Survey on QoE\QoS correlation models for multime-
developed to measure the voice quality, without the need of time- dia services, Int. J. Distrib. Parallel Syst. 4 (2013) 53–72, doi:10.5121/ijdps.2013.
430553.
consuming subjective tests. Furthermore, we applied several classi- [8] D. Ghadiyaram, J. Pan, A...C. Bovik, A subjective and objective study of stalling
cal regression algorithms to predict the voice quality and compare events in mobile streaming videos, IEEE Trans. Circuits Syst. Video Technol. 29
the performances of these algorithms. The work enhances evalu- (2019) 183–197, doi:10.1109/TCSVT.2017.2768542.
[9] N. Rao, A. Maleki, F. Chen, W. Chen, C. Zhang, N. Kaur, A. Haque, Analysis of
ation of the VoIP QoE by using a network performance parame- the effect of QoS on video conferencing QoE, IWCMC (2019) (2019) 1267–1272,
ter, explains how IP network environments can impact the quality doi:10.1109/IWCMC.2019.8766591.
and reliability of VoIP traffic, and calculates typical network per- [10] P. Charonyktakis, M. Plakia, I. Tsamardinos, M. Papadopouli, On user-centric
modular QoE prediction for VoIP based on machine-learning algorithms, IEEE
formance requirements for a VoIP application to run at the desired
Trans. Mob. Comput. 15 (2016) 1443–1456, doi:10.1109/TMC.2015.2461216.
QoE level. The effects of noise impairment and echo impairment [11] D.N. Reshef, Y.A. Reshef, H.K. Finucane, S.R. Grossman, Detecting novel asso-
on voice quality are also analyzed. This research can also be ap- ciations in large data sets, Science 334 (2011) 1518–1524, doi:10.1126/science.
1205438.
plied to a VoIP system to ensure that playout buffer control and
[12] G.J. Szekely, M.L. Rizzo, N.K. Bakirov, Measuring and testing dependence
adaptive codec type can achieve the best possible end-to-end per- by correlation of distances, Ann. Stat. 35 (2007) 2769–2794, doi:10.1214/
ceived voice quality. In prospective, we will explore the impacts of 0 090536070 0 0 0 0 0505.
other parameters (e.g., burst packet loss and packetization interval, [13] ITU-T Recommendation P.800.1, Mean opinion score (MOS) terminology, 2016.
[14] F.D. Rango, M. Tropea, P. Fazio, S. Marano, Overview on VoIP: subjective and
et.al) and build an automatic VoIP quality monitoring system with objective measurement methods, Int. J. Comput. Sci. Netw. Secur. 6 (2006)
network measurement algorithms. 140–153.

Please cite this article as: Z. Hu, H. Yan and T. Yan et al., Evaluating QoE in VoIP networks with QoS mapping and machine learning
algorithms, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.12.072
JID: NEUCOM
ARTICLE IN PRESS [m5G;January 3, 2020;16:18]

20 Z. Hu, H. Yan and T. Yan et al. / Neurocomputing xxx (xxxx) xxx

[15] ITU-T Recommendation P.861, Objective quality measurement of telephone- [42] M. Carson, D. Santay, NIST NET - A Linux-based network emulation tool, ACM
band (30 0-340 0 Hz) speech codecs, February 1998. Available: https://www.itu. SIGCOMM Comput. Commun. Rev. 33 (2003), 111–126, doi:10.1145/956993.
int/rec/T- REC- P.861- 199802- W/en. 957007.
[16] S. Voran, Objective estimation of perceived speech quality. I. Development of [43] OpenPhone. https://www.VoIP-info.org/openphone.
the measuring normalizing block technique. IEEE Trans. Speech Audio Process. [44] B.P. Padhy, Adaptive latency compensator considering packet drop and packet
7 (1999) 383–390, doi:10.1109/89.771259. disorder for wide area damping control design, Int. J. Electr. Power Energy Syst.
[17] A.W. Rix, M.P. Hollier, The perceptual analysis measurement system for robust 106 (2019), 477–487, doi:10.1016/j.ijepes.2018.10.015.
end-to-end speech quality assessment, in: Proceedings of the IEEE Conference [45] Y. Cao, Bifurcations in an internet congestion control system with distributed
on Acoustics, Speech and Signal Processing, Vol. 3, 20 0 0, pp. 1515–1518, doi:10. delay, Appl. Math. Comput. 347 (2019) 54–63, doi:10.1016/j.amc.2018.10.093.
1109/ICASSP.20 0 0.861935. [46] K. Bidaj, J.B. Begueret, J. Deroo, Jitter definition, measurement, generation,
[18] ITU-T Recommendation P.862, Perceptual evaluation of speech quality (PESQ), analysis, and decomposition, Int. J. Circuit Theory Appl. 46 (2018), 2171–2188,
an objective method for end-to-end speech quality assessment of narrow- doi:10.1002/cta.2559.
band telephone networks and speech codecs, February 2001. Available: https: [47] X.Q. Li, K.L. Yeung, Bandwidth-efficient network monitoring algorithms based
//www.itu.int/rec/T- REC- P.862- 200102- I/en. on segment routing, Comput. Netw. 147 (2018) 236–245, doi:10.1016/j.comnet.
[19] ITU-T Recommendation G.107, The E-Model, a computational model for use in 2018.10.010.
transmission planning, June 2015. Available: https://www.itu.int/rec/T- REC- G. [48] ITU-T Recommendation G.114. One-way transmission time, May 2003. Avail-
107-201506-I/en. able: https://www.itu.int/rec/T- REC- G.114- 200305- I/en.
[20] P. Wuttidittachotti, T. Daengsi, Subjective MOS model and simplified E-Model [49] H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson, RTP: a transport proto-
enhancement for skype associated with packet loss effects: a case using col for real-time applications, RFC 1889, IETF. January 1996. Available: https:
conversation-like tests with Thai users, Multimed. Tools Appl. 76 (2017) //tools.ietf.org/html/rfc1889.
16163–16187, doi:10.1007/s11042- 016- 3901- 5. [50] C. Demichelis, P. Chimento, IP packet delay variation metric for IP performance
[21] P. Wuttidittachotti, P. Khaoduang, T. Daengsi, MOS estimation model develop- metrics, RFC 3393, IETF. November 2002. Available: https://tools.ietf.org/html/
ment using ACR listening-opinion tests with Thai users referring to loss ef- rfc3393.
fects: a case of G.726 and G.729, Multimed. Syst. 24 (2018) 285–295, doi:10. [51] P. Imputato, S. Avallone, An analysis of the impact of network device buffers on
10 07/s0 0530- 017- 0549- 6. packet schedulers through experiments and simulations, Simul. Modell. Pract.
[22] S. Jelassi, G. Rubino, A perception-oriented Markov model of loss incidents ob- Theory, 80 (2018), 1–18, doi:10.1016/j.simpat.2017.09.008.
served over VoIP networks, Comput. Commun. 128 (2018) 80–90, doi:10.1016/ [52] K. Hammad, A. Moubayed, A. Shami, S. Primak, Analytical approximation of
j.comcom.2018.06.009. packet delay jitter in simple queues, 5(2016), pp. 564–567, IEEE Wirel. Com-
[23] W. Jiang, H. Schulzrinne, Perceived quality of packet audio under bursty losses, mun. Lett. doi:10.1109/LWC.2016.2601609.
in: Proceedings of IEEE INFOCOM, 2002. [53] E. Kim, T. Kim, C. Lee, An adaptive buffering scheme for P2P live and time-
[24] H. Zlatokrilov, H. Levy, The effect of packet dispersion on voice applications in shifted streaming, Appl. Sci. 7 (2017) 1–14, doi:10.3390/app7020204.
IP networks, IEEE/ACM Trans. Netw. 14 (2006) 277–288, doi:10.1109/tnet.2006. [54] Z.Z. Qiao, R.K. Venkatasubramanian, L.F. Sun, E.C. Ifeachor, A new buffer algo-
872543. rithm for speech quality improvement in VoIP systems, Wirel. Pers. Commun.
[25] P. Reichl, B. Tuffin, R. Schatz, Logarithmic laws in service quality percep- 45 (2008) 189–207, doi:10.1007/s11277-007- 9408- 7.
tion: where microeconomics meets psychophysics and quality of experience, [55] Wreshark, https://www.wireshark.org/
Telecommun. Syst. 52 (2013), 587–600, doi:10.1007/s11235-011-9503-7. [56] H. Schulzrinne, RTP profile for audio and video conferences with minimal con-
[26] T. Hossfeld, D. Hock, P. Tran-Gia, K. Tutschku, M. Fiedler, Testing the iqx trol, RFC 3551, IETF, July 2003. Available: https://tools.ietf.org/html/rfc3551.
hypothesis for exponential interdependency between QoS and QoE of voice [57] http://www.linkedin.com/answers/technology/information-technology/
codecs iLBC and G.711, Proceedings of the ITC Specialist Seminar on Quality telecommunications/TCH_ITS_TCI/491659-28591248
of Experience, 2008. [58] M. Voznak, A. Kovac, M. Halas, Effective packet loss estimation on VoIP jitter
[27] L. Roychoudhuri, E. Al-Shaer, G.B. Brewster, On the impact of loss and delay buffer, in: Proceedings of the 2012 International Conference on Networking,
variation on internet packet audio transmission, Comput. Commun. 29 (2006) vol. 7291, 2012, pp. 157–162, doi:10.1007/978- 3- 642- 30039- 4_21.
1578–1589, doi:10.1016/j.comcom.20 06.04.0 04. [59] S. Tao, K. Xu, A. Estepa, Improving VoIP quality through path switching, in:
[28] A.P. Markopoulou, F.A. Tobagi, M.J. Karam, Assessing the quality of voice com- Proceedings of IEEE INFOCOM, 2005, pp. 2268–2278.
munications over internet backbones, IEEE/ACM Trans. Netw. 11 (2003), 747– [60] P. Spirtes, C.N. Glymour, R. Scheines, Causation, Prediction, and Search,
760, doi:10.1109/TNET.2003.818179. 81(20 0 0), Cambridge, MA, USA: MIT Press
[29] L. Sun, E. Ifeachor, Voice quality prediction models and their applications in [61] M. Ahdesmäki, H. Lähdesmäki, R. Pearson, H. Huttunen, O. Yli-Harja, Ro-
VoIP networks, IEEE Trans. Multimed. 8 (2006) 809–820, doi:10.1109/TMM. bust detection of periodic time series measured from biological systems, BMC
2006.876279. Bioinformatics, 6 (2005), 1-18, doi:10.1186/1471-2105-6-117.
[30] D. Aklilu, L. Vicky, F. Ernest, C. Bill, QoE estimation model for a secure real- [62] H.L. Zhang, Z.M. Gu, Z.Q. Tian, QoS evaluation based on extend E-Model in
time voice communication system in the cloud, ACSW (2019) 1–10, doi:10. VoIP, in: Proceedings of 13th International Conference on Advanced Communi-
1145/3290688.3290705. cation Technology, 2011.
[31] T. Daengsi, P. Wuttidittachotti, QoE modeling for voice over IP: simplified E- [63] R.G. Cole, J.H. Rosenbluth, Voice over IP performance monitoring„ ACM SIG-
Model enhancement utilizing the subjective mos prediction model: a case of COMM Comput. Commun. Rev. 31 (2001) 9–24, doi:10.1145/505666.505669.
G.729 and Thai users, J. Netw. Syst. Manag. (29) (2019) 837–859, doi:10.1007/ [64] ITU-T Recommendation G.113, Transmission impairments due to speech
s10922- 018- 09487- 4. processing, November 2007. Available: https://www.itu.int/rec/T- REC- G.
[32] M. Al-Akhras, H. Zedan, R. John, I. Almomani, Non-intrusive speech quality pre- 113-200711-I/en.
diction in VoIP networks using a neural network approach, Neurocomputing,
72 (2009) 2595–260, doi:10.1016/j.neucom.2008.10.019. Zhiguo Hu received Ph.D. degree in computer science
[33] A. Raja, R. Muhammad Atif Azad, C. Flanagan, D. Picovici, C. Ryan, Non- from TongJi University, China in 2012. He is now working
intrusive quality evaluation of VoIP using genetic programming, Inf. Comput. in the School of Computer and Information Technology,
Syst. 275 (2006), doi:10.1109/BIMNICS.2006.361795. Shanxi University. His research interests include network
[34] E.T. Affonso, R.D. Nunes, R.L. Rosa, G.F. Pivaro, D.Z. Rodriguez, Speech quality measurement, data mining and machine leaning.
assessment in wireless VoIP communication using deep belief network. IEEE
Access, 6 (2018) 77022–77032, doi:10.1109/ACCESS.2018.2871072.
[35] H.S. Chang, C.F. Hsu, T. Hossfeld, K.T. Chen, Active learning for crowdsourced
QoE modeling, IEEE Trans. Multimed. 20 (2018) 3337–3352, doi:10.1109/TMM.
2018.2831639.
[36] T. Mansouri, A. Nabavi, A.Z. Ravasan, H. Ahangarbahan, A practical model for
ensemble estimation of QoS and QoE in VoIP services via fuzzy inference
systems and fuzzy evidence theory, Telecommun. Syst. 16 (2016) 861–873,
doi:10.1007/s11235-015-0041-6. Hongren Yan received M.A degree from Arizona State
[37] Ben Letaifa, Asma, WBQoEMS: web browsing QoE monitoring system based on University in 2016. He is now a Ph.D student and research
prediction algorithms, Int. J. Commun. Syst. 32 (2019) 1–16, doi:10.1002/dac. assistant in the Institute of Big Data Science and Industry,
4007. Shanxi University. He specializes in Information geometry,
[38] https://www.VoIPmechanic.com/mos- mean- opinion- score.htm. machine learning theory, and complex analysis mining.
[39] K. Salah, On the deployment of VoIP in ethernet networks: methodology and
case study, Comput. Commun. 29 (2006) 1039–1054, doi:10.1016/j.comcom.
20 05.06.0 04.
[40] A.A. Laghari, H. He, M. Shafiq, A. Khan, Application of quality of experience in
networked services: review, trend & perspectives, Syst. Pract. Action Res. 32
(2019) 501–519, doi:10.1007/s11213- 018- 9471- x.
[41] A. Antoine, L. Emmanuel, K. Nicolas, Making trustable satellite experiments: an
application to a VoIP scenario, in: Proceedings of the IEEE Vehicular Technol-
ogy Conference, 2019, doi:10.1109/VTCSpring.2019.8746404.

Please cite this article as: Z. Hu, H. Yan and T. Yan et al., Evaluating QoE in VoIP networks with QoS mapping and machine learning
algorithms, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.12.072
JID: NEUCOM
ARTICLE IN PRESS [m5G;January 3, 2020;16:18]

Z. Hu, H. Yan and T. Yan et al. / Neurocomputing xxx (xxxx) xxx 21

Tao Yan was received the Ph.D. degree from Chengdu In- Guoqing Liu received the B.E degree from Shanxi Univer-
stitute of Computer Applications, Chinese Academy of Sci- sity in 2016. She is now studying in the School of com-
ence in 2017. He is now a lecturer at Shanxi University. puter and information technology, Shanxi University. Her
His research interests include image processing and evo- research interest is reinforcement learning.
lutionary computation.

Haijun Geng received the B.E, M.E., and Ph.D degrees


from Yantai University, Capital Normal University and Ts-
inghua University, in 2008, 2011, and 2015, respectively.
He is now working in the School of Software Engineering,
Shanxi University. His research interests include future In-
ternet architecture and large scale Internet routing.

Please cite this article as: Z. Hu, H. Yan and T. Yan et al., Evaluating QoE in VoIP networks with QoS mapping and machine learning
algorithms, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.12.072

You might also like