JUNQIANG GUO, FASHENG LIU, ZHIQIANG ZHU
College of Information and Electrical Engineering, Shandong University of Science and Technology, Qingdao, 266510, China
Guojunqiang2008@163.com
Abstract
This article analyses call duration probability density function and its parameters in GSM system. The sample data are obtained from an actual system, and the observed call duration is as long as 1800 seconds. KL Divergence method has been introduced to compare the goodness of candidate call duration probability density functions, and to estimate the parameters of three candidate probability density functions. Using these parameters, a day time hour performance of the GSM system is predicted. The comparison shows that lognormal distribution of call duration is more precise than exponential or Erlang distribution in the performance prediction in GSM system.
Keywords: entropy, KL Divergence, GSM, call duration, lognormal distribution
1. Introduction
Call duration in mobile telephony system is a very important parameter. Call duration probability density function and its parameters are often required in system design, simulation and analysis. In system management and network optimization, especially in charging policy, call duration probability density function and its parameters are key parameters to ensure the correctness of prediction and analysis. In mobile telephony system, call duration parameter plays a very important role in determining cell size, handover property, and channel allocation strategy. Charging strategy is dependent on call duration, too. Estimation of call duration probability density function and its parameters is often a not trivial problem. In most situations, we expect a simple, analytic and relatively precise function. Call duration in mobile telephony system traditionally is assumed exponential just as in the fixed telephony networks. However, many empirical approaches have been used to look into the probability distribution of call duration both in mobile and fixed telephony systems, and it is found that lognormal distribution fits empirical distribution better than exponential or Erlang distribution does[1][2][4][5]. In [3], it is said that the length of sentences fits the lognormal distribution. Maybe that is the reason that
call duration in GSM telephony system fits lognormal distribution better. In [2], the authors tested the following distributions:
lognormal, shift exponential, Erlangk and Erlangjk. They found that the lognormal distribution fitted the empirical distribution better. In their studies [1][2], the authors mainly used statistical goodnessoffit test method. The data samples they used are shorter than 6 minutes. In [4], the authors believe that it is critical to select sample of calls not being truncated, i.e., look at all calls that started in a given time period, not those that started and ended in a given time interval. The KulbackLeibler divergence is enunciated by Kullback and is called the principle of minimum cross entropy (MinxEnt), which aims to find a probability distribution that is as close as possible to another distribution [6]. Minimum crossentropy (MinxEnt) method has been applied to a wide range of problems such as signal processing, pattern recognition, economics, statistics, and communication system, etc. [79]. To estimate the call duration distribution parameters in GSM System based on KL divergence method is meaningful to this methodology in engineering. It shows the power of KulbackLeibler divergence method: simple and precise. This paper presents a field study of call duration distribution in GSM mobile telephony system. The KL Divergence is used as a measure to estimate three candidate distributions: exponential, Erlang, lognormal. The conclusions are: lognormal distribution has less KL Divergence with statistical probability distribution; Minimum KL Divergence method is simple and precise in estimating the call duration distribution parameters in GSM system.
2. Data obtaining
The data used in this paper is from a mobile switch centre in GSM system of Qingdao, China. The call start time and call duration are selected from signaling command. In one whole hour, all calls to and from the mobile switch centre are recorded. All calls starting within this hour were used to construct a data sample. The traffic distributions of the mobile switch centre are
1424413125/07/$25.00 © 2007 IEEE
_{2}_{9}_{8}_{8}
shown in Figure 1 and Figure 2. Figure 1 shows a day traffic distribution; Figure 2 shows two monthly averaged week day traffic distributions. Two data samples are constructed. Data sample 1 was recorded from 13:17:00 to 14:16:59 on Wednesday, Jan.24, 2007. Data sample 2 was obtained on Thursday, Jan. 25, Table 1. Call duration statistical values of data set 1 
The KL divergence is a natural distance measure from a probability distribution p(x) to a probability distribution q(x). One can show that it is always nonnegative and zero only if p(x)=q(x)[13]. The larger the divergence/difference between p and q, the higher the value of K; the more similar are p and q, the lower will be K; in a limiting case, if p=q, then 

2007, from 10:22:00 to 11:21:59. 
K=0. 

Some values of the original samples are eliminated. The values under 3 seconds are considered to be caused by noise or interference, and are not normal calls. The values over 
In our problem, the goal is to find an analytical function q(x) with proper parameters. It can be written as an optimization problem: 

1800 seconds are all recorded as 1800 seconds so are also eliminated in our analysis. 
Min K ( 
p 
q 
) 
= 
p ( x ) ln 
p 
( 
x 
) 
dx 

q 
( qx 
) 

Table 1 and 2 show the statistical values of the two data samples respectively. From the tables we can find that the 

q () x 
≥ 
0 

eliminated data have negligible effect to our analysis. So the call duration data samples used in this article are from 3S to 1799S. The mean time of data sample 1 is 84.44 seconds, and the standard deviation is 123.32 seconds, and the data sample 2 has a mean of 81.95 seconds, and standard deviation 115.86 seconds. The mean call duration is much shorter than the value in [10], which studied a fixed telephony system and showed 200300 seconds mean call duration. Perhaps this is one of the main differences of mobile and fixed telephone system. 
S T . .
p () x ≥ 0 q () x dx =
p () x dx = 1 1 (3) Calculating problem (3) by computer, continuous function p(x) and q(x) must be transformed into discrete forms, and continuous integral should be approximated by discrete summation. In practice, discrete approximation of the continuous integral is as equation (4). The numerical approximation is 

Number 
Percentage 
Percentage 
based on sample values of f(x). The range of _{x}_{∈}_{[}_{,}_{]} _{a} _{b} , is 

of calls 
of total calls 
Traffic 
of total traffics 
divided into n equal intervals. Let (x _{1} , x2, …, x _{n} ) be the representatives of these intervals, and {f(x _{1} ), f(x _{2} ), …, f(x _{n} )} 

Total 
87415 
2065.46 

be the values corresponding to the points {x _{1} , x _{2} , …, x _{n} }. 

0–2 S 
1093 
1.25 
0.305 
0.015 

3–1799S 
86237 
2022.65 
The continuous integral of f(x) is approximated by 

1800S 
85 
42.5 
2.06 
^{b} f 
x dx () 
= 
b − a 
n

f 
( x ) n = 
p 
(4) 

a Table 2. Call duration statistical values of data set 2 
^{n} 
i = 
1 
i i = 
1 
i 

Number 
Percentage 
Percentage 
Set 

of total 
Traffic 
of total 
b − 
a 
(5) 

of calls 
calls 
traffics 
p i = 
n 
fx ( i 
) 

Total 
104326 
2370.37 
The probability distribution p(x) is calculated by the 

0–2 S 
1256 
1.20 
0.32 
0.0135 
frequency distribution of the data sample. Let the sample 

3–1799S 
103020 
98.75 
2345.05 
98.932 
data be {x _{1} , x _{2} , …, x _{N} }, and x _{m}_{i}_{n} and x _{m}_{a}_{x} be the minimum 

1800 S 
50 
0.048 
25 
1.055 
and maximum values of x. The interval, [x _{m}_{i}_{n} , x _{m}_{a}_{x} ], is 

3. 
Calculating 
Based 
on 
KL 
Divergence 
partitioned into n intervals of equal length. Based on this partitioning, the frequency distribution of the sample data is 

Method 
obtained as {f _{1} , f _{2} , …, f _{N} }. The probability distribution p(x) 

^{f} N 
} 


. 

, 
, 
N 

According to E.T.Jaynes [11], for a continuous variable x, 
is given as In our 
problem, 
call 
duration 
is 
variable 
x, 

the entropy of a probability distribution function p(x) is defined as: 
and _{x} _{∈} _{[}_{3}_{,} _{1}_{7}_{9}_{9}_{]} , partitioning interval is 1 second. Sample 
H ( p ) = − k
p ( x)ln p ( x )dx
(1)
For distributions over a continuous variable x, the KL divergence (KullbackLeibler divergence) between two probability distribution functions p(x) and q(x) is defined by Kullback and Leibler [12] as:
K
(
p
q
)
=
p
(
x
) ln
p
(
x
)
q
(
x
)
dx
(2)
points of q(x) are integers of [3, 1799]. Three candidate functions of q(x) are used. They are lognormal as equation (6), exponential as equation (7) and Erlang2 as equation
(8).
q
(
x
)
=
1
(ln
x
−
ln
m
)
2
2
exp(
)
(6)
1424413125/07/$25.00 © 2007 IEEE
_{2}_{9}_{8}_{9}
q
(
x
)
=
1
x
exp(
−
)
α
α
(7)
q
(
x
)
=
− 2
β
−
x
β
exp(
x
)
(8)
Minimize
(
K pq
)
in problem (3) by searching for the
parameters of q(x) with optimal algorithm, and the results are described in section 4.
4. 
Results 
of 
calculation 
and 
parameters 
estimation 
Table 3 lists the calculation results of K–L divergence of p(x) with the three candidate q(x) and parameters of q(x). Figure 3 depicts the probability distributions vs. call duration. Four types of probability density function are compared: the actual frequency distribution of data sample 1 and three candidate pdfs(probability density functions), which are lognormal, exponential and Erlang2.
From table 3, it can be found that the value of the KL divergence of p(x) with lognormal pdf(probability density function) is the least of the three values. It is more close to 0, and is much less than the other two values. From figure 3, it is obvious that lognormal pdf fits the actual frequency distribution of data sample 1 better than exponential or Erlang pdf.
Table 3. Results of KL divergence calculation and estimated parameters
Density function 
KL divergence 
Parameters 
Erlang 
0.277 
=42.135, k=2 
Exponential 
0.124 
=84.270 
Lognormal 
0.00787 
m=49.133, =1.0041 
From the results, it can be concluded that the lognormal pdf with m=49.133 and =1.0041, can be considered as the real call duration pdf of this GSM system.
5. Results of Prediction
Using parameters obtained in section 4, some prediction results are depicted from figure 4 to figure 8. The total call number of data set 2 was used as a known parameter for the three candidate pdfs. Each prediction result was compared with the actual value of data sample 2. From figure 4 to figure 7, it can be found that the lognormal pdf with m=49.133 and =1.0041 predicted call number or traffic distribution very well. The prediction results using lognormal pdf are more precise than that using exponential or Erlang pdf. Figure 8 depicts the relationships between call number, traffic and call duration of this GSM system. It can be found that a large proportion of calls with short call duration occupy a small proportion of traffic. For example, it can be read from figure 8 that 78% of total calls with call duration
less than 100s, and they account for only 40% of traffic.
6. Conclusions
From the calculation of KL divergence in section 4, and the prediction results in section 5, some conclusions can be obtained. First, KL divergence is a simple but effective measure to test the goodness of fit of a pdf. Second, using KL Divergence method, the parameters of candidate probability density function can be estimated precisely. Third, the Lognormal pdf is more precise than exponential or Erlang pdf in the performance prediction of the GSM system.
Figure 1. Traffic density distribution on JAN.25, 2007
Figure 2. Average weekday traffic density distribution from DEC.1, 2006 TO JAN.31, 2007
Figure 3. Call duration probability distributions comparison
1424413125/07/$25.00 © 2007 IEEE
_{2}_{9}_{9}_{0}
Figure 4. Predictions of call number distribution vs. call duration
Figure 5. Predictions of cumulated call number vs. call duration
Figure 6. Predictions of traffic distribution vs. call duration
Figure 7. Predictions of cumulated traffic vs. call duration
Figure 8. The relationships between call number, traffic and call duration
References
[1] G. Boggia, P. Camarda, A. D’Alconzo, A. De Biasi and M. Siviero, “Drop Call Probability in Established Cellular Networks: from data Analysis to Modelling”, Proc. of IEEE VTC 2005, spring, May 2005. [2] J. Jordan and F. Barcelo, “Statistical modeling of channel occupancy in trunked PAMR systems,” Proc. 15th Int. Teletraffic Conf. V. Ramaswami and P. E. Wirth, Eds. Elsevier Science B.V., 1997, pp. 1169–1178 [3] L. Eckhard, A.S. Werner, A. Markus, “Lognormal Distributions across the Sciences: Keys and Clues”, BioScience, Vol. 51, May 2001, No. 5. pp. 341–351. [4] D.E. Duffy, A.A. McIntosh, M. Rosenstein, W. Willinger, “Statistical Analysis of CCSN/SS7 Traffic Data from Working CCS Subnetworks”, IEEE Journal on Selected Areas in Comm., Vol. 12, April 1994, pp. 544551. [5] V. Bolotin. “Modeling call holding time distributions for CCS network design and performance analysis”, IEEE J. Sel. Areas in Commun., Vol.12, No. 3, Apr., 1994, pp.433438. [6] S. Kullback, Information Theory and Statistics, Wiley, New York, 1959, pp. 37. [7] I. Csiszár, “information theoretic methods in probability and statistics”, IEEE Information Theory Society Newsletter 48, 1998, pp. 2130. [8] S.A Laddin, G.Çigdem, U. Ilhan, M.K. Yeliz, “Determining Probability Distribution by Minimum Cross Entropy”, Proceedings of the 6th WSEAS International Conference on Simulation, Modelling and Optimization, Lisbon, Portugal, September 2224, 2006, pp. 644647. [9] M. Srikanth, H.K. Kesavan, H.R. Peter, “Probability Density Function Estimation Using the MinMax Measure”, IEEE transactions on systems, man, and cybernetics, Vol.30, No.1, February 2000, PP. 17. [10] V. Bolotin, "Telephone Circuit Holding Time Distributions", Proc. of 14th ITC, Elsevier Science B.V., 1994, pp. 125134. [11] E.T. Jaynes, “Information Theory and Statistical Mechanics”, Reprinted from the Physical Review, Vol. 106, may 15, 1957,
pp.620630.
[12] S. Kullback,. and R. A. Leibler, “On Information and sufficiency,” Annals of Mathematical Statistics, Vol. 22, No. 1 (Mar., 1951), pp. 7986. [13] T. Cover and J. Thomas, Elements of Information Theory, Wiley, New York, 1991.
1424413125/07/$25.00 © 2007 IEEE
_{2}_{9}_{9}_{1}