You are on page 1of 4

Estimate the Call Duration Distribution Parameters in GSM System Based on K-L Divergence Method

JUNQIANG GUO, FASHENG LIU, ZHIQIANG ZHU

College of Information and Electrical Engineering, Shandong University of Science and Technology, Qingdao, 266510, China

Guojunqiang2008@163.com

Abstract

This article analyses call duration probability density function and its parameters in GSM system. The sample data are obtained from an actual system, and the observed call duration is as long as 1800 seconds. K-L Divergence method has been introduced to compare the goodness of candidate call duration probability density functions, and to estimate the parameters of three candidate probability density functions. Using these parameters, a day time hour performance of the GSM system is predicted. The comparison shows that lognormal distribution of call duration is more precise than exponential or Erlang distribution in the performance prediction in GSM system.

Keywords: entropy, K-L Divergence, GSM, call duration, lognormal distribution

1. Introduction

Call duration in mobile telephony system is a very important parameter. Call duration probability density function and its parameters are often required in system design, simulation and analysis. In system management and network optimization, especially in charging policy, call duration probability density function and its parameters are key parameters to ensure the correctness of prediction and analysis. In mobile telephony system, call duration parameter plays a very important role in determining cell size, handover property, and channel allocation strategy. Charging strategy is dependent on call duration, too. Estimation of call duration probability density function and its parameters is often a not trivial problem. In most situations, we expect a simple, analytic and relatively precise function. Call duration in mobile telephony system traditionally is assumed exponential just as in the fixed telephony networks. However, many empirical approaches have been used to look into the probability distribution of call duration both in mobile and fixed telephony systems, and it is found that lognormal distribution fits empirical distribution better than exponential or Erlang distribution does[1][2][4][5]. In [3], it is said that the length of sentences fits the log-normal distribution. Maybe that is the reason that

call duration in GSM telephony system fits lognormal distribution better. In [2], the authors tested the following distributions:

lognormal, shift exponential, Erlang-k and Erlang-jk. They found that the lognormal distribution fitted the empirical distribution better. In their studies [1][2], the authors mainly used statistical goodness-of-fit test method. The data samples they used are shorter than 6 minutes. In [4], the authors believe that it is critical to select sample of calls not being truncated, i.e., look at all calls that started in a given time period, not those that started and ended in a given time interval. The Kulback-Leibler divergence is enunciated by Kullback and is called the principle of minimum cross entropy (MinxEnt), which aims to find a probability distribution that is as close as possible to another distribution [6]. Minimum cross-entropy (MinxEnt) method has been applied to a wide range of problems such as signal processing, pattern recognition, economics, statistics, and communication system, etc. [7-9]. To estimate the call duration distribution parameters in GSM System based on K-L divergence method is meaningful to this methodology in engineering. It shows the power of Kulback-Leibler divergence method: simple and precise. This paper presents a field study of call duration distribution in GSM mobile telephony system. The K-L Divergence is used as a measure to estimate three candidate distributions: exponential, Erlang, lognormal. The conclusions are: lognormal distribution has less K-L Divergence with statistical probability distribution; Minimum K-L Divergence method is simple and precise in estimating the call duration distribution parameters in GSM system.

2. Data obtaining

The data used in this paper is from a mobile switch centre in GSM system of Qingdao, China. The call start time and call duration are selected from signaling command. In one whole hour, all calls to and from the mobile switch centre are recorded. All calls starting within this hour were used to construct a data sample. The traffic distributions of the mobile switch centre are

1-4244-1312-5/07/$25.00 © 2007 IEEE

2988

shown in Figure 1 and Figure 2. Figure 1 shows a day traffic distribution; Figure 2 shows two monthly averaged week day traffic distributions. Two data samples are constructed. Data sample 1 was recorded from 13:17:00 to 14:16:59 on Wednesday, Jan.24, 2007. Data sample 2 was obtained on Thursday, Jan. 25,

Table 1. Call duration statistical values of data set 1

The K-L divergence is a natural distance measure from a probability distribution p(x) to a probability distribution q(x). One can show that it is always nonnegative and zero only if p(x)=q(x)[13]. The larger the divergence/difference between p and q, the higher the value of K; the more similar are p and q, the lower will be K; in a limiting case, if p=q, then

2007, from 10:22:00 to 11:21:59.

K=0.

Some values of the original samples are eliminated. The values under 3 seconds are considered to be caused by noise or interference, and are not normal calls. The values over

In our problem, the goal is to find an analytical function q(x) with proper parameters. It can be written as an optimization problem:

1800 seconds are all recorded as 1800 seconds so are also eliminated in our analysis.

Min K

(

p

q

)

=

p

(

x

) ln

p

(

x

)

dx

q

(

qx

)

Table 1 and 2 show the statistical values of the two data samples respectively. From the tables we can find that the

 

 

q

()

x

0

 

eliminated data have negligible effect to our analysis. So the call duration data samples used in this article are from 3S to 1799S. The mean time of data sample 1 is 84.44 seconds, and the standard deviation is 123.32 seconds, and the data sample 2 has a mean of 81.95 seconds, and standard deviation 115.86 seconds. The mean call duration is much shorter than the value in [10], which studied a fixed telephony system and showed 200-300 seconds mean call duration. Perhaps this is one of the main differences of mobile and fixed telephone system.

S T

.

.

p

()

x

0

q

()

x

dx =

p

()

x

dx =

1

1

(3)

Calculating problem (3) by computer, continuous function p(x) and q(x) must be transformed into discrete forms, and continuous integral should be approximated by discrete summation. In practice, discrete approximation of the continuous integral is as equation (4). The numerical approximation is

 

Number

Percentage

 

Percentage

based on sample values of f(x). The range of x[,] a b , is

of calls

of total

calls

Traffic

of total

traffics

divided into n equal intervals. Let (x 1 , x2, …, x n ) be the

representatives of these intervals, and {f(x 1 ), f(x 2 ), …, f(x n )}

Total

87415

 

2065.46

 

be the values corresponding to the points {x 1 , x 2 , …, x n }.

0–2 S

1093

1.25

0.305

0.015

3–1799S

86237

 
  • 98.65 97.93

2022.65

 

The continuous integral of f(x) is approximated by

 

1800S

85

 

42.5

2.06

b f

x dx

()

=

b

a

n

 

f

(

x

)

n

=

 

p

(4)

  • 0.097

a

Table 2. Call duration statistical values of data set 2

 

n

i

=

1

i

i

=

1

i

 

Number

Percentage

 

Percentage

Set

of total

Traffic

of total

 

b

a

(5)

 

of calls

calls

traffics

p

i

=

n

fx

(

i

)

Total

104326

 

2370.37

 

The probability distribution p(x) is calculated by the

0–2 S

1256

1.20

0.32

0.0135

frequency distribution of the data sample. Let the sample

3–1799S

103020

98.75

2345.05

98.932

data be {x 1 , x 2 , …, x N }, and x min and x max be the minimum

1800 S

50

0.048

25

1.055

and maximum values of x. The interval, [x min , x max ], is

3.

Calculating

Based

on

K-L

Divergence

partitioned into n intervals of equal length. Based on this partitioning, the frequency distribution of the sample data is

Method

 

obtained as {f 1 , f 2 , …, f N }. The probability distribution p(x)

         

f N

}

 
 

.

,

,

N

According to E.T.Jaynes [11], for a continuous variable x,

is given as In

our

problem,

 

call

 

duration

is

variable

x,

the entropy of a probability distribution function p(x) is defined as:

and x [3, 1799] , partitioning interval is 1 second. Sample

H ( p ) = − k

p ( x)ln p ( x )dx

(1)

For distributions over a continuous variable x, the K-L divergence (Kullback-Leibler divergence) between two probability distribution functions p(x) and q(x) is defined by Kullback and Leibler [12] as:

K

(

p

q

)

=

p

(

x

) ln

p

(

x

)

q

(

x

)

dx

(2)

points of q(x) are integers of [3, 1799]. Three candidate functions of q(x) are used. They are lognormal as equation (6), exponential as equation (7) and Erlang-2 as equation

(8).

q

(

x

)

=

1

(ln

x

ln

m

)

2

2 πσ x 2 σ
2
πσ x
2
σ

2

exp(

)

(6)

1-4244-1312-5/07/$25.00 © 2007 IEEE

2989

q

(

x

)

=

1

x

exp(

)

α

α

(7)

q

(

x

)

=

2

β

x

β

exp(

x

)

(8)

Minimize

(

K pq

)

in problem (3) by searching for the

parameters of q(x) with optimal algorithm, and the results are described in section 4.

4.

Results

of

calculation

and

parameters

estimation

 

Table 3 lists the calculation results of K–L divergence of p(x) with the three candidate q(x) and parameters of q(x). Figure 3 depicts the probability distributions vs. call duration. Four types of probability density function are compared: the actual frequency distribution of data sample 1 and three candidate pdfs(probability density functions), which are lognormal, exponential and Erlang-2.

From table 3, it can be found that the value of the K-L divergence of p(x) with lognormal pdf(probability density function) is the least of the three values. It is more close to 0, and is much less than the other two values. From figure 3, it is obvious that lognormal pdf fits the actual frequency distribution of data sample 1 better than exponential or Erlang pdf.

Table 3. Results of K-L divergence calculation and estimated parameters

Density function

K-L divergence

Parameters

Erlang

0.277

=42.135, k=2

Exponential

0.124

=84.270

Lognormal

0.00787

m=49.133, =1.0041

From the results, it can be concluded that the lognormal pdf with m=49.133 and =1.0041, can be considered as the real call duration pdf of this GSM system.

5. Results of Prediction

Using parameters obtained in section 4, some prediction results are depicted from figure 4 to figure 8. The total call number of data set 2 was used as a known parameter for the three candidate pdfs. Each prediction result was compared with the actual value of data sample 2. From figure 4 to figure 7, it can be found that the lognormal pdf with m=49.133 and =1.0041 predicted call number or traffic distribution very well. The prediction results using lognormal pdf are more precise than that using exponential or Erlang pdf. Figure 8 depicts the relationships between call number, traffic and call duration of this GSM system. It can be found that a large proportion of calls with short call duration occupy a small proportion of traffic. For example, it can be read from figure 8 that 78% of total calls with call duration

less than 100s, and they account for only 40% of traffic.

6. Conclusions

From the calculation of K-L divergence in section 4, and the prediction results in section 5, some conclusions can be obtained. First, K-L divergence is a simple but effective measure to test the goodness of fit of a pdf. Second, using K-L Divergence method, the parameters of candidate probability density function can be estimated precisely. Third, the Lognormal pdf is more precise than exponential or Erlang pdf in the performance prediction of the GSM system.

q ( x ) = 1 x exp( − ) α α (7) q ( x

Figure 1. Traffic density distribution on JAN.25, 2007

q ( x ) = 1 x exp( − ) α α (7) q ( x

Figure 2. Average weekday traffic density distribution from DEC.1, 2006 TO JAN.31, 2007

q ( x ) = 1 x exp( − ) α α (7) q ( x

Figure 3. Call duration probability distributions comparison

1-4244-1312-5/07/$25.00 © 2007 IEEE

2990

Figure 4. Predictions of call number distribution vs. call duration Figure 5. Predictions of cumulated call

Figure 4. Predictions of call number distribution vs. call duration

Figure 4. Predictions of call number distribution vs. call duration Figure 5. Predictions of cumulated call

Figure 5. Predictions of cumulated call number vs. call duration

Figure 4. Predictions of call number distribution vs. call duration Figure 5. Predictions of cumulated call

Figure 6. Predictions of traffic distribution vs. call duration

Figure 4. Predictions of call number distribution vs. call duration Figure 5. Predictions of cumulated call

Figure 7. Predictions of cumulated traffic vs. call duration

Figure 4. Predictions of call number distribution vs. call duration Figure 5. Predictions of cumulated call

Figure 8. The relationships between call number, traffic and call duration

References

[1] G. Boggia, P. Camarda, A. D’Alconzo, A. De Biasi and M. Siviero, “Drop Call Probability in Established Cellular Networks: from data Analysis to Modelling”, Proc. of IEEE VTC 2005, spring, May 2005. [2] J. Jordan and F. Barcelo, “Statistical modeling of channel occupancy in trunked PAMR systems,” Proc. 15th Int. Teletraffic Conf. V. Ramaswami and P. E. Wirth, Eds. Elsevier Science B.V., 1997, pp. 1169–1178 [3] L. Eckhard, A.S. Werner, A. Markus, “Log-normal Distributions across the Sciences: Keys and Clues”, BioScience, Vol. 51, May 2001, No. 5. pp. 341–351. [4] D.E. Duffy, A.A. McIntosh, M. Rosenstein, W. Willinger, “Statistical Analysis of CCSN/SS7 Traffic Data from Working CCS Subnetworks”, IEEE Journal on Selected Areas in Comm., Vol. 12, April 1994, pp. 544-551. [5] V. Bolotin. “Modeling call holding time distributions for CCS network design and performance analysis”, IEEE J. Sel. Areas in Commun., Vol.12, No. 3, Apr., 1994, pp.433-438. [6] S. Kullback, Information Theory and Statistics, Wiley, New York, 1959, pp. 37. [7] I. Csiszár, “information theoretic methods in probability and statistics”, IEEE Information Theory Society Newsletter 48, 1998, pp. 21-30. [8] S.A Laddin, G.Çigdem, U. Ilhan, M.K. Yeliz, “Determining Probability Distribution by Minimum Cross Entropy”, Proceedings of the 6th WSEAS International Conference on Simulation, Modelling and Optimization, Lisbon, Portugal, September 22-24, 2006, pp. 644-647. [9] M. Srikanth, H.K. Kesavan, H.R. Peter, “Probability Density Function Estimation Using the MinMax Measure”, IEEE transactions on systems, man, and cybernetics, Vol.30, No.1, February 2000, PP. 1-7. [10] V. Bolotin, "Telephone Circuit Holding Time Distributions", Proc. of 14th ITC, Elsevier Science B.V., 1994, pp. 125-134. [11] E.T. Jaynes, “Information Theory and Statistical Mechanics”, Reprinted from the Physical Review, Vol. 106, may 15, 1957,

pp.620-630.

[12] S. Kullback,. and R. A. Leibler, “On Information and sufficiency,” Annals of Mathematical Statistics, Vol. 22, No. 1 (Mar., 1951), pp. 79-86. [13] T. Cover and J. Thomas, Elements of Information Theory, Wiley, New York, 1991.

1-4244-1312-5/07/$25.00 © 2007 IEEE

2991