1

A Novel Approach to Parameter Estimation in Markov-modulated Poisson Processes
Larry N. Singh, G. R. Dattatreya Department of Computer Science The University of Texas at Dallas Richardson, Texas 75083-0688 Phone: (972) 883-2189 Fax: (972) 883-2349 Email: {lns,datta}@utdallas.edu

Abstract— The Markov-modulated Poisson Process (MMPP) is employed as a network traffic model in some applications. In this model, if traffic from each node is a Poisson process, the final sequence of service requirements is a hyperexponential renewal process, a special case of MMPP. This paper solves the estimation of parameters of such an MMPP as follows. Two novel algorithms for estimating the parameters of the hyperexponential density are formulated. In the first algorithm of the present paper, equations are developed for the M mixing probabilities in terms of the component means of the hyperexponential density. This reduces the number of unknown parameters from 2M to M. An objective function is constructed as a function of the unknown component means and an estimate of the cumulative distribution function (cdf) of the hyper-exponential density. The component means are obtained by minimizing this objective function, using quasi-Newton techniques. The mixing probabilities are then computed using these known means and linear least squares analysis. In the second algorithm, an objective function of the unknown component means, mixing probabilities, and an estimate of the cdf is constructed. All the 2M parameters are computed by minimizing this objective function, using quasiNewton techniques. The merits of each algorithm are discussed. The algorithms developed are computationally efficient and easily implemented. Simulation results presented here demonstrate that both algorithms work well in practical situations.

reality, these fluctuations occur over a wide range of time scales generating high variability and self-similar behaviour. Self-similar processes are structurally similar over a many different time scales. This phenomenon leads to long range dependencies in network traffic. Ostensibly, an alternative to classical Poisson methods is sought. To this end, long-tailed distributions have shown much success. A distribution is a long-tailed distribution (also referred to as a heavy-tailed distribution) if its ccdf decays slower than exponentially, i.e. if
x→∞

lim eαx F c(x)

=

∞,

(1)

for all α > 0 (Feldmann and Whitt [4]). The complementary cumulative distribution function (ccdf) is defined as the complement of the cumulative distribution function (cdf), as follows F c(x) = 1 − F (x), (2)

where F (x) is the cdf. Conversely, a distribution is a shorttailed distribution if its ccdf decays exponentially, i.e. if
x→∞

lim eαx F c (x) =

0,

(3)

I. I NTRODUCTION The Markov-modulated Poisson Process (MMPP) is a doubly stochastic Poisson process in which the current rate is determined or modulated by a continuous-time Markov chain. This process is a special case of the Markovian Arrival Process or MAP (Trivedi [1]). There are a number of reasons for the growth in popularity of the MMPP in certain Computer Science applications, particularly in the area of computer networks. In recent years, the volume of traffic on large-scaled networks and across the Internet has increased tremendously, necessitating serviceable statistical models for traffic flow and analysis. Statistical models are required for the design of underlying IP-based transport layer protocols, efficient data gathering and evaluation, and statistical analysis of corresponding random processes and variables (Markovitch and Krieger [2]). The work of Leland et al [3] demonstrates that classical Poisson-based methods are insufficient for large-scale networks. Typical Poisson methods predict early fluctuations in network traffic and hinge on the assumption that these anomalies will smooth out over a long period of time. In

for some α > 0. Of note is a special case of long-tailed distributions called the power-tail distribution. A distribution is said to be power-tail distribution if F c x ∼ αx−β as x → ∞, (4)

where α and β are positive constants and the operator ∼ is defined such that f(x) ∼ g(x) implies that f(x) x→∞ g(x) lim = 1. (5)

Two common examples of long-tailed distributions that are used widely in network performance analysis are the Pareto and Weibull distributions. In addition, the Pareto distribution is a power-tail distribution, however, the Weibull distribution is not. Recent studies have demonstrated that long-tailed distributions are aptly able to model many network characteristics. For example, long-tailed distributions have been valuable in modeling World Wide Web (WWW) traffic, since said

2

traffic often originates from heterogeneous sources [2]. Longtailed distributions have also been shown to be successful in modeling file transfer protocol (FTP) connections and intervals between connection requests [4]. As efficacious as the long-tailed distribution has been in capturing the statistical characteristics of large scale networks, there are some deficiencies of this model. Most notably, long-tailed distributions are generally very difficult to analyze. An example is the task of analyzing the performance figures of the basic M/G/1 queue which becomes quite involved if the service-time distribution is Pareto. Furthermore, unlike many short-tailed distributions, expressions for the Laplace transforms of longtailed distributions are quite complex. Laplace transforms are generally useful for analyzing the distributions by numerical transform inversion. Standard non-parametric estimators such as the histogram, projections and kernel estimators are not suitable for long-tailed distributions, mostly due to the fact that long-tailed distributions do not have compact support. Short-tailed distributions, on the other hand, do have compact support. It is well known that kernel estimators suffer from spurious noise appearing in the tail of the estimator (Silverman [5]). Markovitch and Krieger [2] have explored some non-parametric procedures for estimating and approximating long-tailed distributions. Two of these procedures include transformation functions that map a long-tailed density into a pdf with compact support, and polygrams. Polygrams are histograms with variable bin widths. Hyperexponential densities have exhibited much success in approximating long-tailed distributions and for constructing network performance models ([4] and [2]). Moreover, analysis of hyperexponential densities is tractable and the Laplace transforms are simple expressions. The MMPP is a relatively simple model and is able to approximate network traffic activity well without the accompanying analysis becoming intractable (Fischer and Meier-Hellstern [6]). In addition, the MMPP models arrival streams and bursty traffic more precisely than other models (Bolch et al [7]). As a consequence, the MMPP has been widely used to model network traffic and in queuing theory. For instance, Heffes [8] demonstrates how the MMPP may be utilized to model the statistical multiplexing of a finite number of voice services and data. Muscariello et al [9] show how the MMPP approximates the long-range data characteristics (LRD) of Internet traffic traces. Scott [10] discusses the application of the MMPP to web traffic modeling. Consider also, the following type of sensor network. Each sensor transmits a sequence of bursts of data to a central server for processing. Each burst may consist of several packets of requests. Bursts from multiple nodes are merged resulting in a single sequence of packets. An appropriate model for the sequence of sensor node identifications (IDs) of the packets in the merged sequence is a Markov chain. Service requirements for packets from each node is exponentially distributed, with different nodes having different means. The sequence of service requirements results in an MMPP. If traffic from each node is a Poisson process, the final sequence of service requirements is a hyperexponential renewal process, a special case of an MMPP. Applications of the MMPP are not limited to computer networks and may also be found in environmental,

medical, industrial and sociological research [10]. Indeed, Onof et al [11] utilize the MMPP to study rainfall patterns. A further exposition of the MMPP is found in [6]. A. Definition of the MMPP The MMPP has a finite number, M , of states and operates as a Poisson process with state-dependent rate λi where 1 ≤ i ≤ M . Define qij to be the transition rate from state i to state j, and the M × M matrix Q to be the matrix with element qij at row i and column j. The steady state probabilities are defined as π = [π1, . . . , πM ]T and satisfies the matrix equation πQ = 0. (6)

From this definition of the model, it is clear that the output of the MMPP may be modeled as the outcome of a mixture of exponential random variables with probability distribution function (pdf)
M

f(x)

=
i=1

πi λi e−λi x .

(7)

This mixture random distribution is commonly referred to as the hyperexponential random distribution. Given N independent identically distributed (iid) samples of data, the problem dealt with here is to estimate the parameters λ = [λ1, . . . , λM ]T and π of the hyperexponential density. Thus, all the parameters of the corresponding MMPP except for the transition rates Q, may be determined from the hyperexponential density. In much of the literature, it is assumed that the parameters of the MMPP model are known, and so treatment of the problem of parameter estimation is sparse [10]. However, there have been some algorithms presented that tackle the task of estimating these parameters. The majority of these algorithms are based on maximum-likelihood (ML) techniques. Scott [10] demonstrates a procedure wherein a MMPP may be expressed as a hidden Markov model (HMM) and the corresponding parameters estimated using either the expectation-maximization (EM) algorithm or by a rapidly mixing Markov chain Monte Carlo (MCMC) algorithm. Meier-Hellstern [12] presents a technique for estimating the parameters using the EM algorithm in the special case wherein the number of states M is two. These techniques that make use of the EM algorithm inherently suffer from the problems associated with the EM algorithm. It is well known that the EM algorithm has a slow rate of convergence and may converge to values on the boundary of the parameter space. In this paper, we present two computationally efficient, tractable and easily implemented algorithm for estimating the parameters of a hyperexponential random distribution. These techniques may be extended to estimate the transition probabilities Q as well. The first algorithm — hereafter, referred to as Algorithm 1 — is a two-step procedure. The first step involves estimating the component means λ of the hyperexponential distribution. The approach here is to develop equations that express π in terms of λ. These expressions are then substituted into an objective function which is a function

3

of λ and an estimate for the cdf of the hyperexponential distribution. Minimizing this objective function yields the required estimates of λ. The second step estimates the steady state probabilities π given that the component means are known by making use of linear least squares analysis. To our knowledge, there are no other similar techniques for parameter estimation of hyperexponential distributions. The second algorithm — Algorithm 2 — estimates π and λ in one phase by minimizing an objective function that is a function of π, λ and an estimate for the cdf of the hyperexponential distribution. The relative merits of each algorithm are also compared and discussed. Once the parameters of the hyperexponential density are obtained, the algorithm developed by Dattatreya [13] is employed to compute the transition probabilities of the MMPP. B. Organization of paper Algorithm 1 is developed in section II, along with the required supporting expressions for the associated objective function. Likewise, the corresponding objective function and expressions for Algorithm 2 are constructed in III. Simulation results and analysis are described in section IV. Finally, section V concludes the paper. II. A LGORITHM 1 Most ML techniques for computing the parameters of hyperexponential densities intrinsically require estimation of 2M parameters. Likewise, any optimization procedure that uses or evaluates the hyperexponential density directly, also involves estimation of 2M parameters. In this section, equations expressing π in terms of λ and functions of the samples of data are derived, thus reducing the number of dependent unknown parameters to M . Making use of these equations, an objective function is formulated and the values of λ are calculated by minimizing this objective function. It is assumed that each mixing proportion is strictly positive, ensuring that each component plays a role in influencing the resulting pdf. Algorithm 1 essentially reduces the number of unknown parameters by incorporating additional information from the samples of data. A. Expressions for π The Laplace transform of the hyperexponential densities obtained at M distinct points gives tractable equations connecting the transforms and unknown parameters. Define α = [α1, . . . , αM ]T where each αi is a distinct, real, positive value. The Laplace transform of the pdf in equation (7) scaled by a 1 factor of αi is
X 1 −α E e i αi

provided that A is nonsingular. Matrix A is a Cauchy matrix and hence, has certain nice properties (Boras [14]). For instance, the inverse of a Cauchy matrix can be represented as an explicit expression (Knuth [15]) and thus, B = A−1 is an M × M matrix with elements bij =
k 1 ( λj + α i ) 1 1 ( λj + αk )( λk + αi) 1 ( λj − 1 λk )

, (11)

(αi − αk )
k k=i

k k=j

at row i and column j, where 1 ≤ k ≤ M , λ1 = λ2 = . . . = λM = 0 and α1 = α2 = . . . = αM . Clearly, each λi is distinct by the assumption that the mixing proportions for each component are all non-zero, and each αi is distinct by assumption. Hence, A is clearly invertible. From equations (10) and (11) the steady state probabilities may be expressed as
M

πi (λ) =
j=1

bij aj .

(12)

This gives a means of computing π if the component means and expectations in equation (8) are provided. B. Determination of λ The expressions for the steady state probabilities developed in the previous section afford a means of reducing the number of unknown variables to just M . In this section, an algorithm to obtain these M component means is formulated by fitting a candidate cdf to the given exact cdf. The cdf of a hyperexponential density is
M

F (x) =

1−
k=1

πk e−λk x.

(13)

˜ ˜ ˜ Let λ = [λ1, . . . , λM ]T and π (λ) = [˜1(λ), . . . , πM (λ)]T ˜ ˜ π ˜ ˜ ˜ be the current approximations for λ and π, respectively, in an iterative sequence of approximations for fitting the cdfs. Define a to be the estimate of a from the samples of data. ˆ Hence, an approximate or candidate cdf is given as
M

˜ ˜ F (x, λ)

=

1−
k=1

˜ πk (λ)e−λk x . ˜ ˜

(14)

˜ Notice that the only unknown in this candidate cdf is λ. The error of the fit of this candidate cdf at a single point x is defined to be ˜ ˜ F (x) − F (x, λ)
2

M

=
j=1

πj

1 λj
1 λj

1 + αi
1 +αi

.

(15)

(8) at row i and

Let A be the matrix with elements

column j. Also let a = [a1 , . . ., aM ] be the vector of X 1 − expectations where ai = E[ αi e αi ]. Therefore, in matrix notation, equation (8) is
T

Aπ π

= a and = A
−1

(9) (10)

a,

This result can be extended over the entire domain of x — i.e. the set of all real numbers greater than or equal to zero (R+ )— to give the total error by integrating equation (15) for all x ∈ R+ . Unfortunately, this integral does not furnish a simple, tractable expression for the total error. Thus in practice, the integral would have to be numerically evaluated which is a somewhat expensive task. Realize that for practical purposes, the entire domain of R+ need not be considered. A viable approximation of the total error may be obtained by computing

4

the error at a finite number of points, m, over the region of interest and summing up the error as follows
m

˜ d(λ)

=
k=1

˜ ˜ F (x) − F (xk , λ)

2

dx.

(16)

˜ ˜ Observe that d(λ) ≥ 0 for all λi > 0 and 1 ≤ i ≤ M . ˜ is bounded from below and has a global Therefore, d(λ) minimizer. Moreover, this minimum is known to be zero ˜ and for an ideal candidate cdf, d(λ) = 0. Obviously a better approximation of the total error is obtained if m is made large, however, a lower bound on m is desirable. The family of hyperexponential densities is identifiable (Yakowitz and Spragins [16]), and hence, by [17], a minimum of M data points {x1, . . . , xM } are required in order for a set of parameters to uniquely determine this data. Therefore, in theory it is necessary that m ≥ M . The component means are obtained by minimizing (16) with ˜ respect to λ. It is worth noticing that the objective function ˜ in question is not a convex function of λ in general. Therefore, Newton methods for minimization cannot be directly ˜ applied. Nonetheless, the problem of finding λ is posed as a constrained nonlinear optimization problem. The first set of constraints ensure that λk > 0 for all 1 ≤ k ≤ M . The second set are as a result of ensuring that π are valid, nonzero probabilities, i.e.
M

cdf is defined as,   0,  i−1  , n−1 ˆ (x) = F i−1  n−1 +   1,

x < y1 x = yi (20) x−yi , yi < x < yi+1 (yi+1 −yi )(n−1) x > yn .

ˆ Define λ as the estimates of λ and substitute equation (20) into (16) giving the new objective function
m

ˆˆ d(λ) =
k=1

ˆ ˜ ˆ F (xk ) − F (xk , λ)

2

.

(21)

ˆ Minimizing this objective function produces the estimates λ. D. Estimation of π from statistical data In the preceding section, a method was developed for determining estimates of λ given an estimate for the cdf. A naive approach to attaining values for π is to exploit equation (12). However, due to the nature of the hyperexponential density and the equations involved, the values for the vector a cannot be estimated accurately enough from data for computing π. The values attained for π from equation (12) are very sensitive to ˜ the values derived for λ. This means that small inaccuracies ˜ in λ translate into large errors in π. This is mainly due to the inability of a to estimate a with sufficient accuracy. In ˆ ˆ contrast, the estimate for the cdf F (x) from cdf provides a very accurate representation of the exact cdf. This is yet another advantage of using the estimate of the cdf for function-fitting. The cdf of the hyperexponential density is linear in π. In addition, the estimated cdf can be expressed in terms of the ˆ estimates for λ and the unknown parameters π as follows,
M

˜ πk (λ) = 1 and
k=1

(17) (18)

˜ 0 < πk (λ) < 1 for all 1 ≤ k ≤ M. C. Estimation of λ from statistical data

π k e − λk x ˆ
k=1

ˆ

=

ˆ 1 − F (x),

(22)

In order to minimize the objective function developed, there are several quantities that need to be estimated from the samples of data. In this section, estimators for these quantities are devised. Given n samples of data {x1, . . . , xn}, and assumed distinct constants α, an estimator for ai is ai ˆ = 1 n
n

for all 0 < x < ∞. Let z = {z1, . . . , zS } be an arbitrary set of positive, real constants such that
1≤i≤S

inf zi

= =

1≤j≤M 1≤j≤M

inf

xj and

(23) (24)

sup zi
1≤i≤S

sup xj .

e−αi xk .
k=1

(19)

The equations derived previously implicitly assume that the cdf F (x) is exact and available. Of course this is not the case, and an estimate for the cdf must be obtained from the samples of data. The cdf is a good choice for function-fitting and estimation in general for the hyperexponential density for a number of reasons. First, the cdf is a smooth, monotonically increasing function. Second, the cdf can be estimated easily and accurately from a finite number of samples of data. Third, given the chosen implementation for the cdf, a lookup for a value runs in O(log n) time which is quite fast. A piecewise continuous estimate of the cdf is obtained as follows. Sort the observations of data {x1, . . ., xn} to produce the values {y1 , . . . , yn} such that y1 ≤ y2 ≤ . . . ≤ yn and {y1 , . . . , yn} is a permutation of {x1, . . . , xn}. Hence, the estimate of the

ˆ ˆ ˆ Define F(z) = [F (z1 ), . . . , F (zS )]T and π = [ˆ1, . . . , πM ]T . ˆ π ˆ ˆ ˆ be the S × M matrix with Cij = e−λj zi being the ˆ Let C element at the ith row and j th column. From these definitions, equation (22) can be written using matrix notation as ˆπ Cˆ = ˆ 1 − F(z), (25)

leading to the following theorem. Theorem 1: Equation (25) has a unique solution for the mixing proportions, given that the component means are known. Proof: Equation (25) can be solved using linear least squares regression analysis, and is also a convex function. Therefore, this function has a global minimizer and has a unique solution for π. From this theorem, π is obtained by solving equation (25) ˜ using linear least squares regression analysis.

5

E. Summary of Algorithm 1 The following presents a summary of Algorithm 1, for obtaining estimates of the parameters for the hyperexponential density, given n samples of data. 1) Choose values for α such that each αi is distinct and positive. ˆ 2) Obtain an initial estimate for λ. 3) Compute a using α and the samples of data. ˆ ˆˆ ˆ 4) Minimize d(λ) to obtain a new estimate for λ. ˆ compute C and using 5) Using the new estimate of λ, linear least squares regression, obtain an estimate for π. III. A LGORITHM 2 The second algorithm developed is similar to Algorithm 1 of the previous section. The essential difference in Algorithm 2 is that 2M parameters are estimated from the developed objective function. In the following subsection, the required objective function is developed. A. Development of the objective function As in the previous section, an approximate cdf is constructed as follows
M

3 actual pdf algorithm 1/2 pdf em pdf 2.5

2

f(x)

1.5

1

0.5

0

0

0.5

1

1.5

2 x

2.5

3

3.5

4

Fig. 1. Sample plot of pdf generated from Algorithm 1 using 100 samples of data and after 10-20 iterations.

˜ ˜ ˜ F (x, λ, π ) =

1−
k=1

π k e − λk x . ˜

˜

(26)

proportions are met. Thus, the only applicable constraint is that λi > 0, for all 1 ≤ i ≤ M . Observe that the hyperexponential cdf is strictly monotonic, and thus, the value of the objective function will be large if any λi becomes negative. Therefore, the constraint that λi > 0 is implicitly enforced and we have an unconstrained nonlinear optimization problem. Let γ be ˆ the estimates of γ and introduce the new objective function defined as
m

Notice that this cdf has three arguments as opposed to two, and ˜ that π is no longer described as a function of λ. Following, ˜ the procedure of the previous section, the error of fit of this candidate function is expressed as ˜ ˜ ˜ (F (x) − F (x, λ, π))2 .
m

ˆ ˆ ˆ du(λ, γ )

=
k=1

ˆ ˆ ˆ ˆ γ F (x) − F (xk , λ, π(ˆ))

2

dx, (30)

(27)

Similarly, the approximate total error of the fit is denoted ˜ ˜ d(λ, π) =
k=1

˜ ˜ ˜ F (x) − F (xk , λ, π)

2

dx,

(28)

ˆ ˆ where m ≥ M . The function d(λ, π) has similar properties ˆ ˆ ˆ ˆ as d(λ), in that d(λ, π ) ≥ 0 for all λi > 0 and 1 ≤ i ≤ ˆ ˆ M . Hence, d(λ, π) is also bounded below and has a global minimizer. The required parameters are obtained by minimizing the objective function in equation (28). As in the previous section, this is a constrained nonlinear optimization problem. The constraints are the same, i.e. that the component means are positive and that the mixing proportions are valid probabilities. The latter constraint can, however, be relaxed by introducing the softmax function (Bishop [18]). Let γ = {γ1 , . . . , γM } be a vector of real constants, and hence, e γi , (29) πi(γ) = M γj j=1 e for all 1 ≤ i ≤ M . The number of free parameters can be further reduced to 2M − 1 by fixing one the γi to an arbitrary constant. For instance, let γM = 0, and now the values for the remaining γi will be translated appropriately. Using this definition for πi ensures that the constraints of the mixing

ˆ It is worth noticing that (30) is not a convex function of λ and γ , so the Newton method for optimization cannot be ˆ applied. Instead, the minimization should be performed using either quasi-Newton methods with the BFGS update method and inexact line searches, or using the Levenberg-Marquardt method for nonlinear regression analysis [19]. For Algorithm 2, the only quantity that needs to be estimated is the cdf. The approach for estimating the cdf is the same in both algorithms 1 and 2. In the next subsection, a summary of the algorithm is presented. B. Summary of Algorithm 2 The following presents a summary of of Algorithm 2. Estimates of the parameters of the hyperexponential density are obtained, given n samples of data. ˆ 1) Obtain an initial estimate for λ and π . ˆ ˆu(λ, γ ) to give new estimates for λ and π . ˆ ˆ ˆ 2) Minimize d ˆ IV. S UMMARY OF S IMULATION E XPERIMENTS R ESULTS
AND

The algorithms discussed in the previous sections were implemented and tested through simulation using different values for M , λ and π. A subset of the simulation trials is discussed here. Both algorithms were tested on a four component hyperexponential density using synthetically generated iid mixture samples. In addition, results of both algorithms were compared to the basic implementation of the EM algorithm.

6

ˆˆ Evaluation of the objective function d(λ) was performed by evaluating the approximate total error for equally spaced points over a meaningful domain. In Algorithm 1, the first ˆ phase of estimating λ is accomplished through the use of quasi-Newton methods with the BFGS method for Hessian update and safeguarded mixed quadratic and cubic polynomial interpolation and extrapolation method for line searches. The implementation chosen is the fminunc MATLAB function [20]. For the second phase of Algorithm 2, constrained linear regression was performed using SQP methods and specifically the lsqlin MATLAB function. Sample results of the three algorithms are given in figure 1. The pdf’s generated by both Algorithm 1 and 2 were very similar, so only one plot is shown. Each figure compares the generated pdf of the corresponding algorithm to the actual pdf. Synthetic data having the following characteristics was generated and used for each of the algorithms λ = [1.0, 2.0, 3.0, 4.0] and P = [0.28, 0.14, 0.38, 0.2]. For each ˆ algorithm, starting points for λ were chosen randomly in the range [0.5, 4.5]. Algorithm 1 was executed using 100 data samples, and terminated after 10 − 20 iterations for all the simulation runs. Algorithm 2 was executed using 1000 data samples and terminated after a maximum of 100 iterations. The EM algorithm was performed on 1000 data samples and terminated after 250 iterations. From the simulation results, it is evident that algorithms 1 and 2 produce similar quality of results. Our results for the basic EM algorithm are not very accurate unless the starting point is quite close to the solutions. Algorithm 1 appears to be superior to algorithm 2 since it uses much less data — 10% of Algorithm 2 — and terminates in far less iterations — 10%-20% of the iterations required for Algorithm 2. Since Algorithm 1 has M unknown parameters, versus 2M parameters in Algorithm 2, it is expected that that more solutions that are minimums of the corresponding object function will exist. This explains why more data samples are required in Algorithm 2 for a sufficiently accurate solution. Realize also that the computations for Algorithm 1 are somewhat more complex than those of Algorithm 2, and there are two distinct steps versus one in Algorithm 1 as opposed to Algorithm 2. So the iterations of Algorithm 1 are slightly more expensive in terms of computation time than those of Algorithm 2. There is no simulation evidence to suggest that there are operating regions in which one algorithm is superior to the other, given the current approaches taken for optimization. The general manifestation is that Algorithm 1 is better than algorithm 2 in terms of computation expense and accuracy of solution, given that all input parameters to both algorithms are identical. V. C ONCLUSION The major contributions presented here are algorithms for computing the parameters of a hyperexponential density. Our algorithms are easily implemented yet computationally very efficient. There are numerous potential applications of this algorithm particularly in the areas of network traffic modeling and queuing theory. In addition, evidence is presented to suggest that the algorithms presented are superior to the

EM algorithm in terms of robustness and computation speed. Algorithm 1 appears to be superior to Algorithm 1 in terms of computation speed and number of data samples that are required. However, Algorithm 2 is somewhat conceptually simpler to implement and execute. Nevertheless, both algorithms are efficient and easily implemented using standard tools of numerical optimization. In addition, simulation results indicate that both of the algorithms give remarkably accurate estimates of the hyperexponential pdf. There are possible areas for improvement of the algorithm presented here under certain conditions. The use of simulated annealing may overcome the need for initial estimates for the component means, and may also allow for obtaining a global minimum to the objective function. Boras [14] demonstrates techniques for improving on the numerical precision of the task of finding the inverse of a Cauchy matrix. These techniques may improve the accuracy and quality of the estimates. R EFERENCES
[1] K. S. Trivedi, Probability and Statistics with Reliability Queuing and Computer Science Applications, John Wiley and Sons, NY, USA, 2002. [2] N. M. Markovitch and U. R. Krieger, “Nonparametric estimation of long-tailed density functions and its application to the analysis of the World Wide Web Traffc,” Performance Evaluation 42, (2-3), pp. 205222, 2000. [3] W. E. Leland, M. S. Taqqu, W. Willinger and V. Wilson, “On the Self-Similar Nature of Ethernet Traffic (Extended Version),” IEEE/ACM Trans. on Networking Vol. 2, No. 1, pp. 1-15, Jan. 1994. [4] A. Feldmann and W. Whitt, “Fitting Mixtures of Exponentials to Long-Tail Distributions to Analyze Network Performance Models,” Performance Evaluation, Vol. 31, Iss. 3-4, pp. 245-279, Jan. 1998. [5] B. W. Silverman, Density Estimation for Statistics and Data Analysis, Chapman and Hall, Ltd., 1986. [6] W. Fischer and K. S. Meier-Hellstern, “The Markov-modulated Poisson process (MMPP) cookbook,” Performance Evaluation No. 18, pp. 149171, 1992. [7] G. Bolch, S. Greiner, H. de Meer and K. S. Trivedi, Queuing Networks and Markov Chains: Modeling and Performance Evaluation with Computer Science Applications, John Wiley and Sons, NY, USA, 1998. [8] H. Heffes and D. M. Lucantoni, “A Markov Modulated Characterization of Packetized Voice and Data Traffic and Related Statistical Multiplexer Performance,” IEEE Journal Sel. Areas Commun., Vol. SAC-4, No. 6, Sept. 1986. [9] L. Muscariello, M. Mellia, M. Meo, M. Marsan and R. Lo Cigno, “An MMPP-Based Hierarchical Model of Internet Traffic,” ICC 2004. [10] S. L. Scott and P. Smyth, “The Markov Modulated Poisson Process and Markov Poisson Cascade with Applications to Web Traffic Modeling,” Bayesian Statistics 7, 2003. [11] C. Onof, B. Yameundjeu, J. P. Paoli and N. Ramesh, “A Markov modulated Poisson process model for rainfall increments,” Water Science and Technology, Vol. 45, No. 2, pp. 91-97, 2002. [12] K. S. Meier-Hellstern, “A fitting algorithm for Markov-modulated Poisson processes,” Euro. Jour. Oper. Res. No. 29, pp. 370-377, 1987. [13] G. R. Dattatreya, “Estimation of prior and transition probabilities in multi-class finite Markov mixtures,” IEEE Trans. on Sys., Man., and Cyber., Vol. 21, Iss. 2, pp. 418-416, Mar. 1991. [14] T. Boras, “Studies in Displacement Structure Theory,” Ph.D. Dissertation, Stanford University, CA, USA, 1996. [15] D. E. Knuth, Fundamental Algorithms - The Art of Computer Programming: Vol. 1, Second Edition, Addison-Wesley, MA, USA, 1973. [16] S. J. Yakowitz and J. D. Spragins, “On the Identifiability of Finite Mixtures,” Ann. Math. Stats. Vol. 39, No. 1, pp. 209-214, 1968. [17] H. Teicher “Identifiability of Finite Mixtures,” Ann. Math. Stat., Vol. 34, No. 4, pp. 1265-1269, December 1963. [18] C. M. Bishop, “Neural Networks for Pattern Recognition,” Oxford University Press, NY, USA, 1995. [19] J. Nocedal and S. J. Wright, “Numerical Optimization,” Springer-Verlag, NY, USA, 1999. [20] Matlab Software, http://www.mathworks.com