1 Up votes0 Down votes

1.0K views83 pagesCompiliation of properties for the Maxwell-Boltzmann probability distribution. Includes methodology for statistical estimation of parameters by MoM, MLE, Bayesian and quantile estimators. Provides MATLAB program to generate plotting paper. Presents table for Andrson-Darling GoF test and a table for outlier detection based on likelihood ratio test.

Nov 25, 2009

© Attribution Non-Commercial (BY-NC)

DOC, PDF, TXT or read online from Scribd

Compiliation of properties for the Maxwell-Boltzmann probability distribution. Includes methodology for statistical estimation of parameters by MoM, MLE, Bayesian and quantile estimators. Provides MATLAB program to generate plotting paper. Presents table for Andrson-Darling GoF test and a table for outlier detection based on likelihood ratio test.

Attribution Non-Commercial (BY-NC)

1.0K views

Compiliation of properties for the Maxwell-Boltzmann probability distribution. Includes methodology for statistical estimation of parameters by MoM, MLE, Bayesian and quantile estimators. Provides MATLAB program to generate plotting paper. Presents table for Andrson-Darling GoF test and a table for outlier detection based on likelihood ratio test.

Attribution Non-Commercial (BY-NC)

- Agard Flight Test Technique Series Volume 2 Identification of Dynamic Systems
- a1
- Classification of mixtures of student grade distributions based on the Gaussian mixture model using the expectation-maximization algorithm
- Exchange Dev Africa
- EM
- class03_14
- Risk Rating Article 2004
- S06 Handout - Sampling Distributions for Means and Proportions
- Does Anything Beat 5 Minute RV
- pg092-93
- Fall 2009 Test 1
- Analysis of Wind Speed Variations and Estimation of Weibull Parameters for Wind Power (PPK Awam)
- PROC LOGISTIC Class_variables
- d 03402029035
- CS174: note02
- Structural Analysis Report
- Cir asfda asfd asdf
- Statistical Inference
- 2yprobn
- AIChE Journal Volume 23 Issue 4 1977 [Doi 10.1002%2Faic.690230412] FrantisД›k Madron; Vladimir Veverka; VojtД›Ch VanД›ДЌek -- Statistical Analysis of Material Balance of a Chemical Reactor

You are on page 1of 83

Properties, Parameter Estimators and Some Applications

Rev. 3.1

Doug Hollingshead

doug_hollingshead@yahoo.com

November 2009

Abstract

This note is intended to provide some statistical concepts and results related to the Maxwell

(or Maxwell-Boltzmann) distribution. This distribution can be derived from the three degree-

of-freedom Chi-squared distribution. The Maxwell distribution is the three dimensional

counterpart of the Rayleigh distribution when interpreted as sums of physical quantities.

Many standard probability texts discuss statistical properties of the Rayleigh distribution. To

date, the author has not found an expanded discussion of the Maxwell distribution properties,

or formulas pertaining to parameter estimation.

This note is primarily aimed at deriving relevant statistical properties of the one-parameter

Maxwell distribution and presenting formulae for various point estimators for this parameter.

Maximum likelihood, method of moments and quantile point estimation procedures are

discussed. Cramer-Rao lower bound relative efficiency comparison of the three estimators is

included. The discussion of methods proceeds from general concepts to specifics related to

the Maxwell distribution. A summary of important formulae is provided. Appendix A

provides discussion of Bayesian estimation for the Maxwell distribution parameter. Two-

parameter Maxwell distributions are discussed in Appendix B. Additional information

pertaining to the Maxwell distribution statistics is provided in the remaining appendices.

An attempt is made to identify some situations where knowledge of the Maxwell distribution

and its estimators is helpful. These include three dimensional miss distance, and detection of

mismatch between the assumed and actual covariance matrix associated with an alpha-beta

filter. The Mahalanobis distance for tri-variate normally distributed variables is introduced as

the primary analytical tool for this application.

1

DRAFT

2 y2 − y2

Density Function: f ( y) = ⋅ exp

2

0 ≤ y ≤∞ a >0

π a3 2a

y 2 y − y2

CDF: F ( y ) = erf −

⋅ exp

2

2a π a 2a

3π − 8

µ1′ = a

8

Mean, Mode and Variance: ymo = 2 a σY2 = a 2

π π

a2

Minimum Variance Bound: CRLB =

6n

n

1

MLE estimator: aˆ MLE =

3n

∑y

1

2

i MSE ( aˆ MLE ) = CLRB

π 1 n

MOM estimator: aˆ MOM =

8 n

∑y 1

i

Quantile Estimator: MSE ( aQUAN ) = CLRB / 0.6450

Y3 / 4 = upper quartile

f ( y; a , b ) =

2 ( y − b ) 2 ⋅ exp − ( y − b ) 2

b ≤ y ≤∞ ; a > 0 ; −∞ < b < ∞

π a3 2 a2

y −b 2 ( y − b) − ( y − b) 2

F ( y; a, b) = erf − exp

2a π a 2a

2

1 n 2 1 n a2

aˆ MOM = 1.4849 ∑ yi − n ∑ yi MSE ( aˆ ) = 0.5270

n 1 1 n

bˆMOM

1 n

= ∑ yi − 2.3695

1 n 2 1 n

∑ yi − n ∑1 yi ()

MSE bˆ = 1.2737

a2

n

n 1 n 1

List of Appendices

2

DRAFT

References

3

DRAFT

Properties, Parameter Estimators and Some Applications

The Maxwell (or more correctly, the Maxwell-Boltzmann) distribution was originally derived

using statistical mechanics, to explain the most probable energy distribution of molecules in a

system. The kinetic theory of gasses rests on this distribution. Later developments in

probability theory showed that this distribution can be derived from the more general Chi-

squared distribution, which in turn is a special case of the Gamma distribution.

The Maxwell distribution is related to the Chi-squared distribution, with three degrees of

freedom. This distribution is essentially the “3-dimensional” equivalent of the Rayleigh

distribution. The Chi-Squared distribution expresses the probability distribution of the sum of

the squares of random variables drawn from a Normal distribution. The Rayleigh and

Maxwell distributions define a random variable in terms of the root-mean-square (RMS) of

these sums. It is important to note the underlying variables are distributed normally, but have

a common variance.

This paper is intended to provide some technical background on the properties of the Maxwell

distribution including distribution function, moments and other properties. A second goal is to

discuss estimation methods for the single parameter of this distribution as well as discuss

mean square error of the various statistical estimators. The statistical efficiency concept for

estimators is introduced. Calculation of efficiency requires on the general expressions for

variance of the sample moments.

In this regard, three methods of parameter estimation will be discussed: method of moments

(MOM), method of quantiles, and maximum likelihood (MLE). The Cramer-Rao Lower

Bound (CRLB) for this distribution will be derived. The MLE estimator is shown to achieve

this lower bound.

estimators. Quantile estimators are generally very easy to calculate and are quite robust when

outliers are present in the data set. They are however less efficient than either MOM or MLE

estimators. A simple expression for the Maxwell minimum variance quantile estimator is

provided.

Bayesian estimators for the Maxwell distribution may be useful when using the Maxwell

distribution but are often limited by lack of knowledge of a suitable “a priori” distribution.

Bayesian estimation is discussed in Appendix A. Because of the structure of the likelihood

function for this distribution, a closed form solution exists only for special prior distributions.

The concept of sufficient statistics is introduced here. One such prior is discussed and the

Bayes’ estimate (i.e., mean) of the posterior distribution is provided. Comparison with MLE

estimators is made. An alternative “maximum a priori” or MAP (i.e., mode) estimator is also

mentioned.

4

DRAFT

The last goal of this paper is to indicate two potential applications of the Maxwell distribution

as applied to models or experiments encountered in missile design and analysis. The

application motivating this paper was the need to provide a simple metric for evaluation of the

output from a specific “alpha-beta” filter. The “capture probability” is defined as the

proportion of “weighted error magnitudes” that lie inside the volume contained within the

“2.5-sigma” covariance ellipsoid.

The capture probability involves calculation of the Mahalanobis distance ( d M ) for each of the

data points. This distance will be introduced, and it will be shown that even when the various

error components are correlated, the Mahalanobis distance still has a Maxwell distribution.

In some cases, the Maxwell distribution may not have the lower bound at zero. The

distribution will then have a location parameter as well as the scale parameter. This two

parameter Maxwell distribution is discussed in Appendix B, along with derivation of

properties of method-of-moments parameter estimators.

Derivation of both MLE and MOM estimates depend on the variance of the raw moment

estimators. Appendix C contains additional derivations pertaining to the variance and

covariance of moment statistics. Approximate variance of functions of the raw moments are

also discussed where the Delta Method is introduced for this purpose.

Appendix D develops some necessary results for order statistics of a sample, including the

expression for the expected value of the quantiles. The asymptotic mean and variance for

order statistics themselves is also derived and an application for the Maxwell distribution is

provided.

Appendix E derives the distribution for the sum or two Maxwell random variables. The sum

of N independent Maxwell random variables and the asymptotic normal distribution are also

discussed.

and tests for outliers. The appendix contains Monte Carlo derived tables for testing goodness

of fit via the Anderson-Darling statistic and using correlation statistics for the Maxwell

distribution. Tables are also provided for testing for k outliers in samples assumed to be

Maxwell distributed. The author is unaware of tables being published elsewhere for either of

these statistics from hypothesized Maxwell distributions.

Appendix G discusses and provides MATLAB code for development of Maxwell distribution

“probability paper.” These plots are useful for initial evaluation of data.

Appendix H provides a brief overview of Chi distributions for k > 3. Here, k is the number of

squared normal variables included in the root-mean-square sum. A generalized form of this

distribution is the Nakagami. Parameter estimation methods for these distributions are also

discussed.

5

DRAFT

Distribution Derivation

The sum of squares of “n” independent random variables, each with a standard normal

distribution, result in a random variable that has a Chi-squared distribution with “n” degrees-

of-freedom. References [1] or [2] contain details. We are currently interested in the sum of

three independent variables which are normally distributed with zero mean. The three

component variables have a common variance, and each variable is normalized by dividing

the square of each ( X i ) by variance (σ ) . The resulting Chi-squared random variable, W,

2

W 1 / 2 exp ( −W / 2) 3

X i2

f (W ) = W =∑

Γ(n / 2) ⋅ 23 / 2 1 σ2

The gamma function is expressed as an integral, but is constant for a fixed number of degrees

of freedom. For three degrees of freedom, Γ( 3 / 2 ) ⋅ 2 = π / 2 .

The variable of interest is the square root of W. Consider the transformation: y =a W where

“a” is a scaling factor. It will be shown subsequently that this factor is the standard deviation

of the normal random variables. The distribution of variable y is required.

The inverse of the transformation is W = y 2 / a 2 . The support of W is the positive real axis

and as such, the support of y is also the positive real axis. Further, a one-to-one transformation

exists between y and W, so the distribution of y can be found from the standard procedure

(Reference [4]):

dW 2y

fY ( y ) = fW ( y ) = fW ( y )

dy a2

This distribution is in the general family of “Chi distributions.” The three degree of freedom

Chi distribution is the Maxwell density function. Substituting the transformed variable, and

the value for Γ(3 / 2 ) , provides the Maxwell probability density function:

2 y2 − y2

f ( y) = ⋅ exp

2 Equation (1)

π a 3

2a

The Cumulative Density Function (CDF) is derived by direct integration over the interval

0 ≤ y where 0 ≤ y ≤ ∞ .

y 2 y − y2

F ( y ) = P[Y ≤ y ] = erf −

⋅ exp

2 Equation (2)

2a π a 2a

The error function is a tabulated integral function defined as:

u

2

π∫

erf (u ) = exp ( −t 2

) dt ; erf (∞) =1 and erf (0) = 0

0

for the Maxwell distribution. This is seen by writing Equation 1 as:

6

DRAFT

dz 1 2

f ( z) = f ( z ⋅ a) = z 2 exp ( −z 2 / 2)

dy a π

Figure 1 shows the Maxwell distribution for various values of the scale parameter:

0.7

0.6

a = 1

0.5

a = 2

a = 3

0.4 a = 5

f(y)

0.3

0.2

0.1

0

0 1 2 3 4 5 6 7 8 9 10

y

It is interesting to note that the distribution has an approximate “bell shape” which implies

this distribution is close in some sense to the normal distribution.

7

DRAFT

Calculation of the distribution moment (about the origin) can in theory be done using the

moment generating function (MGF). The MGF does exist for this distribution but is quite

complicated. Moments about the origin are more easily evaluated by direct integration. For

the mean (first moment):

∞

− y2

µ1′ =

2 1 2 1 8

∫0 dy = 2 a4 =

3

y exp 2

a

π a3 2a π a 3

π

∞

− y2 π

µ2′ =

2 1 2 1

∫0 dy = = 3a2

4

y exp 2

3 a5

π a3 2 a π a 3

2

The third and fourth moments about the origin can be calculated similarly:

′ 2 ′

µ3 = 8 a 3 ⋅ µ 4 = 15 a 4

π

The variance follows from the first two moments:

′ ′

2

3π − 8

σ Y2 = µ2 = µ2 − µ1 = a 2

π

Skewness of the distribution involves the third moment about the mean. It should be positive

as indicated in Figure 1. Skewness and kurtosis can be shown to be constant:

3 2 4

µ3 = µ3′ −3 µ1′ ⋅ µ2′ +2 ′

µ1 and µ4 = µ4′ −4 µ1′ ⋅ µ3′ +6 ′

µ1

′

µ2 −3

′

µ1

µ3 2 2 (16 − 5 π )

Skewness = = = 0.4857 > 0

σ3 ( 3 π − 8) 3 / 2

µ4 15 π 2 + 16 π − 192

Kurtosis excess = = − 3 = 0.1082

σ4 ( 3 π − 8) 2

This is not a highly skewed distribution, as Figure 1 confirms. The mode ( ymo ) of the

Maxwell distribution can be found by differentiating equation 1 and setting the result to zero

(note: a > 0):

df 2 1 2 y3

= exp ( − y 2

/ 2 a )

2 y − = 0 ⇒ ymo = 2 a

dy π a3 a 2

8

DRAFT

Various methods exist to estimate parameter “a” when samples from the Maxwell distribution

are available. Since the estimate â depends on this data, it is a random variable and has an

associated sampling distribution, fT ( t : a ) . The notation implies the estimator T = aˆ . The

distribution of this random variable depends on the true parameter value “a”. The sampling

distribution generally is unknown, but the variance of the estimator can often be calculated, at

least approximately. Thus, we desire to find the variance of this estimator, denoted var ( â ) .

The variance of an estimator is referred to as the Mean Square Error (MSE) of the estimator.

The square root of the MSE is referred to as Standard Error (SE) of the estimator.

The first question regarding estimators is: does a theoretical lower bound exist for the MSE,

and if so is there an estimator that achieves this bound? The Cramer-Rao Lower Bound

(CRLB), if it exists, provides an expression for the minimum MSE. It is a function of the

particular distribution and sample size. An estimator that attains this lower bound may not

exist. If the CRLB exists, it is derived from the “likelihood function” of the sample.

The likelihood function of a sample is the product of the probabilities, or joint distribution, of

the sample values. The Maxwell distribution, with sample size n, has the following likelihood

function:

n/2

n

2 n yi2 −1 n

L( y; a ) = ∏ fY ( yi ; a ) = ∏ 3 exp 2 ∑y 2

Equation (3)

π

i

1 1 a 2a 1

The likelihood function is generally more useful in logarithmic form:

n

n 2 n y2 1 n

ln L( y; a ) = ∑ln fY ( yi ; a ) = ln + ∑ln i3 − 2 ∑y 2

Equation (4)

2 π 1

i

1 a 2a 1

Note that the logarithmic function is monotonic increasing, so the operations on the likelihood

function can be replaced with operations on the “log likelihood” function in most cases:

The CRLB derivation can be found in References [1] or [2]. In order for the CRLB to exist,

the support of the density function cannot depend on the parameters, and the first two

derivatives of the log likelihood function, with respect to the parameters, must exist. These

“regularity conditions” are both met for the Maxwell distribution. The CRLB is:

1 1 1

CRLB = = =

∂ ln L 2

∂ ln f ( y ) 2

n ⋅ I (a )

E n ⋅ E Y

∂ a ∂ a

In the previous expression, E denotes expectation and I(a) is called the “Fisher information.”

9

DRAFT

∂ln f ( y ) −3 y 2

= + 3

∂a a a

∞

2 9 y2 6 y4 y6 − y2 2

I ( a) = ∫0 a 5 a 7 a9 2 a 2 dy = π ( I1 + I 2 + I 3 )

− + exp

π

The first integral is evaluated as:

∞

3 π −y − y 2 π 9

− ( a 2 y ) exp

9

I1 = 5 a erf =

2

a 2 2a 2 a 0 2 a2

The remaining two integrals are similar, with one term involving the error function and the

second term involving a polynomial in y, multiplied by the exponential term. The Fisher

information for the Maxwell distribution is:

2 9 18 15 π 6

I ( a) = − 2 + 2 = 2

π a 2

a a 2 a

The CRLB is:

a2

CRLB = Equation (5)

6n

Moment Estimators

Given that the CRLB exists, the second question is: how close does the MSE of a chosen

estimator function get to the minimum variance provided by this lower bound? The ratio of

the CRLB to the MSE for a particular estimator is referred to as the estimator efficiency.

required. We will address three kinds of estimators; Maximum Likelihood (MLE), Method of

Moments (MOM) and method of quantiles. All of these methods allow for relatively direct

calculation of the MSE of the estimators. The last one requires some results from order

statistics of a sample. The MSE of quantile estimators is addressed in the next section.

MLE and MOM estimators depend on the sample moments. As such, an expression for

variance of the sample moments is required. The “r-th’ sample moment about the origin is

given by the following computational formula, for random variable Y:

′ 1 n

M r = ∑Yi r

n 1

Note that sample moments are random variables, because they depend on the random sample

values. It is shown in Appendix C that the sample moments are unbiased.

′ ′

E M r = µr

10

DRAFT

Appendix C contains a general derivation of the variance and covariance of these sample

moments. Because the Maxwell distribution contains only one parameter, only one sample

moment is required and no covariance is involved. The expression of interest is:

′ 1 ′ ′

2

var

M r = µ2 r −

µr

Equations (6)

n

MLE and MOM estimators are functions of sample moments as will be shown. An expression

for the variance of a function of these sample moments is therefore required. The exact

expression for the variance of a function of these random variables is generally not known. In

such cases, an approximation can be developed via the “Delta Method,” which is discussed in

Appendix C.

′

â = g

M r where r = 1 or 2.

The variance of this estimator is expressed in terms of the derivatives of the function (see

Appendix C):

2

∂ aˆ ′

var ( aˆ ) = var

M r Equation (7)

∂M ′

r

The partial derivative is evaluated at the respective mean value: M k ′ = µk ′ . The MSE of

parameter â can be calculated using Equations 6 and 7.

The maximum likelihood principle is based on the notion that parameter values occurring in

the likelihood (or log likelihood) function should maximize that function. In this application,

the likelihood function is no longer considered as a joint probability function as it was for the

CRLB development. In this case, the data values are fixed, and the likelihood is viewed as a

function of the parameter value(s). The parameter is chosen such that the probability of

observing the actual data values is maximized.

For the Maxwell distribution, the derivative of Equation 4 is taken and equated to zero for

maximization:

∂ ln L( y; a ) 1 n n

1

∂a

=0 = 3

a

∑y

1

2

1 −3∑

1 a

Solving this last equation provides the MLE for the parameter “a”:

n

1

aˆ MLE =

3n

∑y

1

2

i Equation (8)

To calculate the MSE of this estimator, note that the statistic considered is:

11

DRAFT

′ ∂T 1

M2 and = Equation (9)

T = aˆ MLE = ∂M 2

′ ′

3 µ2′ 12 µ2

The variance of the second moment is calculated by inserting values for the second and fourth

moments into Equation 6:

′ 1 ′ ′ 6a

2 4

var M 2 = µ4 − µ2 =

n n

Squaring Equation 9, using the previous result, and plugging into Equation 7 provides MLE

estimator variance:

2

var ( aˆ MLE )=a Equation (10)

6n

CRLB

eff (T ) =

var (T )

When this ratio is 1.0, the estimator is called “efficient.” The MLE estimator of “a” is

therefore efficient since Equation 5 and Equation 10 are identical.

The method of moments is based on the idea that if two distributions have the same moments,

they should be (approximately) the same. The method therefore consists of setting the first

“k” sample moments equal to the corresponding moments of the assumed distribution. The

number of equations then equals the number of parameters being estimated.

For the Maxwell distribution, only one parameter is available; thus, we set the sample mean to

the distribution mean:

′ 8 π ′

M1 = a ⇒ aˆ MOM = M1

π 8

Using the previous procedure to derive the MSE of this estimator:

∂T π

=

′ 8

∂M 1

′ 1 ′ ′ 1 2 8 2

2

var

M1 = µ2 −

µ1

= 3 a − a

n n π

a2

var ( aˆ MOM ) = 0.178097 ⋅ Equation (11)

n

The efficiency of the MOM estimator is thus:

1/ 6

eff ( aˆ MOM ) = = 0.935818

0.178097

12

DRAFT

The MOM estimator is seen to have a variance approximately 7% greater than the MLE

estimator of â ; consequently it is less efficient. It is of interest to note that if we had used the

second moments and equated the sample and distribution moments, the resulting MOM

estimator would have been identical to the MLE. This shows that in some cases, judicious

selection of the sample moments to use in MOM might have some advantage.

The above examples indicate that MLE estimators are more efficient, and this is generally

true. MLE estimators have several other advantages. The first is that for “large” sample sizes,

the MLE estimator approaches a normal distribution with mean equal to the true parameter,

and the variance equal to the CRLB.

The second advantage of an MLE estimator is that it has an “invariance” property. This means

that if an estimate of some function of a parameter is required, we can insert the MLE

estimate of the parameter into the function to obtain the MLE of that function. This is

expressed mathematically as:

gˆ MLE ( a ) = g ( aˆ MLE )

It is noted that MLE estimators are often biased, although this is not the case for the Maxwell

distribution as is apparent since it is an efficient estimator.

13

DRAFT

Quantile estimators are similar to MOM estimators in that they are not based on a

maximization procedure. Quantile estimators are generally easy to compute, but often the

MSE of such estimators is difficult to ascertain. The efficiency of these estimators is generally

less than MOM estimators. One application of such estimators is when the data has been

censored. Censored data implies that not all samples have been evaluated (e.g., such as life

testing). The method depends on some results for order statistics of a sample. The pertinent

results are discussed in Appendix D.

In some cases the presence of unexpectedly large (or small) values may be present in a

sample. These ‘outliers’ can have a disproportionate influence on the parameter estimates. If

the outlier needs to be accounted for in a sample, but not unduly affect the parameter estimate,

then quantile estimates are quite “robust” in this sense. The relation between quantile and

MLE (or MOM) estimators is analogous to the sample mean and median estimates of the

population central tendency in this sense. The order statistics just the sample values arranged

in ascending order:

Y(1) < Y(1) < < Y( n−1) < Y( n )

The r-th order statistic is denoted as Yr . Each order statistic of the sample has an associated

probability value, referred to as the quantile of the order statistic:

qr = P [ y = Y( r ) ] =

r

n +1

The idea behind quantile estimators is to pick a number of quantiles equal to the number of

parameters to be estimated, and the associated sample order statistics for those quantiles.

Inserting these values into the distribution function provides a set of equations equal to the

number of unknown parameters. For the Maxwell distribution, there is only one parameter,

hence, only one quantile is needed to estimate the unknown parameter.

Several questions arise. First, does the mean square error of the parameter estimate depend on

which quantile is selected? Secondly, if a particular quantile gives the minimum MSE, how is

it identified? A third question is: does the MSE depend on the parameter value itself? The

following development shows that for the Maxwell distribution, the MSE does depend on the

selected quantile, but the minimum MSE quantile does not depend on the parameter value.

Quantile estimators in general are functions of the sample order statistics (just as moment

estimators are functions of the sample moments). The delta method procedure can again be

applied, and for a single parameter case the MSE of the estimator is:

2

∂a

var (T ≡ aˆ ) = ⋅ var (Y( r ) ) Equation (12)

∂Y( r )

The “large sample” formula for of any order statistic is derived in Appendix D:

14

DRAFT

qr ( 1 − q r )

var (Y( r ) ) =

n ⋅ f ( y( r ) ; a ) 2

Equation (13)

In this equation, y( r ) is the value of the “r-th” order statistic corresponding to the sample

quantile qr . The notation Y( r ) implies this quantity is a random variable. The function f (•)

is the underlying density function of the random variable y.

The minimum variance quantile is determined using the following procedure. We first need to

transform y( r ) into a more computationally convenient form:

yr

tr = Equation (14a)

2a

Equation 2 is then expressed in terms of this variable, which shows that t r depends only on

qr :

t r exp ( − t r2 )

2

qr = erf ( t r ) − Equation (14b)

π

The derivative term required in Equation 12 is obtained directly from Equation 14a:

∂a −1

= Equation (15a)

∂Yr 2 tr

A plot the relationship between qr and t r from Equation 14a is shown in the left side of

Figure 2. The Maxwell density function can also be expressed in terms of t r :

8 t r2

f (t r ; a ) = exp (−t r2 ) Equation (15b)

π a

Note that this expression involves parameter “a”. For selected value of “a”, Equation 12 can

be evaluated over the range of quantiles 0 < qr < 1 , by calculating t r via Equation 14b, and

then using Equations 15a and 15b.

qr ⋅ ( 1 − q r ) 1

var( aˆ ) = Equation (15c)

n ⋅ [ f ( tr ; a ) ] 2 tr2

2

The right hand side of Figure 2 shows the relative variation of Equation 15c as a function of

the quantile value qr for various choices of parameter “a”. The sample size n is a scaling

variable that does not affect the shape of the curves.

For parameter “a” close to unity, the relative variance appears rather insensitive to the

quantile value, but as the parameter increases, a minimum close to qr = 0.75 (upper quartile)

is evident. In fact, the minimum of Equation 15c occurs at qr = 0.7576 , and is independent

of the parameter as can be verified by direct calculation.

15

DRAFT

2 1 8

a = 1

1 6

1 . 8 a = 2

a = 3

1 4

a = 5

1 . 6

1 2

1 . 4 1 0

E s tim a te V a ria n c e (n = 1 )

t v a lu e

1 . 2 8

6

1

4

0 . 8

2

0

0 . 2 0 . 3 0 . 4 0 . 5 0 . 6 0 . 7 0 . 8 0 . 9 1 0 . 2 0 . 3 0 . 4 0 . 5 0 . 6 0 . 7 0 . 8 0 . 9 1

q u a n t ile v a lu e q u a n t ile v a lu e

For ease of calculation, the upper quartile Y3 / 4 value is recommended for the Maxwell

distribution estimator (a more exact analysis shows that the minimum quantile is 0.7676). At

the upper quartile, variable “t” equals 1.4330, and the quantile parameter estimate is:

Yqr =0.75

aˆQUAN = = 0.49344 ⋅ Y3 / 4 Equation (16)

2 (1.433 )

The efficiency of this quantile estimator is calculated using these values for the minimum

MSE:

=

0.1767

π a 2

a2

var ( aˆQUAN ) = =

n ( 0.1767 ) 2 ⋅ (1.433 ) 2 n

eff (aˆQUAN ) =

(a 2

/ 6n )

= 0.6450

( a ⋅ .2584 / n

2

) Equation (17)

16

DRAFT

As an example of the use of the parameter estimators, assume a sample of size 100 from a

Maxwell distribution is available. In this example, the parameter value is a = 1.5.

Computation was done using MATLAB functions. The sample is:

data = [ 3.7743 2.6739 1.1872 0.7470 2.7742 1.7389 2.1289 0.3800 1.7851 2.9057

2.2847 2.1543 1.1863 3.3807 1.0166 2.4348 3.0516 2.4945 0.7600 4.1672

4.3509 3.3549 2.2589 0.9332 1.5142 1.4671 2.2576 4.3885 1.7236 1.9492

0.7190 1.6172 1.6940 1.1957 1.9209 2.6369 1.4716 2.4191 3.0460 2.4330

3.5561 1.7868 2.2678 2.2190 2.7345 0.6198 2.3071 3.0629 4.0377 1.8466

4.9367 2.3712 2.6957 1.1412 2.3771 0.6530 1.9543 0.2878 2.0164 3.0145

4.4819 5.7881 1.2017 1.8738 2.7212 1.5086 3.1833 1.7552 0.8323 2.6708

3.3421 4.6603 2.9550 2.0458 1.5950 3.9521 2.4211 1.7068 1.4631 2.7573

4.0250 3.0648 3.5964 1.7052 3.4843 4.1598 2.4427 1.6287 4.2384 1.9250

4.1387 0.9586 3.7827 1.2290 2.8417 1.4812 0.6498 3.3351 3.3007 2.9664 ]

100 100

∑ yi = 240 .133

1

∑y

1

2

i = 704 .720

1 1.52

aˆ MLE = ⋅ 704 .72 = 1.5327 SE MLE = CRLB = = 0.06123

300 600

aˆ MOM = = 1.5048 SE MOM = = = 0.06330

8 100 eff ( aˆ MOM ) .93581

CRLB

aˆQUAT = 0.49344 ⋅Y3 / 4 =1.5118 Y3 / 4 = 3.0639 SE Quat = = 0.0762

eff ( aˆQUAN )

A check on the calculated standard errors was made by running 200 replicates (of sample size

100 each with a = 1.5 ). The standard deviation of the 200 MLE, MOM and quantile

estimates was calculated and compared to the theoretical values above. Results are:

In order to check the quantile estimator, estimates and standard errors were made at the lower

quartile (Y1 / 4 ) and median (Y1 / 2 ) also. Results showed the upper quartile estimator does in

fact produce minimum variance among these.

17

DRAFT

points in r-dimensional space. We confine discussion to Euclidian three dimensional space.

Mahalanobis distance is employed when the distance between r-dimensional random variables

is required, but the distance is weighted inversely to the “spread” or variance if each variable.

Components of the random vector y may be correlated.

dM = ( y −μ )T C −1 ( y −μ ) Equation (18)

Where: μ = mean vecto r

C = covariance matrix

Suppose the three dimensional vector random components are from uncorrelated normal

distributions. The variance of each component is in general different. For simplicity assume

all of the component variables are zero mean. The covariance matrix inverse is just the

inverse of the diagonal terms in this case, and the square of the Mahalanobis distance is:

y2 y2 y2

d M2 = 12 + 22 + 32

σ1 σ 2 σ 3

This squared value is seen to be the sum of squares of three standard normal random

2

variables. As such, the distribution of d M is Chi-squared with 3 degrees-of-freedom. The

square root of this random variable therefore is Maxwell distributed as shown previously,

2

where the distribution “a” is equal to unity because d M is the sum of three standard normal

variables.

2

The next issue is what happens to the distribution of d M when the components are

correlated?

In the following discussion, we consider tri-variate jointly distributed random variables.

Assume that all components have different standard deviations and different means. The

vector for the three components is [ x1 x2 x3 ] .

Examine the general covariance matrix for these variables, where non-zero correlations are

present:

σ 12 ρ12 σ 1 ⋅ σ 2 ρ13 σ 1 ⋅ σ 3

C = ρ12 σ 1 ⋅ σ 2 σ 22 ρ 23 σ 2 ⋅ σ 3

ρ12 σ 1 ⋅ σ 2 ρ 23 σ 2 ⋅ σ 3 σ 32

18

DRAFT

This matrix is positive definite and has non-negative eigenvalues. The eigenvector

corresponding to each eigenvalue allows for spectral decomposition of the covariance matrix

which can be expressed as:

C = Γ′ ΛΓ = (Γ′ Λ1 / 2 Γ)(Γ′ Λ1 / 2 Γ)

Matrix Γare the eigenvectors, Λis a diagonal matrix of the eigenvalues and the factoring

shown can be accomplished since the eigenvector matrix is orthonormal. Define the “square

root” of the covariance matrix as:

C 1 / 2 = ( Γ′ ΛΓ)

Because the eigenvalues are all positive, the inverse of the above matrix is:

C −1 / 2 = (Γ′ Λ−1 / 2 Γ)

Now, let Z be a random vector that has a multivariate normal distribution with zero mean and

independent components with unity variance, or Z → N 3 ( 0, I 3 ) . The subscript 3 implies Z is

tri-variate. Define a random vector X such that:

X = C 1 / 2 Z +μ

Z = C −1 / 2 ( X −μ )

3

W = Z′ Z = ( X − μ ) C −1 ( X − μ ) = d M2 = ∑ zi2 → χ 32

1

The previous equation states that the Mahalanobis distance squared is still Chi-squared, with

three degrees of freedom, as a result of the zi being independent standard normal variables,

even though the underlying variables (x) are correlated. It is again noted that the value of

parameter “a” is unity.

Application of the Mahalanobis distance measure is provided in the second example below.

The fact that the parameter value is unity provides a metric for evaluation of consistency of a

covariance matrix and a set of random variables.

19

DRAFT

This section provides two examples of missile systems analysis where the Maxwell

distribution applies.

One measure of tactical missile performance is the distribution of miss distance, measured as

the Euclidian distance from the missile to the target center, at closest approach of the weapon.

For air-to-ground missiles, the two-dimensional impact point is described in terms of

orthogonal “x-y” coordinates, centered at the target. The usual assumption is that the each

coordinate position is independent, and identically distributed normal variables, with zero

mean. These assumptions lead to the well known result that the distribution of the miss

distance in the two-dimensional plane follows a Rayleigh distribution.

For air-to-air or surface-to-air missiles, the three-dimensional Euclidian miss distance

distribution is sometimes used to describe performance. If the same assumptions are made

regarding each component (iid distributed, zero mean normal variables), then the resulting

joint probability distribution is:

f ( x, y , z ; a ) =

1

exp

(

− x2 + y 2 + z 2 ) where a 2 ≡ common var iance

( 2π a )

2 3/ 2

2 a2

To obtain Euclidian miss distance r, transform the orthogonal Cartesian coordinates to

spherical coordinates:

y z

r = x2 + y2 + z2 θ = Tan −1 φ = Sin −1

x r

x = r ⋅ cos θ ⋅ cos φ 0 ≤r ≤∞

y = r ⋅ cos θ ⋅sin φ −π ≤ θ ≤ π

z = r ⋅sin φ −π / 2 ≤ φ ≤ π / 2

f ( r ,θ , φ; a ) = J

( x, y , z ) 1 −r2

( r , θ , φ ) ( 2 π a 2 )3 / 2

exp 2 a2

The Jacobian for the transformation follows from the coordinate transform equations:

∂x ∂x ∂x

∂r ∂θ ∂φ cos ϕ ⋅ cos θ − r ⋅ cos φ ⋅ sin θ − r ⋅ sin φ ⋅ cos θ

J

( x, y , z )

=

∂y ∂y ∂y

= cos ϕ ⋅ sin θ r ⋅ cos ϕ ⋅ cos θ − r ⋅ sin φ ⋅ sin θ = r 2 ⋅ cos φ

( r ,θ ,φ ) ∂r ∂θ ∂φ

∂z ∂z ∂z sin φ 0 r ⋅ cos φ

∂r ∂θ ∂φ

20

DRAFT

r 2 cos φ −r2

f ( r ,θ , φ; a ) = exp Equation (19)

(2π a ) 2 3/ 2 2 a2

The marginal distribution of each variable is obtained by integrating out the other two:

r2 − r2 π /2 π

2 r2 −r2

f (r ) = exp 2 ∫ cos φ ⋅ dφ ∫ dθ = exp Equation (20a)

(2π a )

2 3/ 2

2 a −π / 2 −π

π a3 2

2 a

Equation 19a shows that the 3-dimensional miss distance is distributed as a Maxwell random

variable, with the parameter equal to the common standard deviation of the underlying normal

distribution.

For reference, the other two marginal distributions can be easily calculated, noting that:

∞

−r2 π

∫0 ⋅ 2 dr = a 3

2

r exp

2a 2

π/2

π a3 1

f (θ ) =

2 ( 2π ) a3 2π ∫ cos φ dφ = 2 π

π

− /2

Equation (20b)

π

π a3 cos φ

f (φ) =

2 ( 2π ) a3 2π

cos φ ∫πdθ =

−

2

Equation (20c)

Thus, the distribution of the azimuth (θ) angle is uniform of the interval of 2π , while the

elevation angle (φ) has a cosine distribution. It is of interest to note that the product of the

three marginal equations results in the joint distribution of Equation 19. This result shows

these variables are independent of each other.

If a set of miss distance data is available, the standard deviation of the underlying normal

random variables can be estimated using one of the methods discussed previously.

In some ground tracking systems, prediction and smoothing of the target track is implemented

via an alpha-beta (α − β ) filter. Reference 6 contains an excellent description of these

trackers, and their implementation. Trackers or this type generally filter target position in

Cartesian coordinates, although measurements are made in a different “sensor” coordinate

system. The errors associated with the sensor coordinates (usually, range, azimuth and

elevation) are generally independent, but become correlated when they are transformed to the

Cartesian system.

The covariance matrix associated with the target track is used for a variety of purposes,

including uplinks to a missile seeker for search and acquisition by the onboard system. A

potential problem can occur because the alpha-beta (α− β ) filter does not directly produce a

covariance matrix estimate, although one can be derived from the particular structure of the

21

DRAFT

filter. In applications, the covariance matrix is supplied external to the α− β filter and should

be “matched” to the actual sensor error measurement characteristics.

One consistent way of accomplishing this is to develop the respective variance terms from the

implemented filter, in the sensor coordinate system, and transform to the Cartesian system.

This requires that statistics of the error components (assumed normal) be known ahead of

time. It also assumes that the “lag error”: of the filter is negligible compared to the noise error.

Another approach is to develop the covariance matrix from some other algorithm, such as a

related sensor or filter and make necessary adjustments to be compatible with the alpha-beta

filter.

In either case, consistency of the implemented filter with the supplied covariance matrix

should be assessed. One way of doing this is by comparing the theoretical capture ratio of the

measurements with the actual (or simulated) data error values. Capture ratio is defined as the

proportion of “weighted error magnitudes” that lie inside the volume contained within the

“2.5-sigma” error (position) ellipsoid. For equal component variance, the error ellipsoid is a

sphere.

distributed with parameter a = 1 . This fact can be used to check consistency of the

measurement error and covariance matrix. The theoretical value is found from Equation 2:

2.5 2 − 2 .5 2

Capture Ratio = P[Y ≤ 2.5] = erf −

2 . 5 ⋅ exp

= 0.899939

2 π 2

The “data” value of the capture ratio is computed by evaluating the Mahalanobis distance

according to Equation 18 at each update point while the filter is tracking. The ratio of

calculated points larger than 2.5, divided by the total data points provides the data capture

ratio. This should be close to the theoretical value if the filter, measurements, and covariance

data are consistent.

A second equivalent estimate of consistency can be done by calculating the MLE, and

associated MSE of the parameter “a” from the data values. If the computed value ( aˆ −1) is

more than twice the square root of the in the MSE, this is a strong indication that a mismatch

exists.

• The supplied covariance matrix is in fact based in a different model than the alpha-

beta filter.

• The actual noise error components have different variances than those assumed in

development of the covariance matrix.

• The actual noise error components are not from normal distributions, or are not zero

mean.

22

DRAFT

• Large target maneuvers during tracking have introduced large lag error in the filter

output which is not included in the theoretical development.

Miscellaneous Topics

This section covers some additional topics related to the Maxwell distribution and its

estimator properties.

Simulation analyses work may require generation of Maxwell distributed random variables.

This can be accomplished in either of two ways. The first is by direct use of the cumulative

density function. As noted previously, we can write this in the form:

F ( tr ) = qr = erf ( tr ) −

2

π

(

⋅ t r ⋅ exp − tr

2

) where tr =

yr

2a

The qr variables have a uniform distribution on the unit interval. Given a particular random

value on this interval, the corresponding value of t r can be found by a numerical procedure

using the above function.. For a particular analysis, the parameter value “a” is known, so

values of the random variables yr are obtained from the t r values.

Various root finding procedures are available as discussed in Reference [7]. When applicable,

the Newton-Raphson method is desirable since it has quadratic convergence. This method

does require the derivatives to be “well behaved” in that the derivative value does not vanish

anywhere over the range of function values. Unfortunately, the Maxwell CDF has a zero

derivative at zero and the derivative approaches zero as the variable becomes large. As such,

the Newton-Raphson method does not converge in general and some other method, not

involving derivatives, should be used. Bisection is one method, which has slow convergence.

Brent’s method (Reference [7]) has “super-linear” convergence and is considered one of the

best “non-derivative’ root finding algorithms.

The second way to generate Maxwell random variables is to use the results derived for the

miss distance distribution. In this method, three separate normal random deviates are

generated, each with a standard deviation corresponding to parameter “a”. These three

deviates are squared and summed. The square root of the sum then provides the desired

Maxwell random deviate.

YMaxwell = X 12 + X 22 + X 32 X i → N ( 0, a ) i = 1, 2, 3

because the iterative root finding is not required (although a normal random number algorithm

is needed). The results derived in Appendix F use this generation method and MATLAB

random variate generators. This implementation executes an order of magnitude faster than

does root finding implementation.

23

DRAFT

The minimum variance quantile estimate of parameter “a” was derived previously. A question

is: “if all of the quantile data points were used to estimate the parameter, by minimizing the

square of the error between estimated and predicted values, would this produce a better

estimate?” This proposed estimation procedure is a least squares problem.

The parameter is related to the quantile values and the data points through the CDF. As such,

the least squares problem can be expressed as:

n n yr 2 yr − yr2

error = ε = ∑ [ qr − F ( y : a ) ] = ∑ qr − erf − exp

r =1

r =1 2 a π a 2 a

The non-linear least squares problem is to minimize the square of this error by setting the

derivative with respect to the parameter equal to zero:

∂ε 2

=0

∂a

This derivative still involves the error function and requires numerical solution. A

computationally more direct way is use the fact that a single parameter is involved. The

function can be minimized directly using an optimization procedure (such as golden section,

Reference [7]).

A difficulty in determining the efficiency of this non-linear estimator is that the neither the

distribution nor the mean square error can be calculated analytically. This non-linear least

squares (NLS) estimator will require numerical evaluation for any particular desired case.

As an example, take a Maxwell distribution with a = 1.5 , and sample size n = 100 . The

approach is to take a large number of Monte Carlo samples of size n and compute the MLE

and the NLS estimator for each sample. The mean and standard deviation of these n estimates

is calculated and compared. The MLE estimators should have variance very close to the

CRLB. In this particular example, 5000 samples of size 100 were generated, and the NLS and

MLE estimates were made for each. The quantiles were estimated as:

r

qr =

n +1

Figure 3 shows the results for the first 500 samples. The results apply only to the case

considered, but they are indicative of the relative performance of the NLS and MLE

estimators. The figure shows that the two estimates are highly correlated as would be

expected ( rCORR = 0.8976 ) .

Calculated results for the means and standard deviations are:

s MLE = 0.0613 s NLS = 0.0688

24

DRAFT

1.7

1.65

1.55

1.5

1.45

1.4

1.35

1.3

1.3 1.35 1.4 1.45 1.5 1.55 1.6 1.65 1.7 1.75

MLE estimates

The efficiency of the NLS (for this example) is the ratio of the CRLB to the estimator

variances.

a 2 /( 6 n)

eff ( a NLS ) ≈ = 0.8022 ≈ rCORR

2

( 0.0682 ) 2

a 2 / 600

eff ( aMLE ) = = 0.991 ≈ 1.0

(0.0613 ) 2

This example indicates that using all of the quantiles in a sample does provide a higher

relative efficiency than the single quantile estimator. However, both the MLE and MOM

estimators have smaller MSE and are thus superior to this estimator.

The nonlinear least squares estimator is related to the ideas of correlation based goodness-of-

fit tests. Appendix F provides discussion of test of this type.

25

DRAFT

Appendix A

Development

Bayesian estimation is based on a different point of view than are the other point estimation

methods discussed. Whereas MOM, MLE and quantile estimates assume the parameter is

fixed, the Bayesian approach is to treat this quantity as a random variable. The method is has

an advantage of incorporating probabilistic information that is known prior to taking a

sample. The method is based on conditional probabilities and is an expression for Bayes’ rule

as applied to density functions. It is expressed as:

f ( y | a ) ⋅ h( a )

f (a | y ) = Equation (A1)

g (y )

In this equation, h (a ) represents statistical knowledge about variable “a” prior to obtaining

a data sample. It is a probability density and is called the “prior density.” The conditional

density f (y | a ) is the joint density of the sample (y), given a value of the parameter. Since

the data values are independent, this density is the product of the probabilities of each data

point, which is just the likelihood function of Equation 3.

The function g (y ) is the probability density of the data. This density can be calculated in

terms of the likelihood and the prior density using the Theorem of Total Probability.

Integration is over the support of the prior density (i.e., the range of values that parameter “a”

can take on):

g (y ) = ∫ f ( y | a ) ⋅ h( a ) da Equation (A2)

This density is essentially a normalization factor, so that Equation A1 satisfies the

requirements of a probability function.

and represents the statistical knowledge about variable “a” after the current data sample is

incorporated. This posterior density, if it can be evaluated, provides a complete description of

the estimator (for the given data) since the mean, variance and other parameters can in theory

be calculated.

Actual selection of the estimate â can be chosen in several ways. The Bayes’ estimator most

commonly used is found as the expectation of the posterior distribution:

g ( y)

26

DRAFT

The Bayesian procedure can be thought of as a “weighted average’ of the prior distribution

mean and the data sample mean. Most textbooks such as References 1-4 show simple

examples of this for normal and binomial distribution parameters. Generally as the data

sample size n increases, more weight is placed on the sample and less on the prior

distribution. This will be demonstrated in an example for the Maxwell distribution.

There are two issues involved with Bayesian estimation that complicate the procedure. The

first is selection of the prior distribution for the parameter, which is subjective in nature. This

distribution can be based on prior data samples if they are available. This distribution may be

assumed by the analyst, based simply on intuition. The second issue is the evaluation of the

data density (normalization factor) and the integrals resulting in Equation A2 and A3. For

single parameter distributions, the integrals can usually be evaluated numerically.

The likelihood function for the Maxwell distribution data sample is:

n

2

n/2

n 2 −1 n 1

L( y; a ) = f ( y | a ) = ∏ f Y ( yi ; a ) = ∏yi

exp 2 a 2 ∑y 2 ⋅

π 1

i a 3n

1 1

It is noted that all terms in the likelihood function that are not functions of the parameter can

be brought outside the integrals, and thus cancel. This includes terms involving only the y i

since these are fixed for a particular sample.

Evaluation of integrals involving this likelihood function presents particular problems because

of its structure. If the sample size n is even moderately large, the product of the negative

exponential and the term (α−3m ) becomes extremely small. In numerical evaluation

procedures the resulting integrals evaluate to a “0/0” situation (to computational accuracy on

most computers).

The likelihood function is the distribution of the data samples. At any fixed value of the

parameter, the likelihood is a function of the data only, and as such it is a statistic. The

question arises as whether a simpler statistic exists that contains all of the statistical

information of the sample. The answer is affirmative, if and only if, a sufficient statistic exists

for the data sample.

is sufficient summarizes all of the information contained in the sample about the value of the

parameters to be estimated. The direct way of determining if a sufficient statistic exists is via

the Factorization Theorem (Reference 8). This theorem says that if the likelihood function can

be separated into two factors; one involving only data and constants and the other involving

only the parameter and a statistic, then that statistic is sufficient.

2 n 2

n/2

−1 n 1

L( y; a ) = f (y | a ) = ∏ yi exp 2 ∑ y 2

⋅ 3n = d ( yi ) t ( a; Ts )

π 1

i

2a 1 a

27

DRAFT

The function d ( yi ) depends only on the data while function t ( a; Ts ) depends only on the

parameter and the (sufficient) statistic Ts where:

n

′

Ts = ∑ yi2 = n ⋅ M 2

1

In other words, the second moment of the data about the origin is a sufficient statistic for the

Maxwell distribution. It should be noted that any function of a sufficient statistic is also a

sufficient statistic. The fact that the second raw moment is sufficient should be no surprise

because this statistic is what determines the MLE. Sufficient statistics that arise from applying

the Factorization Theorem often produce estimators on which an MLE is based.

fT ( Ts | a ) ⋅ h (a )

f ( a | Ts ) = Equation (A4)

∫ f (Ts | a ) ⋅ h ( a ) da

The distribution function fT (Ts | a ) is the sampling distribution of the sufficient statistic Ts ,

for a fixed parameter value. To determine this distribution, recall expressions for the mean

and variance of the sample moments for the Maxwell distribution:

′ ′

E (Ts ) = E

n ⋅ M 2 = n ⋅ E

M 2 =3 a ⋅n

2

′ ′

var (Ts ) = var

n ⋅ M 2 = n ⋅ var

2

M 2 =6 a ⋅ n

4

Since the sample moment is the sum of n independent random variables, we can invoke the

Central Limit Theorem or CLT (Reference 9). This theorem states that the distribution of the

sum of a “large’ number of random variables approaches a normal distribution. In order to

proceed further, we will assume that the second moment is distributed normally, with the

preceding mean and variance. If the constants and terms involving only the data values are set

to a proportionality factor (“k”), The Bayes estimate is the expectation of Equation A4, over

the support of “a”. The sample data, as contained in Ts , is fixed for the Bayes’ estimate.

1

k ⋅ ∫ a 2

1 T − 3 a 2 n 2

exp − s

(

h (a ) da

)

2 6n a 2

a

aˆ BAYS = Equation (A5)

1

1 T − 3 a 2 n 2

k ⋅ ∫ 2 exp − s

(

h ( a ) da

)

a 2

6n a

2

Equation A5 is as far as the theoretical development can go, other than canceling out common

terms in the exponent. The next step is to arrive at some prior distribution and then perform

the integration as shown. This will be done via a hypothetical example.

28

DRAFT

Suppose we have an electronic device that has inputs from three sources. Each signal is first

passed through a squaring circuit, and these are fed into a summing circuit. The circuits are

fast enough that dynamics of the system do not need to be considered. The output signal is the

sum of the three squared signals. The mean value of each input is known. The problem is to

estimate RMS power of input noise signals for a given device.

From previous experience with these devices, the RMS noise is known to vary from 3V to

5V, with the data being approximately bell shaped around 4V. The beta distribution is a quite

flexible statistical model when the data is known to have a finite support (i.e., the values are

within a finite interval of the real axis). The symmetrical beta distribution, for random

variable a, has the following density function over the finite interval [b, c ] :

Γ( 2 γ ) ( a − b)

h( z; γ ) = z γ −1 ⋅ (1 − z )

γ −1

where z=

Γ( γ ) ⋅ Γ( γ ) c −b

If the parameter γ is unity, the beta distribution becomes the uniform distribution. A graph of

the symmetrical beta function for [3, 5] is shown in Figure A1 for various values of γ .

1.4

γ= 2

1.2 γ= 3

γ= 4

(Symmetrical) Beta Distribution priors

γ= 5

1

0.8

0.6

0.4

0.2

0

3 3.2 3.4 3.6 3.8 4 4.2 4.4 4.6 4.8 5

parameter "a"

For this example, assume the beta parameter value has been established at 3. Once a sample is

taken and the second moment of the sample is calculated, the numerical integration of

Equation A5 can be carried out. Note again that constant factors in the prior distribution

cancel out.

We might be interested in how the Bayes estimate varies with different values of the sample

second moment, and with different sample sizes n. We can do the integration of Equation A5

29

DRAFT

for a range of values of M 2′ and n, and display the results graphically. In order to see how

the Bayes’ estimate compares with the MLE estimate, write the sufficient statistic, as:

′

Ts = n ⋅ M 2 = 3 n ⋅ aˆ MLE

2

Figure A2 shows the relationship between the MLE estimate and the Bayesian estimate,

assuming a symmetrical beta prior, with parameter equal to three. Note that n occurs

elsewhere in the integrals of Equation A5, independent of â MLE , and therefore is not a simple

scale factor. Several points are noted regarding these results.

4.8 n = 10

n = 25

4.6 n = 100

Bayesian parameter estimate "a "

BAY

4.4

4.2

3.8

3.6

3.4

prior

distribution

3.2 support

3

1 2 3 4 5 6 7

MLE parameter estimate "a "

MLE

30

DRAFT

First, if the data indicate the parameter is not included in the support of the assumed prior, the

resulting Bayes estimate should not be used. In the case shown, if the data indicate that the

parameter is either greater than 5 or less than 3, then either this is a different device that the

historical data is based on, or the selection of the prior distribution was not adequate.

Second, the MLE and Bayes estimates are equal only at the mean of the prior. If the prior

density were not symmetrical, then in general the two estimates would never be the same.

Also, the variation of the Bayes estimate is approximately linear with the MLE estimate, over

the prior distribution support.

The third observation is that as sample size increases, the slope of the linear relationship

between the MLE and Bayes becomes greater. As mentioned earlier, the Bayes estimate is a

weighted mean of the prior and sample data. The prior tends to “pull” the estimate toward the

mean value. The data tends to “pull” the estimate toward the MLE, which is the best estimate

in the absence of knowledge other than the data itself.

For sample sizes less than about 10, application of the CLT may be questionable. In this case,

use of the likelihood function in equation A3 generally allows for sufficiently accurate

numerical computation of the required integrals. Substitution of the likelihood function rather

than the sample distribution of the statistic do not affect choice of the prior distribution.

Another “Bayesian” estimate that may be chosen in some cases is the “Maximum a Posterior”

(MAP) estimate, This corresponds to the maximum value (mode) of the posterior density.

The MAP has some appeal since function g (y ) is a fixed constant for the data sample. In

this case, all that is required is maximizing the numerator of Equation A1, and no integration

is involved. In fact, if the prior density is chosen as uniform over an interval (c,d) the MAP is

just the MLE solution. As with the Bayes estimate, the MAP estimate must be within the

support of the prior.

31

DRAFT

Appendix B

The left side of Figure B1 shows the projection of a (hypothetical) sample of random data

values which represent three dimensional miss distances from a target. The third dimension is

perpendicular to the page. The miss distance magnitude is shown on the right hand side.

These graphs indicate a bias in miss distance. Assuming the lower bound as zero would not

provide the best estimation of this density function. The miss distance bias can be accounted

for by considering a two-parameter Maxwell distribution as the applicable statistical model.

E m p i r ic a l C D F - 2 0 0 s a m p l e s

8 1

0 .9

6

0 .8

4

0 .7

Axis 2 Miss Distance

2

0 .6

P ( d<D )

0 0 .5

0 .4

-2

Target

0 .3

-4

0 .2

-6

0 .1

-8 0

-8 -6 -4 -2 0 2 4 6 8 2 3 4 5 6 7 8 9 10

Axis 1 Miss Distance

3 D M i s s D i s ta n c e d

The general two parameter density function is now a “location-scale” type distribution:

f ( y; a , b ) =

2 ( y − b ) 2 ⋅exp − ( y − b ) 2

π a3 2 a2 Equation (B1)

a >0 ; −∞ < b < ∞

The corresponding CDF is again found by direct integration on the interval [b, ∞] :

y −b 2 ( y − b) − ( y − b) 2

F ( y; a, b) = erf − exp Equation (B2)

2a π a 2a

These function forms allow for a negative location parameter; however, based on physical

considerations, it is expected that b ≥ 0 for almost all real world problems.

If parameter b is known, then a simple change of variable shows that all of the previous

results apply for estimators. If b is considered as an unknown parameter, a quite different

situation occurs. In this latter case, the support of the random variable y depends on the

32

DRAFT

parameter itself. This situation violates one of the “regularity conditions” for determining the

CRLB; and in fact a lower bound for the estimators of “a” and “b” cannot be found.

In Appendix A, the concept of a “sufficient statistic” was introduced. The existence of

sufficient statistics is generally the starting point for determining MLE estimators. In general

estimation of two parameters requires two jointly sufficient statistics. Recall that the

Factorization Theorem provides a convenient way to derive sufficient statistics. The

likelihood function for the two parameter Maxwell distribution is:

2

L ( y; a , b ) =

n/2 n

( yi − b ) 2 1 n

2

π

∏ a 3

exp − 2 ∑( y i − b )

1 2a 1

Because of the product terms, this function cannot be factored into one term that includes only

statistics and parameters, and a second factor including only data values. What this implies is

that entire data set is required to define the parameters. This agrees with intuition since we

know that b ≤Y(1) where Y(1) is the smallest order statistic. This smallest value is not known

unless all of the data values are known. Hence, the data cannot be summarized into a function

of the data that contains the same information about the parameters.

A straightforward procedure for estimation of the parameters for this distribution is the

method of moments. This allows for relatively simple computations and also allows for

computation of the standard error of estimate for the parameters.

In order to calculate moments about the origin for Equation B1, simple substitution u = y −b

is made. The first and second moments are:

∞

′ ′ 8

m1 = ∫ ( u + b ) ⋅ f ( u; a ) du = µ1 + b = a +b

0

π

∞

′ ′ ′ 8

m2 = ∫ ( u + b ) ⋅ f ( u; a, b ) du = µ2 + 2 b µ1 + b 2 = 3 a 2 + 2 b

2

a + b2

0

π

The following notation will be used in discussion of MOM estimators for the two parameter

Maxwell distribution:

′

mr ≡ r - th population moment

µr′ ≡ r - th moment about b (moment of variable u )

′ 1 n

M r = ∑ yir ≡ r - th sample (data) moment

n 1

Note that the moments µr ′ are the same as these moments for the one parameter distribution

(i.e., when b = 0).

33

DRAFT

3π − 8

Variance of the random variable y is unaffected by location change: σ22 = a2 .

π

The MOM estimates for “a” and “b” are determined by equating first and second sample

moments, M 1′ and M 2′ to the theoretical expressions and solving the simultaneous

equations.

′ ′

b = M 1 − µ1

2 2

′ ′ ′ ′ ′ ′ ′ ′ ′

M 2 = µ2 +2 µ1 M 1 −µ1 +

M −2 M 1 ⋅ µ1 +

µ1

1

2

′ ′

aˆ = c1 ⋅ M 2 −

M 1 Equation (B3a)

2

′ ′ ′

bˆ = M 1 −c2 M 2 −

M 1 Equation (B3b)

1/ 2 1/ 2

π 8

The constants in these equations are: c1 =

3π −8

and c2 = .

3π −8

Both of the estimated parameters depend on M 1′ and M 2′ . The mean square error for each

of these MOM estimates can be derived using the formulas developed in Appendix C for the

mean square error of a function of moment statistics. The relevant equations for two

parameter Maxwell distributions are:

2 2

∂ aˆ

var ( aˆ ) = ( )

( )

(

var M ′ + ∂ aˆ var M ′ + 2 ∂ aˆ ∂ aˆ cov M ′, M ′

∂M ′ 1 ∂M ′ 2 ∂M ′ ∂M ′ 1 2 ) Equation

1 2 1 2

(B4a)

2 2

∂ bˆ

var ( bˆ ) =

∂M ′

( ) 1

ˆ

′

∂M 2 ( ) ˆ ˆ

∂M ∂M ′

′ 1 2(

var M ′ + ∂ b var M ′ + 2 ∂ b ∂ b cov M ′ , M ′ ) Equation

1 2 1 2

(B4b)

The partial derivatives in these equations are evaluated at the mean value of for the respective

parameters. In order to evaluate variance of the moment estimators, the third and fourth

moments of the density function are required.

′ ′ ′ ′

m3 = µ3 + 3 µ2 ⋅ b + 3 µ1 ⋅ b 2 + b 3 Equation (B5a)

′ ′ ′ ′ ′

m4 = µ4 + 4 µ3 ⋅ b + 6 µ2 ⋅ b 2 + 4 µ1 ⋅ b 3 + b 4 Equation (B5b)

The variance and covariance of the moment estimators are also derived in Appendix C. These

quantities are required in terms of the population moments.

34

DRAFT

′ 1 ′ ′

2

var

M1 = m2 −

m1

n

′ 1 ′ ′

2

var

M 2 = m4 −

m2

n

′ ′ 1 ′ ′ ′

cov M 1 , M 2 = m3 − m1 ⋅ m2

n

Plugging the respective population moments into the above expressions and then substituting

the parameters “a” and “b” into the expressions for the moments µr ′ , provides (after some

algebra) the following:

′ 1 a2

var M 1 = Equation (B6a)

n c12

′ a 2 2 4 b2 c

var M 2 = 6a + 2 + 4ab 2 Equation (B6b)

n c1 c1

′ ′ a 2 c2 2b

cov M 1 , M 2 = a+ 2 Equation (B6c)

n c1 c1

The partial derivative terms are evaluated by differentiation of Equations B3a and B3b. The

last expression results from evaluating the derivatives at the moment mean values.

∂ aˆ ′ c2

= − c1 ⋅ M 1 − c12

=

c a +b

∂M ′ ′ ′

2 a 1 Equation (B7a)

1 M 2 −

M1

∂ aˆ c 1 c12

= 1 =

∂M ′ 2 ′ ′ 2 2a Equation (B7b)

2 M 2 −

M1

∂bˆ ′

=1 + c2 ⋅ M 1 c1 ⋅ c2 ( ( c2 / c1 ) a + b )

=1 +

∂M ′ ′ ′

2 a Equation (B7c)

1 M 2 −

M1

∂bˆ c −1 − c2 ⋅ c1

= 2 =

∂M ′ 2 ′ ′

2 2a Equation (B7d)

2 M 2 −

M 1

The remaining two terms required for evaluation of Equations B4a and B4b are the products

of the respective partial derivatives:

35

DRAFT

∂ aˆ ∂ aˆ ′

= − c12 M 1 − c14 ( ( c2 / c1 ) a + b )

=

∂M ′ ∂M ′ ′ ′

2

2 a2 Equation (B7e)

1 2 2 M 2 − M1

∂ bˆ ∂ bˆ ∂ bˆ ′

= − c22 M 1 − c1 ⋅ c2 c12 ⋅ c22 ( ( c2 / c1 ) a + b )

= −

∂M ′ ∂M ′ ∂M ′ ′ ′

2

2a 2 a2 Equation (B7f)

1 2 2 2 M 2 −

1

M

remarkably simple result for each of the two standard errors.

var ( aˆ ) =

a2

n

[

1.5 c14 + c22 − ( c1 ⋅ c2 )

2

]

() [ ( ) {( ) }]

2

a

var bˆ = 1.5 ( c1 ⋅ c2 ) + 1 + c22 ⋅ 1 + c22 / c12 − c22

2

n

Inserting the constant values results in:

a2

var ( aˆ ) = 0.5270 Equation (B8a)

n

()

var bˆ = 1.2737

a2

n

Equation (B8b)

Neither of the standard errors depends on the parameter “b”. Figure B2 shows plots for each

of the mean square errors, where it is noted sample size is just a scale factor (assumed 1.0 in

the figure). The result agrees with our intuition that “location” of the distribution should not

affect the standard errors, but the scale parameter does have a significant affect.

14 3 5

12 3 0

10 2 5

- (n = 1)

- (n = 1 )

8 2 0

hat

hat

V a ria n ce of a

6 1 5

V a ria n c e o f b

4 1 0

2 5

0 0

1 1 .5 2 2 .5 3 3 .5 4 4 .5 5 1 1 .5 2 2 .5 3 3 .5 4 4 .5 5

P a r a m e te r a P a ra m e te r a

In practical inference applications, we do not know the actual values of the parameters as

these are what we are trying to estimate. Usual statistical practice is to first compute the

36

DRAFT

estimates â and b̂ , and use these estimates in the standard error formulas. One should note

that this introduces some additional errors into these error estimates.

A second issue that can arise in applying Equations B3a and B3b to actual data is that, in

some cases, the estimate bˆ >Y(1) , where Y(1) is the smallest sample value. If this occurs, the

following procedure is recommended:

1. Calculate the standard error of b̂ .

2. Determine if the inequality Y(1) −bˆ <2 ⋅ ()

var bˆ is true.

3. If this inequality is true, assume the minimum value is Y(1) as the “two-sigma” error

of estimate includes this value.

4. If the inequality is false, a likely reason is that the sample is not from a Maxwell

distribution.

5. Check the condition: Y( 2 ) −bˆ < 2 ⋅ ()

var bˆ

6. If the inequality is false for the second smallest order statistic, one can assume the

sample is not from a Maxwell distribution.

7. If only the smallest order statistic violates the “two-sigma” error, but does so by a

large amount, the analyst may choose to treat this value as an outlier.

8. In any case, it is suggested that the minimum of b̂ and Y(1) be used for in any

calculations involving the accepted distribution.

37

DRAFT

As noted previously, the two parameter Maxwell distribution does not have a sufficient

statistic and the CRLB cannot be calculated. The ideas behind MLE estimation are still

applicable. Recall that the likelihood is a function of the parameters, with data values fixed.

MLE estimates are the values of the parameters that maximize this function. The two

parameter Maxwell log likelihood is:

n 2 n n

( y − b)

ln L( y; a, b ) = ln + 2 ∑ ln ( yi − b ) − 3 ⋅ n ln a − ∑ i 2

2

2 π 1 1 2a

∂ln L 3n 1 n

+ 3 ∑ ( yi − b ) = 0

2

=− Equation (B9a)

∂a a a 1

∂ ln L n

1 1 n

= − 2∑ + 2 ∑ ( yi − b ) = 0 Equation (B9b)

∂b 1 ( yi − b ) a 1

2 n n

1 n

(

∑ i y − b ) 2

⋅ ∑1 ( y − b) ∑1 ( yi − b)

= Equation (B9c)

3n 1 i

Equation (B9c) provides the estimate b̂MLE and equation (B9a) then provides âMLE . Note

that the value of b̂MLE is again constrained such that bˆMLE ≤ y(1) where y(1) is the smallest

sample value. The same rules suggested for MOM estimators when bˆMLE >Y(1) should

generally be applicable.

Monte Carlo generated experiments have been used to test both the MOM and the MLE

estimators. The basic test set consisted of 10,000 replications of samples of various sizes.

Some conclusions drawn from these experiments are:

1. Both the MOM and MLE estimators can produce estimates of b̂ that are greater than

the minimum sample values YMIN .

2. The MOM estimate of b̂ is virtually always within twice the standard deviation of the

variance provided by Equation B8b (1 exception in 10,000).

38

DRAFT

3. The MOM estimates of â shows very little sensitivity to not replacing b̂MOM with

YMIN . It is recommended that the constraint be enforced so that no negative values

occur for the random variable (Y −bˆ ) . The MSE of these estimates are essentially

equal to the values given by Equations B8.

4. MLE estimates show large sensitivity to not replacing b̂MLE with YMIN . If this

replacement is not made, both a larger variance and a bias is introduced into the

estimates. For MLE estimates it is imperative that the constraint be enforced.

5. Variance of the MLE parameter estimates is less than that of the corresponding MOM

estimates, if bias is not considered. The MLE estimates do appear to be biased

however even when the minimum constraint is enforced. The experiments show

b̂MLE is positively biased (i.e., larger that the population value) while âMLE is

negatively biased.

6. The reason for the large sensitivity of the MLE estimates, when bˆMLE > YMIN , is that

the assumption of positive values in the derivation of Equation B9c is violated. The

resulting solution does not maximize the likelihood in this case. Enforcing the

constraint implies that the log-likelihood of Equation B9a is valid, but the function is

not necessarily maximized.

39

DRAFT

Appendix C

Calculation of standard error formulae for MLE and MOM estimators involves requires

knowing the variance (and covariance for the two parameter distribution) of the sample

moments. The r-th sample moments are calculated as:

′ 1 n

M r = ∑ xir

n 1

These moments are unbiased as seen by taking the expectation of this statistic:

1 n 1 n

′

n 1

1 ′

( )

E M r = E ∑ xir = ∑ E xir = n µ r = µ r

′

n 1 n

Before calculating the variance and covariance, two mathematical facts are reviewed. The

first is that the expectation of the product of functions of two independent variables is the

product of the expectations. Assume x and y are independent variables, and g(x) and h(y) are

functions of these variables, respectively:

E [ g ( x ) ⋅ h( y ) ] = ∫ g ( x) ⋅ h( y ) ⋅ f ( x, y ) dxdy

x, y

x y

The second useful fact is algebraic. Consider the product of two expressions that are

composed of a sum of n values each:

( a1 + a2 + + an ) ⋅ ( b1 + b2 + + bb )

n

Upon expansion, this expression consists of n terms of the form: ∑a ⋅ b

1

i i . There are

′ ′ ′

2

′

2 2

′ ′

2 2

var

M r = E

M r − µr

= E

M r

−2 E

µr ′ ⋅ M r ′

+

′

µr = E

M r

− µr

40

DRAFT

1 n r 2 1

′ 2

[

E M r = E ∑ xi = 2 E ( x1r + x2r + + xnr ) ⋅ ( x1r + x2r + + xnr ) ]

n 1 n

1 n 1

n n

[ ]

= 2 E ∑ xi2 r + ∑∑ xir x rj = 2 n ⋅ E xi2 r + ⋅∑∑ E ( xir ⋅ x rj )

n n

n 1 i =1 j =1 i ≠ j n i =1 j =1 i≠j

Because each sample value is independent, the following holds:

( ) ( ) ( )

2

′

E xir ⋅ x rj = E xir ⋅ E x rj =

µr

i ≠j

′ 1 ′ µ ′ − n 2 ⋅ µ ′ = 1 µ ′ − µ ′

2 2 2

var M r = 2 n ⋅ µ + n ⋅ ( n − 1) Equation (C1)

n

2 r

r

r

n 2 r

r

Derivation of the expression for the covariance is similar. The covariance for any two sample

moments is:

′ ′ ′ ′ ′ ′ ′ ′ ′ ′

cov

M r , M s =E

M r − µs

⋅ M r − µs

= E

M r , M s − µr ⋅ µs

Expanding the expectation, and aging using independence of the sample values:

1 n

′ ′

n

1

[( )( )]

E M r , M s = 2 E x1r + x2r + + xnr ⋅ x1s + x2s + + xns = 2 E ∑ xir +s + ∑∑ xir x sj =

n 1

n n

i=1 j =1 i ≠ j

1 1

[ ]

n n

′ ′ ′

2

n ⋅ E x r +s

+ ∑∑ E xir ⋅ x sj = 2 n ⋅ µ r +s + n ⋅ ( n − 1) ⋅ µr ⋅ µ s

n

i

i =1 j =1 i ≠ j n

Combining the above expressions results in the formula for the covariance:

′ ′ 1 ′ ′ ′

cov M r , M s = µr +s − µr ⋅ µs Equation (C2)

n

In actual calculation, the population parameters (and thus moments) are generally not known

as these are the quantities being estimated. The sample moments themselves are substituted

into Equation C1 and Equation C2 in these cases. As such, the variance and covariance are

only approximations.

41

DRAFT

The delta method is a general procedure to find estimates of the mean and variance of a

function of random variables, when the mean and variance of the random variables is known.

The method rests on Taylor’s Theorem for representing functions that are sufficiently

differentiable. We will consider a function of two variables only, but the method can be

extended to a function of any number of random variables.

Taylor’s Theorem (Reference 2) states that any function can be expanded around a fixed

value: For a function of two variables: z = f ( x, y ) , we can expand around the point

( x0 , y0 ) , retaining only linear terms:

∂f ∂f

f ( x, y ) = f ( x0 , y0 ) + x =x 0 ( x −x0 ) + y =y 0 ( y −y0 ) +Rn

∂x ∂y

The term Rn is the remainder and the key point of Taylor’s theorem is that it approaches zero

as n approaches infinity. In many applications only the linear terms are retained, and this will

be the procedure here.

In our statistical applications, we will chose the mean of the random variables and expand

around that point:

x0 = E ( x ) = µX y0 = E ( y ) = µY

So if we take the expectation of this expansion, noting the derivative values are constants:

∂f ∂f

E ( z ) = E [ f ( µX , µY )] + E ( x −µX ) +

∂x µX ∂y µY E ( y −µY )

E ( z ) = f ( µ X , µY ) (Equation C3)

Hence the expected value of a function of random variables is just the function evaluated at

the expected values of the variables. Figure C1 provides a illustration of the assumption

involved, for a function of one variable.

f(y )

pdf(y )

y

y0 = µY

42

DRAFT

The approximation will be very good if the probability density of the underlying variable is

concentrated enough such that most of the “probability” lies within an interval where the

linear approximation does not deviate significantly from the actual function.

2

E [( z − E ( z ) ) ]

2 ∂f

= E

∂f

µX ⋅ ( x − µX ) +

µY

⋅ ( y − µY )

∂x ∂y

2 2

=

∂f

∂

[( ∂f

µX ⋅ E x − µX 2 +

) ]

∂y µY

⋅ E y − µY

2

[(

∂f

+2 ⋅

∂

∂f

µX ) ]

∂y µY

⋅ E x − µX ⋅ y − µY [( )( )]

x x

Note that the functions x and y are random variables and in general are correlated. This last

expression becomes:

2 2

∂f ∂f

[ ] ∂f

E ( z −E ( z ) ) 2 =

∂x µX

var ( X ) +

∂f

∂y µY var (Y ) +2 ⋅

∂x µX ∂y µY cov ( X , Y )

This basic formula is used often in deriving the MSE of various types of statistics (estimators)

that are functions of other statistics with known variances. Examples are estimators that are

functions of the sample moments. The variance and covariance of sample moments is

calculated, or approximated, as discussed above.

43

DRAFT

Appendix D

Start by assuming a random sample of size n, from some distribution specified by f (•) . The

sample values are arranged in ascending order:

u1 ≤ u 2 ≤ u3 ≤ un

Each observation is now the r-th order statistic, U ( r ) where r refers to the index of the

observed value. U (1) is the smallest, and U ( n ) is the largest value. The process of ordering

the observations has introduced dependence between the order statistics.

Reference [5] provides a discussion of order statistics which includes a derivation of their

distribution and other properties of these statistics.

If the underlying density function is f (u;θ) and the corresponding CDF is F (u;θ) , the

density function, g (u( r ) ;θ) , for the r-th order statistic is:

g (u( r ) ;θ) =

n!

[ F (u;θ)]r −1 ⋅ f (u;θ) ⋅[1 − F (u;θ)]n−r Equation (D1)

( r −1)! ( n −r )!

The parameter θ can be either a scalar or vector quantity. Subsequent discussion assumes the

support of the underlying distribution is ( 0 ≤ u ≤ ∞) . The area to the left of any order statistic

U r is G (U r ) . The expected value of this area (or, equivalently, the probability) is:

∞

[ ]

E G (U ( r ) ) = ∫ G (ur ) g (ur ) du r

0

function transforms the integral expression:

1

[

E G (U ( r ) ) = ∫ ] n!

(r −1)! (n − r )!

z r (1 − z ) dz

n −r

This is a “beta-like” integral, and carrying out the integration results in:

[

E F (U ( r ) ) = ] n!

( r −1)! ( n −r )!

Γ( r +1) Γ(n −r +1)

Γ( n +1)

Invoking the gamma function property for integer arguments provides a simple expression for

the expected value of the probability of any order statistic (with a continuous underlying

distribution):

[

E G (U ( r ) ) = ] r

n +1

= qr Equation (D2)

44

DRAFT

We now seek an expression for the variance of the r-th order statistic. Variance of an order

statistic will only be an approximation, where the derivation utilizes the “Delta Method”

discussed in Appendix C.

Begin by considering a uniform distribution on the unit interval [0,1]. Equation D1 becomes:

g (u( r ) ;θ) =

n!

(u ) r −1 ⋅ (1 −u ) n−r

( r −1)! ( n −r )!

The expectation of this function is seen again to involve “beta like” integrals, and as expected,

direct integration provides the same answer as Equation D2 (which applies for any

distribution).

E (U ( r ) ) =

r

n +1

The variance of the uniformly distributed variable can also be found by direct integration. We

obtain:

var (U ( r ) ) =

1 r r

⋅ ⋅ 1 − Equation (D3)

N + 2 n +1 n +1

For any continuous density function, f ( y; a ) , and CDF F ( y; a ) , define the random variable

u:

u = F ( y; a ) )

This random variable is uniform on the interval [0,1]. The order statistic Y( r ) therefore has

the distribution:

Y( r ) = F −1 (u( r ) ; a )

Recall that if variable y is a function of variable x, i.e., y = F (x), and the mean and variance

of variable x are known, the delta method provides a way to approximate the mean and

variance of variable y.

2

E (Y ) ≈ g ( µ X ) d g

var (Y ) ≈ var ( X )

x =µX

dx

d F −1 dy

In the current case, x = u and g = F −1 . The required derivative is: =

du du

It useful to note that because y is a monotonic function of variable u:

du dy 1

= f ( y; a ) so that =

dy dy f ( y; a )

Since we know both the mean and variance of U ( r ) , the variance of the order statistic Y( r )

follows from the above relations:

E (Y( r ) ) = F −1

r

= y( r )

N +1

45

DRAFT

var (Y( r ) ) =

1 r r 1

⋅ ⋅ 1 − ⋅

n + 2 n + 1 n + 1 f ( y( r ) ; a ) 2

The last equation for the variance of the order statistic implies that for moderately large n,

1 q r ⋅ ( 1 − qr )

var (Y( r ) ) = ⋅ Equation (D4)

n f ( y ( r ) ; a) 2

As n becomes large, the order statistic Y( r ) , has an asymptotic normal distribution with

variance and mean given by the preceding equations. Reference [5] provides a detailed

derivation of this Theorem.

An application of Equation D4 is to find the relative efficiency of the sample mean and the

sample median for the one parameter Maxwell distribution.

8 var ( y ) 1 3π − 8 2 a2

y= a var ( y ) = = a = .4535

π n n π n

The median value of the Maxwell distribution can be evaluated from Figure 2 in the main

report, where qr = 0.5. At this point ymed =1.5380 a .

2

f ( ymed ; a ) = exp

=

π a3 2 a2 a

0.52 a2

var ( ymed ) = a 2

= .7473

n ( .5784 )

2

n

var ( y )

eff REL = = .6069

var ( ymed )

This shows that the sample mean has about 61% of the variability of the sample median (For

comparison the relative efficiency for the normal distribution sample mean to sample median

is about 0.64.) The sample mean is preferable for estimating the parameter of the Maxwell

distribution if one of these simple statistics were used, rather than MLE, MOM or quantile

estimators.

46

DRAFT

Appendix E

It may be of interest in some applications to determine the distribution of the sum of two or

more independent identically distributed (IID) random variables drawn from a population that

follows the Maxwell distribution. Moment generating functions or characteristic functions are

often employed when closed form expressions for these functions exist and when their

products inverted. This approach does not provide expressions that can be inverted for the

Maxwell distribution. A direct integration approach will be used.

The joint distribution of two IID random variables, X and Y, is provided by the product of the

density functions. Define the sum of these as random variable Z. Then:

Z = X +Y or X = Z −Y

The distribution of Z can be found by substituting into the joint density, and integrating out

variable Y. Because both X and Y are confined to non-negative values, integration is over the

interval [0, z], which limits X to non-negative values.

z

f ( z ) = ∫ f ( y ) ⋅ f ( z − y ) dy

0

2

z

− y2 − ( z − y) 2

f ( z) = ∫0 ( ) dy

2

y 2

z − y ⋅ exp

π a6 2 a 2

Evaluation is carried out by expanding the terms in the exponent and the polynomial, and

integrating term by term, and finally collecting terms. The result is:

1 − z2 z − z 2

f ( z) = π exp erf ⋅ 12 a 4 − 4 a 2 z 2 + z 4 + 2 z a z 2 − 6a 2 ⋅ exp

5

8π a 2 2a 2a 2 Equati

4a

on (E1)

As before, erf (•) denotes the error function. The density function of equation E1 cannot be

integrated in closed form to obtain the CDF, or moments of this distribution. This density can

be integrated using numerical techniques to find the desired quantities, for a given value of

the parameter “a”.

The mean and variance for a sum of Maxwell variables follows from the basic theorem for the

addition of independent variables. This theorem can be stated mathematically for the sum of

N variables, Yi :

47

DRAFT

8

E ( Z N ) = ∑E (Yi ) = N ⋅ E (Yi ) = N ⋅ a Equation (E2)

N π

3π − 8 2

var ( Z N ) = ∑ var ( Yi ) = N ⋅ ⋅a Equation (E3)

N π

Performing numerical integration for various values of the parameter confirms this result for

N = 2. Also numerical integration shows that this distribution has constant skewness:

E( z − µZ )

3

Skew( z ) = = 0.3434

σ z3 / 2

Recall that skewness of the Maxwell distribution is 0.4857, so the sum of Maxwell variables

produces a more symmetrical distribution. The Central Limit Theorem implies that as N

becomes “large,” the distribution of the sum should approach the normal distribution with

mean and variance as indicated in equations E2 and E3, respectively. The skewness

calculation confirms this for the sum of two variables.

The size required by the CLT for a “large number” of terms will depend on the underlying

distributions involved. This requirement can be investigated for the Maxwell distribution

using the sum of only two variables.

Figure E1 is a plot of equation E1, for the exact distribution of the sum of two Maxwell

variables. Also shown is the normal approximation for this distribution, using Equations E2

and E3, where N = 2. The parameter “a” is 1.0 in both cases. The underlying Maxwell

distribution for each variable is shown for comparison. The figure implies that the normal

distribution is a reasonable approximation of the sum, except for the “tails” of the distribution,

even for two variables.

48

DRAFT

0

.7

M

axwellD istrib u

tiona=1

0

.6 E

xac

tD is

tribu tio

n ,N =2

N

orma

lA p pro xim atio

n,N

=2

0

.5

Probability

0

.4

0

.3

0

.2

0

.1

0

0 1 2 3 4 5 6 7

R

an

domV

aria

ble

sZa

ndY

If sums of Maxwell random variables are required, it appears that the normal distribution is

adequate for N ≥ 3 in most cases, with mean and variance given in equations E2 and E3.

Evaluation of the exact distributions is possible only through numerical procedures. If N = 2 ,

equation E1 is recommended along with numerical evaluation of the CDF or other quantities

of interest.

49

DRAFT

Appendix F

Hypothesis Tests

Discussion of point estimation methods has been based on the assumption that a data sample

is in fact from a Maxwell distribution. Given a set of data, an analyst would probably want to

get some indication of the veracity of this assumption prior to drawing conclusions about the

statistical model. The procedures used to determine if a set of data is from a particular

distribution, or family of distributions, fall under the category of statistical hypothesis tests.

Hypothesis tests are essentially a tool for making decisions under uncertainty. These tests

consist of specifying a dichotomous decision specified by a null hypothesis, H 0 ,and an

alternative hypothesis, H 1 . Often, the null hypothesis is specified with the intent to reject it.

If the null hypothesis is rejected, this implies the alternative hypothesis is accepted. The

converse is not true: failure to reject the null hypothesis does not mean that it is true. Rather,

failure to reject the null hypothesis means that insufficient data is available. Reference [11]

covers the theory of hypothesis tests in more detail.

In the decision framework, four possible outcomes can occur, depending on which decision is

made, and what is the true “state of nature.” If the null hypothesis is true (i.e., actual state of

nature), and the analyst does not reject it, or if the alternative hypothesis is true and the

analyst rejects the null hypothesis, then the correct decision is made. However, if the state if

nature is such that the null hypothesis is true, but the analyst rejects it, a Type I error has

occurred. If the alternative hypothesis is true, but the analyst fails to reject the null hypothesis,

then a Type II error has occurred. Useful statistical tests generally provide the analyst with a

quantitative way to assess the Type I error probability which is referred to as α. Τ he Type

II error probability, which is referred to as β is generally much harder to asses for a given

statistical test. In general, if the size of a sample is fixed, the Type I and Type II errors have

an inverse relationship. Decreasing both α and β requires increasing sample size. Most

hypothesis tests specify the required probability α and do not explicitly specify β . This

approach assumes that making a Type I error is more serious that making a Type II error. The

Type I error probability is called the “significance level” of the test.

Obviously, hypothesis tests require that a relevant statistic be available. Since a statistic

depends on the sample, it is a random variable, and has an associated distribution. Different

hypothesis tests use different statistics, each with a distribution that can be derived exactly or

asymptotically (i.e., large sample approximations). The percentage points for a significance

test are the probability values that a particular statistic has a specified value when the null

hypothesis is in fact true.

A simple hypothesis is one where the distribution is completely specified, which includes not

only the type of distribution, but also any parameters of that distribution. An example of a

simple hypothesis is for a Maxwell type distribution with parameter a = 1 . A composite

hypothesis specifies the type of distribution (e.g., Maxwell) but allows for one of more free

parameters that are not specified. Most goodness-of-fit tests require a simple null hypothesis

50

DRAFT

behind any goodness-of-fit test that is being used to provide support for the analysis.

Before proceeding further, it is noted that a some controversy arises regarding hypothesis tests

as regards simulation experiments (Reference [14]). The essence of the argument involves the

“power” of a test. Power is defined as the probability of rejecting the null hypothesis. In

practical terms, the power is:

As the size of the sample increases, the Type II error decreases, and eventually, enough

samples will be taken that the null hypothesis will be rejected. In simulation experiments, this

means that unless we absolutely know the data comes from the null hypothesis distribution,

any distributional hypothesis will be rejected if enough Monte Carlo samples are obtained.

The analyst will need to decide if results based on goodness of fit tests are of value in

defending results based on the assumed distribution.

probabilities of the test statistic under all other alternatives from the null hypothesis. In other

words, the power of a test is a function of the alternative hypothesis, as well as the test

statistic itself. Statistical theory for hypothesis tests deals with deriving test statistics and

“most powerful’ tests for given situations. References [1], [2], [5] and [11] discuss this topic

in more detail.

In practical application, the power function generally is not known. The approach is to specify

the null hypothesis, and then determine the critical region, for a particular statistic, where the

null hypothesis is rejected under the condition that the null hypothesis is true. This approach

is used to derive statistics relevant to the specification of the null hypothesis as a Maxwell

distribution in the following. The critical region for specified statistics are derived through

Monte Carlo simulation.

Hypothesis tests for distributions, referred to as goodness-of-fit tests, essentially try to show

that the data sample does not come from the distribution of the null hypothesis. In this sense it

is never proved that the data comes from the test distribution. Either the hypothesis is rejected

which “proves” the data does not come from that distribution; or the hypothesis is not

rejected. To emphasize: failure to reject the null hypothesis does not mean the data is from the

specified distribution, rather it merely means that we can’t reject that hypothesis.

Tests for outliers in distributions are similar in approach. The null hypothesis is that data

comes from a given distribution. A test statistic must be chosen in some way, and the critical

region for that statistic is again calculated using Monte Carlo simulation.

51

DRAFT

Prior to performing any type of hypothesis test regarding the null distribution, one may want

to perform an Exploratory Data Analysis (EDA) or the sample data. EDA is a set of

techniques that are used to obtain a qualitative understanding of the data and as an aid in

formulating hypothesis tests regarding. EDA is a data driven procedure that is intended to “let

the data speak for itself” rather than consideration to specified models. The only technique

addressed in this paper is the use of probability plots, which are directly related to formulation

of hypothesis about probability distributions.

Probability plots consist of taking the sampled values, placing them in ascending order, and

calculating the empirical probability associated with each of these order statistics. The

empirical probability is sometimes referred to as plotting position. The suggested plotting

position formula for a data sample of size n is:

i

pi =

( n +1)

The empirical probability is plotted against the corresponding ordered sample value on one or

more specially constructed “probability papers.” These papers are most easily constructed

when the particular distribution can be put into a standardized “location-scale” form. Normal

probability paper is the most common example.

A plot if the sample data along with the empirical probability should produce an approximate

straight line, if the data is in fact from the assumed distribution. For the Maxwell distribution,

there is no generally available probability paper. As such, a MATLAB function was written to

produce Maxwell probability paper. Appendix G provides more discussion, as well as a listing

of the code for this development.

In EDA, the probability plot is often used to obtain a preliminary indication of whether the

data is in fact from the selected distribution. If the distribution is markedly different from a

straight line, then proceeding with a formal goodness-of-fit test is not required. Rather, a

different distribution should be considered.

In many cases, a generally good fit can be obtained, but a few of the data points deviate

considerably at either the lower or upper end of the distribution. These values are potential

outliers. The probability plot is quite helpful in indicating if outliers may be present in the

sample. When possible outliers are identified, then some appropriate statistical test for

identification is warranted. Figure F0 shows Maxwell probability plots where goodness-of-fit

is questionable (panel a), and a plot where there appear to be outliers (panel b).

The following sections discuss relevant statistics and hypothesis tests for goodness-of-fit

(GoF) tests. This is followed by a discussion of selecting a relevant statistics for testing

outliers in for the Maxwell distribution along with tables of percentage points. Unless noted, a

specific test applies to the one parameter Maxwell distribution.

52

DRAFT

M a x w e ll P r o b a b ilit y P lo t M a x w e ll P ro b a b ilit y P lo t

9 9 9 9

9 8 9 8

9 5 9 5

9 0 9 0

8 0 8 0

7 0 7 0

P ro ba bility %

P ro b a b ility %

5 0 5 0

3 0 3 0

2 0 2 0

1 0 1 0

4 4

1 1

0 0

-4 -2 0 2 4 6 8 1 0 0 1 2 3 4 5 6 7 8 9 1 0

D a ta V a lu e s

D a t a V a lu e s

A large number of goodness-of-fit tests have been devised over the last century. A good

survey is provided in Reference [12]. This paper discusses only two of these: methods based

on the Empirical Distribution Function (EDF) and those based on correlation.

One of the oldest tests for goodness-of-fit is Pearson’s Chi-squared test. These tests do allow

for a composite null hypothesis, where only the distribution type is required. The Chi-squared

test is accurate only when the sample size is large enough that the test statistic is in fact Chi-

square distributed. Small samples or too few cells (or “bins”) can invalidate this test. The

process of “binning” the observations leads to a loss of information, and these tests are

generally less powerful than those based on order statistics or regression. A discussion and

procedures for implementation of the Chi-Squared test are provided in Reference [10].

EDF statistical tests are divided into two classes: those using supremum statistics and those

using quadratic statistics. The statistic Dn is the supremum (or maximum value) of the

absolute difference between the empirical CDF of size n, Fn ( y ) , and the hypothesized CDF,

F ( y) :

Dn =sup Fn ( y ) − F ( y )

y

53

DRAFT

The empirical distribution function of the sample, Fn ( y ) consists of the probability values for

the sample order statistics, where the r-th order statistic:

r

Fn ( y( r ) ) =

n

Supremum statistics are very useful because they are distribution free. This implies that the

test can be applied to any continuous distribution, without having to derive particular values

of the statistic for every distribution. The non-parametric nature of the test rests on the fact

that random variables from arbitrary distribution are related to the distribution of uniform

random variables on the unit interval [0,1].

The most common application of a supremum test is the Kolmogorov-Smirnov (K-S)

procedure. A potential difficulty with the K-S test is that it requires a simple null hypothesis.

As discussed, this requires that not only the distribution form be specified but also the

parameters be specified. Estimating parameters from the data and then applying the K-S test is

not correct and can lead to serious errors, particularly for small samples. The reason is that

when the parameters are estimated from the data, the resulting order statistics may not longer

be uniformly distributed as assumed by the K-S test. References 10 and 12 contain additional

details on use of this test.

The distribution of the K-S statistic is derived in Reference [5]. Tables for percentage points

at selected levels of significance (α) as a function of sample size (n) are provided for this

statistic. The MATLAB statistics toolbox provides an implementation of this test, with any

choice of sample size and level of significance.

The K-S test is very good at assessing a shift between the hypothesized distribution and the

data. This is because the supremum statistic is a measure of distance between the two

distributions. On the other hand, scale differences between distributions often are most

evidenced in the tail of the distribution, and the K-S statistic is least sensitive there. Recall

that the variance of an order statistic is proportional to p ⋅ (1 − p ) , where p is the quantile

probability. This has a maximum at p = 0.5, and decreases to zero in the tail of the sample

distribution. As such, the power of the test is highest when differences occur between central

portions of the distributions. To alleviate this situation, the supremum statistics could be

derived using a weighed average. This concept is discussed in the next section as applied to

quadratic statistics.

Quadratic Tests

Quadratic statistics are formulated on the basis of a weighted average of the squares of the

deviations from the empirical distribution Fn ( y ) and the hypothesized distribution F ( y ) .

The general form is:

∞

T = n ∫[ Fn ( y ) − F ( y ) ]2 Ψ( y ) dF ( y )

−∞

approach also requires a simple null hypothesis specification in order to derive an analytic

54

DRAFT

expression for the distribution of T. If Ψ( y ) =1 , the resulting statistic is called the Cramer-

von Mises statistic.

The following weight function leads to the Anderson-Darling statistic, Reference [13]:

1

Ψ( y ) = Equation (F1)

F ( y ) ⋅ [1 − F ( y ) ]

This weight function tends to place a higher “value’ on the tails of the distribution, and as

such is a more powerful test for detecting changes in scale between two distributions. The

computational formula for this statistic, denoted as A2 , is:

2 1 n

n 1

[

A = −n − ∑( 2 i −1) ln ( Z ( i ) ) + ln (1 − Z ( n+1−i ) ) ] Equation (F2)

( )

Z ( i ) values from the EDF data, Z ( i ) = F Y( i ) ; aˆ , where the Y(i ) are the ordered sample data

values.

The idea behind Monte Carlo development for goodness-of-fit tests rests on being able to

generate random samples from the distribution that is being assumed (i.e., the null hypothesis

distribution). Samples drawn from the actual distribution are random variables. If a very large

number of samples are drawn, and the desired statistic is evaluated for each sample,

percentage points can be calculated based on these results. The assumption is that the number

of samples is large enough to adequately approximate the actual population.

This approach applies for either K-S or quadratic type tests, as well as for correlation and

regression goodness-of-fit statistics. Reference [10] discusses this issue and states that when

the distribution is of “location-scale” type, distribution of the test statistic does not depend on

the actual parameter values. Heuristically, this seems reasonable since the location-scale

random variable can be standardized. As noted in the main report, both the one and two

parameter Maxwell distributions are of this type.

Quadratic statistics have the property that they converge very quickly to their asymptotic

limits. For other than small sample sizes, the percentage points are therefore depend

asymptotically only on the level of significance chosen. These statistics have also been found

to have higher power than Chi-squared or K-S test as a result of the distribution tail having

more weight in the sample.

One point necessary to keep in mind when using asymptotic results is: the estimator used for

the unknown parameters must be asymptotically efficient. For the one parameter Maxwell

distribution, this implies using the maximum likelihood estimator, âMLE . The MOM

estimates for the two parameter distribution are not asymptotically efficient. If we are to use

these moment estimates, the limiting distribution for the quadratic statistic would be a

55

DRAFT

function of the sample size (as well as a function of the estimation method). The following

results apply only to the single parameter Maxwell distribution.

The methodology provided in Reference [12] was used to derive the Anderson-Darling

asymptotic percentage points for the Maxwell distribution. These results, along with some

very small sample value results, are given in Table F1 for selected levels of significance. The

results are derived for 50,000 Monte Carlo samples, each of the size specified. The

methodology is applied as follows:

1. Generate a Monte Carlo sample (size n) of standardized values “t” by obtaining the

root-mean-square of three standard normal random variables. Note: the sample values

are drawn from a population with parameter a = 1. Arrange values in ascending

order.

2. The MLE estimate of “a” is next calculated using the sample data values. This is

accomplished as:

∑t i

2

aˆ MLE = 2 1

3n

distribution includes this term ( 2 a ) .

3. The n sample values must reflect that that an estimated parameter, rather than the

actual population parameter is being used. The sample standardized value ti is then:

ti

ti =

aˆ MLE

4. The empirical distribution function for the sample is calculated from the CDF:

ti ⋅ exp (−ti2 )

2

Fn ( ti ) = erf (ti ) −

π

5. Equation F2 is then calculated for each sample. This is done 50,000 times for the

current results. The values of the statistic A2 are sorted and quantile values selected

corresponding to the selected significance level.

56

DRAFT

Size

5 .4589 .6646 .9358 1.1493 1.4458 1.70761

10 .4723 .6868 .9728 1.2041 1.5304 1.7822

25 .4744 .6913 .9929 1.2239 1.5488 1.7797

> = 30 .4805 .7070 1.0094 1.2390 1.5643 1.8162

It is emphasized that when using the percentage points in Table F1, the statistic A2 must be

calculated using the MLE estimate of the parameter. If the value of the parameter “a” is

known (or assumed) independently, a table with different values at each significance level

would result.

A somewhat different approach can be taken for testing goodness-of-fit when the distribution

is of “location-scale” type. The idea uses the fact that the quantiles of order statistics from a

sample should be distributed uniformly on the unit interval. The development follows

Reference [12].

The procedure uses the form of the standard definition of the correlation coefficient, defined

between a random variable Yi and some set of constants, Ti .

1 n

n

∑ (

(Yi − Y ) ⋅ Ti − Tˆ ) ; (Y − Y ) 2

(T − T ) 2

R2 = 1 sY = i

sT = i

sY ⋅ sT n n

The usual correlation coefficient is defined between two random variables. The above

definition extends this definition to the case where one set of variables is not random.

samples drawn from the selected distribution. In general, the distribution or R 2 will depend

on the underlying distribution form, as well as the specific parameters. In the case of

“location-scale” distributions, the random variables can be standardized, and only the

resulting correlation results depend only on the distribution form.

The values from each sample are arranged in ascending order (i,e, as order statistics). The

most representative constant associated with each order statistic is the expected value of that

order statistic. Although the exact distribution of the order statistics are known (Appendix D),

the moments of these distributions cannot be calculated in closed form for most probability

densities, including the Maxwell distribution.

An alternative to the mean value of the order statistic is the mean value of the quantile

associated with that order statistic. This value is known in a simple form:

57

DRAFT

r

E ( qr ) =

n +1

As an approximation, select constants Ti such that:

i

Ti = F −1

n +1

Using the Delta Method, it can be shown that this is an approximation to the mean of the

actual order statistic, for a one term Taylor expansion of the inverse function. As sample size

increases, the approximation becomes more exact (Reference [5]).

Hence, for sample size n, and a specified location-scale distribution, a set Ti ; for 1 ≤ i ≤ n ,

can be calculated using the distribution inverse function. A large number of (Monte Carlos)

samples are drawn from the specified distribution, and the regression with the fixed Ti is

computed for each sample. The resulting sample statistics are ordered, and percentage points

determined.

As the statistic R 2 decreases, it is less likely that the sample belongs to the specified

distribution (null hypothesis). In accordance with standard practice, the following related

statistic is generally used:

Z n = n ⋅ (1 − R 2 )

This statistic has the property that it increases as indication of fit decreases.

Extreme Value distribution result correlation results of Reference [12] were checked, using

50,000 rather than 10,000 samples. The method was then applied to the Maxwell distribution.

Note that these results can be used for either the one or two parameter distribution. The

distribution of Z n is quite different than a normal distribution.

Figure F-1 shows the distribution of the Z n statistic for 50,000 Monte Carlos runs, and a

sample size of 100 plotted on normal probability paper.

in Table E2, for indicated levels of significance and sample sizes.

58

DRAFT

N o r m a l P ro b a b ilit y P lo t

0 .9 9 9

0 .9 9 7

0 .9 9

0 .9 8

0 .9 5

0 .9 0

0 .7 5

P robability 0 .5 0

0 .2 5

0 .1 0

0 .0 5

0 .0 2

0 .0 1

0 .0 0 3

0 .0 0 1

D a ta

Size 0.50 0.25 0.10 0.05 0.02 0.01

25 .7601 1.1414 1.6714 2.1169 2.7207 3.2321

50 .8960 1.3301 1.9313 2.4214 3.1496 3.7543

75 .9699 1.4302 2.0813 2.6271 3.4239 4.0950

100 1.0213 1.4964 2.1690 2.7294 3.5346 4.2037

150 1.0894 1.5831 2.2870 2.8972 3.7203 4.4158

200 1.1373 1.6498 2.3622 2.9590 3.8189 4.5451

400 1.2345 1.7753 2.5359 3.1263 4.0064 4.6605

750 1.3251 1.8966 2.6751 3.3299 4.2250 4.8762

1000 1.3611 1.9446 2.7354 3.3592 4.2418 4.9649

Figure F2 is a plot of the percentage points of Table F2. The plot shows that, for a given level

of significance, an asymptote is approached when the sample is sufficiently large. For small

sample sizes, the relationship is quite non-linear. This is a characteristic of all goodness-of-fit

statistics although definition of “large sample” differs depending on the test.

59

DRAFT

α = 0.01

4.5

4 α = 0.02

percentage points Z = n [1 - R]

3.5

2

α = 0.05

3

2.5 α = 0.10

α = 0.25

1.5

α = 0.5

1

0.5

0 100 200 300 400 500 600 700 800 900 1000

sample size

Regression Statistics

Regression statistics are related to correlations statistics, and can be derived the same way.

For location-scale distributions, the regression uses the ordered sample values and the

constants Ti to perform an ordinary least-squares liner fit:

Y( i ) = b0 +b1 Ti +εi

The test statistic is a function of the residuals of the least-squares fit because these tests are

closely related to correlation tests. Also, because the power functions have not been

determined, the choice of regression vs. correlation test is not evident.

For single parameter Maxwell distribution the Anderson-Darling statistic discussed above is

believed to be more powerful than either regression or correlation statistics. The interested

reader can consult Reference [12] for further discussion.

60

DRAFT

Outlier Tests

Panel (b) of Figure F0 provides an example of a data set that appears to follow a Maxwell

distribution quite closely, excepting for the three upper values. Applying one of the previous

goodness-of-fit tests may or may not result in rejecting the null hypothesis of the distribution

type. In either case, it may be of interest to determine if these extreme values can be identified

as outliers.

Outliers have long history and a certain degree of empiricism is involved in selecting a

relevant statistic for identifying and rejecting outliers as discussed in Reference [15]. Much of

the work done on outliers has assumed that the samples are drawn from normal distributions.

When distributions other than normal are involved, one should approach the historical criteria

for outlier detection with caution.

Detection of outliers is a hypothesis testing problem as noted previously. The null hypothesis

is that the data are from a specified distribution. The alternative hypothesis is that “k” outliers

are present, where often k equals 1. In any case, the alternative hypothesis must state if the

outliers are at the upper tail, the lower tail or possibly at either tail of the distribution. Any

useful test for outliers will depend on this alternative hypothesis. Different decision statistics

are usually be required to test different alternatives. The present paper deals only with the null

hypothesis of a Maxwell distribution and an alternative that includes outliers on the upper tail.

As noted in the general discussion, an outlier test involves a composite hypothesis. For a

decision statistic T, the test takes the form:

Null hypothesis H0: All data vales come from the same Maxwell distribution

The first question is: how can we come up with a useful statistic for testing for upper outliers?

To the end, we consider basing our choice on the likelihood ratio for the null hypothesis and a

selected alternative. Likelihood ratio tests are a tool used to handle composite hypothesis.

This type of test compares the ratio if the likelihood of the composite hypothesis to the

likelihood of the null hypothesis, given the particular sample data x :

LR = =

Null HypothesisLikelihood L0 ( x;θ 0 )

As the likelihood the alternative increases, the probability that the null hypothesis is true

decreases. As such, the likelihood ratio increases as probability of rejecting the null

hypothesis is increases. The idea is to find a suitable statistic T reflects this behavior. The

approach is again to used Monte Carlo simulation to determine percentage points at specified

significance levels for the statistic, under the assumption the null hypothesis is true.

61

DRAFT

The specific type of alternative considered for the Maxwell outlier test statistic is for

“slippage” alternative. This means that we consider one or more of the data values to have

come from the same type of distribution (i.e., Maxwell), but to have different parameters than

the null distribution.

Although the slippage alternative assumes the same distribution type for outliers, it should be

reasonable if the outlier data are from a different distribution family (Reference [15]). In

particular alternate distributions that are of the “exponential family” such as the Normal or

Gamma distributions should have essentially the same test statistic. A definition and

discussion of exponential families of distributions is provided in References [1], [2] and [3].

In the development that follows, consideration is restricted to the hypothesis of a single outlier

at the upper tail. Development of a test statistic for k outliers is very similar. Results will be

stated for multiple outliers. The approach is to calculate the likelihood ratio as discussed

above and determine a statistic that increases as the likelihood increases. As discussed in the

main text, we will actually use the log-likelihood for convenience of manipulation.

The log-likelihood function for the null hypothesis, n data values is:

n

n 2 n ∑ yi2

ln L0 ( y; a ) = ln + 2 ∑ln yi − 3 ⋅ n ln a − 1 2

2 π 1 2a

The alternative hypothesis assumes that ( n −1) data values are from the same Maxwell

distribution, but one observation is from a different distribution. This different distribution is

characterized by a scale parameter ( a / λ) , where λ is the “slippage parameter.” The log-

likelihood function takes the form:

n−1 2

( n−1) / 2 n −1 −∑yi 1/ 2

2 y 2

2 yn2 ⋅ λ3 − yn2 ⋅ λ2

L1 ( y; a, λ) = ∏ ⋅exp i 1

⋅exp

2 a2

π 1 a 2 a2

3

π a3

Note that the slippage parameter λ should be less than 1.0 in order for the tail of this “outlier”

distribution to be extended. The data point yn represents the largest sample value. The log-

likelihood for the alternative hypothesis is:

n −1

n 2 n ∑ yi2

y 2 λ2

ln L1 = ln + 2 ∑ ln yi − 3 ⋅ (n −1) ln a − 1 2 + 3 ⋅ ln λ − 3 ⋅ ln a − n 2

2 π 1 2a 2a

The parameter values that maximize each of these likelihood functions can be found in the

usual way by zeroing the derivative and solving for the various terms. For the null hypothesis,

the MLE solution is the same as derived in previous sections of this paper:

n

∑y 2

i

y2

a= 1

=

3⋅ n 3

62

DRAFT

n n −1

∑y 2

∑y 2

( y )′ =

i i

and

y2 = 1 2 1

n n −1

Substituting “a” into the log-likelihood for L0 provides the maximum value of the null

hypothesis function, for the given sample data:

n 2

[ ]

n

3n 3n

ln L0 ,MAX = ln + 2 ∑ln yi − ln y 2 − ln 3 −

2 π 1 2 2

The alternative likelihood function has two parameters. Note that parameter “a” in the

alternative likelihood has a different value than it does for the null likelihood. Taking

derivatives with respect to λ and a, and equating to zero and solving provides:

n−1

∑y 2

i

( y )′

2

λ2 =

yn2

=

( y 2 )′

a =

2 1

= 3a2 yn2

3 ( n − 1) 3

Substituting these expressions into the alternative likelihood function, and after some

combining of terms, the maximum function value is:

n 2 n

ln + 2 ∑ln yi −

3n 2 ′

2

( )

+3 y ( )′ − 1 n−1 2 1 n−1 2

ln L1,MAX =

2 π 1 2

ln y − ln 3

2 yn2 2 a2

∑ i n −1 ∑

1

y −

1

yi

Also, substituting for “a”, the last term of this expression becomes:

1 n−1 2 1 n−1 2 3n

− 2 ∑ yi − ∑ yi = −

2a 1 n −1 1 2

The maximized log-likelihood ratio is the difference of these functions:

′

3 n y 2 3 ( y 2 )

ln LR = {ln L1,MAX − ln L0,MAX } = ln + ln 2 :

2 ( y 2 )′ 2 y

n

yn2

Now define the statistic: T1 = and the above expression reduces to:

( y )′

2

3 n ( n −1) + T1 1

ln LR = ln − ln Equation (F3)

2 n T1

The relevant statistic T1 is the ratio of the square of the largest data value and the average of

the remaining data values. The following comments apply to this formulation:

63

DRAFT

1. The statistic T1 is always greater than 1.0. As such, it can be seen from Equation

(F3) that as T1 increases, the log-likelihood ratio increases. This implies that as T1

becomes larger, the rejection of the null hypothesis will increase.

2. The statistic T1 is called an “exclusive” statistic since the average does not include

y2

the extreme value. An alternative “inclusive” statistic could have chosen as U1 = n2 .

y

T1 and U 1 are functionally related as can be easily shown. As such power of the test

for either statistic is equal. In the following tables, the “exclusive” form of the statistic

has been used.

3. The test statistic for k outliers, based on the slippage alternative is:

1 n 2

∑ yj

k n−k +1

Tk = where y1 , y 2 yn −k < y n −k +1 , y n

1 n −k 2

∑ yi

(n − k ) 1

4. Strictly speaking, the likelihood ratio as used in the derivation of these statistics is not

correct, because selecting the k largest values violates the assumption of randomness

in the likelihood function. Reference [15] addresses this issue. Essentially, what is

being done is to recognize that there are n! permutations of the sample data. If a

hypothesis test was performed for each permutation, the one resulting in the maximum

of the likelihood ratio would be chosen. Since this is the only one of interest, the

above procedure can be used.

The following tables provide percentage points for the specified levels of significance.

Each table is for a different sample size. The exclusive for of statistic is used. For outlier

rejection, the significance level should be small. The tables reflect this consideration. All

tables reflect results of 50,000 Monte Carlo simulation samples.

64

DRAFT

Outlier Significance Level α

s .10 .05 .02 .01 .005 .001

1 4.8145 5.4052 6.2223 6.8581 7.4441 ----

2 4.4494 4.8802 5.4402 5.8519 6.2644 ----

Outlier Significance Level α

s .10 .05 .02 .01 .005 .001

1 5.1436 5.7111 6.4765 6.9823 7.6112 8.9510

2 4.6623 5.0379 5.5728 5.9360 6.3128 7.0560

3 4.4140 4.7270 5.1355 5.4232 5.7602 6.3415

4 4.2591 4.5401 4.8972 5.1420 5.4380 5.9670

Outlier Significance Level α

s .10 .05 .02 .01 .005 .001

1 5.3799 5.9265 6.6109 7.1600 7.7189 9.0112

2 4.8484 5.2052 5.6900 6.0244 6.3659 7.1285

3 4.5690 4.8665 5.2383 5.5123 5.7930 6.4035

4 4.3980 4.6556 4.9654 5.2079 5.4433 5.9793

Outlier Significance Level α

s .10 .05 .02 .01 .005 .001

1 5.5275 6.0788 6.7843 7.3133 7.8463 9.1031

2 4.9945 5.3513 5.8099 6.1483 6.4609 7.2427

3 4.6902 4.9838 5,3575 5.6042 5.8680 6.5090

4 4.4938 4.7507 5.0644 5.2797 5.4996 5.9863

5 4.3535 4.5879 4.8676 5.0629 5.2502 5.6770

65

DRAFT

Outlier Significance Level α

s .10 .05 .02 .01 .005 .001

1 5.9591 6.4758 7.1402 7.6411 8.1419 9.4174

2 5.3773 5.7278 6.1531 6.4487 6.7565 7.5001

3 5.0492 5.3298 5.6506 5.8891 6.1371 6.6655

4 4.8270 5.0670 5.3452 5.5360 5.7442 6.1896

5 4.6628 4.8747 5.1289 5.2989 5.4698 5.8298

Outlier Significance Level α

s .10 .05 .02 .01 .005 .001

1 6.4135 6.9305 7.5920 8.0892 8.5975 9.7696

2 5.8003 6.1378 6.5539 6.8654 7.1519 7.8258

3 5.4567 5.7216 6.0462 6.2671 6.4962 7.0009

4 5.2221 5.4428 5.7136 5.9018 6.0728 6.4955

5 5.0431 5.2376 5.4770 5.6349 5.7883 6.1710

66

DRAFT

Appendix G

The first step in many Exploratory Data Analyses (EDA) of a data set is plotting the data on

one or more types of probability. This allows for rapid qualitative assessment of whether the

data appears to come from the selected distribution. The probability plot is also useful in

identifying possible outliers as discussed in Appendix F.

MATLAB provides Normal distribution plots directly. Maxwell probability plots can be

generated using appropriate MATLAB code and plotting utilities. The resulting probability

paper can be used for either the one or two parameter distributions because of the “location-

scale” nature of this distribution.

The figure on the next page is the general purpose Maxwell probability paper which can be

copied and used for hand plotting if required. Following this figure is code for a MATLAB

function that takes a data vector as input and develops the probability plot axes, plots the input

data vector, and provides an estimate of the “best fit” Maxwell distribution for the given data.

Essentially, the probability plot is just plotting standardized values on both the ordinate and

abscissa, both being on a linear scale. Various probability values, which correspond to fixed

values of the standardized variable, are superimposed on the ordinate. This axis is always the

same regardless of the data. The abscissa scale is linearly adjusted to fit the actual data input.

The best fit line is generated by obtaining estimates of the parameters â and b̂ using the

method of moments. The MOM estimators are used for simplicity since they can be calculated

directly. The MOM and MLE estimators provided the same information for qualitative

evaluation.

67

DRAFT

M a x w e ll P r o b a b ility P lo t

99

98

95

90

80

Probability %

70

50

30

20

10

D a ta V a lu e s

68

DRAFT

%

% This function takes in a data vector and develops a plot of the values on

% probability axes corresponding to a general two dimensional Maxwell

% distribution. A fitted line, based on moment parameter estimates is also

% drawn on the plot.

% Tht Maxwell CDF in terms of the standardized variable t is:

%

% F(t) = erf(t) - (2/sqrt(pi))*t*exp(-t^2)

%

% t = (Y - b)/a

%

% First sort the data array for plotting purposes

%

Y = sort(Y);

%

% check for at least four data values

nsamp = length(Y);

%

if( nsamp < 4)

disp('Error - Input requires at least 4 data values')

return

end

%

% Prepare the basic probability plot axes. This amounts to plotting the

% standardized Maxwell variable on both the X and Y axes. The Y axis then

% has tick marks superimposed corresponding to probabilities.

qv = (0:0.01:.99);

for i = 1:length(qv)

q = qv(i);

T(i) = fzero(@(t) maxinv(t,q),0.5);

end

T(1) = 0; % rounding error may cause non zero value so set to zero.

%

Q = [0 0.01 0.04 .1 .2 .3 .5 .7 .8 .9 .95 .98 .99];

for i = 1:length(Q)

q = Q(i);

L(i) = fzero(@(t) maxinv(t,q),0.5);

end

L(1) = 0;

%

%

plot(T,T,'Linestyle','none'),grid

set(gca,'YTick',L)

set(gca,'YTickLabel',{'0';'1';'4';'10';'20';'30';'50';'70';'80';'90' ...

;'95';'98';'99'})

title('Maxwell Probability Plot')

ylabel('Probability %')

xlabel('Data Values')

hold

%

69

DRAFT

%

% Now compute the empirical probability and the standardized varibles

% associated with the empirical values.

for i = 1:nsamp

r = i/(1 + nsamp);

et(i) = fzero(@(t) maxinv(t,r),0.5);

end

%

% Plot the standardized empirical values on the Y-axis (which are

% equivalent to the empirical probability) and the actual values on the

% X-axis.

plot(Y,et,'ro')

%

% Compute MOM estimates of the a and b for the Maxwell distribution

% See document Appendix B for derivation

%

m2 = sum(Y.*Y)/nsamp;

m1 = sum(Y)/nsamp;

mdiff = m2 - m1^2;

%

ahat = 1.4849*sqrt(mdiff);

bhat = m1 - 2.3695*sqrt(mdiff);

if(bhat > Y(nsamp))

bhat = Y(nsamp);

end

TFit = L*sqrt(2)*ahat + bhat;

%

plot(TFit,L,'Linestyle','--')

hold off

%

return

%

%

% Solution of standardized variates requires root-finding solution to

% contained in this function

function f = maxinv(t,q)

f = erf(t) - (2/sqrt(pi))*t*exp(-t^2) - q;

return

70

DRAFT

Appendix H

It has been noted that the Rayleigh and Maxwell distributions can be derived from the Chi-

squared distribution, with 2 and 3 degrees-of-freedom, respectively. These are useful because

certain physical phenomena give rise to such distributions. Two and three dimensional miss

distance distributions are examples. There may be physical situations where more than three

terms are involved in the root-mean-square variable. It was noted in the main report that these

give rise to Chi-distributions.

The Chi distributions can be further generalized so that the shape parameter k is no longer an

integer value, but can take on any continuous value greater than zero. These are called

Nakagami distributions. This distribution has been proposed in physical applications as a

model for fading of communication signals in multi-path.

Consider the following random variable, where the X i are normal random variables with a

common variance “a”.

k

y =a ∑X

1

i

2

= a Wk

W ( k −2 ) / 2

f (W ) = exp ( −W / 2 )

Γ( k / 2 ) 2 k / 2

The distribution of the random variable y follows from using the standard Jacobian

transformation method:

dW 2y 1 y k −1 y2

f Y ,k ( y ) = fW ( y ) = f ( y ) = exp

− 2 a 2

; y ≥0

Γ( k / 2 ) 2( k −2 ) / 2 a k

W

dy a2

Equation (H1)

y

1 u k −1 u2 1 k y2

Fk ( y ) =

Γ( k / 2 ) 2 ( k −2 ) / 2 ∫0 a k exp

− 2 a2

du = γ

Γ( k / 2 )

;

2 2a

2

Equation (H2)

71

DRAFT

The last function is the (lower) incomplete gamma function, and is defined as:

x

γ ( n ; x ) = ∫t n−1 e −t dt

0

The incomplete gamma function has a recursive property that makes evaluation easier

(Reference [16]):

γ ( k +1; x ) = k γ ( k ; x ) − x k e −x

γ (1; x ) =1 −e −x and

1

γ ; x = π erf ( x)

2

These distributions for a given k are collectively referred to as “Chi-k distributions.”

These above relations show that when k is even, the distribution function involves only

exponential terms, whereas when k is odd, the CDF involves the error function.

As an example, the above procedure can be applied when k = 4 to obtain:

y2 − y2

F4 ( y ) = 1 −

1 + exp

2 a2

2 a2

It is noted that regardless of the value of k, the distribution can be written in terms of the

standardized variable used for the Maxwell distribution:

y

t=

2a

∞

1 u k + p−1 u2 2p/2 k + p p

mkp =

Γ( k / 2 ) 2( k −2 ) / 2 ∫0 a k exp

− 2 a2

du = Γ

Γ( k / 2 ) 2

a

Equation (H3)

A plot of several Chi distributions (a = 1) is given in Figure H1. As k increases, the mode of

the distribution shifts to the right as would be expected since the root-mean-square sum

involves more terms. Differentiating Equation (H1) with respect to y and equating to zero

gives the mode of the distribution as:

ykMode = a k −1

The figure also appears to show that the variance and mode probability value change very

slowly as k increases.

72

DRAFT

0.7

k=3

k=4

k=6

0.6 k=9

0.5

0.4

Probability

0.3

0.2

0.1

0

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

Data Value

Table H1 provides values of the distribution mean, variance, skewness and kurtosis excess for

several of the Chi-k distributions. The raw moments were computed using Equation (H3).

3 1.5958 a 3 a2 0.4535 a2 0.4857 0.1082

1.8800

4 1.8800 a 4 a2 0.4657 a2 0.4057 0.0593

5 2.1277a 5 a2 0.4729 a2 0.3542 0.0370

6 2.3500 a 6 a2 0.4777 a2 0.3179 0.0251

9 2.9180 a 9 a2 0.4854 a2 0.2519 0.0106

The table confirms that variance of the distribution changes little, but the skewness and

kurtosis excess indicate as k increases, the distributions approach the normal distribution.

Approximating the Chi distribution as normal when k becomes “large” may be adequate for

some applications.

73

DRAFT

All of the procedures discussed for the Maxwell distribution can be applied to Chi-k

distributions. As an example, the log-likelihood function for a sample of size n is:

n n

1

Lk ( n; a ) = −n ln C ( k ) ) + ( k −1) ∑ln yi − k ⋅ n ln a − ∑y 2

i

1 2 a2 1

estimate for the parameter:

n

∑y 2

i

M2

′

aˆ MLE = 1

=

k ⋅n k

The CRLB could be derived as was done for the Maxwell distribution using the appropriate

density function. A simpler alternate argument can be used however. If we apply the

Factorization theorem discussed in Appendix A to the above likelihood function, it is clear

that the sufficient statistic for all k is the same:

n

Ts = ∑ yi2

1

The MLE estimate is a function of this sufficient statistic and as was the case for k = 3, the

MLE is an efficient estimator. Thus, the CRLB is the mean square error for this estimator, and

the necessary relation is (see main body or this paper):

2

∂ aˆ 1 ′ ′ 2

MSE ( aˆ MLE ) = CRLB ( k ) = MLE var M ′ = 1

µ −µ

∂M ′ 2 4 k M ′ n 4 2

2 2

The relation between relevant moments turns out to have a simple form:

′ ′

2

M 2 ≈ µ2 = k ⋅ a2 and µ4′ − ′

µ2 = 2 ⋅k ⋅a

4

a2

CRLB ( k ) = Equation (H4)

2⋅k ⋅n

The variance of the MLE estimate can be used to compare efficiencies with the MOM or

quantile estimator MSE if desired. It is interesting that the MSE of the estimator decreases

with increasing k. Noting that “a” is the variance of underlying normal variables, it would be

expected that as more variables are included in the summing (i.e., as k increases) , more

information is available for a given sample size n, and the estimate MSE would decrease.

The MOM estimate for parameter “a” is the same as the MLE estimate when using the second

moment. MOM estimates are discussed below in general terms for the Nakagami distribution.

74

DRAFT

The distribution function of Equation [H2] becomes more complex as k increases. Generation

of random variables from Chi distributions is easily accomplished by taking the root-mean-

square of the number of normal random variables. This is the recommended procedure for

developing Monte Carlo samples if statistics for outliers or goodness of fit are needed.

Calculation of Chi-k random variates at specified quantile values will require inversion of the

distribution function. Examples where this is needed are for construction of probability paper

or Monte Carlo evaluation of the distribution of certain statistics. When k gets large, deriving

the exact distribution function using Equation (H2) and the recursive relations becomes

unwieldy. Depending on the problem requirements, the normal distribution might be an

alternative, with mean and variance calculated from Equation (H3).

It is noted that if the normal component distributions do not have the same underlying

variance, and/or if there are correlations present, then the Mahalanobis distance should be

used. In this case the Chi-k parameter for the distribution is unity.

Nakagami Distribution

As noted in the introduction, Chi-k distributions are related to even more general types of

Nakagami distributions. The connection is established in the following discussion.

α

2 α 2 α−1 α

f ( y) = y exp − y 2 Equation (H5)

Γ(α ) λ λ

Here α is a shape parameter and λ is a scale parameter. If these parameters are chosen:

k

α= and λ = k ⋅a

2

With these choices, the Nakagami distribution becomes a Chi-k distribution. It is apparent that

this former distribution is a generalization of the latter, where the shape parameter is not

restricted to integers or an integer plus one-half.

If one is dealing with a set of data that is suspected of coming from a Chi-k distribution, but

the value of k is uncertain, this more general form may be useful for estimating the value of

the shape parameter. The likelihood function for the Nakagami distribution is:

n nα

2 α n

α n 2

L( y;α , λ ) = ∏ y 2 α −1

exp − ∑ yi

Γ(α ) λ λ 1

i

1

Since two parameters need to be estimated, we ask if there are two sufficient statistics for this

likelihood function. We can apply the Factorization theorem for multiple parameters. In the

case of the above likelihood function note that:

75

DRAFT

2 α −1

n

n

∏y

1

2 α −1

= ∏ yi

1

Thus, the data is represented by the two sufficient statistics:

n n

′

Ts1 = ∑ yi2 = n ⋅ M 2 and Ts 2 = ∏ yi

1 1

The first of these is the usual summation (moment) statistic for the squares of the data values.

The second on is a product statistic.

In order to determine the mean square error of the estimator, it would be required to know the

variance of these two statistics. Appendix C and the main report, discuss how the variance of

summation statistics can be calculated. Unfortunately, product statistics are much more

difficult to deal with.

The maximum likelihood estimator of the two parameters can be developed however by the

usual method. Consider the log-likelihood function:

n

α 1

ln L( y;α , λ ) = n [ ln 2 + α { ln α − ln λ} − ln Γ( α ) ] + ( 2 α − 1) ∑ ln yi − ∑ yi2

1 λ 1

Differentiating with respect to λ and equating to zero produces:

α α n

1 n

−n + ∑ yi2 = 0 or λˆMLE = ∑ yi2 = M 2′

λ λ2 1 n 1

Differentiating with respect to α and equating to zero results in:

d ln Γ(α ) n

1 n 2

n ln α − ln λ + 1 −

dα

+ 2 ∑

1

ln yi − ∑ yi = 0

λ 1

Substituting the expression for λ into the last term reduces this term to n. The derivative of

the natural logarithm of the gamma function is called the “psi” function or the digamma

function. This function does not have a simple closed form and is usually represented by

symbol Ψ( x ) .

This last equation can be solved numerically using a MATLAB function which implements

evaluation of the digamma function. The resulting equation for the solution of α is:

2 n ′

Ψ ( α ) − ln α = ∑ ln yi − ln M 2

n 1

As a practical note, the equation solver may extrapolate values to α <= 0 . This value must be

always positive. As such, these values need to be reset to some small positive value if this

occurs. Generally, we would expect parameter α ≥ 1 .

Solving the previous equation gives α̂MLE . The analyst could use this, along with the

estimate λ̂MLE and use the Nakagami distribution of Equation (H5). The mean square error

76

DRAFT

for the estimate of λ is easily found since this involves only the data second moment. Mean

square error of the estimate of α does not have a simple expression.

The method of moments may be applied to the Nakagami distribution to obtain parameter

estimates in terms of summation statistics. The general expression for the p-th moment of this

distribution is found in terms of the gamma function as:

Γ(α + p / 2) λ

p/2

p

mNak =

Γ(α ) α

The second moment using this equation, and noting Γ(α +1) = α Γ(α) provides:

λ Γ(α + 1) ˆ

2

mNak = = λMOM ≅ M 2′

α Γ(α )

The MOM estimator for λ is the same as the MLE estimator. The first moment could be

used for the remaining parameter; however, for simplicity the so-called “inverse normalized

variance estimator” is generally used. This results by considering the fourth moment:

Γ(α + 2) λ

2

λ2

4

mNak = = (1 + α )

Γ(α ) α α

On rearrangement and substitution:

2 ′

mNak M2

αˆ MOM = ≅

2

mNak − mNak

2

′ ′

2

M 4 − M 2

If required, the mean square error of these estimators can be derived using methods discussed

in Appendix C for functions of moment estimators.

If the data is required to have a Chi-k distribution, it seems reasonable to choose k such that it

rounds to the integer closest to ( 2 α) . The estimate of parameter “aChi” would be updated

using this rounded value in the estimator equation: aˆChi = M 2′ / k round .

77

DRAFT

Appendix I

The Inverse Maxwell (IM) distribution is related to the Maxwell distribution in the following

way. If random variable y is Maxwell distributed, random variable ( x =1 / y ) is inverse

Maxwell distributed.

The distribution of x can be derived in the usual manner for single variable function

transformation:

1 1 dx 1 2 y2 y2

f x ( x) = f y ( y) f y ( y) =

x=

y J

J =

dy

= 2

y π a3

exp − 2 a2

Substituting the transformed variables in the Maxwell density results in the inverse Maxwell

density function:

2 1 −1

f x ( x) = 3 4

exp Equation (I-1)

π

a x

2 a2 x2

The distribution function is obtained from direct integration of this equation:

1 2 1 −1

Fx ( x) = 1 − erf

2 a x + π

a x

exp

2 a2 x2

Equation (I-2)

The symbol “erf ” denotes the error function as before. The IM distribution involves only a

“scale” parameter, which leads to some simplification, as noted previously when the Maxwell

distribution was discussed.

rewriting the expression and using L’Hopital’s rule twice.

1

Let z≡ then z → ∞ as x → 0

x

2 1 z4 2 1 4 z2 2 8a

f x (0) = lim = lim = lim =0

z →∞ π a3 z

2

π a z →∞ z

2

π z →∞ z2

exp 2 exp 2 exp 2

2a 2a 2a

A comparison of the IM and the Maxwell is shown in Figure I1 for a = 1. Applications for the

IM distribution are not readily available, but from the shape of the density function suggests

that it may be useful when the data are highly skewed, which often occurs when considering

extreme value events.

78

DRAFT

M a x w e ll a n d In ve rs e M a x w e ll D e n s it ie s a = 1 M a x w e ll a n d In ve rs e M a x w e ll D is t rib u t io n s a = 1

1 .8 1

1 .6 0 .9

In ve rs e M a x w e ll

M a x w e ll 0 .8

1 .4

0 .7

1 .2

0 .6

Probability Distribution

1

Probability Density

0 .5

0 .8

0 .4

0 .6

0 .3

0 .4

0 .2

In ve rs e M a x w e ll

0 .2 0 .1

M a x w e ll

0 0

0 0 .5 1 1.5 2 2.5 3 3.5 4 0 0 .5 1 1 .5 2 2 .5 3 3.5 4

y y

The mean and variance of the IM distribution are easily obtained by taking the required

expectation operations:

∞

−1

µ1′ = E [ x ] =

1 2 1 2

exp

2a 2 x 2

=a

a π 0 π

∞

−1

µ2 = E [x

′

] 1 1

2

= − 2 erf

= 2

a 2 a x 0 a

′ ′

2

1 π −2

σ 2 = µ2 − µ1 = 2

a π

The “inverse” effect of the value of the parameter a can be seen from the respective variances

and means. As “a” increases in the Maxwell distribution, the mean and variance increase;

whereas, the IM distribution mean and variance decrease.

∞

µ3 = E [ x ]

′ 1 −1

3

=− Ei

2a 2 x 2

2π a 3

0

Where Ei (•) is the exponential integral. At the upper bound (i.e., x → ∞ ), the argument

approaches zero and this integral is undefined. Thus, the third moment does not exist.

Proceeding in the same fashion, it can be shown that only the first and second moments exist.

The mode of the inverse Maxwell distribution is found by setting the derivative of the density

function to zero:

79

DRAFT

d fx −4 1 1

=0= 5 + 2 7 or xmo =

dx x a x 2a

The CRLB for this distribution can be calculated in a similar manner to that for the Maxwell

distribution. Recall:

1

CRLB =

∂ ln f ( x) 2

n ⋅ E X

∂ a

1 2 1 ∂ ln f X ( x ) 3 1

ln f x ( x ) = ln − 3ln a − 4ln x −

2a 2 x 2

and =− + 3 2

2 π ∂a a a x

2 ∞

∂ ln f X ( x) 2 9 6 9 −1 2

E = ∫ a − 7 6 + 9 8 exp 2 2 = [I + I + I ]

∂a π 0

5 4

x a x a x 2a x π 1 2 3

The first integral is (and invoking L’Hopital’s rule for evaluation):

∞

a2 π 1 π 9

I1 =

9 exp

−1

erf

−a 2 a x =

3

a5 x 2a 2 x 2 2 2 a2

0

Likewise:

π −18 π 15

I2 = 2

I3 =

2 a 2 a2

The CRLB for the inverse Maxwell distribution is:

a2

CRLB = Equation I-3

6n

The maximum likelihood estimator for the IM density is found from the log-likelihood:

n 2 n

1 n

1

ln L( x; a ) = ln − 3 n ⋅ ln a − 4∑ ln xi − 2 ∑x

2 π 1 2a 1

2

i

Differentiating with respect to parameter “a” and equating to zero results in:

− 3n 1 n 1 1 n 1

+ 3∑ 2 =0 or aˆ MLE = ∑

a a 1 xi 3 n 1 xi2

n

1

For this distribution, the statistic Ts = ∑ 2 is seen to be sufficient. It was proved that for

1 xi

the Maxwell distribution, this estimator is efficient for any sample size “n”. The method there

80

DRAFT

relied on the fact that the MLE estimator was a function of the second sample moment of the

data. For the inverse Maxwell, this is not the case. The statistic cannot be expressed in terms

of any sample moment.

Although an exact expression for the MSE is not available, likelihood theory shows that

MLE’s are asymptotically efficient. Thus, for large enough sample sizes, the CRLB can be

used as an estimate of the mean square error of this estimator.

Moment Estimator

It is tempting to use the method of moments and estimate the parameter “a”. It would seem

that the second data moment would provide the estimate:

2

1 ∂aˆ ′

µ − µ ′

2

aˆ MOM =

′ and MSE ( aˆ MOM ) =

M2 ∂M ′ 4

2

2

The MSE of this estimator is infinite because the fourth moment of the density is not defined.

2

2 1 ∂aˆ 1 ′ ′

2

aˆ MOM = and MSE ( aˆ MOM ) = µ −

µ

π M′ ∂M ′ n

2

1

1 1

The MSE does exist for this estimator. Carrying out the evaluation results in:

2 1 1 1 2 a2

MSE ( a

ˆ MOM ) = 2

1 − ≈

π ′

4

n a π 1.7519 n

M 1

eff MOM =

CLRB

= 2

( a 2 / 6n ) = 0.2919

MSE (aˆ MOM ) ( a / 1.7519 n )

This is a very low efficiency estimator. The MLE should be used when estimating the

parameter of an inverse Maxwell distribution.

Quantile estimators depend only on the distribution and the relation between quantiles and the

estimator. As such, these estimators could be derived if required. Further, Bayesian parameter

estimation could be done if desired.

When a large number of IM distributed random variates are required, it is faster to use the

inverse relationship

1

X IM = where Z i → N ( 0, a ) i = 1, 2, 3

Z + Z 22 + Z 32

1

2

81

DRAFT

distribution function involves only a scale parameter, it can be expressed in standard form by

defining t ≡ 2 a x :

1 2 1 −1

Fx (t ) =1 −erf

t + π t exp

t2

A general inversion algorithm for the scaled variates can be developed from this. Random

variates from specific distributions are calculated from the scaled values.

The ability to make the distribution function non-dimensional allows for development of

goodness-of-fit tests such as the Anderson-Darling or correlation type if desired. Also, tests

for outlier rejection can be developed using the same likelihood ratio simulation techniques as

applied for the Maxwell distribution.

Probability plotting paper can be developed easily for this distribution, using the same

procedure as applied for the Maxwell distribution.

82

DRAFT

References

1. Hogg, R. V., J. W. McKean, A. T. Craig, “Introduction to Mathematical Statistics,” 6th

Edition, Prentice-Hall, 2005.

2. Casella, G., and R. Berger, “Statistical Inference,” 2nd Edition, Duxbury Publishing,

2002.

3. Bury, K. V., “Statistical Models in Applied Science,” John Wiley & Sons, 1975.

4. Papoulis, A., and U. Pillai, “Probability, Random Variables and Stochastic Processes,”

4th Edition, McGraw-Hill, 2002.

5. Gibbons, J., and S. Chakraborti, “Nonparametric Statistical Inference,” 3rd Edition,

Dekker Publishing, 1992.

6. Brookner, E., “Tracking and Kalman Filtering Made Easy,” J. Wiley and Sons, 1998.

7. Presss, W., et. al., “Numerical Recipes,” Cambridge University Press, 1986.

Applications,” 5th Edition, Duxbury Press, 1996.

9. Parzen, E., “Modern Probability and Its Applications,” John Wiley and Sons, 1960.

10. Conover, W., “Practical Nonparametric Statistics,” 2nd Edition, J. Wiley and Sons,

1980.

Publishing, 1986.

13. Anderson, T.W., and D.A. Darling, “A Test of Goodness of fit,” J. American

Statistical Association, Vol. 49, No. 268, December 1954.

14. Schmeiser, Bruce W., “Some Myths and Common Errors in Simulation Experiment,”

Proceedings of the 2001 Winter Simulation Conference.

15. Barnett, V. and T. Lewis, “Outliers in Statistical Data,” 3rd Edition, John Wiley and

Sons, 1994.

Mathematics Series 55, National Bureau of Standards, U.S. Government Printing

Office, 1964.

83

- Agard Flight Test Technique Series Volume 2 Identification of Dynamic SystemsUploaded byDavid Russo
- a1Uploaded by1plus12
- Classification of mixtures of student grade distributions based on the Gaussian mixture model using the expectation-maximization algorithmUploaded byJoel Reyes Noche
- Exchange Dev AfricaUploaded byvaleryfranck
- EMUploaded byTrần Vũ Hà
- class03_14Uploaded byDr. Ir. R. Didin Kusdian, MT.
- Risk Rating Article 2004Uploaded byekaljc
- S06 Handout - Sampling Distributions for Means and ProportionsUploaded byKatherine Sauer
- Does Anything Beat 5 Minute RVUploaded byeiffel
- pg092-93Uploaded byLeonard Gonzalo Saavedra Astopilco
- Fall 2009 Test 1Uploaded byAndrew Zeller
- Analysis of Wind Speed Variations and Estimation of Weibull Parameters for Wind Power (PPK Awam)Uploaded byİsmail Kayahan
- PROC LOGISTIC Class_variablesUploaded byjupdiaz
- d 03402029035Uploaded bytheijes
- CS174: note02Uploaded byjuggleninja
- Structural Analysis ReportUploaded byUnknown
- Cir asfda asfd asdfUploaded byseanwu95
- Statistical InferenceUploaded byKsm Hanif
- 2yprobnUploaded bybliss_au
- AIChE Journal Volume 23 Issue 4 1977 [Doi 10.1002%2Faic.690230412] FrantisД›k Madron; Vladimir Veverka; VojtД›Ch VanД›ДЌek -- Statistical Analysis of Material Balance of a Chemical ReactorUploaded byJunaid Ahmad
- 20091 Delignette-Muller Pouillot DenisUploaded byBustami Muhammad Sidik
- 1-1 Key Terms and Choosing Statistical AnalysisUploaded byGopalan Kathiravan
- Unit 3Uploaded bySrinivas Shetty
- Statistical and Spectral Analysis of Wind Characteristics Relevant to Wind Energy Assessment Using Tower Measurements in Complex TerrainUploaded byd_u_i_l_i_o
- THE EFFECT OF WATER PAINTING ON CREATIVITY AND SOCIAL ADJUSTMENT OF PRIMARY SCHOOL STUDENTSUploaded byBaru Chandrasekhar Rao
- ME GATE-2018 Paper 3-FEB-2018 Morning Session Ans Qns (1)Uploaded byNirmal Jayanth
- MLE of Utility Function by Using StataUploaded bymeriem
- Topic05.Normal DistrUploaded byDavid Le
- Chapter 5.1 Point Estimation_9March2016Uploaded byAriful Islam
- Chapter3-Econometrics-MultipleLinearRegressionModel.pdfUploaded byTripti Dutta

- Univariate Extreme Value Theory-1Uploaded byMaría Sandra Udrizar
- (Atlantis Studies in Probability and Statistics 3) Mohammad Ahsanullah, Valery B Nevzorov, Mohammad Shakil (Auth.)-An Introduction to Order Statistics-Atlantis Press (2013)Uploaded byElíGomaraGil
- Worked ExamplesUploaded byGörkem Berk Ağar
- 10.1.1.53Uploaded byDwi_Yanti_5542
- 7Tests of OutliersUploaded byProfesor Jorge
- WIND-KSU.pdfUploaded byMuhammad Ali
- The Ultimate Probability CheatsheetUploaded byqtian
- Paper 113Uploaded byjennajennjenjej
- Weld Geometry 1Uploaded bySaad Al Hely
- filliben1975.pdfUploaded byHishan Crecencio Farfan Bachiloglu
- Comparison Methods Extreme WindUploaded byThássia M. M. Kadlec
- Normality Testing in ExcelUploaded byAlina Constantin
- Ch 4 Order StatisticsUploaded bySupraja Niranjan
- Barlow and Boveng 1991Uploaded byapi-3828346
- Chapter 5Uploaded byJunheon Lee
- Gumbel Distribution -- From Wolfram MathWorldUploaded byalex
- Chapter5_Lecture5Uploaded byNdomadu
- integrating safety stockUploaded byAndry Yanuar
- NonParametric Statistical Inference - Gibbons J..pdfUploaded byasd
- Ch NonunifrngUploaded bybing miranda
- Statistics 2008Uploaded byAmeer Bakhsh
- Maxwell Distribution Rev3Uploaded bydoug_hollings
- Extreme Value DistributionsUploaded byWon Raima
- Generating Random Variables - Simulating methodsUploaded byyouyiz
- Granular OptimizationUploaded byFrederic Nietzche
- MathStatII.pdfUploaded bynitin_y11
- Distribucion Log NormalUploaded bymtorrejon
- Order StatisticUploaded byhienluong293
- D 2915 – 03 ;RDI5MTU_Uploaded byCarlos L. Oyuela
- hw08 (1).pdfUploaded byDavid Olorato Ngwako

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.