Goodness of Fit Test

Determination of Probability Distribution Models
• Distribution may be obtained based on properties of

physical process underlying a random phenomenon
– e.g. ocean wave heights
• Distribution may be determined based on available
observational data
– e.g. based on a frequency diagram for a set of data, the
distribution model may be inferred by visually comparing
the frequency diagram with a particular PDF
• Assumed or predicted probability distribution may be
verified or disproved using goodness of fit test
Goodness of Fit
When a theoretical distribution has been assumed (e.g.
based on the general shape of histogram, or physical nature
of the problem), the validity of the assumed distribution may
be verified or disproved statistically by goodness of fit tests
- Chi-square method (we will cover this)
- Kolmogorov-Smirnov test
- Anderson-Darling test not covered here
Theoretical or expected distribution

Compare
Observed frequency distribution
Judge whether the differences can be attributed to chance.
Hypothesis testing
During 400 5-min intervals the air traffic control of an airport
received 0,1,2,…, or 13 radio messages with respective frequencies
of 3, 15, 47, 76, 68, 74, 46, 39, 15, 9, 5, 2, 0, and 1. Furthermore, we
want to check whether these data substantiate the claim that the
number of radio messages received during a 5 min. interval may be
regarded as a r.v. having Poisson distribution with  = 4.6.
General rule of
thumb is that
expected
frequencies > 5.
We can achieve this
by combing some of
the data.
(But in exam, don’t
worry about this, we
will tell you how to
combine if needed)
Note: observed freq
can be < 5
Poisson Distribution
(t ) x t
P ( X  x in t )  e x  0, 1, 2...
x!
 = 4.6 =  t
 - mean rate of occurrence, t – duration
Poisson Probability
For
x = 0, P = 0.01 x = 13, P should be calculated
as P(X  13)
x = 1, P = 0.046
P(X  13) = 1 – P(X  12)
x = 2, P = 0.107
= 1 – P(X=0) – P(X=1) ….
…
= 0.001
Statistic for Test Goodness of Fit
k
o i  ei 
2
 
2
i 1 ei
The sampling distribution of the static is approximately the

 2 distribution with (k – 1 – m) d.o.f.
k – the number of terms in the formula

m – the number of parameters that are unknown and
estimated from the data.
Critical regions  2   2 , level of significance 
χ 2  χ α2 , Small  value, good fit  H0 not rejected

2
 2  2 , Large  2 value, poor fit  reject H0
Example (cont’d)
Test at 0.01 level of significance whether the data can be viewed
as values of a random variable having the Poisson distribution
with  = 4.6.
1. H0: Random variable, Poisson distribution,  = 4.6
H1: Random variable, does not follow Poisson distribution
with  = 4.6
2.  = 0.01
2 2
3. Criterion: Reject H0 if    0.01  21.666 ,
Here, parameter is given ( = 4.6), so no. of parameters
estimated from data is m = 0 k 2
oi  ei 
d.o.f. = k – 1 – m = 10 – 1 = 9, where   
2
i 1 ei
4. Calculation:
2 18  22.42 8  82
   ...   6.749
22.4 8
5. Decision:
 2   02.01 , H0 cannot be rejected.
Example (cont’d)
Test at 0.01 level of significance whether the data can be viewed
as values of a random variable having the Poisson distribution
with  = 4.6.
However, if we wanted to test whether the data could have
arisen from a Poisson distribution, without specifying ,
and we estimate  from the data, then
m=1 d.o.f. = 10 – 1 – m = 10 – 1 – 1 = 8.
How to calculate from data

N
  x   xi p X ( xi )
i 1
 3   15   47   1 
 0   1     2       13   
 400   400   400   400 
= 4.535, which is close, but not exactly equal to 4.6
Example
Figure shows crushing

strength of concrete
cubes. The normal and
lognormal pdfs are fitted to
the data, and the
parameters are estimated
using method of moments.
Perform a chi-squared test

to determine the validity of
the pdfs at  = 0.05 level
of significance.
Example
For both normal and lognormal distributions, m = 2 parameters are

estimated. k = 8 (no. of intervals)
Hence, d.o.f. = 8 – 1 – 2 = 5. At  = 0.05,  0 .05  11 . 07
2
Note: in exam, the intervals will be provided. Don’t worry about

expected frequencies > 5, or having to combine intervals
Example
Normal distribution
 
2
k
oi  ei 2  10.73  11.07 OK, but marginal
i 1 ei
Lognormal distribution
 
2
k
oi  ei 2  7.97  11.07 Clearly OK
i 1 ei
Hence, both the normal and lognormal distributions cannot be

rejected at  = 0.05.
However, the test shows that the lognormal is better than the
normal distribution
Appendix
How to calculate theoretical frequencies for

normal and lognormal distributions
Appendix
Normal distribution
Parameters:  = 7.5,  = 0.53 (estimated from raw data, which
is not shown in this example)
 6.75  7.5 
Bin 1 P ( X  6.75)        1.42 
 0 . 53 
 1  0.9222  0.0778
Theoretical frequency, e1 = 0.0778  143 = 11.1
 7  7 .5   6.75  7.5 
Bin 2 P (6.75  X  7.00)       
 0 . 53   0 . 53 
 0.1736  0.0778  0.0958
Slightly different from 13.2 due to round-off error when

using statistical tables, but don’t worry about this
Appendix
First derive the lognormal parameters  and  from  and 

 2  0 .53 2 
  ln  1  2   ln  1 
2
    0 .0706
2 
    7 . 5 
  ln   0 .5 2  ln( 7 .5 )  0 .5  0 .0706 2  2 .012
 ln(6.75)  2.012 
Bin 1 P ( X  6.75)        1.45 
 0.0706 
 1  0.9265  0.0735
Appendix
Bin 2
 ln(7)  2.012   ln(6.75)  2.012 
P (6.75  X  7.00)       
 0 . 0706   0 . 0706 
 0.1736  0.0735  0.100
Again, e1 and e2 are slightly different from actual values due to

round-off errors. Remember that the statistical tables only use 2
decimal places for z (actually quite crude)

Goodness of Fit Test

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Goodness of Fit Test

Uploaded by

Copyright:

Available Formats

Determination of Probability Distribution Models

• Distribution may be obtained based on properties of

Theoretical or expected distribution

Judge whether the differences can be attributed to chance.

 - mean rate of occurrence, t – duration

The sampling distribution of the static is approximately the

k – the number of terms in the formula

Critical regions  2   2 , level of significance 

χ 2  χ α2 , Small  value, good fit  H0 not rejected

 2  2 , Large  2 value, poor fit  reject H0

How to calculate from data

Figure shows crushing

Perform a chi-squared test

For both normal and lognormal distributions, m = 2 parameters are

Note: in exam, the intervals will be provided. Don’t worry about

Hence, both the normal and lognormal distributions cannot be

How to calculate theoretical frequencies for

Slightly different from 13.2 due to round-off error when

First derive the lognormal parameters  and  from  and 

Theoretical frequency, e1 = 0.0735  143 = 10.5

Again, e1 and e2 are slightly different from actual values due to

You might also like