You are on page 1of 7

Determination of Probability Distribution Models

• Distribution may be obtained based on properties of


physical process underlying a random phenomenon
– e.g. ocean wave heights
• Distribution may be determined based on available
observational data
– e.g. based on a frequency diagram for a set of data, the
distribution model may be inferred by visually comparing
the frequency diagram with a particular PDF
• Assumed or predicted probability distribution may be
verified or disproved using goodness of fit test

Goodness of Fit
When a theoretical distribution has been assumed (e.g.
based on the general shape of histogram, or physical nature
of the problem), the validity of the assumed distribution may
be verified or disproved statistically by goodness of fit tests
- Chi-square method (we will cover this)
- Kolmogorov-Smirnov test
- Anderson-Darling test not covered here

Theoretical or expected distribution


Compare
Observed frequency distribution

Judge whether the differences can be attributed to chance.

Hypothesis testing
During 400 5-min intervals the air traffic control of an airport
received 0,1,2,…, or 13 radio messages with respective frequencies
of 3, 15, 47, 76, 68, 74, 46, 39, 15, 9, 5, 2, 0, and 1. Furthermore, we
want to check whether these data substantiate the claim that the
number of radio messages received during a 5 min. interval may be
regarded as a r.v. having Poisson distribution with  = 4.6.
General rule of
thumb is that
expected
frequencies > 5.
We can achieve this
by combing some of
the data.
(But in exam, don’t
worry about this, we
will tell you how to
combine if needed)
Note: observed freq
can be < 5

Poisson Distribution
(t ) x t
P ( X  x in t )  e x  0, 1, 2...
x!
 = 4.6 =  t

 - mean rate of occurrence, t – duration

Poisson Probability
For
x = 0, P = 0.01 x = 13, P should be calculated
as P(X  13)
x = 1, P = 0.046
P(X  13) = 1 – P(X  12)
x = 2, P = 0.107
= 1 – P(X=0) – P(X=1) ….

= 0.001
Statistic for Test Goodness of Fit
k
o i  ei 
2
 
2
i 1 ei

The sampling distribution of the static is approximately the


 2 distribution with (k – 1 – m) d.o.f.

k – the number of terms in the formula


m – the number of parameters that are unknown and
estimated from the data.

Critical regions  2   2 , level of significance 

χ 2  χ α2 , Small  value, good fit  H0 not rejected


2

 2  2 , Large  2 value, poor fit  reject H0

Example (cont’d)
Test at 0.01 level of significance whether the data can be viewed
as values of a random variable having the Poisson distribution
with  = 4.6.
1. H0: Random variable, Poisson distribution,  = 4.6
H1: Random variable, does not follow Poisson distribution
with  = 4.6
2.  = 0.01
2 2
3. Criterion: Reject H0 if    0.01  21.666 ,
Here, parameter is given ( = 4.6), so no. of parameters
estimated from data is m = 0 k 2
oi  ei 
d.o.f. = k – 1 – m = 10 – 1 = 9, where   
2

i 1 ei
4. Calculation:
2 18  22.42 8  82
   ...   6.749
22.4 8
5. Decision:
 2   02.01 , H0 cannot be rejected.
Example (cont’d)
Test at 0.01 level of significance whether the data can be viewed
as values of a random variable having the Poisson distribution
with  = 4.6.
However, if we wanted to test whether the data could have
arisen from a Poisson distribution, without specifying ,
and we estimate  from the data, then
m=1 d.o.f. = 10 – 1 – m = 10 – 1 – 1 = 8.

How to calculate from data


N
  x   xi p X ( xi )
i 1

 3   15   47   1 
 0   1     2       13   
 400   400   400   400 
= 4.535, which is close, but not exactly equal to 4.6

Example

Figure shows crushing


strength of concrete
cubes. The normal and
lognormal pdfs are fitted to
the data, and the
parameters are estimated
using method of moments.

Perform a chi-squared test


to determine the validity of
the pdfs at  = 0.05 level
of significance.
Example

For both normal and lognormal distributions, m = 2 parameters are


estimated. k = 8 (no. of intervals)
Hence, d.o.f. = 8 – 1 – 2 = 5. At  = 0.05,  0 .05  11 . 07
2

Note: in exam, the intervals will be provided. Don’t worry about


expected frequencies > 5, or having to combine intervals

Example

Normal distribution

 
2
k
oi  ei 2  10.73  11.07 OK, but marginal
i 1 ei

Lognormal distribution

 
2
k
oi  ei 2  7.97  11.07 Clearly OK
i 1 ei

Hence, both the normal and lognormal distributions cannot be


rejected at  = 0.05.

However, the test shows that the lognormal is better than the
normal distribution
Appendix

How to calculate theoretical frequencies for


normal and lognormal distributions

Appendix
Normal distribution
Parameters:  = 7.5,  = 0.53 (estimated from raw data, which
is not shown in this example)
 6.75  7.5 
Bin 1 P ( X  6.75)        1.42 
 0 . 53 
 1  0.9222  0.0778
Theoretical frequency, e1 = 0.0778  143 = 11.1

 7  7 .5   6.75  7.5 
Bin 2 P (6.75  X  7.00)       
 0 . 53   0 . 53 
 0.1736  0.0778  0.0958
Theoretical frequency, e2 = 0.0958  143 = 13.7

Slightly different from 13.2 due to round-off error when


using statistical tables, but don’t worry about this
Appendix

Lognormal distribution

First derive the lognormal parameters  and  from  and 


 2  0 .53 2 
  ln  1  2   ln  1 
2
    0 .0706
2 
    7 . 5 
  ln   0 .5 2  ln( 7 .5 )  0 .5  0 .0706 2  2 .012

 ln(6.75)  2.012 
Bin 1 P ( X  6.75)        1.45 
 0.0706 
 1  0.9265  0.0735

Theoretical frequency, e1 = 0.0735  143 = 10.5

Appendix

Lognormal distribution

Bin 2
 ln(7)  2.012   ln(6.75)  2.012 
P (6.75  X  7.00)       
 0 . 0706   0 . 0706 
 0.1736  0.0735  0.100
Theoretical frequency, e2 = 0.1  143 = 14.3

Again, e1 and e2 are slightly different from actual values due to


round-off errors. Remember that the statistical tables only use 2
decimal places for z (actually quite crude)

You might also like