You are on page 1of 119

• Review of probability and statistics

• Input data analysis (input distribution


modeling)

Slide 1 of 80
Purpose & Overview

Steps:

2
Data Collection

3
Data Collection and Input Modeling
• Information is gathered in the form of samples, or collections of
observations.
• Samples are collected from populations that are collections of all
individuals or individual items of a particular type.

Copyright © 2010 Pearson Addison-Wesley. All rights reserved.


1-4
Simple “Sample size” simply

Random means the number of


elements in the sample.

Sampling

Simple Random Sampling:


• Is a procedure for sampling from a population in which the selection of a
sample unit is based on chance.
• Implies that that any particular sample, of a specified sample size, has the same
chance of being selected as any other sample with the same sample size.
• The importance of proper sampling revolves around the degree of confidence
with which the analyst is able to answer the questions being asked.

Copyright © 2010 Pearson Addison-Wesley. All rights reserved.


1-5
Identifying the distribution
• Review of common distribution
• Histograms
• Scatter diagrams

Simulation Modeling and Analysis – Chapter 1 – Basic Simulation Modeling Slide 6 of 51


Review Common Discrete Probability
Distributions
• Commonly Used Discrete Distributions
– Discrete Uniform Distribution
– Binomial
– Poisson
– Other Distributions
Discrete Uniform Distribution
Poisson Distribution
Summary of discrete probability distribution
Review Common Continuous
Probability Distributions
• Continuous distributions:
– Uniform
– Normal
– Lognormal 13

– Exponential
– Other Continuous Probability Distributions
 Triangular, Gamma, Rayleigh and Beta
Statistics: Chi-square, Student-t and F distributions
 Extreme Value Distributions
Continuous Uniform Distribution
Normal Distribution
Exponential Distribution
triangular Distribution
Identifying the distribution
• Review of common distribution
• Histograms
• Scatter diagrams

Simulation Modeling and Analysis – Chapter 1 – Basic Simulation Modeling Slide 16 of 51


Histogram/frequency
Distribution
 A histogram is a plot that lets you discover, and show, the
underlying frequency distribution (shape) of a set of
continuous data.
 A histogram is an accurate representation of the distribution of
numerical data and can be used as an estimate of the
probability distribution of a continuous variable.
Histogram/frequency Distribution (discrete)
 Discrete data comprise a listing of the observed values

 An example will be used to describe how to construct


frequency distribution
Histogram/frequency Distribution (discrete)

Poisson
Distribution
Building a histogram (Discrete)
example
Raw Data
10 8 5 1 6 0 4 6 2 3
2 3 5 9 2 0 2 4 2 3
5 1 8 9 1 9 3 7 4 0
2 6 3 1 4 5 0 3 3 2
2 10 0 3 6 0 6 5 7 0
8 2 3 7 0 2 2 1 0 4
0 2 4 1 2 5 1 5 3 2
8 6 3 4 6 11 3 2 8 0
2 4 2 4 1 3 1 2 1 2
3 10 0 7 3 5 3 7 3 4 21
Building a histogram (discrete)

Arrivals per
period Frequency
0 12 Histogram of Arrivals per Period
1 10 20

2 19 18

16 Poisson
3 17
4 10
14

12
Distribution
Frequency
5 8 10

6 7 8

6
7 5 4
8 5 2

9 3 0
0 1 2 3 4 5 6 7 8 9 10 11
10 3
11 1 22
Step 2.1: Identify the Probability Distribution
• Histogram with Continuous Data 79.919
Raw Data
3.081 0.062 1.961 5.845
3.027 6.505 0.021 0.013 0.123
6.769 59.899 1.192 34.760 5.009
Component Life 18.387 0.141 43.565 24.420 0.433
(days) Frequency 144.695 2.663 17.967 0.091 9.003
[0-3) 23 0.941 0.878 3.148 2.157 7.579
[3-6) 10 0.624 5.380 3.371 7.078 23.960
[6,9) 5 0.590 1.928 0.300 0.002 0.543
7.004 31.764 1.005 1.147 0.219
[9-12) 1
3.217 14.382 1.008 2.336 4.562
[12-15) 1
[15-18) 2
[18-21) 0
Histogram of Component Life
[21-24) 1 25
[24-27) 1
[27-30) 0
20
[30-33) 1
[33-36) 1
... ... Frequency 15
exponential
[42-45) 1
... ... 10
Distribution
[57-60) 1
... ... 23
5
[78-81) 1
... ...
[144-147) 1 0
3 6 9 12 15 18 21 24 27 30 33 36
24
Scatter Diagram
• A scatter diagram is a
graphical presentation of
the relationship between
two variables. One variable,
which is usually the
controllable one, is placed
on the x axis and the other,
or dependent variable, is
placed on the y axis.
Scatter Diagram

26
Scatter Diagram

27
Scatter Diagram

28
Selecting the Family of distributions

Slide 29 of 51
Estimators for obtaining
distribution parameters

Simulation Modeling and Analysis – Chapter 1 – Basic Simulation Modeling Slide 30 of 51


Sampling Statistics
 Using Sample Data to estimate population parameters
 Sample Parameters  Population Parameters
 Point Estimation: Point estimation is concerned with the
calculation of a single number from a set of observed data to
represent the parameter of underlying population.
 The decision process must also be capable of reflecting the
risk of making incorrect decision
 This decision making can be made using hypothesis testing,
but must first be represented by a sampling distribution
Location/Variability Grouped/Ungrouped

33
Distribution Parameters

• Relationships for the Methods of Moments - Discrete Case


Distribution
PMF Parameters Relationships
Type

Bernoulli p

Binomial p

Geometric p

Poisson 
Distribution Parameters
 Relationships for the Methods of Moments – Continuous
Case
Distribution
PDF Parameters Relationships
Type

Uniform a, b

Normal , 

Y, Y
Lognormal
(or , )

Exponential 
Distribution Parameter estimation
Example
• Example: Uniform Distribution
Consider the following histogram for a random variable
X. Show that If the data follows a uniform distribution
by using point estimate
Distribution Parameter estimation
Example
• Example (cont’d)
Uniform Probability Function Fit
Value Frequency
0.25
5 3
6 2 0.2
Histogram of Density Value

7 4
0.15
8 0
9 3
0.1
10 4
11 2 0.05
12 6
13 2 0
5 6 7 8 9 10 11 12 13 14 15
14 2 x Value
15 2
Distribution Parameter estimation
Example
Sample Mean =

= 300/30 = 10

Sample Variance =

= (3272 – 30* 10^2) / 29 = 9.379


Distribution Parameter estimation
Example
• Example (cont’d)
The parameters are a = 5, b=15
population mean = (b + a)/2
= 20/2 = 10

Population variance = (b-a)^2/12


=100/12 = 8.333
From point estimate, It seems that data can follow
uniform distribution because sample mean =
population mean
Sample variance nearly equal population variance
Parameter estimation example

40
41
After selecting the distribution and estimating
the parameters we need to TEST to see if the are
valid

We will use hypothesis testing


and
Goodness of fit testing

Simulation Modeling and Analysis – Chapter 1 – Basic Simulation Modeling Slide 42 of 51


Sample distributions for Mean
Sample distributions for Variance

Chi-squared

F
Slide 45 of 51
Testing a Statistical Hypothesis
• Statement of the hypothesis
• The test statistic is what we based our decision on
• Critical region defines the rejection area
• Critical value is the last number that we observe in passing
into the critical region.
Hypotheses Testing
 Hypothesis testing is a formal procedure for using statistical concepts and measures in
performing decision making.
 An idea that is proposed for the sake of argument so that it can be tested to see if it might be true.
In the scientific method.
 The hypothesis is constructed before any applicable research has been done, apart from a basic
background review
 The following six steps can be used to make a statistical analysis of a hypothesis

Formulate hypotheses

Select the appropriate statistical model that identifies the test statistic.
Procedure

Specify the level of significance.

Collect a sample of data and compute an estimate of the test statistic.

Define the region of rejection for the test statistic.

Determine the appropriate hypothesis.


Formulating Hypothesis
• The Null Hypothesis
 The null hypothesis is denoted by H0
 It is formulated as an equality, thus indicating that a difference
does not exist.
• The Alternative Hypothesis
 The alternate hypothesis is
denoted by HA
 It is formulated to indicate
that a difference does exist.
Select Statistic
• Select the appropriate statistical model that identifies the
test statistic
– Determine the Test Statistic and Its Sampling Distribution
– The test statistic is random variable that has a distribution
– Example test statistics:
Level of Significance and errors
• Significance level α provides a probabilistic framework for accepting or rejecting the
null hypothesis, and consequently making a decision.
• The Level of Significance and errors
Errors in decision making:
 Type I error: α reject H0 when in fact H0 is true Fortunately, the
probability of
 Type II error: β : accept H0 when in fact H0 is false committing both types
of error can be
reduced by increasing
the sample size.
Define the region of rejection for the test statistic.
The critical value of the test statistic depends on:
1. The statement of the alternative hypothesis.
2. The distribution of the test statistic.
3. The level of significance, and
4. The characteristics of the sample data.

Rejection
Region
Example
Simulation Modeling and Analysis – Chapter 1 – Basic Simulation Modeling Slide 53 of 51
A manufacturer of car batteries claims that the life of the
company’s batteries is approximately normally distributed with a
Example standard deviation equal to 0.9 year. If a random sample of 10 of
these batteries has a standard deviation of 1.2 years, do you think
that σ > 0.9 year? Use a 0.05 level of significance.
Slide 55 of 51
Hypothesis testing for Distribution Type
“Goodness of Fit”

The tests are based on how good a fit we have between the frequency of occurrence of
observations in an observed sample and the expected frequencies obtained from the
hypothesized distribution.
– three tests are available:
 The Chi-square

 The Kolmogorov-Smirnov (K-S) test.

 Anderson-Darling Test
Chi-square Test for Goodness of Fit

• Formulate hypothesis
– H0: The random variable has the specified population distribution with
the parameters indicated.
– HA: The random variable is not distributed as specified.
Chi-square Test for Goodness of Fit
Test Statistic
– The chi-square goodness-of-fit test compares the observed frequencies O1, O2,
…, Ok of k values (k intervals) with the corresponding frequencies E1, E2,…,Ek
from an assumed or theoretical distribution.
– The test statistic is:

2 is the computed value of a random variable having a chi-square distribution


with k – 1-(no. of parameters estimated from sample) degrees of freedom; Oi and
Ei are the observed and expected frequencies in cell (or interval) i, and k is the
number of discrete cells (intervals) into which data were separated.
Guidance: k > 3 and Ei > 5
– Or
Chi-square Test for Goodness of Fit
The region of rejection
 The assumed distribution yields

This theoretical distribution is an acceptable model if


2 < 2,
Otherwise, it is not acceptable at the  significance level.
Chi-square Test for Goodness of Fit
The region of rejection

This theoretical distribution is an acceptable model if


p value > α

Otherwise, it is not acceptable


Chi-square test for the Poisson
distribution (Example 1)

if number of rainstorm were recorded


annually with number of years observed.
Does data follow a poisson distribution.
Use .05 significance level
Number of rainstorms Number of years observed
annually
0 20
1 23
2 15
3 6
4 2
Chi-square test for the Poisson
distribution (Example 1)

First:

second:
Chi-square test for the Poisson
distribution (Example 1)
No. of Observed Expected
storms at frequency frequency,
station per , Oi Ei
year
Chi-square test for the Poisson
distribution (Example 1)
No. of Observed Expected
storms at frequency frequency,
station per , Oi Ei
year

0 20

1 23

2 15

3 6

4 2

total
66
Chi-square test for the Poisson
distribution (Example 1)
No. of Observed Expected
storms at frequency frequency,
station per , Oi Ei
year

0 20 0.302

1 23 0.3616

2 15 0.2164

3 6 0.08635

4 2 0.02584

total
66
Chi-square test for the Poisson
distribution (Example 1)
No. of Expected
Observed frequency,
storms at frequency
station per Ei
, Oi
year Np = 66p

0 20 0.302 19.94

1 23 0.3616 23.87

2 15 0.2164 14.29

3 6 0.08635 5.70

4 2 0.02584 1.71

total
66 ~66
Chi-square test for the Poisson
distribution (Example 1)

No. of Observed Expected


storms at frequency frequency,
station per , Oi Ei
year

0 20 0.302 19.94

1 23 0.3616 23.87

2 15 0.2164 14.29

>3 8 0.08635+0.02584 7.90

total 66 ~66.00
Chi-square test for the Poisson
distribution (Example 1)

No. of Observed Expected


storms at frequency frequency,
station per , Oi Ei
year

0 20 0.302 19.94 0.0036

1 23 0.3616 23.87 0.7569

2 15 0.2164 14.29 0.5041

>3 8 0.08635+0.02584 7.90 0.0100

total 66 66.00
Chi-square test for the Poisson
distribution (Example 1)

No. of Expected
Observed frequency,
storms at frequency
station per Ei
, Oi
year =66*pi

0 20 0.302 19.94 0.0036 0.0002

1 23 0.3616 23.87 0.7569 0.0317

2 15 0.2164 14.29 0.5041 0.0353

>3 8 0.08635+0.02584 7.90 0.0100 0.0013

total 66 66.00 0.0685


Chi-square test for the Poisson
distribution (Example 1)
Slide 71 of 51
Promodel:

Tools: Stat::fit
Enter data

Input : options
Promodel:

Input : graph
Promodel:
Statistics:
descriptive
Fit
Auto::Fit
Fit
Setup
Interval type :equal length
Fit: Goodness of Fit
Fit: Goodness of Fit
Slide 79 of 51
Chi-square test for the Poisson
distribution (Example 2)
Promodel:
Input : options
Chi-square test for the Poisson
distribution (Example 2)
Input : graph
statistics : descriptive
Chi-square test for the Poisson
distribution (Example 2)
Fit : setup
Chi-square test for the Poisson
distribution (Example 2)
Promodel:
Fit : auto fit
Chi-square test for the Poisson
distribution (Example 2)
Promodel:
Fit : Goodness of fit
Distribution Parameter estimation
Example
• Example (cont’d)
Uniform Probability Function Fit
Value Frequency
0.25
5 3
6 2 0.2
Histogram of Density Value

7 4
0.15
8 0
9 3
0.1
10 4
11 2 0.05
12 6
13 2 0
5 6 7 8 9 10 11 12 13 14 15
14 2 x Value
15 2
Chi-square test for the uniform
distribution (Example 3)
Promodel:
statistics :
descriptive
Chi-square test for the uniform
distribution (Example 3)
Promodel:
Input : options
Chi-square test for the uniform
distribution (Example 3)
Promodel:
Input : graph
Chi-square test for the uniform
distribution (Example 3)
Chi-square test for the uniform
distribution (Example 3)
Promodel:
fit : auto::fit
Chi-square test for the uniform
distribution (Example 3)
Promodel:
fit : auto::fit
Chi-square test for the uniform
distribution (Example 3)
Promodel:
fit : setup
Chi-square test for the uniform
distribution (Example 3)
Promodel:
fit : goodness of fit
Stat::Fit Promodel

Slide 94 of 51
Stat::Fit Promodel

Slide 95 of 51
Stat::Fit Promodel

Slide 96 of 51
Stat::Fit Promodel

Slide 97 of 51
Stat::Fit Promodel

Slide 98 of 51
Stat::Fit Promodel

Slide 99 of 51
Stat::Fit Promodel

Slide 100 of 51
Stat::Fit Promodel

Slide 101 of 51
example
• A six-sided die is rolled 120 times with the following
distribution of outcomes. Does data follow uniform
distribution with significance level of .05

Simulation Modeling and Analysis – Chapter 1 – Basic Simulation Modeling Slide 102 of 51
example
• By using point estimate =
• X bar = (1*15+2*13+3*28+4*25+5*12+6*27)/120 =
447/120= 3.725

• S^2 = (1991 – 120*3.725^2)/119 =325.925/119 = 2.738

Simulation Modeling and Analysis – Chapter 1 – Basic Simulation Modeling Slide 103 of 51
example
• For discrete uniform:
• a=1, b=6

• µx =(6+1)/2 = 3.5
• V(x) = (7*5)/12 = 2.91
• it seems that data follows uniform distribution with a = 1
b= 6

Simulation Modeling and Analysis – Chapter 1 – Basic Simulation Modeling Slide 104 of 51
example
• The following hypothesis test has been set up:
• Ho: data follow uniform distribution with a=1, b=6
• H1: data doesn’t follow uniform distribution with a=1, b=6
• E = constant = np=120*1/6=20

Slide 105 of 51
example
example
example
example
example
example
Slide 112 of 51
By using point estimate:

Poisson seems to be a good appropriate of data

Slide 113 of 51
histogram

Slide 114 of 51
By using test of hypothesis:

Slide 115 of 51
Slide 116 of 51
Slide 117 of 51
Slide 118 of 51
Slide 119 of 51

You might also like