Prop Final 4

• Review of probability and statistics
• Input data analysis (input distribution

modeling)
Slide 1 of 80
Purpose & Overview
Steps:
2
Data Collection
3
Data Collection and Input Modeling
• Information is gathered in the form of samples, or collections of
observations.
• Samples are collected from populations that are collections of all
individuals or individual items of a particular type.
Copyright © 2010 Pearson Addison-Wesley. All rights reserved.

1-4
Simple “Sample size” simply
Random means the number of

elements in the sample.
Sampling
Simple Random Sampling:

• Is a procedure for sampling from a population in which the selection of a
sample unit is based on chance.
• Implies that that any particular sample, of a specified sample size, has the same
chance of being selected as any other sample with the same sample size.
• The importance of proper sampling revolves around the degree of confidence
with which the analyst is able to answer the questions being asked.
Copyright © 2010 Pearson Addison-Wesley. All rights reserved.

1-5
Identifying the distribution
• Review of common distribution
• Histograms
• Scatter diagrams
Simulation Modeling and Analysis – Chapter 1 – Basic Simulation Modeling Slide 6 of 51

Review Common Discrete Probability
Distributions
• Commonly Used Discrete Distributions
– Discrete Uniform Distribution
– Binomial
– Poisson
– Other Distributions
Discrete Uniform Distribution
Poisson Distribution
Summary of discrete probability distribution
Review Common Continuous
Probability Distributions
• Continuous distributions:
– Uniform
– Normal
– Lognormal 13
– Exponential
– Other Continuous Probability Distributions
 Triangular, Gamma, Rayleigh and Beta
Statistics: Chi-square, Student-t and F distributions
 Extreme Value Distributions
Continuous Uniform Distribution
Normal Distribution
Exponential Distribution
triangular Distribution
Identifying the distribution
• Review of common distribution
• Histograms
• Scatter diagrams

Histogram/frequency
Distribution
 A histogram is a plot that lets you discover, and show, the
underlying frequency distribution (shape) of a set of
continuous data.
 A histogram is an accurate representation of the distribution of
numerical data and can be used as an estimate of the
probability distribution of a continuous variable.
Histogram/frequency Distribution (discrete)
 Discrete data comprise a listing of the observed values
 An example will be used to describe how to construct

frequency distribution
Histogram/frequency Distribution (discrete)
Poisson
Distribution
Building a histogram (Discrete)
example
Raw Data
10 8 5 1 6 0 4 6 2 3
2 3 5 9 2 0 2 4 2 3
5 1 8 9 1 9 3 7 4 0
2 6 3 1 4 5 0 3 3 2
2 10 0 3 6 0 6 5 7 0
8 2 3 7 0 2 2 1 0 4
0 2 4 1 2 5 1 5 3 2
8 6 3 4 6 11 3 2 8 0
2 4 2 4 1 3 1 2 1 2
3 10 0 7 3 5 3 7 3 4 21
Building a histogram (discrete)
Arrivals per
period Frequency
0 12 Histogram of Arrivals per Period
1 10 20
2 19 18
16 Poisson
3 17
4 10
14
12
Distribution
Frequency
5 8 10
6 7 8
6
7 5 4
8 5 2
9 3 0
0 1 2 3 4 5 6 7 8 9 10 11
10 3
11 1 22
Step 2.1: Identify the Probability Distribution
• Histogram with Continuous Data 79.919
Raw Data
3.081 0.062 1.961 5.845
3.027 6.505 0.021 0.013 0.123
6.769 59.899 1.192 34.760 5.009
Component Life 18.387 0.141 43.565 24.420 0.433
(days) Frequency 144.695 2.663 17.967 0.091 9.003
[0-3) 23 0.941 0.878 3.148 2.157 7.579
[3-6) 10 0.624 5.380 3.371 7.078 23.960
[6,9) 5 0.590 1.928 0.300 0.002 0.543
7.004 31.764 1.005 1.147 0.219
[9-12) 1
3.217 14.382 1.008 2.336 4.562
[12-15) 1
[15-18) 2
[18-21) 0
Histogram of Component Life
[21-24) 1 25
[24-27) 1
[27-30) 0
20
[30-33) 1
[33-36) 1
... ... Frequency 15
exponential
[42-45) 1
... ... 10
Distribution
[57-60) 1
... ... 23
5
[78-81) 1
... ...
[144-147) 1 0
3 6 9 12 15 18 21 24 27 30 33 36
24
Scatter Diagram
• A scatter diagram is a
graphical presentation of
the relationship between
two variables. One variable,
which is usually the
controllable one, is placed
on the x axis and the other,
or dependent variable, is
placed on the y axis.
Scatter Diagram
26
Scatter Diagram
27
Scatter Diagram
28
Selecting the Family of distributions
Slide 29 of 51
Estimators for obtaining
distribution parameters

Sampling Statistics
 Using Sample Data to estimate population parameters
 Sample Parameters  Population Parameters
 Point Estimation: Point estimation is concerned with the
calculation of a single number from a set of observed data to
represent the parameter of underlying population.
 The decision process must also be capable of reflecting the
risk of making incorrect decision
 This decision making can be made using hypothesis testing,
but must first be represented by a sampling distribution
Location/Variability Grouped/Ungrouped
33
Distribution Parameters
• Relationships for the Methods of Moments - Discrete Case

Distribution
PMF Parameters Relationships
Type
Bernoulli p
Binomial p
Geometric p
Poisson 
Distribution Parameters
 Relationships for the Methods of Moments – Continuous
Case
Distribution
PDF Parameters Relationships
Type
Uniform a, b
Normal , 
Y, Y
Lognormal
(or , )
Exponential 
Distribution Parameter estimation
Example
• Example: Uniform Distribution
Consider the following histogram for a random variable
X. Show that If the data follows a uniform distribution
by using point estimate
Example
• Example (cont’d)
Uniform Probability Function Fit
Value Frequency
0.25
5 3
6 2 0.2
Histogram of Density Value
7 4
0.15
8 0
9 3
0.1
10 4
11 2 0.05
12 6
13 2 0
5 6 7 8 9 10 11 12 13 14 15
14 2 x Value
15 2
Example
Sample Mean =
= 300/30 = 10
Sample Variance =
= (3272 – 30* 10^2) / 29 = 9.379

Example
The parameters are a = 5, b=15
population mean = (b + a)/2
= 20/2 = 10
Population variance = (b-a)^2/12

=100/12 = 8.333
From point estimate, It seems that data can follow
uniform distribution because sample mean =
population mean
Sample variance nearly equal population variance
Parameter estimation example
40
41
After selecting the distribution and estimating
the parameters we need to TEST to see if the are
valid
We will use hypothesis testing

and
Goodness of fit testing

Sample distributions for Mean
Sample distributions for Variance
Chi-squared
F
Slide 45 of 51
Testing a Statistical Hypothesis
• Statement of the hypothesis
• The test statistic is what we based our decision on
• Critical region defines the rejection area
• Critical value is the last number that we observe in passing
into the critical region.
Hypotheses Testing
 Hypothesis testing is a formal procedure for using statistical concepts and measures in
performing decision making.
 An idea that is proposed for the sake of argument so that it can be tested to see if it might be true.
In the scientific method.
 The hypothesis is constructed before any applicable research has been done, apart from a basic
background review
 The following six steps can be used to make a statistical analysis of a hypothesis
Formulate hypotheses
Select the appropriate statistical model that identifies the test statistic.
Procedure
Specify the level of significance.
Collect a sample of data and compute an estimate of the test statistic.
Define the region of rejection for the test statistic.
Determine the appropriate hypothesis.

Formulating Hypothesis
• The Null Hypothesis
 The null hypothesis is denoted by H0
 It is formulated as an equality, thus indicating that a difference
does not exist.
• The Alternative Hypothesis
 The alternate hypothesis is
denoted by HA
 It is formulated to indicate
that a difference does exist.
Select Statistic
• Select the appropriate statistical model that identifies the
test statistic
– Determine the Test Statistic and Its Sampling Distribution
– The test statistic is random variable that has a distribution
– Example test statistics:
Level of Significance and errors
• Significance level α provides a probabilistic framework for accepting or rejecting the
null hypothesis, and consequently making a decision.
• The Level of Significance and errors
Errors in decision making:
 Type I error: α reject H0 when in fact H0 is true Fortunately, the
probability of
 Type II error: β : accept H0 when in fact H0 is false committing both types
of error can be
reduced by increasing
the sample size.
Define the region of rejection for the test statistic.
The critical value of the test statistic depends on:
1. The statement of the alternative hypothesis.
2. The distribution of the test statistic.
3. The level of significance, and
4. The characteristics of the sample data.
Rejection
Region
Example
A manufacturer of car batteries claims that the life of the
company’s batteries is approximately normally distributed with a
Example standard deviation equal to 0.9 year. If a random sample of 10 of
these batteries has a standard deviation of 1.2 years, do you think
that σ > 0.9 year? Use a 0.05 level of significance.
Slide 55 of 51
Hypothesis testing for Distribution Type
“Goodness of Fit”
•
The tests are based on how good a fit we have between the frequency of occurrence of
observations in an observed sample and the expected frequencies obtained from the
hypothesized distribution.
– three tests are available:
 The Chi-square
 The Kolmogorov-Smirnov (K-S) test.
 Anderson-Darling Test
Chi-square Test for Goodness of Fit
• Formulate hypothesis
– H0: The random variable has the specified population distribution with
the parameters indicated.
– HA: The random variable is not distributed as specified.
Test Statistic
– The chi-square goodness-of-fit test compares the observed frequencies O1, O2,
…, Ok of k values (k intervals) with the corresponding frequencies E1, E2,…,Ek
from an assumed or theoretical distribution.
– The test statistic is:
2 is the computed value of a random variable having a chi-square distribution

with k – 1-(no. of parameters estimated from sample) degrees of freedom; Oi and
Ei are the observed and expected frequencies in cell (or interval) i, and k is the
number of discrete cells (intervals) into which data were separated.
Guidance: k > 3 and Ei > 5
– Or
The region of rejection
 The assumed distribution yields
This theoretical distribution is an acceptable model if

2 < 2,
Otherwise, it is not acceptable at the  significance level.
The region of rejection
This theoretical distribution is an acceptable model if

p value > α
Otherwise, it is not acceptable

Chi-square test for the Poisson
distribution (Example 1)
if number of rainstorm were recorded

annually with number of years observed.
Does data follow a poisson distribution.
Use .05 significance level
Number of rainstorms Number of years observed
annually
0 20
1 23
2 15
3 6
4 2
First:
second:
No. of Observed Expected
storms at frequency frequency,
station per , Oi Ei
year
station per , Oi Ei
year
0 20
1 23
2 15
3 6
4 2
total
66
station per , Oi Ei
year
0 20 0.302
1 23 0.3616
2 15 0.2164
3 6 0.08635
4 2 0.02584
total
66
No. of Expected
Observed frequency,
storms at frequency
station per Ei
, Oi
year Np = 66p
0 20 0.302 19.94
1 23 0.3616 23.87
2 15 0.2164 14.29
3 6 0.08635 5.70
4 2 0.02584 1.71
total
66 ~66

station per , Oi Ei
year
0 20 0.302 19.94
1 23 0.3616 23.87
2 15 0.2164 14.29
>3 8 0.08635+0.02584 7.90
total 66 ~66.00

station per , Oi Ei
year
0 20 0.302 19.94 0.0036
1 23 0.3616 23.87 0.7569
2 15 0.2164 14.29 0.5041
>3 8 0.08635+0.02584 7.90 0.0100
total 66 66.00
No. of Expected
Observed frequency,
storms at frequency
station per Ei
, Oi
year =66*pi
0 20 0.302 19.94 0.0036 0.0002
1 23 0.3616 23.87 0.7569 0.0317
2 15 0.2164 14.29 0.5041 0.0353
>3 8 0.08635+0.02584 7.90 0.0100 0.0013
total 66 66.00 0.0685

Slide 71 of 51
Promodel:
Tools: Stat::fit
Enter data
Input : options
Promodel:
Input : graph
Promodel:
Statistics:
descriptive
Fit
Auto::Fit
Fit
Setup
Interval type :equal length
Fit: Goodness of Fit
Fit: Goodness of Fit
Slide 79 of 51
Promodel:
Input : options
Input : graph
statistics : descriptive
Fit : setup
Promodel:
Fit : auto fit
Promodel:
Fit : Goodness of fit
Example
Uniform Probability Function Fit
Value Frequency
0.25
5 3
6 2 0.2
Histogram of Density Value
7 4
0.15
8 0
9 3
0.1
10 4
11 2 0.05
12 6
13 2 0
5 6 7 8 9 10 11 12 13 14 15
14 2 x Value
15 2
Chi-square test for the uniform
Promodel:
statistics :
descriptive
Promodel:
Input : options
Promodel:
Input : graph
Promodel:
fit : auto::fit
Promodel:
fit : auto::fit
Promodel:
fit : setup
Promodel:
fit : goodness of fit
Stat::Fit Promodel
Slide 94 of 51
Stat::Fit Promodel
Slide 95 of 51
Stat::Fit Promodel
Slide 96 of 51
Stat::Fit Promodel
Slide 97 of 51
Stat::Fit Promodel
Slide 98 of 51
Stat::Fit Promodel
Slide 99 of 51
Stat::Fit Promodel
Slide 100 of 51
Stat::Fit Promodel
Slide 101 of 51
example
• A six-sided die is rolled 120 times with the following
distribution of outcomes. Does data follow uniform
distribution with significance level of .05
example
• By using point estimate =
• X bar = (1*15+2*13+3*28+4*25+5*12+6*27)/120 =
447/120= 3.725
• S^2 = (1991 – 120*3.725^2)/119 =325.925/119 = 2.738
example
• For discrete uniform:
• a=1, b=6
• µx =(6+1)/2 = 3.5
• V(x) = (7*5)/12 = 2.91
• it seems that data follows uniform distribution with a = 1
b= 6
example
• The following hypothesis test has been set up:
• Ho: data follow uniform distribution with a=1, b=6
• H1: data doesn’t follow uniform distribution with a=1, b=6
• E = constant = np=120*1/6=20
Slide 105 of 51
example
example
example
example
example
example
Slide 112 of 51
By using point estimate:
Poisson seems to be a good appropriate of data
Slide 113 of 51
histogram
Slide 114 of 51
By using test of hypothesis:
Slide 115 of 51
Slide 116 of 51
Slide 117 of 51
Slide 118 of 51
Slide 119 of 51

Prop Final 4

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Prop Final 4

Uploaded by

Copyright:

Available Formats

• Review of probability and statistics

• Input data analysis (input distribution

Copyright © 2010 Pearson Addison-Wesley. All rights reserved.

Random means the number of

Simple Random Sampling:

Copyright © 2010 Pearson Addison-Wesley. All rights reserved.

Simulation Modeling and Analysis – Chapter 1 – Basic Simulation Modeling Slide 6 of 51

Simulation Modeling and Analysis – Chapter 1 – Basic Simulation Modeling Slide 16 of 51

 An example will be used to describe how to construct

Simulation Modeling and Analysis – Chapter 1 – Basic Simulation Modeling Slide 30 of 51

• Relationships for the Methods of Moments - Discrete Case

= (3272 – 30* 10^2) / 29 = 9.379

Population variance = (b-a)^2/12

We will use hypothesis testing

Simulation Modeling and Analysis – Chapter 1 – Basic Simulation Modeling Slide 42 of 51

Specify the level of significance.

Collect a sample of data and compute an estimate of the test statistic.

Define the region of rejection for the test statistic.

Determine the appropriate hypothesis.

 The Kolmogorov-Smirnov (K-S) test.

2 is the computed value of a random variable having a chi-square distribution

This theoretical distribution is an acceptable model if

This theoretical distribution is an acceptable model if

Otherwise, it is not acceptable

if number of rainstorm were recorded

No. of Observed Expected

>3 8 0.08635+0.02584 7.90

No. of Observed Expected

0 20 0.302 19.94 0.0036

1 23 0.3616 23.87 0.7569

2 15 0.2164 14.29 0.5041

>3 8 0.08635+0.02584 7.90 0.0100

0 20 0.302 19.94 0.0036 0.0002

1 23 0.3616 23.87 0.7569 0.0317

2 15 0.2164 14.29 0.5041 0.0353

>3 8 0.08635+0.02584 7.90 0.0100 0.0013

total 66 66.00 0.0685

• S^2 = (1991 – 120*3.725^2)/119 =325.925/119 = 2.738

Poisson seems to be a good appropriate of data

You might also like