Professional Documents
Culture Documents
Comparing System
Comparing System
SYSTEMS
Lecture Note
Introduction
In many cases, simulations are conducted to
compare two or more alternative designs of a
system with the goal of identifying the superior
system relative to some performance measure
Comparing alternative system design requires
careful analysis to ensure that differences being
observed are attributable to actual differences
in performance and not to statistical variation.
2
Hypothesis Testing
A null hypothesis, denoted H0, is drafted to
state the value of 1 is not significantly
different than the value of 2 at the level of
significance.
An alternative hypothesis, denoted H1, is
drafted to oppose the null hypothesis H0. For
example, H1 could state that 1 and 2 are
different.
Formally,
H0: 1 = 2 or equivalently H0: 1 2 = 0
H1: 1 2 or equivalently H1: 1 2 0
3
Hypothesis Testing
The level of significance refers to the
probability of making a Type I error.
A Type I error occurs when H0 is rejected, but
in fact H0 is true.
A Type II error occurs when it is failed to
reject H0 but in fact H1 is true.
Hypothesis testing methods are designed
such that the probability of making a type II
error, , is as small as possible for a given
value of .
4
Hypothesis Testing
5
Comparing Two Alternative Designs
6
Comparing Two Alternative Designs
7
Comparing Two Alternative Designs
Hypothesis:
H0: 1 2 = 0
H1: 1 2 0
8
Welch Confidence Interval for
Comparing Two System
The Welch confidence interval method requires
that the observations drawn from each population
(simulated system) be normally distributed and
independent within a population and between
populations.
The Welch confidence interval method does not
require that the number of samples drawn from
one population (n1) equal the number of samples
from the other population (n2).
This approach does not require that two
population have equal variances (12 = 22 = 2).
9
Welch Confidence Interval for
Comparing Two System
The Welch confidence interval for an level of
significance:
Px1 x2 hw 1 2 x1 x2 hw 1
s12 s22
hw tdf , / 2
n1 n2
df
s2
1 n1 s 2
2 n2
2
s 2
1 n1
2
n1 1 s 2
2 n2
2
n2 1
10
Welch Confidence Interval for
Comparing Two System
Tolak H0
11
Paired-t Confidence Interval for
Comparing Two Systems (1)
Paired-t confidence interval method requires
that the observations drawn from each
population (simulated systems) be normally
distributed and independent within a
population.
The paired-t confidence interval method does
not require that the observations between
populations be independent.
The paired-t confidence interval method
requires that the populations have equal
variances (12 = 22 = 2).
12
Paired-t Confidence Interval for
Comparing Two Systems
x
n
n
x(12)
x
2
(1 2 ) j
(1 2 ) j
j 1
x(12)
j 1 s
n n 1
hw
t n 1, 2 s1 2
n
The paired-t confidence interval for an level of
significance is
Px1 2 hw 1 2 x1 2 hw 1
13
Paired-t Confidence Interval for
Comparing Two Systems
14
Paired-t Confidence Interval for
Comparing Two Systems
Tolak H0
15
Comparing More Than Two
Alternative Designs
Methods:
Bonferroni approach
Advanced statistical models (ANOVA)
16
The Bonferroni Approach for
Comparing More Than Two Systems
The Bonferroni approach is useful when there
are more than two alternative system designs to
compare with respect to some performance
measure.
Given K alternative system designs to compare,
the null hypothesis H0 and alternative
hypothesis H1 become
H0 : 1 = 1 = ... = K = for K alternative system
H1 : i i’ for at least one pair i i’
17
The Bonferroni Approach for
Comparing More Than Two Systems
The null hypothesis H0 states that the means
from the K populations (mean output of the K
different simulation models) are not different.
The alternative hypothesis H1 states that at
least one pair of the means are different.
The Bonferroni approach is very similar to the
two confidence interval methods that it is
based on computing confidence intervals to
determine if the true mean performance of one
system (i) is significantly different than the
true mean performance of another system (i’).
18
The Bonferroni Approach for
Comparing More Than Two Systems
Number of pairwise comparison for K
candidate designs
K K 1
2
19
The Bonferroni Approach for
Comparing More Than Two Systems
i , i 1, , K K 1 2
K K 1 2
x( p q )
t n 1, i / 2 s
( p q )
20
The Bonferroni Approach for
Comparing More Than Two Systems
21
The Bonferroni Approach
22
The Bonferroni Approach for
Comparing More Than Two Systems
23
Advanced Statistical Models for Comparing
More Than Two Alternative Systems
Analysis of variance (ANOVA) in conjunction with a
multiple comparison test provides a means for
comparing a much larger number of alternative
designs.
Given K alternative system designs to compare, the
null hypothesis H0 and alternative hypothesis H1
become
H0 : 1 = 1 = ... = K = for K alternative system
H1 : i i’ for at least one pair i i’
24
Advanced Statistical Models for Comparing
More Than Two Alternative Systems
25
Advanced Statistical Models for Comparing
More Than Two Alternative Systems
H0 :1 = 2 = 3 = 0
26
ANOVA
Analysis of variance (ANOVA) allow us to
partition the total variation in the output
response from the simulated system into two
component
– Variation due to the effect of the treatments
– Variation due to experimental error (the
inherent variability in the simulated system)
For this problem, we are interested is knowing
if the variation due to the treatment is
sufficient to conclude that the performance of
one strategy is significantly different than the
other with respect to mean throughput of the
system.
27
ANOVA
Assumptions:
– The observations are drawn from normally
distributed populations.
– The observations are independent within a
strategy and between strategies.
The variance reduction technique based on
common random number (CRN) cannot be
used with this method.
28
ANOVA
The fixed-effects model:
xij i ij , i 1,, K ; j 1,, n
30
ANOVA
Sum of squares i: 2
n
x
ij
n
,
SS x
j 1
i
2
ij i 1, ,K
j 1 n
Grand total:
K n K
x.. xij xi.
i 1 j 1 i 1
Overall mean:
K n
x
i 1 j 1
ij
x..
x.. ; N nK
N N
31
ANOVA
Degree ofdf(treatme K 1
freedomnt)error
df(error) N K
32
ANOVA
Sum of squares error:
K
SSEi SS i
i 1
33
ANOVA
Mean square treatment:
SST
MST
df (treatment )
Mean square error:
SSE
MSE
df (error)
Calculated F statistics:
MST
FCALC
MSE
34
ANOVA
35
Multiple Comparison Test
The hypothesis test suggests that not all
designs are the same with respect to a
particular response, but it does not identify
which designs perform differently.
The Fisher’s least significant difference
(LSD) test is used to identify which designs
perform differently.
It is generally recommended to conduct a
hypothesis test prior to the LSD test to
determine if one or more pairs of treatments
are significantly different.
36
Multiple Comparison Test
The LSD test requires the calculation of a test
statistics used to evaluate all pairwise
comparisons of the sample mean from each
population.
Number of pairwise comparisons for K
candidate designs = K(K – 1)/2
The LSD test statistics:
2MSE
LSD t( df(error), / 2)
n
37
Multiple Comparison Test
The decision rule: If the difference in the
sample mean response values exceeds the
LSD test statistics, the population mean
response values are significantly different at
a given level of significance.
38
Multiple Comparison Test
For this problem, the LSD test statistic is
determined at the = 0.05 level of significant
2MSE
LSD( 0.05) t 27, 0.0025
n
21.23
2.052 1.02
10
39
Multiple Comparison Test
40
One-way ANOVA with SPSS
Observation
41
One-way ANOVA with SPSS
42
One-way ANOVA with SPSS
43
One-way ANOVA with SPSS
44
One-way ANOVA with SPSS
Descriptives
95% Confidence
Interval for Mean
Std. Lower Upper
N Mean Deviation Std. Error Bound Bound Minimum Maximum
OBS STRA 1,000 10 56,30200 1,37371 ,43441 55,31931 57,28469 54,480 58,330
2,000 10 54,63300 1,16549 ,36856 53,79926 55,46674 52,140 56,010
3,000 10 57,39200 ,65794 ,20806 56,92134 57,86266 56,110 58,300
Total 30 56,10900 1,57266 ,28713 55,52176 56,69624 52,140 58,330
45
One-way ANOVA with SPSS
Levene
Statistic df1 df2 Sig.
OBS 3,195 2 27 ,057
46
One-way ANOVA with SPSS
ANOVA
Sum of Mean
Squares df Square F Sig.
OBS Between
38,619 2 19,310 15,749 ,000
Groups
Within
33,105 27 1,226
Groups
Total 71,724 29
47
One-way ANOVA with SPSS
Multiple Com parisons
95% Confidence
Mean Int erval
Difference Lower Upper
(I) STRA (J) STRA (I-J) St d. Error Sig. Bound Bound
1,000 2,000 1,66900* ,495 ,002 ,65294 2,68506
3,000 -1, 09000* ,495 ,036 -2, 10606 -7, 4E-02
2,000 1,000 -1, 66900* ,495 ,002 -2, 68506 -,65294
3,000 -2, 75900* ,495 ,000 -3, 77506 -1, 74294
3,000 1,000 1,09000* ,495 ,036 7,39E-02 2,10606
2,000 2,75900* ,495 ,000 1,74294 3,77506
*. The mean differenc e is signific ant at the .05 level.
48
Factorial Designs and Optimization
In simulation experiments, it is sometimes
intended in finding out how different decision
variable settings impact the response of the
system rather than simply comparing one
candidate system to another.
There are often many decision variables of
interests for complex systems.
Rather than run hundreds of experiments for
every possible variable settings, experimental
design techniques can be used as a shortcut
for finding those decision variables of
greatest significance.
49
Factorial Designs and Optimization
Using experimental design terminology,
decision variables are referred to as factors
and the output measures are referred to as
responses.
Once the response of interest has been
identified and the factors that are suspected
of having an influence in this response
defined, a factorial design method that
prescribes how many runs to make and what
level or value to use for each factor is used.
50
Factorial Designs and Optimization
51
Factorial Designs and Optimization
One type of experiment that looks at the
combined effect of multiple factors on system
response is referred to as a two-level, full
factorial design.
For experiments in which a large number of
factors are considered, a fractional-factorial
design is used to strategically select a subset of
combinations to test in order to “screen-out”
factors with little or no impact on system
performance.
In case of there are many combinations,
optimization techniques are used to search for
the combination that produces the most
desirable response.
52
Variance Reduction Techniques
The variance of a performance measure
computed from the output of simulations can be
reduced.
Reducing the variance allows to estimate the
mean value of a random variable within a
desired level of precision and confidence with
fewer replications (independent observations).
The reduction in the required number of
replications is achieved by controlling how
random numbers are used to “drive” the events
in the simulation model.
The use of common random number (CRN) is
perhaps one of the most popular variance
reduction techniques.
53
Common Random Number
The common random number (CRN) technique
was invented for comparing alternative system
designs.
The CRN technique provides a means for
comparing alternative system designs under
more equal experimental conditions,
This is helpful in ensuring that the observed
differences in the performance of two system
designs are due to the differences in the designs
and not to differences in experimental conditions.
The goal is to evaluate each system under the
exact same circumstances to ensure a fair
comparison.
54
Common Random Number
The common random number (CRN) technique was
invented for comparing alternative system designs.
The CRN technique provides a means for comparing
alternative system designs under more equal
experimental conditions,
This is helpful in ensuring that the observed
differences in the performance of two system
designs are due to the differences in the designs
and not to differences in experimental conditions.
The goal is to evaluate each system under the exact
same circumstances to ensure a fair comparison.
55
Common Random Number
56
Common Random Number
57
Common Random Number
58
Common Random Number
59