You are on page 1of 59

COMPARING

SYSTEMS
Lecture Note
Introduction
 In many cases, simulations are conducted to
compare two or more alternative designs of a
system with the goal of identifying the superior
system relative to some performance measure
 Comparing alternative system design requires
careful analysis to ensure that differences being
observed are attributable to actual differences
in performance and not to statistical variation.

2
Hypothesis Testing
 A null hypothesis, denoted H0, is drafted to
state the value of 1 is not significantly
different than the value of 2 at the  level of
significance.
 An alternative hypothesis, denoted H1, is
drafted to oppose the null hypothesis H0. For
example, H1 could state that 1 and 2 are
different.
 Formally,
H0: 1 = 2 or equivalently H0: 1  2 = 0
H1: 1  2 or equivalently H1: 1  2  0

3
Hypothesis Testing
 The  level of significance refers to the
probability of making a Type I error.
 A Type I error occurs when H0 is rejected, but
in fact H0 is true.
 A Type II error occurs when it is failed to
reject H0 but in fact H1 is true.
 Hypothesis testing methods are designed
such that the probability of making a type II
error, , is as small as possible for a given
value of .
4
Hypothesis Testing

5
Comparing Two Alternative Designs

Methods used to compare two alternative


system designs:
 Welch confidence interval for
comparing two systems
 Paired-t confidence interval for
comparing two systems

6
Comparing Two Alternative Designs

7
Comparing Two Alternative Designs

 Hypothesis:
H0: 1  2 = 0
H1: 1  2  0

8
Welch Confidence Interval for
Comparing Two System
 The Welch confidence interval method requires
that the observations drawn from each population
(simulated system) be normally distributed and
independent within a population and between
populations.
 The Welch confidence interval method does not
require that the number of samples drawn from
one population (n1) equal the number of samples
from the other population (n2).
 This approach does not require that two
population have equal variances (12 = 22 = 2).
9
Welch Confidence Interval for
Comparing Two System
 The Welch confidence interval for an  level of
significance:
Px1  x2   hw  1  2  x1  x2   hw  1  

s12 s22
hw  tdf , / 2 
n1 n2

df 
s2
1 n1  s 2
2 n2 
2

s 2
1 n1 
2
n1  1  s 2
2 n2 
2
n2  1
10
Welch Confidence Interval for
Comparing Two System

 Tolak H0

11
Paired-t Confidence Interval for
Comparing Two Systems (1)
 Paired-t confidence interval method requires
that the observations drawn from each
population (simulated systems) be normally
distributed and independent within a
population.
 The paired-t confidence interval method does
not require that the observations between
populations be independent.
 The paired-t confidence interval method
requires that the populations have equal
variances (12 = 22 = 2).
12
Paired-t Confidence Interval for
Comparing Two Systems
 x 
n
n
 x(12)
x
2
(1 2 ) j
(1 2 ) j
j 1
x(12) 
j 1 s
n n 1

hw 
t n 1, 2 s1 2 

n
The paired-t confidence interval for an  level of
significance is
Px1 2   hw  1 2   x1 2   hw  1  
13
Paired-t Confidence Interval for
Comparing Two Systems

14
Paired-t Confidence Interval for
Comparing Two Systems

 Tolak H0

15
Comparing More Than Two
Alternative Designs
Methods:
 Bonferroni approach
 Advanced statistical models (ANOVA)

16
The Bonferroni Approach for
Comparing More Than Two Systems
 The Bonferroni approach is useful when there
are more than two alternative system designs to
compare with respect to some performance
measure.
 Given K alternative system designs to compare,
the null hypothesis H0 and alternative
hypothesis H1 become
H0 : 1 = 1 = ... = K =  for K alternative system
H1 : i  i’ for at least one pair i  i’

17
The Bonferroni Approach for
Comparing More Than Two Systems
The null hypothesis H0 states that the means
from the K populations (mean output of the K
different simulation models) are not different.
The alternative hypothesis H1 states that at
least one pair of the means are different.
The Bonferroni approach is very similar to the
two confidence interval methods that it is
based on computing confidence intervals to
determine if the true mean performance of one
system (i) is significantly different than the
true mean performance of another system (i’).
18
The Bonferroni Approach for
Comparing More Than Two Systems
 Number of pairwise comparison for K
candidate designs
K K  1

2

 Probability of m confidence intervals being


simultaneously correct
 m

Pall m confidence interval statements are correct   1     1    i 
 i 1 

19
The Bonferroni Approach for
Comparing More Than Two Systems


i  , i  1, , K K  1 2
K K  1 2

x( p q ) 
t n 1, i / 2 s
( p q )

20
The Bonferroni Approach for
Comparing More Than Two Systems

21
The Bonferroni Approach

22
The Bonferroni Approach for
Comparing More Than Two Systems

23
Advanced Statistical Models for Comparing
More Than Two Alternative Systems
 Analysis of variance (ANOVA) in conjunction with a
multiple comparison test provides a means for
comparing a much larger number of alternative
designs.
 Given K alternative system designs to compare, the
null hypothesis H0 and alternative hypothesis H1
become
H0 : 1 = 1 = ... = K =  for K alternative system
H1 : i  i’ for at least one pair i  i’

24
Advanced Statistical Models for Comparing
More Than Two Alternative Systems

25
Advanced Statistical Models for Comparing
More Than Two Alternative Systems
H0 :1 = 2 = 3 = 0

FCRITICAL/TABLE = F(df(treatment), df(error);


F(2,27:0.05) = 3.35 < Fcalc  Reject H0

26
ANOVA
 Analysis of variance (ANOVA) allow us to
partition the total variation in the output
response from the simulated system into two
component
– Variation due to the effect of the treatments
– Variation due to experimental error (the
inherent variability in the simulated system)
 For this problem, we are interested is knowing
if the variation due to the treatment is
sufficient to conclude that the performance of
one strategy is significantly different than the
other with respect to mean throughput of the
system.
27
ANOVA
 Assumptions:
– The observations are drawn from normally
distributed populations.
– The observations are independent within a
strategy and between strategies.
 The variance reduction technique based on
common random number (CRN) cannot be
used with this method.

28
ANOVA
 The fixed-effects model:
xij     i   ij , i  1,, K ; j  1,, n

i: the effect of the ith treatment (ith strategy) as


a deviation from the overall (common to all
treatments) population mean .
ij: the associated error with this observation (in
the context of simulation, the ij : the random
variation of the response xij that occurred
during the jth replication of the ith
treatment)
29
ANOVA

Assumptions for the fixed-effects model:


– the sum of all i equals to zero.
– the error terms ij are independent and
normally distributed with a mean of zero
and common variance.

30
ANOVA
 Sum of squares i: 2
 n 
 x 
 
ij 
 n
 ,
SS    x  
j 1
i
2
ij i  1, ,K
 j 1 n 
 Grand total:
K n K
x..    xij   xi.
i 1 j 1 i 1

 Overall mean:
K n

 x
i 1 j 1
ij
x..
x..   ; N  nK
N N
31
ANOVA

Degree of freedom total (corrected)


df(total corrected)  N  1
Degree of freedom treatment

Degree ofdf(treatme  K 1
freedomnt)error

df(error)  N  K

32
ANOVA
Sum of squares error:
K
SSEi   SS i
i 1

Sum of squares treatment:


1  K 2  x..2 
SSTi    xi.   
n  i 1  K 
Sum of squares total (corrected):
SSTC  SST  SSE

33
ANOVA
Mean square treatment:
SST
MST 
df (treatment )
Mean square error:
SSE
MSE 
df (error)

Calculated F statistics:
MST
FCALC 
MSE
34
ANOVA

Source of Degree of Sum of Mean Calculated F


Variation Freedom Squares Square Statistics

Treatment K–1 SST MST MST/MSE

Error N–K SSE MSE

Total N–1 SSTC

35
Multiple Comparison Test
 The hypothesis test suggests that not all
designs are the same with respect to a
particular response, but it does not identify
which designs perform differently.
 The Fisher’s least significant difference
(LSD) test is used to identify which designs
perform differently.
 It is generally recommended to conduct a
hypothesis test prior to the LSD test to
determine if one or more pairs of treatments
are significantly different.
36
Multiple Comparison Test
The LSD test requires the calculation of a test
statistics used to evaluate all pairwise
comparisons of the sample mean from each
population.
Number of pairwise comparisons for K
candidate designs = K(K – 1)/2
The LSD test statistics:

2MSE 
LSD   t( df(error), / 2)
n
37
Multiple Comparison Test
 The decision rule: If the difference in the
sample mean response values exceeds the
LSD test statistics, the population mean
response values are significantly different at
a given level of significance.

If xi  xi '  LSD , then i and i'


are significan ly differenta t the  level of significan ce.

38
Multiple Comparison Test
 For this problem, the LSD test statistic is
determined at the  = 0.05 level of significant

2MSE 
LSD( 0.05)  t 27, 0.0025
n
21.23
 2.052  1.02
10

With 95 percent confidence, we conclude that each


pair of means is different (1  2 , 1  3 , and
2  3 )

39
Multiple Comparison Test

40
One-way ANOVA with SPSS

Factor (candidate design)

Observation

41
One-way ANOVA with SPSS

Multiple comparison test

42
One-way ANOVA with SPSS

43
One-way ANOVA with SPSS

44
One-way ANOVA with SPSS

Descriptives

95% Confidence
Interval for Mean
Std. Lower Upper
N Mean Deviation Std. Error Bound Bound Minimum Maximum
OBS STRA 1,000 10 56,30200 1,37371 ,43441 55,31931 57,28469 54,480 58,330
2,000 10 54,63300 1,16549 ,36856 53,79926 55,46674 52,140 56,010
3,000 10 57,39200 ,65794 ,20806 56,92134 57,86266 56,110 58,300
Total 30 56,10900 1,57266 ,28713 55,52176 56,69624 52,140 58,330

45
One-way ANOVA with SPSS

Test of Homogeneity of Variances

Levene
Statistic df1 df2 Sig.
OBS 3,195 2 27 ,057

46
One-way ANOVA with SPSS

ANOVA

Sum of Mean
Squares df Square F Sig.
OBS Between
38,619 2 19,310 15,749 ,000
Groups
Within
33,105 27 1,226
Groups
Total 71,724 29

47
One-way ANOVA with SPSS
Multiple Com parisons

Dependent Variable: OBS


LSD

95% Confidence
Mean Int erval
Difference Lower Upper
(I) STRA (J) STRA (I-J) St d. Error Sig. Bound Bound
1,000 2,000 1,66900* ,495 ,002 ,65294 2,68506
3,000 -1, 09000* ,495 ,036 -2, 10606 -7, 4E-02
2,000 1,000 -1, 66900* ,495 ,002 -2, 68506 -,65294
3,000 -2, 75900* ,495 ,000 -3, 77506 -1, 74294
3,000 1,000 1,09000* ,495 ,036 7,39E-02 2,10606
2,000 2,75900* ,495 ,000 1,74294 3,77506
*. The mean differenc e is signific ant at the .05 level.

48
Factorial Designs and Optimization
 In simulation experiments, it is sometimes
intended in finding out how different decision
variable settings impact the response of the
system rather than simply comparing one
candidate system to another.
 There are often many decision variables of
interests for complex systems.
 Rather than run hundreds of experiments for
every possible variable settings, experimental
design techniques can be used as a shortcut
for finding those decision variables of
greatest significance.

49
Factorial Designs and Optimization
 Using experimental design terminology,
decision variables are referred to as factors
and the output measures are referred to as
responses.
 Once the response of interest has been
identified and the factors that are suspected
of having an influence in this response
defined, a factorial design method that
prescribes how many runs to make and what
level or value to use for each factor is used.

50
Factorial Designs and Optimization

Relationship between factors (decision variables)


and output responses

51
Factorial Designs and Optimization
 One type of experiment that looks at the
combined effect of multiple factors on system
response is referred to as a two-level, full
factorial design.
 For experiments in which a large number of
factors are considered, a fractional-factorial
design is used to strategically select a subset of
combinations to test in order to “screen-out”
factors with little or no impact on system
performance.
 In case of there are many combinations,
optimization techniques are used to search for
the combination that produces the most
desirable response.

52
Variance Reduction Techniques
 The variance of a performance measure
computed from the output of simulations can be
reduced.
 Reducing the variance allows to estimate the
mean value of a random variable within a
desired level of precision and confidence with
fewer replications (independent observations).
 The reduction in the required number of
replications is achieved by controlling how
random numbers are used to “drive” the events
in the simulation model.
 The use of common random number (CRN) is
perhaps one of the most popular variance
reduction techniques.
53
Common Random Number
 The common random number (CRN) technique
was invented for comparing alternative system
designs.
 The CRN technique provides a means for
comparing alternative system designs under
more equal experimental conditions,
 This is helpful in ensuring that the observed
differences in the performance of two system
designs are due to the differences in the designs
and not to differences in experimental conditions.
 The goal is to evaluate each system under the
exact same circumstances to ensure a fair
comparison.

54
Common Random Number
 The common random number (CRN) technique was
invented for comparing alternative system designs.
 The CRN technique provides a means for comparing
alternative system designs under more equal
experimental conditions,
 This is helpful in ensuring that the observed
differences in the performance of two system
designs are due to the differences in the designs
and not to differences in experimental conditions.
 The goal is to evaluate each system under the exact
same circumstances to ensure a fair comparison.

55
Common Random Number

Unique seed value assigned


for each replications

56
Common Random Number

Unique random number


stream assigned to each
stochastic element in
system

57
Common Random Number

58
Common Random Number

59

You might also like