Professional Documents
Culture Documents
C H A P T E R
10 COMPARING SYSTEMS
“The method that proceeds without analysis is like the groping of a blind man.”
—Socrates
10.1 Introduction
In many cases, simulations are conducted to compare two or more alternative de-
signs of a system with the goal of identifying the superior system relative to some
performance measure. Comparing alternative system designs requires careful
analysis to ensure that differences being observed are attributable to actual differ-
ences in performance and not to statistical variation. This is where running either
multiple replications or batches is required. Suppose, for example, that method A
for deploying resources yields a throughput of 100 entities for a given time period
while method B results in 110 entities for the same period. Is it valid to conclude
that method B is better than method A, or might additional replications actually
lead to the opposite conclusion?
You can evaluate alternative configurations or operating policies by perform-
ing several replications of each alternative and comparing the average results
from the replications. Statistical methods for making these types of comparisons
are called hypotheses tests. For these tests, a hypothesis is first formulated (for
example, that methods A and B both result in the same throughput) and then a test
is made to see whether the results of the simulation lead us to reject the hypothe-
sis. The outcome of the simulation runs may cause us to reject the hypothesis that
methods A and B both result in equal throughput capabilities and conclude that the
throughput does indeed depend on which method is used.
This chapter extends the material presented in Chapter 9 by providing statis-
tical methods that can be used to compare the output of different simulation mod-
els that represent competing designs of a system. The concepts behind hypothesis
testing are introduced in Section 10.2. Section 10.3 addresses the case when two
253
Harrell−Ghosh−Bowden: I. Study Chapters 10. Comparing Systems © The McGraw−Hill
Simulation Using Companies, 2004
ProModel, Second Edition
alternative system designs are to be compared, and Section 10.4 considers the
case when more than two alternative system designs are to be compared. Addi-
tionally, a technique called common random numbers is described in Section 10.5
that can sometimes improve the accuracy of the comparisons.
FIGURE 10.1
Production system
with four workstations
and three buffer
storage areas.
.72 4.56
Harrell−Ghosh−Bowden: I. Study Chapters 10. Comparing Systems © The McGraw−Hill
Simulation Using Companies, 2004
ProModel, Second Edition
Suppose that Strategy 1 and Strategy 2 are the two buffer allocation strategies
proposed by the production control staff. We wish to identify the strategy that
maximizes the throughput of the production system (number of parts completed
per hour). Of course, the possibility exists that there is no significant difference in
the performance of the two candidate strategies. That is to say, the mean through-
put of the two proposed strategies is equal. A starting point for our problem is to
formulate our hypotheses concerning the mean throughput for the production
system under the two buffer allocation strategies. Next we work out the details of
setting up our experiments with the simulation models built to evaluate each strat-
egy. For example, we may decide to estimate the true mean performance of each
strategy (µ1 and µ2) by simulating each strategy for 16 days (24 hours per day)
past the warm-up period and replicating the simulation 10 times. After we run
experiments, we would use the simulation output to evaluate the hypotheses
concerning the mean throughput for the production system under the two buffer
allocation strategies.
In general, a null hypothesis, denoted H0, is drafted to state that the value of µ1
is not significantly different than the value of µ2 at the α level of significance. An
alternate hypothesis, denoted H1, is drafted to oppose the null hypothesis H0. For
example, H1 could state that µ1 and µ2 are different at the α level of significance.
Stated more formally:
H0 : µ1 = µ2 or equivalently H0 : µ1 − µ2 = 0
H1 : µ1 =
µ2 or equivalently H1 : µ1 − µ2 = 0
In the context of the example problem, the null hypothesis H0 states that the
mean throughputs of the system due to Strategy 1 and Strategy 2 do not differ. The
alternate hypothesis H1 states that the mean throughputs of the system due to
Strategy 1 and Strategy 2 do differ. Hypothesis testing methods are designed such
that the burden of proof is on us to demonstrate that H0 is not true. Therefore, if
our analysis of the data from our experiments leads us to reject H0, we can be con-
fident that there is a significant difference between the two population means. In
our example problem, the output from the simulation model for Strategy 1 repre-
sents possible throughput observations from one population, and the output from
the simulation model for Strategy 2 represents possible throughput observations
from another population.
The α level of significance in these hypotheses refers to the probability of
making a Type I error. A Type I error occurs when we reject H0 in favor of H1
when in fact H0 is true. Typically α is set at a value of 0.05 or 0.01. However,
the choice is yours, and it depends on how small you want the probability of
making a Type I error to be. A Type II error occurs when we fail to reject H0 in
favor of H1 when in fact H1 is true. The probability of making a Type II error is
denoted as β. Hypothesis testing methods are designed such that the probability
of making a Type II error, β, is as small as possible for a given value of α. The
relationship between α and β is that β increases as α decreases. Therefore, we
should be careful not to make α too small.
We will test these hypotheses using a confidence interval approach to
determine if we should reject or fail to reject the null hypothesis in favor of the
Harrell−Ghosh−Bowden: I. Study Chapters 10. Comparing Systems © The McGraw−Hill
Simulation Using Companies, 2004
ProModel, Second Edition
alternative hypothesis. The reason for using the confidence interval method is that
it is equivalent to conducting a two-tailed test of hypothesis with the added bene-
fit of indicating the magnitude of the difference between µ1 and µ2 if they are in
fact significantly different. The first step of this procedure is to construct a confi-
dence interval to estimate the difference between the two means (µ1 − µ2 ). This
can be done in different ways depending on how the simulation experiments are
conducted (we will discuss this later). For now, let’s express the confidence inter-
val on the difference between the two means as
P[(x̄1 − x̄2 ) − hw ≤ µ1 − µ2 ≤ (x̄1 − x̄2 ) + hw] = 1 − α
where hw denotes the half-width of the confidence interval. Notice the similari-
ties between this confidence interval expression and the one given on page 227 in
Chapter 9. Here we have replaced x̄ with x̄1 − x̄2 and µ with µ1 − µ2 .
If the two population means are the same, then µ1 − µ2 = 0, which is our
null hypothesis H0. If H0 is true, our confidence interval should include zero with
a probability of 1 − α. This leads to the following rule for deciding whether to re-
ject or fail to reject H0. If the confidence interval includes zero, we fail to reject H0
and conclude that the value of µ1 is not significantly different than the value of µ2
at the α level of significance (the mean throughput of Strategy 1 is not signifi-
cantly different than the mean throughput of Strategy 2). However, if the confi-
dence interval does not include zero, we reject H0 and conclude that the value of
µ1 is significantly different than the value of µ2 at the α level of significance
(throughput values for Strategy 1 and Strategy 2 are significantly different).
Figure 10.2(a) illustrates the case when the confidence interval contains zero,
leading us to fail to reject the null hypothesis H0 and conclude that there is no sig-
nificant difference between µ1 and µ2. The failure to obtain sufficient evidence to
pick one alternative over another may be due to the fact that there really is no dif-
ference, or it may be a result of the variance in the observed outcomes being too
high to be conclusive. At this point, either additional replications may be run or one
of several variance reduction techniques might be employed (see Section 10.5).
Figure 10.2(b) illustrates the case when the confidence interval is completely to the
FIGURE 10.2
Three possible
(a) [------|------] Fail to
reject H0
positions of a
confidence interval
relative to zero.
(b) [------|------] Reject H0
1 ⫺ 2 ⫽ 0
left of zero, leading us to reject H0. This case suggests that µ1 − µ2 < 0 or, equiv-
alently, µ1 < µ2 . Figure 10.2(c) illustrates the case when the confidence interval
is completely to the right of zero, leading us to also reject H0. This case suggests
that µ1 − µ2 > 0 or, equivalently, µ1 > µ2 . These rules are commonly used in
practice to make statements about how the population means differ
(µ1 > µ2 or µ1 < µ2 ) when the confidence interval does not include zero (Banks
et al. 2001; Hoover and Perry 1989).
1 54.48 56.01
2 57.36 54.08
3 54.81 52.14
4 56.20 53.49
5 54.83 55.49
6 57.69 55.00
7 58.33 54.88
8 57.19 54.47
9 56.84 54.93
10 55.29 55.84
Sample mean x̄i, for i = 1, 2 56.30 54.63
Sample standard deviation si , for i = 1, 2 1.37 1.17
Sample variance si2 , for i = 1, 2 1.89 1.36
Harrell−Ghosh−Bowden: I. Study Chapters 10. Comparing Systems © The McGraw−Hill
Simulation Using Companies, 2004
ProModel, Second Edition
because a unique segment (stream) of random numbers from the random number
generator was used for each replication. The same is true for the 10 observations in
column C (Strategy 2 Throughput). The use of random number streams is dis-
cussed in Chapters 3 and 9 and later in this chapter. At this point we are assuming
that the observations are also normally distributed. The reasonableness of assum-
ing that the output produced by our simulation models is normally distributed is
discussed at length in Chapter 9. For this data set, we should also point out that two
different sets of random numbers were used to simulate the 10 replications of each
strategy. Therefore, the observations in column B are independent of the observa-
tions in column C. Stated another way, the two columns of observations are not
correlated. Therefore, the observations are independent within a population (strat-
egy) and between populations (strategies). This is an important distinction and will
be employed later to help us choose between different methods for computing the
confidence intervals used to compare the two strategies.
From the observations in Table 10.1 of the throughput produced by each strat-
egy, it is not obvious which strategy yields the higher throughput. Inspection of the
summary statistics indicates that Strategy 1 produced a higher mean throughput
for the system; however, the sample variance for Strategy 1 was higher than for
Strategy 2. Recall that the variance provides a measure of the variability of the data
and is obtained by squaring the standard deviation. Equations for computing the
sample mean x̄, sample variance s2, and sample standard deviation s are given in
Chapter 9. Because of this variation, we should be careful when making conclu-
sions about the population of throughput values (µ1 and µ2) by only inspecting the
point estimates (x̄1 and x̄2 ). We will avoid the temptation and use the output from
the 10 replications of each simulation model along with a confidence interval to
make a more informed decision.
We will use an α = 0.05 level of significance to compare the two candidate
strategies using the following hypotheses:
H0 : µ1 − µ2 = 0
H1 : µ1 − µ2 = 0
where the subscripts 1 and 2 denote Strategy 1 and Strategy 2, respectively. As
stated earlier, there are two common methods for constructing a confidence
interval for evaluating hypotheses. The first method is referred to as the Welch
confidence interval (Law and Kelton 2000; Miller 1986) and is a modified two-
sample-t confidence interval. The second method is the paired-t confidence inter-
val (Miller et al. 1990). We’ve chosen to present these two methods because their
statistical assumptions are more easily satisfied than are the assumptions for other
confidence interval methods.
Table 10.1 are independent and are assumed normal. However, the Welch confi-
dence interval method does not require that the number of samples drawn from
one population (n1) equal the number of samples drawn from the other population
(n2) as we did in the buffer allocation example. Therefore, if you have more ob-
servations for one candidate system than for the other candidate system, then by
all means use them. Additionally, this approach does not require that the two pop-
ulations have equal variances (σ12 = σ22 = σ 2 ) as do other approaches. This is
useful because we seldom know the true value of the variance of a population.
Thus we are not required to judge the equality of the variances based on the sam-
ple variances we compute for each population (s12 and s22 ) before using the Welch
confidence interval method.
The Welch confidence interval for an α level of significance is
P[(x̄1 − x̄2 ) − hw ≤ µ1 − µ2 ≤ (x̄1 − x̄2 ) + hw] = 1 − α
where x̄1 and x̄2 represent the sample means used to estimate the population
means µ1 and µ2; hw denotes the half-width of the confidence interval and is
computed by
s2 s2
hw = tdf,α/2 1 + 2
n1 n2
where df (degrees of freedom) is estimated by
2 2
s1 n 1 + s22 n 2
df ≈ 2 2
s12 n 1 (n 1 − 1) + s22 n 2 (n 2 − 1)
and tdf,α/2 is a factor obtained from the Student’s t table in Appendix B based on
the value of α/2 and the estimated degrees of freedom. Note that the degrees of
freedom term in the Student’s t table is an integer value. Given that the estimated
degrees of freedom will seldom be an integer value, you will have to use interpo-
lation to compute the tdf,α/2 value.
For the example buffer allocation problem with an α = 0.05 level of signifi-
cance, we use these equations and data from Table 10.1 to compute
[1.89/10 + 1.36/10]2
df ≈ ≈ 17.5
[1.89/10]2 /(10 − 1) + [1.36/10]2 /(10 − 1)
and
1.89 1.36 √
hw = t17.5,0.025 + = 2.106 0.325 = 1.20 parts per hour
10 10
where tdf,α/2 = t17.5,0.025 = 2.106 is determined from Student’s t table in Appen-
dix B by interpolation. Now the 95 percent confidence interval is
(x̄1 − x̄2 ) − hw ≤ µ1 − µ2 ≤ (x̄1 − x̄2 ) + hw
(56.30 − 54.63) − 1.20 ≤ µ1 − µ2 ≤ (56.30 − 54.63) + 1.20
0.47 ≤ µ1 − µ2 ≤ 2.87
Harrell−Ghosh−Bowden: I. Study Chapters 10. Comparing Systems © The McGraw−Hill
Simulation Using Companies, 2004
ProModel, Second Edition
m
where α = i=1 αi and is the overall level of significance and m = K (K2−1) and
is the number of confidence interval statements.
If, in this example for comparing four candidate designs, we set α1 = α2 =
α3 = α4 = α5 = α6 = 0.05, then the overall probability that all our conclusions
are correct is as low as (1 − 0.30), or 0.70. Being as low as 70 percent confident in
Harrell−Ghosh−Bowden: I. Study Chapters 10. Comparing Systems © The McGraw−Hill
Simulation Using Companies, 2004
ProModel, Second Edition
our conclusions leaves much to be desired. To combat this, we simply lower the val-
ues of the individual significance levels (α1 = α2 = α3 = · · · = αm ) so their sum
is not so large. However, this does not come without a price, as we shall see later.
One way to assign values to the individual significance levels is to first es-
tablish an overall level of significance α and then divide it by the number of pair-
wise comparisons. That is,
α
αi = for i = 1, 2, 3, . . . , K (K − 1)/2
K (K − 1)/2
Note, however, that it is not required that the individual significance levels be as-
signed the same value. This is useful in cases where the decision maker wants to
place different levels of significance on certain comparisons.
Practically speaking, the Bonferroni inequality limits the number of system de-
signs that can be reasonably compared to about five designs or less. This is because
controlling the overall significance level α for the test requires the assignment
of small values to the individual significance levels (α1 = α2 = α3 = · · · = αm )
if more than five designs are compared. This presents a problem because the width
of a confidence interval quickly increases as the level of significance is reduced.
Recall that the width of a confidence interval provides a measure of the accuracy
of the estimate. Therefore, we pay for gains in the overall confidence of our test by
reducing the accuracy of our individual estimates (wide confidence intervals).
When accurate estimates (tight confidence intervals) are desired, we recommend
not using the Bonferroni approach when comparing more than five system designs.
For comparing more than five system designs, we recommend that the analysis of
variance technique be used in conjunction with perhaps the Fisher’s least signifi-
cant difference test. These methods are presented in Section 10.4.2.
Let’s return to the buffer allocation example from the previous section and
apply the Bonferroni approach using paired-t confidence intervals. In this case,
the production control staff has devised three buffer allocation strategies to com-
pare. And, as before, we wish to determine if there are significant differences
between the throughput levels (number of parts completed per hour) achieved
by the strategies. Although we will be working with individual confidence
intervals, the hypotheses for the overall α level of significance are
H0 : µ1 = µ2 = µ3 = µ
H1 : µ1 =
µ2 or µ1 = µ3 or µ2 = µ3
where the subscripts 1, 2, and 3 denote Strategy 1, Strategy 2, and Strategy 3,
respectively.
To evaluate these hypotheses, we estimated the performance of the three
strategies by simulating the use of each strategy for 16 days (24 hours per day)
past the warm-up period. And, as before, the simulation was replicated 10 times
for each strategy. The average hourly throughput achieved by each strategy is
shown in Table 10.3.
The evaluation of the three buffer allocation strategies (K = 3) requires
that three [3(3 − 1)/2] pairwise comparisons be made. The three pairwise
Harrell−Ghosh−Bowden: I. Study Chapters 10. Comparing Systems © The McGraw−Hill
Simulation Using Companies, 2004
ProModel, Second Edition
comparisons are shown in columns E, F, and G of Table 10.3. Also shown in Table
10.3 are the sample means x̄(i−i ) and sample standard deviations s(i−i ) for each
pairwise comparison.
Let’s say that we wish to use an overall significance level of α = 0.06 to eval-
uate our hypotheses. For the individual levels of significance, let’s set α1 = α2 =
α3 = 0.02 by using the equation
α 0.06
αi = = = 0.02 for i = 1, 2, 3
3 3
The computation of the three paired-t confidence intervals using the method out-
lined in Section 10.3.2 and data from Table 10.3 follows:
states that the mean throughputs due to the application of treatments (Strategies 1,
2, and 3) differ among at least one pair of strategies.
We will use a balanced CR design to help us conduct this test of hypothesis.
In a balanced design, the same number of observations are collected for each fac-
tor level. Therefore, we executed 10 simulation runs to produce 10 observations
of throughput for each strategy. Table 10.4 presents the experimental results and
summary statistics for this problem. The response variable (xi j ) is the observed
throughput for the treatment (strategy). The subscript i refers to the factor level
(Strategy 1, 2, or 3) and j refers to an observation (output from replication j) for
that factor level. For example, the mean throughput response of the simulation
model for the seventh replication of Strategy 2 is 54.88 in Table 10.4. Parameters
for this balanced CR design are
Number of factor levels = number of alternative system designs = K = 3
Number of observations for each factor level = n = 10
Total number of observations = N = n K = (10)3 = 30
Inspection of the summary statistics presented in Table 10.4 indicates that
Strategy 3 produced the highest mean throughput and Strategy 2 the lowest.
Again, we should not jump to conclusions without a careful analysis of the
experimental data. Therefore, we will use analysis of variance (ANOVA) in con-
junction with a multiple comparison test to guide our decision.
Analysis of Variance
Analysis of variance (ANOVA) allows us to partition the total variation in the out-
put response from the simulated system into two components—variation due to
Harrell−Ghosh−Bowden: I. Study Chapters 10. Comparing Systems © The McGraw−Hill
Simulation Using Companies, 2004
ProModel, Second Edition
the effect of the treatments and variation due to experimental error (the inherent
variability in the simulated system). For this problem case, we are interested in
knowing if the variation due to the treatment is sufficient to conclude that the per-
formance of one strategy is significantly different than the other with respect to
mean throughput of the system. We assume that the observations are drawn from
normally distributed populations and that they are independent within a strategy
and between strategies. Therefore, the variance reduction technique based on
common random numbers (CRN) presented in Section 10.5 cannot be used with
this method.
The fixed-effects model is the underlying linear statistical model used for the
analysis because the levels of the factor are fixed and we will consider each pos-
sible factor level. The fixed-effects model is written as
for i = 1, 2, 3, . . . , K
xi j = µ + τi + εi j
for j = 1, 2, 3, . . . , n
where τi is the effect of the ith treatment (ith strategy in our example) as a devia-
tion from the overall (common to all treatments) population mean µ and εi j is the
error associated with this observation. In the context of simulation, the εi j term
represents the random variation of the response xi j that occurred during the jth
replication of the ith treatment. Assumptions for the fixed-effects model are that
the sum of all τi equals zero and that the error terms εi j are independent and nor-
mally distributed with a mean of zero and common variance. There are methods
for testing the reasonableness of the normality and common variance assump-
tions. However, the procedure presented in this section is reported to be somewhat
insensitive to small violations of these assumptions (Miller et al. 1990). Specifi-
cally, for the buffer allocation example, we are testing the equality of three
treatment effects (Strategies 1, 2, and 3) to determine if there are statistically sig-
nificant differences among them. Therefore, our hypotheses are written as
H0 : τ1 = τ2 = τ3 = 0
H1 : τi =
0 for at least one i, for i = 1, 2, 3
Basically, the previous null hypothesis that the K population means are
all equal (µ1 = µ2 = µ3 = · · · = µ K = µ) is replaced by the null hypothesis
τ1 = τ2 = τ3 = · · · = τ K = 0 for the fixed-effects model. Likewise, the alterna-
tive hypothesis that at least two of the population means are unequal is replaced
by τi = 0 for at least one i. Because only one factor is considered in this problem,
a simple one-way analysis of variance is used to determine FCALC, the test statis-
tic that will be used for the hypothesis test. If the computed FCALC value exceeds
a threshold value called the critical value, denoted FCRITICAL, we shall reject the
null hypothesis that states that the treatment effects do not differ and conclude that
there are statistically significant differences among the treatments (strategies in
our example problem).
To help us with the hypothesis test, let’s summarize the experimental results
shown in Table 10.4 for the example problem. The first summary statistic that we
will compute is called the sum of squares (SSi) and is calculated for the ANOVA
for each factor level (Strategies 1, 2, and 3 in this case). In a balanced design
Harrell−Ghosh−Bowden: I. Study Chapters 10. Comparing Systems © The McGraw−Hill
Simulation Using Companies, 2004
ProModel, Second Edition
where the number of observations n for each factor level is a constant, the sum of
squares is calculated using the formula
n 2
x i j
n j=1
SSi = j=1 x i j −
2
for i = 1, 2, 3, . . . , K
n
The grand total of the N observations (N = n K ) collected from the output re-
sponse of the simulated system is computed by
K n K
Grand total = x.. = i=1 j=1 xi j = i=1 xi
The overall mean of the N observations collected from the output response of
the simulated system is computed by
K n
i=1 j=1 x i j x..
Overall mean = x̄.. = =
N N
Using the data in Table 10.4 for the buffer allocation example, these statistics are
3
Grand total = x.. = i=1 xi = 563.02 + 546.33 + 573.92 = 1,683.27
x.. 1,683.27
Overall mean = x̄.. = = = 56.11
N 30
Our analysis is simplified because a balanced design was used (equal obser-
vations for each factor level). We are now ready to define the computational
formulas for the ANOVA table elements (for a balanced design) needed to
conduct the hypothesis test. As we do, we will construct the ANOVA table for the
buffer allocation example. The computational formulas for the ANOVA table
elements are
Degrees of freedom total (corrected) = df(total corrected) = N − 1
= 30 − 1 = 29
Degrees of freedom treatment = df(treatment) = K − 1 = 3 − 1 = 2
Degrees of freedom error = df(error) = N − K = 30 − 3 = 27
Harrell−Ghosh−Bowden: I. Study Chapters 10. Comparing Systems © The McGraw−Hill
Simulation Using Companies, 2004
ProModel, Second Edition
and
K
Sum of squares error = SSE = i=1 SSi = 16.98 + 12.23 + 3.90 = 33.11
K x..2
Sum of squares treatment = SST = 1
n i=1 i −
x 2
K
1 (1,683.27)2
SST = ((563.02)2 + (546.33)2 + (573.92)2 )− = 38.62
10 3
SSE 33.11
Mean square error = MSE = = = 1.23
df(error) 27
and finally
MST 19.31
Calculated F statistic = FCALC = = = 15.70
MSE 1.23
Table 10.5 presents the ANOVA table for this problem. We will compare the value
of FCALC with a value from the F table in Appendix C to determine whether to
reject or fail to reject the null hypothesis H0 : τ1 = τ2 = τ3 = 0. The values
obtained from the F table in Appendix C are referred to as critical values and
are determined by F(df(treatment), df(error); α). For this problem, F(2,27; 0.05) = 3.35 =
FCRITICAL , using a significance level (α) of 0.05. Therefore, we will reject H0 since
FCALC > FCRITICAL at the α = 0.05 level of significance. If we believe the data in
Table 10.4 satisfy the assumptions of the fixed-effects model, then we would con-
clude that the buffer allocation strategy (treatment) significantly affects the mean
throughput of the system. We now have evidence that at least one strategy produces
better results than the other strategies. Next, a multiple comparison test will be
conducted to determine which strategy (or strategies) causes the significance.
Strategy 2 Strategy 1
x̄2 = 54.63 x̄1 = 56.30
explanation is that the LSD test is considered to be more liberal in that it will in-
dicate a difference before the more conservative Bonferroni approach. Perhaps if
the paired-t confidence intervals had been used in conjunction with common
random numbers (which is perfectly acceptable because the paired-t method
does not require that observations be independent between populations), then the
Bonferroni approach would have also indicated a difference. We are not sug-
gesting here that the Bonferroni approach is in error (or that the LSD test is in
error). It could be that there really is no difference between the performances of
Strategy 1 and Strategy 3 or that we have not collected enough observations to
be conclusive.
There are several multiple comparison tests from which to choose. Other
tests include Tukey’s honestly significant difference (HSD) test, Bayes LSD
(BLSD) test, and a test by Scheffe. The LSD and BLSD tests are considered to be
liberal in that they will indicate a difference between µi and µi before the more
conservative Scheffe test. A book by Petersen (1985) provides more information
on multiple comparison tests.
FIGURE 10.3
Factors (X1, X2, . . . , Xn) Simulation Output responses
Relationship between
factors (decision model
variables) and output
responses.
The natural inclination when experimenting with multiple factors is to test
the impact that each individual factor has on system response. This is a simple and
straightforward approach, but it gives the experimenter no knowledge of how fac-
tors interact with each other. It should be obvious that experimenting with two or
more factors together can affect system response differently than experimenting
with only one factor at a time and keeping all other factors the same.
One type of experiment that looks at the combined effect of multiple factors on
system response is referred to as a two-level, full-factorial design. In this type of ex-
periment, we simply define a high and a low setting for each factor and, since it is a
full-factorial experiment, we try every combination of factor settings. This means
that if there are five factors and we are testing two different levels for each factor,
we would test each of the 25 = 32 possible combinations of high and low factor
levels. For factors that have no range of values from which a high and a low can be
chosen, the high and low levels are arbitrarily selected. For example, if one of the
factors being investigated is an operating policy (like first come, first served or last
come, last served), we arbitrarily select one of the alternative policies as the high-
level setting and a different one as the low-level setting.
For experiments in which a large number of factors are considered, a two-
level, full-factorial design would result in an extremely large number of combina-
tions to test. In this type of situation, a fractional-factorial design is used to strate-
gically select a subset of combinations to test in order to “screen out” factors with
little or no impact on system performance. With the remaining reduced number of
factors, more detailed experimentation such as a full-factorial experiment can be
conducted in a more manageable fashion.
After fractional-factorial experiments and even two-level, full-factorial ex-
periments have been performed to identify the most significant factor level com-
binations, it is often desirable to conduct more detailed experiments, perhaps over
the entire range of values, for those factors that have been identified as being the
most significant. This provides more precise information for making decisions re-
garding the best, or optimal, factor values or variable settings for the system. For
a more detailed treatment of factorial design in simulation experimentation, see
Law and Kelton (2000).
In many cases, the number of factors of interest prohibits the use of even
fractional-factorial designs because of the many combinations to test. If this is
the case and you are seeking the best, or optimal, factor values for a system, an
alternative is to employ an optimization technique to search for the best combina-
tion of values. Several optimization techniques are useful for searching for the
combination that produces the most desirable response from the simulation model
without evaluating all possible combinations. This is the subject of simulation op-
timization and is discussed in Chapter 11.
Harrell−Ghosh−Bowden: I. Study Chapters 10. Comparing Systems © The McGraw−Hill
Simulation Using Companies, 2004
ProModel, Second Edition
The goal is to use the exact random number from the stream for the exact
purpose in each simulated system. To help achieve this goal, the random number
stream can be seeded at the beginning of each independent replication to keep it
synchronized across simulations of each system. For example, in Figure 10.4, the
first replication starts with a seed value of 9, the second replication starts with a
seed value of 5, the third with 3, and so on. If the same seed values for each repli-
cation are used to simulate each alternative system, then the same stream of ran-
dom numbers will drive each of the systems. This seems simple enough. How-
ever, care has to be taken not to pick a seed value that places us in a location on
the stream that has already been used to drive the simulation in a previous repli-
cation. If this were to happen, the results from replicating the simulation of a sys-
tem would not be independent because segments of the random number stream
would have been shared between replications, and this cannot be tolerated.
Therefore, some simulation software provides a CRN option that, when selected,
Harrell−Ghosh−Bowden: I. Study Chapters 10. Comparing Systems © The McGraw−Hill
Simulation Using Companies, 2004
ProModel, Second Edition
not specify an initial seed value for a stream that is used, ProModel will use the
same seed number as the stream number (stream 3 uses the third seed). A detailed
explanation of how random number generators work and how they produce
unique streams of random numbers is provided in Chapter 3.
Complete synchronization of the random numbers across different models is
sometimes difficult to achieve. Therefore, we often settle for partial synchro-
nization. At the very least, it is a good idea to set up two streams with one stream
of random numbers used to generate an entity’s arrival pattern and the other
stream of random numbers used to generate all other activities in the model.
That way, activities added to the model will not inadvertently alter the arrival
pattern because they do not affect the sample values generated from the arrival
distribution.
and
(t9,0.025 )s(1−2) (2.262)1.16
hw = √ = √ = 0.83 parts per hour
n 10
where tn−1,α/2 = t9,0.025 = 2.262 is determined from the Student’s t table in
Appendix B. The 95 percent confidence interval is
10.6 Summary
An important point to make here is that simulation, by itself, does not solve a
problem. Simulation merely provides a means to evaluate proposed solutions by
estimating how they behave. The user of the simulation model has the responsi-
bility to generate candidate solutions either manually or by use of automatic
optimization techniques and to correctly measure the utility of the solutions based
on the output from the simulation. This chapter presented several statistical
methods for comparing the output produced by simulation models representing
candidate solutions or designs.
When comparing two candidate system designs, we recommend using either
the Welch confidence interval method or the paired-t confidence interval. Also, a
Harrell−Ghosh−Bowden: I. Study Chapters 10. Comparing Systems © The McGraw−Hill
Simulation Using Companies, 2004
ProModel, Second Edition
References
Banks, Jerry; John S. Carson; Berry L. Nelson; and David M. Nicol. Discrete-Event Sys-
tem Simulation. Englewood Cliffs, NJ: Prentice Hall, 2001.
Bateman, Robert E.; Royce O. Bowden; Thomas J. Gogg; Charles R. Harrell; and Jack
R. A. Mott. System Improvement Using Simulation. Orem, UT: PROMODEL Corp.,
1997.
Goldsman, David, and Berry L. Nelson. “Comparing Systems via Simulation.” Chapter 8
in Handbook of Simulation. New York: John Wiley and Sons, 1998.
Hines, William W., and Douglas C. Montgomery. Probability and Statistics in Engineering
and Management Science. New York: John Wiley & Sons, 1990.
Hoover, Stewart V., and Ronald F. Perry. Simulation: A Problem-Solving Approach.
Reading, MA: Addison-Wesley, 1989.
Law, Averill M., and David W. Kelton. Simulation Modeling and Analysis. New York:
McGraw-Hill, 2000.
Miller, Irwin R.; John E. Freund; and Richard Johnson. Probability and Statistics for
Engineers. Englewood Cliffs, NJ: Prentice Hall, 1990.
Miller, Rupert G. Beyond ANOVA, Basics of Applied Statistics, New York: Wiley, 1986.
Montgomery, Douglas C. Design and Analysis of Experiments. New York: John Wiley &
Sons, 1991.
Petersen, Roger G. Design and Analysis of Experiments. New York: Marcel Dekker, 1985.