Week 8 Notes

ICS 810 MODELLING AND SIMULATION NOTES
Mainly based on Stewart Robinson (2004). Simulation: The Practice of Model

Development and Use
Week 8
Data related issues and model coding
✔Data:- central to the development and

use of simulation models.
✔Accurate data for design/populate the

model → results from the model be
accurate.
SCI-UON, June/July 2011, E. Opiyo
ICS 810 MODELLING AND SIMULATION NOTES
Mainly based on Stewart Robinson (2004). Simulation: The Practice of Model
Development and Use
Week 8
Data related issues and model coding
✔We consider issues on collection and

analysis of data.
✔We also consider how to handle
incorrect and missing data.
✔Data modeling is called input modeling.

Data Requirements
Data: can be quantitative or numbers eg.

data needed on cycle (service) times,
breakdown frequencies or arrival patterns.
Data: can be qualitative i.e. non-numeric facts

and beliefs about a system as expressed in
pictures or words.
Eg. queuing behaviour of customers as
expressed in words.
Data Requirements
Information: data with some
interpretation-data that have been
analyzed for some purpose.
The modeler: check adequacy of the

data and information available for the
purpose of the simulation study.
Data generated: analyze to provide

useful information for the simulation.
The types of data that are required [Pidd
2003]:
Preliminary or contextual data
Such data is needed to develop a

thorough understanding of the
problem situation.
2003]:
Preliminary or contextual data
Examples
A layout diagram;
Basic data on process capability and beliefs
about the cause of problems that are being

experienced.
Does not require extensive data collection.

Forms part of conceptual modeling.
2003]:
Data for the computer model
Examples
➢Data on cycle times;
➢Data on and breakdowns;
➢Customer arrival patterns;
➢Descriptions of customer types;
➢Scheduling and processing rules.

2003]:
Data for the computer model
Obtained by: detailed data collection

exercise.
Can be identified from the conceptual

model based on the components of the
model and the details associated with
them. SCI-UON, June/July 2011, E. Opiyo
2003]:
Data for the model validation
For ensuring that each part of the model,
as well as the whole model represents
the real world system with sufficient
accuracy.
Compare the model results with data

from the real system- if real world
exists . SCI-UON, June/July 2011, E. Opiyo
Obtaining Data
➢Sometimes some data are immediately
available while others must be collected.
There are different kinds of data that

➢
may be available.
These can be grouped as:

➢
 Categories A
 Category B
 Category C.
Obtaining Data
Category A data
➔Are known or have been collected earlier;
➔Collected for some other reasons;
➔Automatically collected electronically.
Examples
●Physical layout of a manufacturing plant;
●The cycle times of the machines;
●Service times and arrival rates in a bank from
a survey of staffing levels;

●Transaction data at service points.
Obtaining Data
Category B data
The data that should be collected.
Examples
➔Service times;
➔Arrival patterns;
➔Machine failure rates and repair times;
➔Nature of human decision-making.

Obtaining Data
Category B data -Data collection
➢Direct observations if necessary;
➢Use questionnaires or interviews with

subject matter experts such as staff,
equipment suppliers or customers.
Ensure that the data obtained are both

accurate and in the right format.

Obtaining Data
Category C data
These are data items that are not available
and cannot be collected.
●Occur where the real world system does not yet exist,
making it impossible to observe it in operation;
●Occur where there is no time to collect meaningful data.
Unfortunately category C data occur in many situations.

Put in place some means for dealing with these data, as
well as data that are available, but are inaccurate.

Dealing with unobtainable (category C)
data
The category C data can be
handled by:
 Estimating the data;
Treating the data as an experimental

factor rather than a fixed parameter.
data
Estimating data
Collect data from a similar system in the
same, or even another, organization;
Get data from some processes for which
standardized data exist;

Discuss with subject matter experts,
such as staff and equipment suppliers.
data
Estimating data
Some uncertainty about the validity
of the model is introduced; the

credibility is reduced.
Sensitivity analysis is useful in


measuring the effect of estimating

data. SCI-UON, June/July 2011, E. Opiyo
data
Treating data as an experimental factor
Question: what are the levels of the data
values?
eg. Machine failures are missing:- what level
of machine failures needs to be achieved to
meet target throughput?
The machine failures are therefore treated as

an experimental factor.
data
Treating data as an experimental factor
Challenge-limitation
Can only be applied when there is

some control over the data in
question.
data
●These approaches for handling category C data
suffice in most circumstances.
When these approaches are not satisfactory

✗Revise the conceptual model so that the need for
the data is engineered out of the model;

✗Change the modeling objectives so that such data
are no longer needed;

✗Abandon the simulation study altogether!

Data accuracy
Ensure that the data that is being used is
accurate.
Category A data
➢Check that the source is reliable;
➢Ensure that the collection exercise was done
properly;
➢Use a graph to check some unusual pattern.
If the data are considered to be too inaccurate

for the simulation model, then an alternative
source should be SCI-UON,
sought. June/July 2011, E. Opiyo
Data accuracy
Ensure that the data used is accurate.
Category B data
➔Use adequate sample size;
➔Ensure that the data collection staff have no
vested interest in the data;

➔Put the mechanisms in place to monitor and
avoid inaccuracies creeping into the data

collection;
➔For critical data arrange for two sets of
observations and cross-checking.

Data format
Data should be accurate and it should also be in
the right format for the simulation.
Understand how the computer model, particularly the

underlying software, interprets the data that are input.
Example
Interpreting the time between component failure. This
can be interpreted as the time between start of one
breakdown to the start of the next breakdown or, the
other interpretation is the time between the end of one
breakdown to the start of the next breakdown.
Data format
✔Know the format of the data that are
being supplied or collected and ensure that
these are appropriate for the simulation
model.
✔Treat any data that is not in the proper

format as inaccurate and take actions to
improve the data or find an alternative
source.
Representing Unpredictable Variability
Modeling variability, such as unpredictable
(or random) variability, is important in
simulation modeling;
Several aspects of an operations system are
subject to variability such as:
✔customer arrivals;
✔Service times;
✔processing times;
✔routing decisions.
Variability can be modeled using:
Traces;
✗
Empirical distributions
✗
Statistical distributions.
✗

Traces
A trace is a stream of data that describes a
sequence of events.
A trace holds data about:
●The time at which the events occur
●Data about the events such as:
● the type of part to be processed (part
arrival event)
● the nature of the fault (machine
breakdown event).
Traces
➔The trace is read by the simulation,
from a file as it runs and the events are
recreated in the model as described by
the trace.
➔Traces can be obtained by collecting
data from the real system for example by
some automatic monitoring systems.
Traces
Example of a
trace
[Robinson
2004, p. 101] SCI-UON, June/July 2011, E. Opiyo
✔Show the frequency with which data values,
or ranges of data values, occur;
✔Are represented by histograms or frequency

charts;
They usually arise from historical data;

✔
✔They can be summaries of the data held in a

trace.
➢When a simulation runs, values are sampled
from empirical distributions by using random

numbers.
➢Most simulation software, however, enable
the user to directly enter empirical distribution
data.
➢The sampling process is then hidden from the
user except for, in some cases, the need to
specify a pseudo random number stream.
Example of an Empirical
Distribution: Call Arrivals
at a Call Centre. SCI-UON, June/July 2011, E. Opiyo
Statistical distributions
Usually defined by some mathematical function or

probability density function (PDF).

There are many but the best known one is the

normal distribution (ND).
ND is specified by two parameters:


 mean (its location)
 standard deviation (its spread).
Uses: model errors in weight or dimension that


occur in manufacturing components.

The Normal Distribution

Example of a
normal distribution SCI-UON, June/July 2011, E. Opiyo
ND is however limited:-
✗Generates negative values, especially if the
standard deviation is relatively large in comparison
to the mean.
Can generate some invalid sample values

✗
particularly even if the context cannot handle

negative data items such as the inter-arrival time
data.
Some other distribution can be used eg. Erlang,

negative exponential, etc.

The general categories of standard
statistical distributions include:
Continuous distributions

Discrete distributions

Approximate distributions

The general categories of standard statistical
distributions
Continuous distributions
For sampling data that can take any value
across a range or an interval.
Examples include Normal, Erlang and

negative exponential distributions.
See earlier sections.
distributions
Discrete distributions
Sampling data take only specific values across
a range, eg. integers or non-numeric values.
Examples
Binomial distribution that may be used to
describe the number of successes, or failures,

in a specified number of trials;
The Poisson distribution.
distributions
Approximate distributions
➔Used in the absence of data;
➔They do not have strong theoretical

underpinnings.
Example
Uniform distribution
Bootstrapping
➔It involves re-sampled data at random
with replacement from an original trace.
➔The data on the trace is re-shuffled.
➔Bootstrapping is particularly useful

when there is only a small sample of data
available.
Bootstrapping

Selecting Statistical Distributions
Decision must be made on which statistical

distributions are most appropriate for the
model that is being developed.
This selection can be based on:
➔The known properties of the process being

modeled;
➔By fitting a distribution to empirical data.

Selecting distributions from known
properties of the process
The appropriate distribution can be selected by
considering the properties of the process being
modeled.
Example
Modeling customer arrivals at a bank
✔Assuming that the arrivals are at random, a
negative exponential distribution can be used.
✔ Erlang, gamma and lognormal distributions are
known to represent the properties of a service

process. SCI-UON, June/July 2011, E. Opiyo
Example
➢Modeling customer arrivals at a bank.
➢The gamma distribution provides a
greater range of shapes than the Erlang.
➢The lognormal distribution can have a
greater peak (probability) around the

modal average than the Erlang or the
gamma distributions.
The appropriate distribution can be selected
by considering the properties of the process
being modeled.
Example
Modeling time between failures
It is probably reasonable to assume a
Weibull distribution.
These distributions were outlined earlier (see
week 2 notes). SCI-UON, June/July 2011, E. Opiyo
Fitting statistical distributions to empirical
data
Dependent on the availability of the

empirical data.
This process is performed in three stages

Select a statistical distribution;
Determine the parameters;
Test the goodness-of-fit.

data
Advisable:-try a series of distributions with
different parameter values;
The process may take several iterations.
Example
➔Data collected on the repair time of a
machine;
➔In total, 100 observations have been made
and recorded in the histogram shown below.

data
100 observations on repair time

Selecting a statistical distribution
A useful statistical distribution can be

selected by:
➔Inspecting the data using a histogram to
guide on the shape of the distribution of

the empirical data.
➔Using the known properties of the

process.
Selecting a statistical distribution
A useful statistical distribution can be
selected by:
Example
➔The histogram above suggests that the shape can
be Erlang or gamma, or possibly some similar

distribution such as lognormal or Weibull.
➔All of these can be used to represent the time to
complete a task, which is appropriate for this

situation.
➔We try fitting an Erlang distribution to the data.

Determine the parameters
This step follows the selection of a distribution
Erlang distribution requires an estimate of the
mean and the k parameter.

The mean can be calculated from the histogram
data.

Determine the parameters
In this case the estimated mean = 762/100 = 7.62

minutes, based on the data summarized from the
histogram.
There is, no way of estimating the k parameter.
Use trial and error as the only option.
In this case, the distribution seems to be skewed to
the right and the hump is near to the left suggesting

low values of k will be tried (one, three and five).
Test the goodness-of-fit
The goodness-of-fit is a measure of the
extent to which the data that is observed
is similar to those obtained from some
selected distribution function.
The goodness-of-fit can be tested:-


 Graphically
 Using statistical tests.

The graphical approach
 compare the histogram of the empirical data
with a histogram of the proposed distribution.
Data from the proposed distribution can be

created by taking many samples (say 10,000
or more) and placing them in the same cell
ranges used for the histogram of the empirical
data.

The graphical approach
compare the histogram of the empirical data with a histogram
of the proposed distribution.
Generating samples:- use the

distribution functions in the simulation
software;
Limitation:- is a sampling procedure
hence, not completely accurate; so take
large number of samples.
More accurate approach: calculate
the frequencies from the cumulative
distribution functions for the
proposed distributions.
This will provide an exact value for

the percentage of observations that
should fall in each cell range.
Test the goodness-of-fit- Graphical method

Test the goodness-of-fit- graphical method
Limitations
➢Inspecting histograms is possible only if a
small amount of data is available, say less

than 30 samples;
➢The shape of the histogram is unlikely to be
smooth.
➢Graphical approaches based on cumulative
probabilities can, however, overcome these

limitations.
The chi-square test is probably the best known
goodness-of-fit test.
The chi-square test
Is probably the best known goodness-of-fit test.
The calculation of the chi-square value as follows:
k
χ2 = Σ (Oi − Ei)2/ Ei
i=1
where: χ2 = chi-square value ;
Oi = observed frequency in ith range (empirical
distribution); Ei = expected frequency in ith range
(proposed distribution)
k = total number of ranges
The chi-square test
Two other factors needed:
➔The level of significance- Typically, 5%
is used for the level of significance.

➔The degrees of freedom. The number of
degrees of freedom is calculated as

follows:
Degrees of freedom =
Number of cell ranges – Number of estimated parameters− 1

The chi-square test
The repair time example

●the number of ranges is seven (see histogram)
●Two parameters have been estimated:-
➢ The distribution mean
➢ The value of k.
There are four degrees of freedom.



The chi-square test
The chi-square value is compared with a
critical value;
 Critical value is read from chi-square
tables or generated in a spreadsheet using

the appropriate function (‘CHIINV’ in
Excel);
Critical value is obtained for the correct
level of significance and the number of

degrees of freedom.
Test the goodness-of-fit- Chi-square test

Test the goodness-of-fit- Chi-square test
➔The tests performed on the repair time data
reach the same conclusion that the Erlang
(7.62, 3) provides the best fit.
➔This does not, however, mean that we should
simply accept this distribution as being

correct.
➔We should try other parameter values,
particularly trying k values of two and four.
➔ It is also worth trying other types of
distribution, for instance, a lognormal
distribution. SCI-UON, June/July 2011, E. Opiyo
Issues in distribution fitting
➢An important issue to consider is the
difference between the best fit and the

best distribution.
➢The best fit is not necessarily the best
distribution. This is because the empirical

data are just a sample.
➢The shape and parameters that are
estimated from the data, such as the
mean, are just estimates.
Issues in distribution fitting
➢It is never possible to be certain that the
correct statistical distribution has been

found.
➢Selecting a statistical distribution means
making an assumption about the nature

of the population distribution based on
what is usually a small sample.
➢Such assumptions should be added to
the description of the conceptual model.

Distribution fitting software
●Effective distribution fitting may be too
much time consuming.
●Some software packages that automate this
process, eg. ExpertFit (www: Averill M. Law

& Associates); Stat::Fit (www:Geer Mountain
Software).
●The user only enters the empirical data and
the software automatically generates graphical

and statistical reports, recommending the best
fitting distributions.
Model coding
✔Changing from the conceptual model to the
computer model.
Use:-
✔
✔A programming language;
✔A specialized simulation software
package (most expected).

Model coding
Structuring the Model (algorithm design)
Formulate the overall coding plan – have in mind
the simulation software that is going to be used to
develop the code.
The following points may be important:
Speed of coding: the speed with which the code can
be written.
Transparency: the ease with which the code can be
understood.
Flexibility: the ease with which the code can be changed.
Run-speed: the speed with which the code will
execute. SCI-UON, June/July 2011, E. Opiyo
Model coding
➔The model structure is typically a paper-based
description of the model, outlining the constructs
and logic of the simulation in terms of the software
being used.
➔It normally entails some form of schematic
outlining the components, variables, attributes and
logic of the model.
➔It is not necessary to write out each line of code,
but it is useful to describe the code in a natural

language, for instance, ‘‘if call type is A, route to
enquiries’’. SCI-UON, June/July 2011, E. Opiyo
Model coding
✔The way in which the structure is
expressed depends very much on the
nature of the software being used.
✔One can use pseudo-code, or visual
diagrams.
✔Some standard notations such as UML
(the unified modeling language) can be
used.
Model coding
Coding the Model
The important activities in the production
of the computer model are:-
Coding: developing the code in the
simulation software.
Testing: verifying and validating the
model.
Documenting: recording the details of
the model.
Model coding
Coding the Model
➔Incremental code development is
advisable as it can facilitate testing.
➔Testing can be in-built in the code
production process.
➔Good code production behavior
should be observed.
Model coding
Separate the data from the code and from
the results
➢It is a good practice;
➢Keep important constants or data values
a place where they are referenced by

formulae;
Inputs:- experimental factors and general
➢
model data, these go into the simulation

model providing the outputs as the results.
Model coding
the results
Advantages
Familiarity: users do not need extensive
training in the data input and results
generation software.
Ease of use: in-depth understanding of
the simulation code and simulation
software is not required.
Model coding
the results
Advantages
Presentation: ability to use specialist software
(e.g. spreadsheet) for presenting the data and the
results.
Further analysis: ability to use specialist software
facilities for further analysis of the data and the results.
Version control: enables maintaining a record of all
experimental scenarios by holding separate versions of the
experimental factors SCI-UON,
file and the results file.
June/July 2011, E. Opiyo
Model coding
Use of pseudo random number streams
Pseudo random number is a stream of
random numbers whose sequence can be
replicated exactly because the stream
always starts from the same seed.
A simulation run that is repeated, will

always give the same result.
See earlier sections on pseudo random
numbers. SCI-UON, June/July 2011, E. Opiyo

Model coding
Documenting the Model and the Simulation Project
Documentation may account for
between 20 to 30% of the total
development costs.
A comparable amount of effort should
be put into documenting the simulation

models.
Useful for subsequent maintenance of
the model.
Model coding
The important documents
●The conceptual model;
●A list of model assumptions and
simplifications;
●The model structure;
●The input data and experimental
factors: including interpretation and

sources of data;
Model coding
The important documents
●The results format: interpretation of results;
●Using meaningful names for
components, variables and attributes;

●Comments and notes in the code;
●The visual display of the model.

Model coding
The components of project documentation
➢The project specification
➢Minutes of meetings
➢Verification and validation performed
➢Experimental scenarios run
➢Results of experiments (associated with each scenario)
➢Final report
➢Project review

Model coding
✗Keep a record of all the

experimental scenarios run and of
the results associated with them;
Provide a user guide to the users
✗
by the time everything is handed

over to the users.
Model coding
The structure of the user guide
The project specification: providing background to
the project and the conceptual model.
Input data: interpretation and sources of data.
Experimental factors: their meanings, range and how to
change them.
Guide to running the model.
Results: accessing and interpreting results.

End of Week 8 Exercises
1.Discuss the importance of data in modeling and simulation.
2.Discuss how data is obtained and how problems are handled in a
simulation study.
3.A simulation model is being developed of the check-in area at an airport.
Different airlines follow slightly different procedures and within airlines the staff
have quite different levels of experience. Discuss the problems that might be
encountered in collecting the data on check-in times.
4.Which statistical distributions would be most appropriate for modeling the
following processes?
i. Process A: the weight of a bottle leaving a filling process
ii.Process B: the time between failure of a machine
iii.Process C: the check-in time at an airport
5.Explore the distributions in any selected simulation software or
spreadsheet.
6.Discuss how you would go about moving from a conceptual model to a
computer model.
7.Discuss the importance of documentation and different types of
documentation.
8.Select and then develop the simulation model and code in the software of your choice in
an area of your choice. SCI-UON, June/July 2011, E. Opiyo

Week 8 Notes

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Week 8 Notes

Uploaded by

Copyright:

Available Formats

ICS 810 MODELLING AND SIMULATION NOTES

Mainly based on Stewart Robinson (2004). Simulation: The Practice of Model

✔Data:- central to the development and

✔Accurate data for design/populate the

✔We consider issues on collection and

SCI-UON, June/July 2011, E. Opiyo

Data: can be quantitative or numbers eg.

Data: can be qualitative i.e. non-numeric facts

The modeler: check adequacy of the

Data generated: analyze to provide

Preliminary or contextual data

Such data is needed to develop a

Basic data on process capability and beliefs

about the cause of problems that are being

Does not require extensive data collection.

➢Customer arrival patterns;

➢Descriptions of customer types;

➢Scheduling and processing rules.

SCI-UON, June/July 2011, E. Opiyo

Obtained by: detailed data collection

Can be identified from the conceptual

Compare the model results with data

There are different kinds of data that

These can be grouped as:

➔Collected for some other reasons;

➔Automatically collected electronically.

●The cycle times of the machines;

●Service times and arrival rates in a bank from

a survey of staffing levels;

➔Machine failure rates and repair times;

➔Nature of human decision-making.

SCI-UON, June/July 2011, E. Opiyo

➢Use questionnaires or interviews with

Ensure that the data obtained are both

SCI-UON, June/July 2011, E. Opiyo

Unfortunately category C data occur in many situations.

well as data that are available, but are inaccurate.

 Estimating the data;

Treating the data as an experimental

standardized data exist;

of the model is introduced; the

Sensitivity analysis is useful in

measuring the effect of estimating

The machine failures are therefore treated as

Can only be applied when there is

When these approaches are not satisfactory

the data is engineered out of the model;

are no longer needed;

SCI-UON, June/July 2011, E. Opiyo

➢Ensure that the collection exercise was done

If the data are considered to be too inaccurate

➔Ensure that the data collection staff have no

vested interest in the data;

avoid inaccuracies creeping into the data

observations and cross-checking.

underlying software, interprets the data that are input.

✔Treat any data that is not in the proper

SCI-UON, June/July 2011, E. Opiyo

●Data about the events such as:

● the type of part to be processed (part

or ranges of data values, occur;

✔Are represented by histograms or frequency

They usually arise from historical data;