You are on page 1of 84

ICS 810 MODELLING AND SIMULATION NOTES

Mainly based on Stewart Robinson (2004). Simulation: The Practice of Model


Development and Use

Week 8
Data related issues and model coding

✔Data:- central to the development and


use of simulation models.

✔Accurate data for design/populate the


model → results from the model be
accurate.
SCI-UON, June/July 2011, E. Opiyo
ICS 810 MODELLING AND SIMULATION NOTES
Mainly based on Stewart Robinson (2004). Simulation: The Practice of Model
Development and Use

Week 8
Data related issues and model coding

✔We consider issues on collection and


analysis of data.
✔We also consider how to handle
incorrect and missing data.
✔Data modeling is called input modeling.

SCI-UON, June/July 2011, E. Opiyo


Data Requirements

Data: can be quantitative or numbers eg.


data needed on cycle (service) times,
breakdown frequencies or arrival patterns.

Data: can be qualitative i.e. non-numeric facts


and beliefs about a system as expressed in
pictures or words.
Eg. queuing behaviour of customers as
expressed in words.
SCI-UON, June/July 2011, E. Opiyo
Data Requirements
Information: data with some
interpretation-data that have been
analyzed for some purpose.

The modeler: check adequacy of the


data and information available for the
purpose of the simulation study.

Data generated: analyze to provide


useful information for the simulation.
SCI-UON, June/July 2011, E. Opiyo
The types of data that are required [Pidd
2003]:

Preliminary or contextual data

Such data is needed to develop a


thorough understanding of the
problem situation.
SCI-UON, June/July 2011, E. Opiyo
The types of data that are required [Pidd
2003]:
Preliminary or contextual data
Examples
A layout diagram;

Basic data on process capability and beliefs

about the cause of problems that are being


experienced.

Does not require extensive data collection.


Forms part of conceptual modeling.
SCI-UON, June/July 2011, E. Opiyo
The types of data that are required [Pidd
2003]:
Data for the computer model
Examples
➢Data on cycle times;
➢Data on and breakdowns;

➢Customer arrival patterns;

➢Descriptions of customer types;

➢Scheduling and processing rules.

SCI-UON, June/July 2011, E. Opiyo


The types of data that are required [Pidd
2003]:
Data for the computer model

Obtained by: detailed data collection


exercise.

Can be identified from the conceptual


model based on the components of the
model and the details associated with
them. SCI-UON, June/July 2011, E. Opiyo
The types of data that are required [Pidd
2003]:
Data for the model validation
For ensuring that each part of the model,
as well as the whole model represents
the real world system with sufficient
accuracy.

Compare the model results with data


from the real system- if real world
exists . SCI-UON, June/July 2011, E. Opiyo
Obtaining Data
➢Sometimes some data are immediately
available while others must be collected.

There are different kinds of data that


may be available.

These can be grouped as:


 Categories A
 Category B

 Category C.
SCI-UON, June/July 2011, E. Opiyo
Obtaining Data
Category A data
➔Are known or have been collected earlier;

➔Collected for some other reasons;

➔Automatically collected electronically.

Examples
●Physical layout of a manufacturing plant;

●The cycle times of the machines;

●Service times and arrival rates in a bank from

a survey of staffing levels;


●Transaction data at service points.
SCI-UON, June/July 2011, E. Opiyo
Obtaining Data
Category B data
The data that should be collected.
Examples
➔Service times;

➔Arrival patterns;

➔Machine failure rates and repair times;

➔Nature of human decision-making.

SCI-UON, June/July 2011, E. Opiyo


Obtaining Data
Category B data -Data collection
➢Direct observations if necessary;

➢Use questionnaires or interviews with


subject matter experts such as staff,
equipment suppliers or customers.

Ensure that the data obtained are both


accurate and in the right format.

SCI-UON, June/July 2011, E. Opiyo


Obtaining Data
Category C data
These are data items that are not available
and cannot be collected.
●Occur where the real world system does not yet exist,
making it impossible to observe it in operation;
●Occur where there is no time to collect meaningful data.

Unfortunately category C data occur in many situations.


Put in place some means for dealing with these data, as

well as data that are available, but are inaccurate.


SCI-UON, June/July 2011, E. Opiyo
Dealing with unobtainable (category C)
data
The category C data can be
handled by:

 Estimating the data;

Treating the data as an experimental


factor rather than a fixed parameter.
SCI-UON, June/July 2011, E. Opiyo
Dealing with unobtainable (category C)
data
Estimating data
Collect data from a similar system in the
same, or even another, organization;
Get data from some processes for which

standardized data exist;


Discuss with subject matter experts,
such as staff and equipment suppliers.
SCI-UON, June/July 2011, E. Opiyo
Dealing with unobtainable (category C)
data
Estimating data
Some uncertainty about the validity

of the model is introduced; the


credibility is reduced.

Sensitivity analysis is useful in


measuring the effect of estimating


data. SCI-UON, June/July 2011, E. Opiyo
Dealing with unobtainable (category C)
data
Treating data as an experimental factor
Question: what are the levels of the data
values?
eg. Machine failures are missing:- what level
of machine failures needs to be achieved to
meet target throughput?

The machine failures are therefore treated as


an experimental factor.
SCI-UON, June/July 2011, E. Opiyo
Dealing with unobtainable (category C)
data
Treating data as an experimental factor

Challenge-limitation

Can only be applied when there is


some control over the data in
question.
SCI-UON, June/July 2011, E. Opiyo
Dealing with unobtainable (category C)
data
●These approaches for handling category C data
suffice in most circumstances.

When these approaches are not satisfactory


✗Revise the conceptual model so that the need for

the data is engineered out of the model;


✗Change the modeling objectives so that such data

are no longer needed;


✗Abandon the simulation study altogether!

SCI-UON, June/July 2011, E. Opiyo


Data accuracy
Ensure that the data that is being used is
accurate.
Category A data
➢Check that the source is reliable;

➢Ensure that the collection exercise was done

properly;
➢Use a graph to check some unusual pattern.

If the data are considered to be too inaccurate


for the simulation model, then an alternative
source should be SCI-UON,
sought. June/July 2011, E. Opiyo
Data accuracy
Ensure that the data used is accurate.

Category B data
➔Use adequate sample size;

➔Ensure that the data collection staff have no

vested interest in the data;


➔Put the mechanisms in place to monitor and

avoid inaccuracies creeping into the data


collection;
➔For critical data arrange for two sets of

observations and cross-checking.


SCI-UON, June/July 2011, E. Opiyo
Data format
Data should be accurate and it should also be in
the right format for the simulation.
Understand how the computer model, particularly the

underlying software, interprets the data that are input.

Example
Interpreting the time between component failure. This
can be interpreted as the time between start of one
breakdown to the start of the next breakdown or, the
other interpretation is the time between the end of one
breakdown to the start of the next breakdown.
SCI-UON, June/July 2011, E. Opiyo
Data format
✔Know the format of the data that are
being supplied or collected and ensure that
these are appropriate for the simulation
model.

✔Treat any data that is not in the proper


format as inaccurate and take actions to
improve the data or find an alternative
source.
SCI-UON, June/July 2011, E. Opiyo
Representing Unpredictable Variability
Modeling variability, such as unpredictable
(or random) variability, is important in
simulation modeling;
Several aspects of an operations system are
subject to variability such as:
✔customer arrivals;

✔Service times;

✔processing times;

✔routing decisions.
SCI-UON, June/July 2011, E. Opiyo
Representing Unpredictable Variability
Variability can be modeled using:

Traces;

Empirical distributions

Statistical distributions.

SCI-UON, June/July 2011, E. Opiyo


Representing Unpredictable Variability
Traces
A trace is a stream of data that describes a
sequence of events.
A trace holds data about:
●The time at which the events occur

●Data about the events such as:

● the type of part to be processed (part

arrival event)
● the nature of the fault (machine
breakdown event).
SCI-UON, June/July 2011, E. Opiyo
Representing Unpredictable Variability
Traces
➔The trace is read by the simulation,
from a file as it runs and the events are
recreated in the model as described by
the trace.
➔Traces can be obtained by collecting
data from the real system for example by
some automatic monitoring systems.
SCI-UON, June/July 2011, E. Opiyo
Representing Unpredictable Variability
Traces

Example of a
trace
[Robinson
2004, p. 101] SCI-UON, June/July 2011, E. Opiyo
Empirical distributions
✔Show the frequency with which data values,

or ranges of data values, occur;

✔Are represented by histograms or frequency


charts;

They usually arise from historical data;


✔They can be summaries of the data held in a


trace.
SCI-UON, June/July 2011, E. Opiyo
Empirical distributions
➢When a simulation runs, values are sampled

from empirical distributions by using random


numbers.
➢Most simulation software, however, enable
the user to directly enter empirical distribution
data.
➢The sampling process is then hidden from the
user except for, in some cases, the need to
specify a pseudo random number stream.
SCI-UON, June/July 2011, E. Opiyo
Empirical distributions

Example of an Empirical
Distribution: Call Arrivals
at a Call Centre. SCI-UON, June/July 2011, E. Opiyo
Statistical distributions
Usually defined by some mathematical function or

probability density function (PDF).


There are many but the best known one is the

normal distribution (ND).

ND is specified by two parameters:


 mean (its location)

 standard deviation (its spread).

Uses: model errors in weight or dimension that


occur in manufacturing components.


SCI-UON, June/July 2011, E. Opiyo
Statistical distributions
The Normal Distribution

Example of a
normal distribution SCI-UON, June/July 2011, E. Opiyo
Statistical distributions
ND is however limited:-
✗Generates negative values, especially if the
standard deviation is relatively large in comparison
to the mean.

Can generate some invalid sample values


particularly even if the context cannot handle


negative data items such as the inter-arrival time
data.
Some other distribution can be used eg. Erlang,

negative exponential, etc.


SCI-UON, June/July 2011, E. Opiyo
Statistical distributions
The general categories of standard
statistical distributions include:

Continuous distributions

Discrete distributions

Approximate distributions

SCI-UON, June/July 2011, E. Opiyo
Statistical distributions
The general categories of standard statistical
distributions
Continuous distributions
For sampling data that can take any value
across a range or an interval.

Examples include Normal, Erlang and


negative exponential distributions.
See earlier sections.
SCI-UON, June/July 2011, E. Opiyo
Statistical distributions
The general categories of standard statistical
distributions
Discrete distributions
Sampling data take only specific values across
a range, eg. integers or non-numeric values.
Examples
Binomial distribution that may be used to

describe the number of successes, or failures,


in a specified number of trials;
The Poisson distribution.
SCI-UON, June/July 2011, E. Opiyo
Statistical distributions
The general categories of standard statistical
distributions

Approximate distributions
➔Used in the absence of data;

➔They do not have strong theoretical


underpinnings.
Example
Uniform distribution
SCI-UON, June/July 2011, E. Opiyo
Bootstrapping
➔It involves re-sampled data at random
with replacement from an original trace.

➔The data on the trace is re-shuffled.

➔Bootstrapping is particularly useful


when there is only a small sample of data
available.
SCI-UON, June/July 2011, E. Opiyo
Bootstrapping

SCI-UON, June/July 2011, E. Opiyo


Selecting Statistical Distributions

Decision must be made on which statistical


distributions are most appropriate for the
model that is being developed.

This selection can be based on:

➔The known properties of the process being


modeled;
➔By fitting a distribution to empirical data.

SCI-UON, June/July 2011, E. Opiyo


Selecting Statistical Distributions
Selecting distributions from known
properties of the process
The appropriate distribution can be selected by
considering the properties of the process being
modeled.
Example
Modeling customer arrivals at a bank
✔Assuming that the arrivals are at random, a
negative exponential distribution can be used.
✔ Erlang, gamma and lognormal distributions are

known to represent the properties of a service


process. SCI-UON, June/July 2011, E. Opiyo
Selecting Statistical Distributions
Selecting distributions from known
properties of the process
Example
➢Modeling customer arrivals at a bank.
➢The gamma distribution provides a
greater range of shapes than the Erlang.
➢The lognormal distribution can have a

greater peak (probability) around the


modal average than the Erlang or the
gamma distributions.
SCI-UON, June/July 2011, E. Opiyo
Selecting Statistical Distributions
Selecting distributions from known
properties of the process
The appropriate distribution can be selected
by considering the properties of the process
being modeled.
Example
Modeling time between failures
It is probably reasonable to assume a
Weibull distribution.
These distributions were outlined earlier (see
week 2 notes). SCI-UON, June/July 2011, E. Opiyo
Selecting Statistical Distributions
Fitting statistical distributions to empirical
data

Dependent on the availability of the


empirical data.

This process is performed in three stages


Select a statistical distribution;

Determine the parameters;

Test the goodness-of-fit.

SCI-UON, June/July 2011, E. Opiyo


Selecting Statistical Distributions
Fitting statistical distributions to empirical
data
Advisable:-try a series of distributions with
different parameter values;
The process may take several iterations.

Example
➔Data collected on the repair time of a
machine;
➔In total, 100 observations have been made

and recorded in the histogram shown below.


SCI-UON, June/July 2011, E. Opiyo
Selecting Statistical Distributions
Fitting statistical distributions to empirical
data
100 observations on repair time

SCI-UON, June/July 2011, E. Opiyo


Selecting a statistical distribution

A useful statistical distribution can be


selected by:
➔Inspecting the data using a histogram to

guide on the shape of the distribution of


the empirical data.

➔Using the known properties of the


process.
SCI-UON, June/July 2011, E. Opiyo
Selecting a statistical distribution
A useful statistical distribution can be
selected by:
Example
➔The histogram above suggests that the shape can

be Erlang or gamma, or possibly some similar


distribution such as lognormal or Weibull.
➔All of these can be used to represent the time to

complete a task, which is appropriate for this


situation.
➔We try fitting an Erlang distribution to the data.

SCI-UON, June/July 2011, E. Opiyo


Determine the parameters
This step follows the selection of a distribution
Erlang distribution requires an estimate of the

mean and the k parameter.


The mean can be calculated from the histogram

data.

SCI-UON, June/July 2011, E. Opiyo


Determine the parameters

In this case the estimated mean = 762/100 = 7.62


minutes, based on the data summarized from the
histogram.
There is, no way of estimating the k parameter.

Use trial and error as the only option.

In this case, the distribution seems to be skewed to

the right and the hump is near to the left suggesting


low values of k will be tried (one, three and five).
SCI-UON, June/July 2011, E. Opiyo
Test the goodness-of-fit
The goodness-of-fit is a measure of the
extent to which the data that is observed
is similar to those obtained from some
selected distribution function.

The goodness-of-fit can be tested:-


 Graphically

 Using statistical tests.

SCI-UON, June/July 2011, E. Opiyo


Test the goodness-of-fit
The graphical approach
 compare the histogram of the empirical data

with a histogram of the proposed distribution.

Data from the proposed distribution can be


created by taking many samples (say 10,000
or more) and placing them in the same cell
ranges used for the histogram of the empirical
data.

SCI-UON, June/July 2011, E. Opiyo


Test the goodness-of-fit
The graphical approach
compare the histogram of the empirical data with a histogram
of the proposed distribution.

Generating samples:- use the


distribution functions in the simulation
software;
Limitation:- is a sampling procedure
hence, not completely accurate; so take
large number of samples.
SCI-UON, June/July 2011, E. Opiyo
Test the goodness-of-fit
More accurate approach: calculate
the frequencies from the cumulative
distribution functions for the
proposed distributions.

This will provide an exact value for


the percentage of observations that
should fall in each cell range.
SCI-UON, June/July 2011, E. Opiyo
Test the goodness-of-fit- Graphical method

SCI-UON, June/July 2011, E. Opiyo


Test the goodness-of-fit- graphical method
Limitations
➢Inspecting histograms is possible only if a

small amount of data is available, say less


than 30 samples;
➢The shape of the histogram is unlikely to be

smooth.
➢Graphical approaches based on cumulative

probabilities can, however, overcome these


limitations.
The chi-square test is probably the best known
goodness-of-fit test.
SCI-UON, June/July 2011, E. Opiyo
Test the goodness-of-fit
The chi-square test
Is probably the best known goodness-of-fit test.
The calculation of the chi-square value as follows:
k
χ2 = Σ (Oi − Ei)2/ Ei
i=1
where: χ2 = chi-square value ;
Oi = observed frequency in ith range (empirical
distribution); Ei = expected frequency in ith range
(proposed distribution)
k = total number of ranges
SCI-UON, June/July 2011, E. Opiyo
Test the goodness-of-fit
The chi-square test
Two other factors needed:
➔The level of significance- Typically, 5%

is used for the level of significance.


➔The degrees of freedom. The number of

degrees of freedom is calculated as


follows:
Degrees of freedom =
Number of cell ranges – Number of estimated parameters− 1

SCI-UON, June/July 2011, E. Opiyo


Test the goodness-of-fit
The chi-square test

The repair time example


●the number of ranges is seven (see histogram)

●Two parameters have been estimated:-

➢ The distribution mean

➢ The value of k.

There are four degrees of freedom.


SCI-UON, June/July 2011, E. Opiyo


Test the goodness-of-fit
The chi-square test
The chi-square value is compared with a

critical value;
 Critical value is read from chi-square

tables or generated in a spreadsheet using


the appropriate function (‘CHIINV’ in
Excel);
Critical value is obtained for the correct

level of significance and the number of


degrees of freedom.
SCI-UON, June/July 2011, E. Opiyo
Test the goodness-of-fit- Chi-square test

SCI-UON, June/July 2011, E. Opiyo


Test the goodness-of-fit- Chi-square test
➔The tests performed on the repair time data
reach the same conclusion that the Erlang
(7.62, 3) provides the best fit.
➔This does not, however, mean that we should

simply accept this distribution as being


correct.
➔We should try other parameter values,
particularly trying k values of two and four.
➔ It is also worth trying other types of
distribution, for instance, a lognormal
distribution. SCI-UON, June/July 2011, E. Opiyo
Issues in distribution fitting
➢An important issue to consider is the

difference between the best fit and the


best distribution.
➢The best fit is not necessarily the best

distribution. This is because the empirical


data are just a sample.
➢The shape and parameters that are
estimated from the data, such as the
mean, are just estimates.
SCI-UON, June/July 2011, E. Opiyo
Issues in distribution fitting
➢It is never possible to be certain that the

correct statistical distribution has been


found.
➢Selecting a statistical distribution means

making an assumption about the nature


of the population distribution based on
what is usually a small sample.
➢Such assumptions should be added to

the description of the conceptual model.


SCI-UON, June/July 2011, E. Opiyo
Distribution fitting software
●Effective distribution fitting may be too
much time consuming.
●Some software packages that automate this

process, eg. ExpertFit (www: Averill M. Law


& Associates); Stat::Fit (www:Geer Mountain
Software).
●The user only enters the empirical data and

the software automatically generates graphical


and statistical reports, recommending the best
fitting distributions.
SCI-UON, June/July 2011, E. Opiyo
Model coding
✔Changing from the conceptual model to the
computer model.

Use:-

✔A programming language;
✔A specialized simulation software
package (most expected).

SCI-UON, June/July 2011, E. Opiyo


Model coding
Structuring the Model (algorithm design)
Formulate the overall coding plan – have in mind
the simulation software that is going to be used to
develop the code.
The following points may be important:
Speed of coding: the speed with which the code can
be written.
Transparency: the ease with which the code can be
understood.
Flexibility: the ease with which the code can be changed.
Run-speed: the speed with which the code will
execute. SCI-UON, June/July 2011, E. Opiyo
Model coding
Structuring the Model (algorithm design)
➔The model structure is typically a paper-based
description of the model, outlining the constructs
and logic of the simulation in terms of the software
being used.
➔It normally entails some form of schematic
outlining the components, variables, attributes and
logic of the model.
➔It is not necessary to write out each line of code,

but it is useful to describe the code in a natural


language, for instance, ‘‘if call type is A, route to
enquiries’’. SCI-UON, June/July 2011, E. Opiyo
Model coding
Structuring the Model (algorithm design)
✔The way in which the structure is
expressed depends very much on the
nature of the software being used.
✔One can use pseudo-code, or visual
diagrams.
✔Some standard notations such as UML
(the unified modeling language) can be
used.
SCI-UON, June/July 2011, E. Opiyo
Model coding
Coding the Model
The important activities in the production
of the computer model are:-
Coding: developing the code in the
simulation software.
Testing: verifying and validating the
model.
Documenting: recording the details of
the model.
SCI-UON, June/July 2011, E. Opiyo
Model coding
Coding the Model
➔Incremental code development is
advisable as it can facilitate testing.
➔Testing can be in-built in the code
production process.
➔Good code production behavior
should be observed.
SCI-UON, June/July 2011, E. Opiyo
Model coding
Separate the data from the code and from
the results
➢It is a good practice;
➢Keep important constants or data values

a place where they are referenced by


formulae;
Inputs:- experimental factors and general

model data, these go into the simulation


model providing the outputs as the results.
SCI-UON, June/July 2011, E. Opiyo
Model coding
Separate the data from the code and from
the results
Advantages
Familiarity: users do not need extensive
training in the data input and results
generation software.
Ease of use: in-depth understanding of
the simulation code and simulation
software is not required.
SCI-UON, June/July 2011, E. Opiyo
Model coding
Separate the data from the code and from
the results
Advantages
Presentation: ability to use specialist software
(e.g. spreadsheet) for presenting the data and the
results.
Further analysis: ability to use specialist software
facilities for further analysis of the data and the results.
Version control: enables maintaining a record of all
experimental scenarios by holding separate versions of the
experimental factors SCI-UON,
file and the results file.
June/July 2011, E. Opiyo
Model coding
Use of pseudo random number streams
Pseudo random number is a stream of
random numbers whose sequence can be
replicated exactly because the stream
always starts from the same seed.

A simulation run that is repeated, will


always give the same result.
See earlier sections on pseudo random

numbers. SCI-UON, June/July 2011, E. Opiyo


Model coding
Documenting the Model and the Simulation Project
Documentation may account for
between 20 to 30% of the total
development costs.
A comparable amount of effort should

be put into documenting the simulation


models.
Useful for subsequent maintenance of

the model.
SCI-UON, June/July 2011, E. Opiyo
Model coding
Documenting the Model and the Simulation Project
The important documents
●The conceptual model;
●A list of model assumptions and

simplifications;
●The model structure;

●The input data and experimental

factors: including interpretation and


sources of data;
SCI-UON, June/July 2011, E. Opiyo
Model coding
Documenting the Model and the Simulation Project
The important documents
●The results format: interpretation of results;

●Using meaningful names for

components, variables and attributes;


●Comments and notes in the code;

●The visual display of the model.

SCI-UON, June/July 2011, E. Opiyo


Model coding
Documenting the Model and the Simulation Project
The components of project documentation
➢The project specification

➢Minutes of meetings

➢Verification and validation performed

➢Experimental scenarios run

➢Results of experiments (associated with each scenario)

➢Final report

➢Project review

SCI-UON, June/July 2011, E. Opiyo


Model coding
Documenting the Model and the Simulation Project

✗Keep a record of all the


experimental scenarios run and of
the results associated with them;
Provide a user guide to the users

by the time everything is handed


over to the users.
SCI-UON, June/July 2011, E. Opiyo
Model coding
Documenting the Model and the Simulation Project
The structure of the user guide
The project specification: providing background to
the project and the conceptual model.
Input data: interpretation and sources of data.
Experimental factors: their meanings, range and how to
change them.
Guide to running the model.
Results: accessing and interpreting results.

SCI-UON, June/July 2011, E. Opiyo


End of Week 8 Exercises
1.Discuss the importance of data in modeling and simulation.
2.Discuss how data is obtained and how problems are handled in a
simulation study.
3.A simulation model is being developed of the check-in area at an airport.
Different airlines follow slightly different procedures and within airlines the staff
have quite different levels of experience. Discuss the problems that might be
encountered in collecting the data on check-in times.
4.Which statistical distributions would be most appropriate for modeling the
following processes?
i. Process A: the weight of a bottle leaving a filling process
ii.Process B: the time between failure of a machine
iii.Process C: the check-in time at an airport
5.Explore the distributions in any selected simulation software or
spreadsheet.
6.Discuss how you would go about moving from a conceptual model to a
computer model.
7.Discuss the importance of documentation and different types of
documentation.
8.Select and then develop the simulation model and code in the software of your choice in
an area of your choice. SCI-UON, June/July 2011, E. Opiyo

You might also like