You are on page 1of 8

Nenu Anda Roxana

341C5

Simulation
Simulation is a technique for computer systems performance analysis. If we wish to predict some
aspect of the performance of a computer system and the real machine is not available, as is often the
case during design or procurement stage, a simulation model provides an easy way to predict the
performance or compare several alternatives.
Common mistakes in simulation
Simulation allows a system to be studied in more detail than analytical modeling. Analysis requires
several simplifications and assumptions. In a simulation model, the level of detail is limited only by the
time available for simulation development. There is, in general a trade-off between the accuracy of a
simulation and the time required to write the simulator and execute the necessary simulations. A more
detailed simulation requires more time to develop. The likelihood of bugs increases and it becomes
harder to spot them. The debugging time increases. A more detailed simulation also requires a greater
computer time to execute. When developing a simulator, it is important to explicitly consider the
consequences and trade-offs between the level of detail necessary in order to make the desired decision,
and the consequences of being wrong. It is generally assumed that a more detailed model is a better
model, since it makes fewer assumptions. This is not always true. A detailed model may require more
detailed knowledge of input parameters, which, if not available, may make the model inaccurate. It
could take too much time to develop. It is better to start with a less detailed model, get some results,
study sensitivities, and introduce details in the areas that have the highest impact on the results.
Selecting a proper language is probably the most important step in the process of developing a
simulation model. An incorrect decision during this step may lead to long development times,
incomplete studies, and failures. Simulation languages save the analyst considerable time when
developing a simulation. These languages have built-in facilities for time advancing, event scheduling,
entity manipulation, random-variate generation, statistical data collection, and report generation. They
allow analysts to spend more time on issues specific to the system being modeled and not worry about
issues that are general to all simulations. A general-purpose language is chosen for simulation primarily
because of an analysts familiarity with the language. Most computer system designers and new
analysts are not familiar with simulation languages. Also, deadline requirements do not allow time for
them to learn a simulation language. Further, simulation languages are often not available on their
computer systems. This is why most people write their first simulation in a general-purpose language.
Even for beginners, the time trade-off between a simulation language and a general-purpose language is
really not what it appears. If they choose a simulation language, they have to spend time learning the
language. In some cases, they may even have to install it on their computer system and see that no
pieces are missing. If they choose a general-purpose language, they can get started right away. But they
spend time developing routines for event handling, random-number generation, and so forth.
Considerable time may be spent in learning about these issues and rediscovering known problems.
This is not to say that analysts should always use simulation languages. There are other considerations,
such as efficiency, flexibility, and portability, which may make a general-purpose language the only
choice. A model developed in a general-purpose language is more efficient and takes less CPU time. A

general-purpose language gives analysts more flexibility by allowing them to take short cuts prohibited
in a simulation language. Furthermore, a model developed in a general-purpose language can be easily
converted for execution on different computer systems. To make an objective choice between a
simulation language and a general-purpose language, it is suggested that the analyst learn at least one
simulation language so that other factors in addition to familiarity will help in the selection of the
language.
The first and foremost cause of failures of simulation models is that the model developers
underestimate the time and effort required to develop a simulation model. It is common for simulation
projects to start off as a one-week or one-month project and then continue for years. If a simulation
is successful and provides useful information, its users want more features, parameters, and details to
be added. On the other hand, if a simulation does not provide useful information, it is often expected
that adding more features, parameters, and details will probably make it useful. In either case, the
project continues far beyond initial projections. Among the three performance analysis techniques
modeling, measurement, and simulationthe simulation generally takes the longest time, particularly
if a complete new model has to be developed from scratch.
Other mistake would be an incomplete mix of essential skills. A simulation project requires at least the
following four areas of skills:
(a) Project Leadership: The ability to motivate, lead, and manage the members of the simulation team.
(b) Modeling and Statistics: The ability to identify the key characteristics of the system and model
them at the required level of detail.
(c) Programming: The ability to write a readable and verifiable computer program that implements the
model correctly.
(d) Knowledge of the Modeled System: The ability to understand the system, explain it to the modeling
team, and interpret the modeling results in terms of their impact on the system design. A simulation
team should have members with these skills and be ideally led by a member who has some knowledge
of all of the skills.
It is essential that the modeling team and the user organizations meet periodically and discuss the
progress, problems, and changes, if any, in the system. Most systems evolve or change with time, and a
model developed without end user participation is rarely successful. The periodic meetings help point
out the modeling bugs at an early stage and help keep the model in sync with changes in the system.
Most simulation models evolve over a long period of time and are continuously modified as the system
is modified or better understood. Documentation of these models often lags behind the development
and, unless special care is taken to keep it up to date, it soon becomes obsolete. The best strategy is to
include the documentation in the program itself and to use computer languages that are easier to read.
Types of simulations
Emulation
An emulator program is a simulation program that runs on some existing system to make the system
appear to be something else. Since the goal of emulation is to make one type of system appear to be
another type, an emulation program typically sacrifices performance for flexibility. A processor
emulator emulates an instruction set of one processor on another. The Java Virtual Machine is an
example of a processor emulator.

Monte Carlo Simulation


A static simulation or one without a time axis is called a Monte Carlo simulation. Such simulations are
used to model probabilistic phenomenon that do not change characteristics with time. When you
develop a forecasting model any model that plans ahead for the future you make certain
assumptions. These might be assumptions about the investment return on a portofolio, the cost of a
construction project, or how long it will take to complete a certain task. Because these are projections
into the future, the best you can do is estimate the expected value. You can't know with certainty what
the actual value will be, but based on historical data, or expertise in the field, or past experience, you
can draw an estimate. While this estimate is useful for developing a model, it contains some inherent
incertainty and risk, because it's an estimate of an unknown value. When you have a range of values as
a result, you are beginning to understand the risk and uncertainty in the model. The feature of a Monte
Carlo simulation is that it can tell you based on how you create the ranges of estimates, how likely
the resulting outcomes are. In a Monte Carlo simulation, a random value is selected for each of the
tasks, based on the range of estimates; the model is calculated based on this random value. The result of
the model is recorded, and the process is repeated. These results are used to describe the likelihood, or
probability, of reaching various results in the model. A typical Monte Carlo simulation calculates the
model hundreds or thousands of times, each time using different randomly-selected values.
As an example, consider the problem of numerically determining the value of . Since the area of a
circle with a radius of 1 is , the area of a quarter-circle
within the first quadrant is /4. The area contained within
the unit square in this quadrant is simply 1. Thus, the ratio
of the area of the quarter-circle to the area of the square,
which we denote R, is R= /4. So = 4R. We have now
transformed the problem of computing the numerical value
of intto the equivalent geometric problem of determining
the ratio of the two areas, R. A Monte Carlo simulation
can be used to find R by modeling an equivalent physical
system. Imagine throwing darts randomly at out figure,
such that every dart hits within the unit square. After
throwing a large number of darts, we count the number of
times a dart hit within the quarter-circle, ncirc, and the total
number of darts thrown, ntotal. Then the desired ratio of the
two areas is R = ncirc / ntotal. We can simulate this dart-throwing experiment by generating two random
numbers, u1 and u2, for each dart thrown, such that u1 and u2 are both uniformly distributed(equally
likely to be observed) between 0 and 1. If the distance from the origin of the point defined by (x, y) =
2
2
(u1, u2) is smaller than the radius of the circle, that is, u 1+u 2<1 , then the simulated dart has hit
within the quarter-cicle. By repeating this process a large number of times, we can theoretically
compute the value of to any level of precision desired.
Trace-Driven Simulation
A simulation using a trace as its input is a trace-driven simulation. A trace is a time-ordered record of
events on a real system. Trace-driven simulations are quite common in computer system analyses. They
are generally used in analyzing or tuning resource management algorithms. Paging algorithms, cache
analysis, CPU scheduling algorithms, deadlock prevention algorithms, and algorithms for dynamic
allocation of storage are examples of cases where trace-driven simulation has been successfully used

and documented in the literature. In these studies, a trace of the resource demand is used as input to the
simulation, which models different algorithms. For example, in order to compare different memory
management schemes, a trace of page reference patterns of key programs can be obtained on a system.
This trace can then be used to find the optimal set of parameters for a given memory management
algorithm or to compare different algorithms. It should be noted that the traces should be independent
of the system under study. For example, a trace of pages fetched from a disk depends upon the working
set size and page replacement policy used. This trace could not be used to study other page replacement
policies. For that, one would need a trace of pages referenced. Similarly, an instruction trace obtained
on one operating system should not be used to analyze another operating system.
Discrete-event simulation
A descrete-event simulator is used to model a system whose global state changes as a function of time.
The state may also be affected by events that are generated externally to the simulator as well as those
that are spawned within the simulator by the processing of other events. The basic idea is that the
global state is appropriately updated every time some event occurs. While the specific details of every
simulator will be unique, descrete-event simulators all share a similar overall structure. Each descreteevent simulator will require at least some of the following components:
an event-scheduler : the heart of a discrete-event simulator. It keeps a linked list of events
waiting to happen. The scheduler allows the events to be manipulated in various ways. Some of these
manipulation activities are : schedule event X at time T, hold event X for a time interval dt, cancel a
previously scheduled event X, hold event X indefinitely (until it is scheduled by another event),
schedule an indefinitely held event.
simulation clock and a time-advancing mechanism : Each simulation has a global variable
representing simulated time. The scheduler is responsible for advancing this time. There are two ways
of doing this. The first way, called the unit time approach, increments time by small increment and
then checks to see if there are any events that can occur. The second approach, called the event-driven
approach, increments the time automatically to the time of the next earliest occurring event.
system state variables: These are global variables that describe the state of the system. For
example, in the CPU scheduling simulation, the system state variable is the number of jobs in the
queue. This is a global variable that is distinct from local variables such as CPU time required for a job,
which would be stored in the data structure representing the job.
event routines: Each event is simulated by its routine. These routines update the system state
variables and schedule other events. For example, in simulating a CPU scheduling mechanism, one
might need routines to handle the three events of job arrivals, job scheduling, and job departure.
input routines: These get the model parameters, such as mean CPU demand per job, from the
user. It is better to ask for all input at the beginning of a simulation and then free the user, since
simulations generally take a long time to complete. The input routines typically allow a parameter to be
varied in a specified manner. For example, the simulation may be run with mean CPU demand varying
from 1 to 9 milliseconds in steps of 2 milliseconds. Each set of input values defines one iteration that
may have to be repeated several times with different seeds. Thus, each single execution of the
simulation consists of several iterations, and each iteration consists of several repetitions.
report generator: These are the output routines executed at the end of the simulation. They
calculate the final result and print in a specified format.
initialization routines: These set the initial state of the system state variables and initialize
various random-number generation streams. It is suggested that there be separate routines to initialize
the state at the beginning of a simulation, at the beginning of an iteration, and at the beginning of a
repetition.

trace routines: These print out intermediate variables as the simulation proceeds. They help
debug the simulation program. It is advisable that the trace have an on/off feature so that it can be
turned off for final production runs of the model. A model may even allow the ability to interrupt the
execution of the model from the keyboard and turn the trace on or off.
dynamic memory management: The number of entities in a simulation changes continuously as
new entities are generated and old ones are destroyed. This requires periodic garbage collection. Most
simulation languages and many general-purpose languages provide this automatically. In other cases,
the programmer has the burden of writing codes for dynamic memory management.
main program: This brings all the routines together. It calls input routines, initializes the
simulation, executes various iterations, and finally, calls the output routines.
Random-number generation
One of the key steps in developing a simulation is to have a routine to generate random values for
variables with a specified random distribution, for example, exponential and normal. This is done in
two steps. First, a sequence of random numbers distributed uniformly between 0 and 1 is obtained.
Then the sequence is transformed to produce random values satisfying the desired distribution. The
first step is called random-number generation and the second random-variate generation.
Sequence of random numbers are necessary to drive simulators. However, generating random-number
sequence is not as easy it might seem at first glace. For instance, there would seem to be an inherent
contradiction in using a deterministic algorithm to generate a sequence of numbers that is, by
definition, nondeterministic. In fact, to allow us to exactly repeat a simulation (for testing, for
example), we do not really want truly random numbers. Instead, a variety of techniques to generate
pseudorandom-number sequences has been developed. A good-pseudorandom-number generator
should have the following properties.
It should be efficient. Since a typical simulation will require a large number of random values,
the generator shoud be easy to compute efficiently.
It should have a long period. The sequence of random values generated by a finite
algorithmmust necessarily be finite. That is, the sequence will repeat with some period k, such
that x n+k =x n , xn +k+1 =xn +1 , x n+k +2=x n+2 , ... . To make the sequence appear as random as
possible, we would like the period k to be as large as possible.
Its values should be independent and uniformely distributed. The values produced should
appear to be uniformely distributed in the interval [0, 1). That is, every value should have the
same likelihood of appearing in the sequence.
It should be repeatable. To facilitate testing in out simulator , and to allow the direct comparison
of different simulation configurations being driven by the same sequence of random values, we
would like the generator to be able to reproduce exactly the same sequence as that it produced
at some previous time.
Linear-Congruential Generator (LCG)
desirable properties.
x n=(a x n1+ b)mod m
nonnegative.

Where

is one of the simplest generators that exhibits the above


x n 's are integers between 0 and m-1. Constants a and b are

In general, the choice of a, b, and m affects the period and autocorrelation in the sequence. A number of

researchers have studied such generators, and the results of their studies can be summarized as follows:
1. The modulus m should be large. Since all xs must be between 0 and m - 1, the period can never be
more than m.
2. For mod m computation to be efficient, m should be a power of 2, that is, 2k. In this case, mod m can
be obtained by truncating the result to the right by k bits.
3. If b is nonzero, the maximum possible period m is obtained if and only if
(a) integers m and b are relatively prime, that is, have no common factors other than 1;
(b) every prime number that is a factor of m is also a factor of a - 1; and
(c) a - 1 is a multiple of 4, if integer m is a multiple of 4
A generator that has the maximum possible period is called full-period generator. All full-period
generators are not equally good. Generators with lower autocorrelation between successive numbers
are preferable. If the increment b is zero, no addition is involved and the generator is called a
multiplicative LCG: x n=a x n1 mod m . It is obvious that multiplicative LCG are more efficient in
terms of processor time required for computation. Further efficiency can be obtained by choosing m to
be a power of 2 so that the mod opertion is trivial.
Multiplicative LCG with m=2k
The key argument in favor of choosing m=2k is the ease of the mod operation. However, such
generators do not have a full period. The maximum possible period for a multiplicative LCG with
modulus m=2k is only one-fourth the full period, that is, 2k2 . This period is achieved if the
multiplier a is of the form 8i 3 and the initial seed is an odd integer.
Consider the following multiplicative LCG:
5
x n=5 xn1 mod 2
Using a seed of x 0=1 , we get the sequence 5, 25, 29, 17, 21, 9, 13, 1, 5,.... The period is 8, which is
one-fourth the maximum possible 32.
If we change the seed to x 0=2 , the sequence is 10, 18, 26, 2, 10, .... Here, the period is only 4. Thus,
choosing an odd seed is important in this case.
To see what happens if the multiplier is not of the form 8i 3, consider the LCG
x n=7 x n1 mod 25
Using a seed of x 0=1 , we get the sequence 7, 17, 23, 1, 7, .... Again, the period is only 4. Thus, both
conditions are necessary to achieve the maximum period.
Although the maximum period possible from a multiplicative LCG with m=2k is only one-fourth
the maximum possible, the resulting period may not be too small for many applications. In such cases,
it may be more efficient to use a multiplicative generator than a mixed generator.
Testing Random-Number Generators
Analysts using a simulation should ensure that the random-number generator used in the model
produces a sufficiently random stream. The very first step in testing any random-number or randomvariate generation algorithm is to plot and look at the histogram and cumulative frequency
distributions.
Chi-Square Test
This is the most commonly used test to determine if an observed data set satisfies a specified

distribution. The test is general and can be used for any distribution. It can be used for testing random
numbers, that is, independently and identically distributed (IID) U(0, 1), as well as for testing randomvariate generators. A histogram of the observed data is prepared, and the observed frequencies are
compared with those obtained from the specified density function. Suppose the histogram consists of k
cells, and oi and e i are the observed and expected frequencies for the ith cell. Then the test
consists of computing
2
k
(oiei )
D=
ei
i=1
For an exact fit, D should be zero. However, due to randomness, the D would be nonzero.
Verification and validation of simulations
The quality of the result obtained from any simulation of a system is fundamentally limited by the
quality of the assumptions made in developing the simulation model, and the corectness of the actual
implementation. Throughout the development of the simulator, you should bear in mind how your
assumptions impact the reasonableness of your simulation model.
Validation
The validation process attempts to ensure that your simulator accurately models the desired system.
That is, validation attempts to determine how close the results of the simulation are to what would be
produces by an actual system. The types of questions that need to be addressed in this process include
the following : is this a good model, are the assumptions reasonable, are the input distributions a good
representation of what would be seen in practice, are the output results reasonable, are the results
explainable? There are three approaches that can be used : comparisons with a real system, comparison
with an analytical model and engineering judgement.
Verification
Verification is the process of determining that your simulator actually implements the desired model. It
is not concerned with whether the model is correct, but, rather, whether the given model is
implemented correctly.

Documentation
http://catdir.loc.gov/catdir/samples/cam032/99057225.pdf
Measuring Computer Performance by Davil J. Lilja
http://www.ensenadamexico.net/hector/mest/Art_Of_Computer_Systems_Performance_Analysis_Tech
niques_For_Experimental_Measurements_Simulation_And_Modeling-Raj_Jain.pdf
Art of Computer Systems Performance Analysis Techniques for Experimental Design Measurements
Simulation and Modelling by Raj Jain

You might also like