CPSC 420 -- Lab Exercise
Observing Randomness of Simulation Output
The purpose of this lab exercise is for you to make observations
about simulation output. You will run a C++ version of a simulation
of a single server queuing model and examine the results. A link to
this handout is at http://cs.roanoke.edu/Spring2005/CPSC420A.
You will be
able to get the files you need from there.
Systems one wishes to simulate are usually stochastic; that is they
have some random components. Thus the simulation models of the
systems are also usually stochastic. The randomness in the models
mean that each time you run the simulation (each replication) you
get different answers (just as each time you go to the fast food
restaurant you have to wait a different amount of time to get
served). We think of each run of a simulation as a statistical
experiment. Just as you don't roll a pair of dice once to estimate
the probability of snake eyes (or to try to determine if the dice
are fair as opposed to loaded), you don't run a simulation just
once to come up with an "answer" such as the average length of
a queue. One of the pitfalls of simulation listed on page 93 of
the text is treating the output of a single replication (single
run) as the "true answer." With any luck, after doing this
exercise you will believe that is indeed a pitfall!
A simple MM1 queuing system has an analytic, closed form solution: If
E(A) denotes the mean interarrival time and E(S) is the mean service time,
then the following formulas hold (we'll discuss this in class soon):
utilization: rho = E(S) / E(A)
average queue length: rho^2 / (1 - rho)
average delay in the queue: rho * E(S) / (1 - rho)
Your textbook generally uses a mean interarrival time of 1 minute and
a mean service time of 0.5 minutes in its examples. For those times
we get the following theoretical results:
utilization: rho = 0.5/1 = 0.5 (meaning the server is busy 50% of the time)
average queue length: (0.5)2 / (1 - 0.5) = 0.5
average delay: 0.5 * 0.5 / (1 - 0.5) = 0.5
For a mean interarrival time of 4 and a mean service time of 3 we get
the following theoretical results:
utilization: rho = 3/4 = .75 (meaning the server is busy 75% of the time)
average queue length: (0.75)2 / (1 - 0.75) = 2.25
average delay: 0.75 * 3 / (1 - 0.75) = 9
These numbers represent the "true" averages for the system but for
any given run of a simulation of the system you will not get these
numbers exactly (as for any given day in the system the averages for customers
that day will not be exactly these numbers). The question is how close
are the numbers you get and how do you use the simulation program to
get an accurate estimate of the "true" values.
Using the C++ Program: The version of the C++ program that
simulates the single server queuing system is written to run the
simulation just once (there is another version in the file driverE.cc that
adds a loop and will let you "replicate"
the simulation - that is run it several different times and write the
results to a file).
Different replications of the simulation differ only in the times
that customers come into the system and the amount of time each
needs for service. These times are determined by random numbers. As
you may recall, in a computer "random" numbers are actually generated
by some sort of formula (we'll learn more details later in the course).
The basic idea is to start with an initial value (a "seed"), plug it
into the formula to get the next random value, plug that in to get
the next and so on. This generates what we call a stream of
random numbers. It is very important that you start with a good
seed (to maximize the chance of a stream of numbers that does have
random-appearing properties). Hence, simulation programs don't leave
it up to the user to choose the seed -- "good" seeds are stored somewhere
in the program (in an array in this program) and the user has a choice
of which one to start with (which "stream" -- in this case an index
into the array).
The program initially requests five inputs:
- the mean interarrival time for
customers,
- a stream number for interarrival times,
- the mean service time for customers,
- a stream number for the service times,
- the number of customers to run through the system (the
simulation stops when this many customers have completed their delay in the
system).
Getting Set Up
Get into Linux, make a subdirectory
for this class, then do the following:
-
Copy the tar file into your subdirectory using one of
the following techniques:
- From Mozilla: Go to the course home page at
http://cs.roanoke.edu/Spring2005/CPSC420A,
click on the link to this exercise, then
click on the link mm1tar.tgz
to the tar file. Choose File, Save As, to
save it to your directory. OR
- Use the copy command: cp ~cpsc/public_html/Spring2005/CPSC420A/mm1tar.tgz .
- Uncompress the files using the command
tar xzf mm1tar.tgz
Look in your directory. You should see that you now have an mm1 subdirectory.
Get into this subdirectory and see what's there. It should be the .h and .cc
files plus a Makefile.
- The name of the executable in the Makefile is runmm1 so you
can compile and link the program with the command make runmm1
- To run the program use the command
./runmm1
Do the following:
- Run the program 10 times with a mean interarrival time of 1,
a mean service time of 0.5 and 480 customers. For each run choose
different random number streams. Record your results on the sheets
given out.
- Find the maximum, minimum and average for each performance
measure (average delay and average queue length) from the above data.
- Compare the numbers to the theoretical values computed above. How
do the results from a single simulation (replication) compare? How
about the average over the 10 replications?
- Now run the program 20 more times. Each
time use a mean interarrival time of 1.0 minutes and a mean service time
of 0.5 minutes. For the first 10 runs use 2000 customers and for the second
use 10000 customers.
- For the data generated by the 10 runs with 2000 customers, compute the
maximum, minimum, and average of both performance measures.
- Do the same calculations for your 10 runs with 10000 customers.
- In each case (2000 customers and 10000 customers), how do the
results from individual runs compare to the theoretical answers and
how does the average over all runs (for a given number
of customers) compare to the theoretical
answer?
- Often data is summarized using an average value but that single number
doesn't tell the whole story. For example, I could tell you that the average
test grade in a class of 25 students was 75. That sounds "normal" but
the average could have come about in many different ways, each of which
would give you a totally different idea about the class and/or the test.
For example, maybe all students made between 70 and 80; or
maybe the grades were fairly uniformly spread out between 50 and 100; or
maybe most of the grades were between 65 and 85 but there was one grade
of 20 and a couple of grades in the high 90s; or maybe most of the
grades were between 60 and 70 but there were several 100s. The point is that
the way the data is dispersed about the mean is needed to "understand"
the data. An indication of the "spread" of the data is given by
the standard deviation or the variance in statistics.
Describe the spread of the data for the three different cases above (480
customers vs. 2000 customers vs. 10000 customers). What differences
do you notice?
-
Compute the theoretical utilization, average queue
length, and average delay in the queue for a mean interarrival time
of 3 and a mean service time of 2.
-
Run the program 30 more times for the following input: Mean iat: 3 minutes,
Mean Service Time: 2 minutes. For the first 10 runs use 480 customers,
the second use 2000 and the last use 10000.
- For each of the above compute the maximum, minimum, and average
of both performance measures.
- Write a description of your observations (as you did above -- compare
the results in each case with the theoretical answers and compare the
results for different numbers of customers going through the system).
Hand In: The pages containing output plus your written
analyses.