CPSC 420 -- Lab Exercise

Observing Randomness of Simulation Output

The purpose of this lab exercise is for you to make observations about simulation output. You will run a C++ version of a simulation of a single server queuing model and examine the results. A link to this handout is at http://cs.roanoke.edu/Spring2005/CPSC420A. You will be able to get the files you need from there.

Systems one wishes to simulate are usually stochastic; that is they have some random components. Thus the simulation models of the systems are also usually stochastic. The randomness in the models mean that each time you run the simulation (each replication) you get different answers (just as each time you go to the fast food restaurant you have to wait a different amount of time to get served). We think of each run of a simulation as a statistical experiment. Just as you don't roll a pair of dice once to estimate the probability of snake eyes (or to try to determine if the dice are fair as opposed to loaded), you don't run a simulation just once to come up with an "answer" such as the average length of a queue. One of the pitfalls of simulation listed on page 93 of the text is treating the output of a single replication (single run) as the "true answer." With any luck, after doing this exercise you will believe that is indeed a pitfall!

A simple MM1 queuing system has an analytic, closed form solution: If E(A) denotes the mean interarrival time and E(S) is the mean service time, then the following formulas hold (we'll discuss this in class soon):

       utilization:           rho = E(S) / E(A)
       average queue length:  rho^2 / (1 - rho)
       average delay in the queue: rho * E(S) / (1 - rho)

Your textbook generally uses a mean interarrival time of 1 minute and a mean service time of 0.5 minutes in its examples. For those times we get the following theoretical results:

    utilization:   rho = 0.5/1 = 0.5 (meaning the server is busy 50% of the time)
    average queue length:  (0.5)2 / (1 - 0.5) = 0.5
    average delay:  0.5 * 0.5 / (1 - 0.5) = 0.5

For a mean interarrival time of 4 and a mean service time of 3 we get the following theoretical results:
    utilization:   rho = 3/4 = .75 (meaning the server is busy 75% of the time)
    average queue length:  (0.75)2 / (1 - 0.75) = 2.25
    average delay:  0.75 * 3 / (1 - 0.75) = 9

These numbers represent the "true" averages for the system but for any given run of a simulation of the system you will not get these numbers exactly (as for any given day in the system the averages for customers that day will not be exactly these numbers). The question is how close are the numbers you get and how do you use the simulation program to get an accurate estimate of the "true" values.

Using the C++ Program: The version of the C++ program that simulates the single server queuing system is written to run the simulation just once (there is another version in the file driverE.cc that adds a loop and will let you "replicate" the simulation - that is run it several different times and write the results to a file). Different replications of the simulation differ only in the times that customers come into the system and the amount of time each needs for service. These times are determined by random numbers. As you may recall, in a computer "random" numbers are actually generated by some sort of formula (we'll learn more details later in the course). The basic idea is to start with an initial value (a "seed"), plug it into the formula to get the next random value, plug that in to get the next and so on. This generates what we call a stream of random numbers. It is very important that you start with a good seed (to maximize the chance of a stream of numbers that does have random-appearing properties). Hence, simulation programs don't leave it up to the user to choose the seed -- "good" seeds are stored somewhere in the program (in an array in this program) and the user has a choice of which one to start with (which "stream" -- in this case an index into the array).

The program initially requests five inputs:

Getting Set Up

Get into Linux, make a subdirectory for this class, then do the following:

Do the following:

  1. Run the program 10 times with a mean interarrival time of 1, a mean service time of 0.5 and 480 customers. For each run choose different random number streams. Record your results on the sheets given out.

  2. Find the maximum, minimum and average for each performance measure (average delay and average queue length) from the above data.

  3. Compare the numbers to the theoretical values computed above. How do the results from a single simulation (replication) compare? How about the average over the 10 replications?

  4. Now run the program 20 more times. Each time use a mean interarrival time of 1.0 minutes and a mean service time of 0.5 minutes. For the first 10 runs use 2000 customers and for the second use 10000 customers.

  5. For the data generated by the 10 runs with 2000 customers, compute the maximum, minimum, and average of both performance measures.

  6. Do the same calculations for your 10 runs with 10000 customers.

  7. In each case (2000 customers and 10000 customers), how do the results from individual runs compare to the theoretical answers and how does the average over all runs (for a given number of customers) compare to the theoretical answer?

  8. Often data is summarized using an average value but that single number doesn't tell the whole story. For example, I could tell you that the average test grade in a class of 25 students was 75. That sounds "normal" but the average could have come about in many different ways, each of which would give you a totally different idea about the class and/or the test. For example, maybe all students made between 70 and 80; or maybe the grades were fairly uniformly spread out between 50 and 100; or maybe most of the grades were between 65 and 85 but there was one grade of 20 and a couple of grades in the high 90s; or maybe most of the grades were between 60 and 70 but there were several 100s. The point is that the way the data is dispersed about the mean is needed to "understand" the data. An indication of the "spread" of the data is given by the standard deviation or the variance in statistics.

    Describe the spread of the data for the three different cases above (480 customers vs. 2000 customers vs. 10000 customers). What differences do you notice?

  9. Compute the theoretical utilization, average queue length, and average delay in the queue for a mean interarrival time of 3 and a mean service time of 2.

  10. Run the program 30 more times for the following input: Mean iat: 3 minutes, Mean Service Time: 2 minutes. For the first 10 runs use 480 customers, the second use 2000 and the last use 10000.

  11. For each of the above compute the maximum, minimum, and average of both performance measures.

  12. Write a description of your observations (as you did above -- compare the results in each case with the theoretical answers and compare the results for different numbers of customers going through the system).
Hand In: The pages containing output plus your written analyses.