CPSC 420 -- Lab Exercise

Observing Randomness of Simulation Output

The purpose of this lab exercise is for you to make observations about simulation output. You will run a C++ version of a simulation of a single server queuing model and examine the results. A link to this handout is at http://cs.roanoke.edu/CPSC420A. You will be able to get the files you need from there.

Systems one wishes to simulate are usually stochastic; that is they have some random components. Thus the simulation models of the systems are also usually stochastic. The randomness in the models mean that each time you run the simulation (each replication) you get different answers (just as each time you go to the fast food restaurant you have to wait a different amount of time to get served). We think of each run of a simulation as a statistical experiment. Just as you don't roll a pair of dice once to estimate the probability of snake eyes (or to try to determine if the dice are fair as opposed to loaded), you don't run a simulation just once to come up with an "answer" such as the average length of a queue. One of the pitfalls of simulation listed on page 93 of the text is treating the output of a single replication (single run) as the "true answer." With any luck, after doing this exercise you will believe that is indeed a pitfall!

A simple MM1 queuing system has an analytic, closed form solution: If E(A) denotes the mean interarrival time and E(S) is the mean service time, then the following formulas hold (we'll discuss this in class soon):

       utilization:           rho = E(S) / E(A)
       average queue length:  rho^2 / (1 - rho)
       average delay in the queue: rho * E(S) / (1 - rho)

Your textbook generally uses a mean interarrival time of 1 minute and a mean service time of 0.5 minutes in its examples. For those times we get the following theoretical results:

    utilization:   rho = 0.5/1 = 0.5 (meaning the server is busy 50% of the time)
    average queue length:  (0.5)2 / (1 - 0.5) = 0.5
    average delay:  0.5 * 0.5 / (1 - 0.5) = 0.5

These numbers represent the "true" averages for the system but for any given run of a simulation of the system you will not get these numbers exactly (as for any given day in the system the averages for customers that day will not be exactly these numbers). The question is how close are the numbers you get and how do you use the simulation program to get an accurate estimate of the "true" values.

Using the C++ Program: The version of the C++ program that simulates the single server queuing system is written to make it easy for you to analyze data from several replications of the simulation (simulating the system several different times). Different replications of the simulation differ only in the times that customers come into the system and the amount of time each needs for service. These times are determined by random numbers. As you may recall, in a computer "random" numbers are actually generated by some sort of formula (we'll learn more details later in the course). The basic idea is to start with an initial value (a "seed"), plug it into the formula to get the next random value, plug that in to get the next and so on. This generates what we call a stream of random numbers. It is very important that you start with a good seed (to maximize the chance of a stream of numbers that does have random-appearing properties). Hence, simulation programs don't leave it up to the user to choose the seed -- "good" seeds are stored somewhere in the program (in an array in this program) and the user has a choice of which one to start with (which "stream" -- in this case an index into the array).

The program initially requests five inputs: the name of a file for the output (the data is written to a file so it can be analyzed later), the mean interarrival time for customers, the mean service time for customers, the number of customers to run through the system (the simulation stops when this many customers have completed their delay in the system), and finally the number of times to run the simulation (this is the number of replications of the "experiment"). Then for each replication, the program asks for two stream numbers -- valid stream numbers are in the range 1 - 100, inclusive. The accompanying handout shows a run of the program where the simulation is replicated two times.

The output file contains one row for each replication and four columns of numbers in each row -- the first two numbers are stream numbers, the third is the mean delay and the fourth is the mean queue length. See the accompanying handout. The numbers are from using a mean interarrival time of 1.0 minutes, a mean service time of 0.5 minutes, and 480 customers completing their delay in the system.

Do the following:

  1. Find the maximum, minimum and average for each performance measure (average delay and average queue length) from the above data.
  2. Compare the numbers to the theoretical values computed above. How do the results from a single simulation (replication) compare? How about the average over the 30 replications?
  3. Now you will run the program several times and compare the results to the theoretical values.

    Getting Set Up

    Get into Linux, make a subdirectory for this class, then do the following:
  4. Now run the program two times generating two separate file. Each time use a mean interarrival time of 1.0 minutes and a mean service time of 0.5 minutes. For the first run use 2000 customers and for the second use 10000 customers. In each case perform 20 replications.
  5. For the data generated by each of the runs above, compute the maximum, minimum, and average of both performance measures.
  6. In general how do the results of an individual run compare to the theoretical answer for each of the three cases (480 customers, 2000 customers, 10000 customers)?
  7. How does the average over all runs compare to the theoretical answer?
  8. Often data is summarized using an average value but that single number doesn't tell the whole story. For example, I could tell you that the average test grade in a class of 25 students was 75. That sounds "normal" but the average could have come about in many different ways, each of which would give you a totally different idea about the class and/or the test. For example, maybe all students made between 70 and 80; or maybe the grades were fairly uniformly spread out between 50 and 100; or maybe most of the grades were between 65 and 85 but there was one grade of 20 and a couple of grades in the high 90s; or maybe most of the gradues were between 60 and 70 but there were several 100s. The point is that the way the data is dispersed about the mean is needed to "understand" the data. An indication of the "spread" of the data is given by the standard deviation or the variance in statistics. Describe the spread of the data for the three different cases above (480 customers vs. 2000 customers vs. 10000 customers). What differences do you notice?
  9. Compute the theoretical utilization, average queue length, and average delay in the queue for a mean interarrival time of 3 and a mean service time of 2.
  10. Run the program 3 more times for the following input: Mean iat: 3 minutes, Mean Service Time: 2 minutes. In the first case use 480 customers, the second use 2000 and the last use 10000. Run for 20 replications each time.
  11. For each of the above runs compute the maximum, minimum, and average of both performance measures.
  12. Write a description of your observations (as you did above -- compare the results in each case with the theoretical answers and compare the results for different numbers of customers going through the system).
Hand In: The five files containing output plus your written analyses (which can be written on the printouts of the files).