CPSC/MATH 402 - Investigating Floating Point Representation and Arithmetic

Log into the Linux system to do this exercise. This handout and some programs are on-line -- go to http://cs.roanoke.edu/Spring2006/CPSC402A and follow the link.

  1. Recall that in a floating point system F(b,k,m,M) machine precision is b1-k if chopping is used to represent a real number and 1/2 * b1-k if rounding is used. This number is the maximum possible relative error due to roundoff when approximating a real number x by fl(x). The IEEE Standard Floating Point number system, the system used by the lab computers, is F(2, 24, -125, 128) for single precision and F(2, 53, -1021, 1024) for double precision.
    1. Calculate machine precision for each of the following:
      
      Single Precision (Chop)  ______________________________________
      
      Single Precision (Round) ______________________________________
      
      Double Precision (Chop)  ______________________________________
      
      Double Precision (Round) ______________________________________
      
      
    2. Based on your answers how many significant decimal digits (base 10) are there for each of the above?

    3. The file Precision.java is a Java program that determines approximations of machine precision (machine epsilon) for type float and type double using two different techniques. One is by computing the quantity
          | 3 * (4/3 - 1) - 1 |
      
      and the other is the alternate characterization of machine epsilon described on page 39, problem 3 (as the smallest positive number epsilon such that fl(1 + epsilon) > 1).
      Save the program to your directory (you can do this with the File/Save As feature in the browser).

    4. Now compile and run the program. The commands to do this are as follows:
                javac Precision.java      -- compiles to bytecode
                java Precision            -- interprets/runs the bytecode
      

    5. Compare the output of the program to your calculations.

    6. Try the 4/3 approximation above on your TI-89 calculator. What does it give as an approximation of machine precision? What does that say about the number of significant digits stored in your calculator?

    7. Explain why the approximation (using 4/3) works.

  2. The program Increment.java illustrates the hazards of testing for equality of two floating point numbers. Save it to your directory and read the documentation and the code to see what it does.
    1. Compile and run the program with the following input: (a) n = 8; (b) n = 64; (c) n = 1024; (d) n = 10. What happened and why was the behavior different for different input?

    2. Change the loop control to total > epsilon (epsilon is already computed -- it is an approximation to machine epsilon). Rerun the program for the input above plus other values. Does it work correctly?

  3. Smallest Postive IEEE Floating Point Number
    1. In theory what is the smallest positive IEEE floating point number in:
      Single Precision       __________________________________________
      
      Double Precision       __________________________________________
      
    2. The program UFL.java (underflow level) computes the smallest positive power of 2 in both single precision (using type float) and double precision (using type double) by starting at 1 and dividing by 2 until 0 is reached (remember that when underflow is reached the computer treats the number as 0). The program also prints the constants MIN_VALUE in the Double and Float wrapper classes. Save the program, compile, and run it.

    3. How do the results of the program compare to the theoretical results? There is an explanation for the discrepancy - you'll examine it in homework.

  4. Largest Positive IEEE Floating Point Number
    1. In theory, what is the largest positive IEEE floating point number in:
      Single Precision      _____________________________________________________
      
      Double Precision      _____________________________________________________
      
      
    2. The program OFL.java (Overflow Level) just prints out the constants MAX_VALUE in the Double and Float wrapper classes. Save, compile and run the program. Compare the answers to your calculations.

    3. Add a loop to the program after the values currently printed to compute and print double precision powers of 2 from 2^0 to 2^1040 (print both the exponent and the power of 2). Compile and run the program. What happens when overflow occurs?

    4. Did overflow occur at the power of 2 expected? Explain.

  5. The program Poly.java computes values of the polynomial p(x) = (x - 1)^6 in two different ways for values of x near 1 (specifically it starts at x = 0.995 and goes in increments of 0.0001 up to 1.005.
    1. Save the program, compile it and run it. Study the output. Notice the difference in answers for the two different calculations. Also notice the computed value of (x - 1)^6 for x = 1.

    2. The problem at x = 1 is from accumulated roundoff in computing the value of x - each new x is computed from the preceding x so roundoff error builds up. An alternate approach to generating n equally spaced points between numbers a and b is to compute the increment h = (b - a)/n, then the points are x = a + i*h for i = 0, 1, 2, ..., n. Modify the code in Poly.java to compute the values of x in this way. Compile and run the program and compare the results.

    3. The expanded form of the polynomial did not fare so well in the above calculations. In general, a polynomial should not be evaluated in expanded form but not all polynomials have nice factored forms. The recommended method of evaluation (in the absence of a nice factorization) is called Horner's method (we discussed this in problem 14 on page 18). Horner's method uses a nested evaluation. Add code to the program to evaluate the polynomial using nested evaluation (assign to poly3) and print out the result (as a fourth column).

    4. Compare the results of the three methods of calculating the value of the polynomial. NOTE: One way to get a printout of output to examine is to create a logfile (or script - the file will contain everything that appears on your xterm from the time you start the script until you stop it). To start a logfile type:
         script
      or
         script filename
      
      at the command prompt. In the first case a script file named typescript is created; in the second the name of the file is whatever you specify.

      Run your program - the output going to your screen is also going to the logfile. Type exit to close the script.

HOMEWORK DUE Friday, February 3, 2006:

  1. Complete the lab!
  2. Would the approximation of machine epsilon using 4/3 work in a machine that used base 3? Explain your answer.
  3. Do #3 on page 39. That is, explain why the other technique gives machine precision.
  4. The UFL.java program produced smaller values of the underflow level than theory predicts. That is because the IEEE Floating Point System uses a concept called subnormals or gradual underflow. Subnormals are non-normalized numbers - they have the smallest exponent but the mantissa is not normalized. For example in the floating point system F(10, 3, -1, 4), the smallest number in theory is 0.100 x 10^-1. But with subnormals numbers such as 0.010 x 10^-1 (which equals 10^-3), 0.058 x 10^-1, and 0.001 x 10^-1 (= 10^-4) can be represented. Calculate the smallest single precision and smallest double precision positive floating point numbers in IEEE Floating Point System with subnormals. (Show how you are doing this!) Do your results agree with the program's results?
  5. Why did the input 10 cause problems in the Increment.java program but 8, 64, and 1024 worked as expected?
  6. Discuss the behavior for the three different methods for computing (x - 1)^6 for x near 1. Compare the behavior of the methods for x really close to 1 versus a bit further away. That is, for which values of x is each method most accurate versus for which is each least accurate.
  7. Write a program that implements two iterative methods for approximating the square root of a positive number a. For the first use the algorithm on page 15. For the second (make this a separate loop), implement the recursive formula in problem #4 on page 16 (later revisited in problem #11 on page 28). Run your program with different values of a and a convergence parameter nearly as small as machine epsilon. Examine the output and determine whether or not it appears the order of convergence agrees with the theoretical order for each algorithm.
Also hand in a copy of your program that implements the iterative algorithms for approximating the square root of a number and a logfile (script) showing the output.