CPSC/MATH 402 - Investigating Floating Point
Representation and Arithmetic
Log into the Linux system to do this exercise.
This handout and some programs are on-line --
go to http://cs.roanoke.edu/Spring2006/CPSC402A and
follow the link.
- Recall that in a floating point system F(b,k,m,M) machine
precision is b1-k if chopping is used to represent a
real number and 1/2 * b1-k if rounding is used. This number
is the maximum possible relative error due to roundoff when
approximating a real number x by fl(x).
The IEEE Standard Floating Point number system, the system used by
the lab computers, is F(2, 24, -125, 128)
for single precision and F(2, 53, -1021, 1024) for double precision.
- Calculate machine precision for each of the following:
Single Precision (Chop) ______________________________________
Single Precision (Round) ______________________________________
Double Precision (Chop) ______________________________________
Double Precision (Round) ______________________________________
- Based on your answers how many significant decimal digits
(base 10) are there for each of the above?
- The file Precision.java
is a Java program that
determines approximations of machine precision
(machine epsilon) for type float
and type double using two different techniques.
One is by computing the quantity
| 3 * (4/3 - 1) - 1 |
and the other is the alternate characterization of
machine epsilon described on page 39, problem 3 (as the smallest positive
number epsilon such that fl(1 + epsilon) > 1).
Save the program to your directory (you can do this with
the File/Save As feature in the browser).
- Now compile and run the program. The commands to do this
are as follows:
javac Precision.java -- compiles to bytecode
java Precision -- interprets/runs the bytecode
- Compare the output of the program to your calculations.
- Try the 4/3 approximation above on your TI-89 calculator. What
does it give as an approximation of machine precision? What does that
say about the number of significant digits stored in your calculator?
- Explain why the approximation (using 4/3) works.
- The program Increment.java illustrates the
hazards of testing for equality of two floating point numbers. Save
it to your directory and read the documentation and the code
to see what it does.
- Compile and run the program with the following input: (a) n = 8;
(b) n = 64; (c) n = 1024; (d) n = 10. What happened and why was the behavior
different for different input?
- Change the loop control to total > epsilon (epsilon
is already computed -- it is an approximation to machine epsilon).
Rerun the program for the input above plus other values. Does it
work correctly?
- Smallest Postive IEEE Floating Point Number
- In theory what is the smallest positive IEEE floating point
number in:
Single Precision __________________________________________
Double Precision __________________________________________
- The program UFL.java (underflow level)
computes the smallest
positive power of 2 in both single precision (using type float) and
double precision (using type double) by starting at 1 and dividing
by 2 until 0 is reached (remember that when underflow is reached the
computer treats the number as 0). The program also prints the constants
MIN_VALUE in the Double and Float wrapper classes.
Save the program, compile, and run it.
- How do the results of the program compare to the theoretical
results? There is an explanation for the discrepancy - you'll
examine it in homework.
- Largest Positive IEEE Floating Point Number
- In theory, what is the largest positive IEEE floating point number
in:
Single Precision _____________________________________________________
Double Precision _____________________________________________________
- The program OFL.java (Overflow Level)
just prints out the constants MAX_VALUE in the Double and Float
wrapper classes. Save, compile and run the program. Compare the answers
to your calculations.
- Add a loop to the program after the values currently printed to
compute and print double precision powers of 2 from 2^0 to 2^1040
(print both the exponent and the power of 2).
Compile and run the program. What happens when overflow occurs?
- Did overflow occur at the power of 2 expected? Explain.
- The program Poly.java computes values of
the polynomial p(x) = (x - 1)^6 in two different ways for values of x
near 1 (specifically it starts at x = 0.995 and goes in increments of
0.0001 up to 1.005.
-
Save the program, compile it and run it. Study the output.
Notice the difference in answers for the two different
calculations. Also notice the computed value of (x - 1)^6 for x = 1.
- The problem at x = 1 is from accumulated roundoff in computing the
value of x - each new x is computed from the preceding x so roundoff
error builds up. An alternate approach to generating n equally
spaced points between numbers a and b is to compute the increment
h = (b - a)/n, then the points are x = a + i*h for i = 0, 1, 2, ..., n.
Modify the code in Poly.java to compute the values of x in this
way. Compile and run the program and compare the results.
- The expanded form of the polynomial did not fare so well in the
above calculations. In general, a polynomial should not be evaluated
in expanded form but not all polynomials have nice factored forms.
The recommended method of evaluation (in the absence of a nice
factorization) is called Horner's
method (we discussed this in problem 14 on page 18). Horner's method
uses a nested evaluation.
Add code to the program to evaluate the polynomial using
nested evaluation (assign
to poly3) and print out the result (as a fourth column).
- Compare the results of the three methods of calculating the
value of the polynomial. NOTE: One way to get a printout of
output to examine is to create a logfile (or script - the file will
contain everything that appears on your xterm from the time
you start the script until you stop it). To start
a logfile type:
script
or
script filename
at the command prompt. In the first case a script file named
typescript is created; in the second the name of the file
is whatever you specify.
Run your program - the output going to your screen is also going to
the logfile. Type exit to close the script.
HOMEWORK DUE Friday, February 3, 2006:
- Complete the lab!
- Would the approximation of machine epsilon using 4/3 work in a
machine that used base 3? Explain your answer.
- Do #3 on page 39. That is, explain why the other technique
gives machine precision.
- The UFL.java program produced smaller values of the underflow
level than theory predicts. That is because the IEEE Floating Point
System uses a concept called subnormals or gradual underflow.
Subnormals are non-normalized numbers - they have the smallest
exponent but the mantissa is not normalized. For example in the
floating point system F(10, 3, -1, 4), the smallest number in theory
is 0.100 x 10^-1. But with subnormals numbers such as 0.010 x 10^-1
(which equals 10^-3), 0.058 x 10^-1, and 0.001 x 10^-1 (= 10^-4)
can be represented.
Calculate the smallest single precision and smallest double
precision positive floating point numbers in IEEE Floating Point
System with subnormals. (Show how you are doing this!)
Do your results agree with the program's results?
- Why did the input 10 cause problems in the Increment.java
program but 8, 64, and 1024 worked as expected?
- Discuss the behavior for the three
different methods for computing (x - 1)^6 for x near 1.
Compare the behavior of the
methods for x really close to 1 versus a bit further away. That is,
for which values of x is each method most accurate versus for which
is each least accurate.
- Write a program that implements two iterative methods for
approximating the square root of a positive number a. For the
first use the algorithm on page 15. For the second (make this
a separate loop), implement the recursive formula in problem
#4 on page 16 (later revisited in problem #11 on page 28).
Run your program with different values of a and a convergence
parameter nearly as small as machine epsilon. Examine the output
and determine whether or not it appears the order of convergence
agrees with the theoretical order for each algorithm.
Also hand in a copy of your program that implements the
iterative algorithms for approximating the square root of a number
and a logfile (script) showing the output.