CPSC/MATH 402 - Computer Exercises
Log into the Linux system to do this exercise.
This handout and a program are on-line --
go to http://cs.roanoke.edu/~cpsc/CPSC402A and
follow the link.
Investigating Floating Point Representation and Arithmetic
Investigate the floating point representation on the
Linux system by doing the following:
- The program
floatRep.cc is a C++ program that
determines approximations of unit roundoff
(machine epsilon) for type float
and type double using two different techniques.
One is the technique described in Computer Problem 1.3 on page
44 of the text; the other is the alternate characterization of
machine epsilon described on page 20 (as the smallest positive
number epsilon such that fl(1 + epsilon) > 1). The program
FloatRep.java is a Java version of
the same thing.
Do the following:
- Save the programs to your directory (you can do this with
the File/Save As feature in the browser or copy and paste).
- Compile and run the C++ version. In case you forgot,
the commands are:
g++ -o epsilon floatRep.cc
to compile and put the executable in a file named epsilon.
Just type ./epsilon to run (you need the ./ to refer to
your current directory).
Alternately, use the command
g++ floatRep.cc
to compile then ./a.out to run.
- Study the output. Compare the different answers for the different
approximation techniques. How do they compare to theory (for IEEE
floating point representation)? NOTE: If you want a printed copy of
the output, redirect to a file then print the file. The
following commands do that if the name of the executable is epsilon
and the name of the file is eps.out:
./epsilon > eps.out
nenscript -2rG -p- eps.out | lpr -h
- Now compile and run the Java version. The commands to do this
are as follows:
javac FloatRep.java -- compiles to bytecode
java FloatRep -- interprets/runs the bytecode
- Compare these results to those of the C++ program.
- Add to one of the programs above (you choose) code to determine
the smallest power of two that is stored
in both double and single precision (UFL). Print both the decimal
form of the power and the exponent.
- Now add code to find the overflow level. (See Computer Problem
1.2 on page 44)
- The program tenth.cc illustrates the
hazards of testing for equality of two floating point numbers. Save
it to your directory and read the documentation and the code
to see what it does.
- Compile and run the program with the following input: (a) n = 8;
(b) n = 4096; (c) n = 10. What happened and why was the behavior
different for different input?
- Change the loop control to total > epsilon (epsilon
is already computed -- it is the approximation to machine epsilon).
Rerun the program for the input above plus other values. Does it
work correctly?
- Do computer problem #1.6 on page 45.
- The program poly.cc
is essentially Computer Problem 1.14
on page 47 (it just prints values rather than plotting them).
Save it to your directory.
- Read
the problem and the program then compile and run the program. Study
the output. Notice the difference in answers for the two different
calculations. Also notice the computed value of (x - 1)^6 for x = 1.
- The later problem is related to what you were supposed to understand
from doing #1.6. Modify the program using what you learned from #1.6
so the answer for x = 1 is correct.
- The expanded form of the polynomial did not fare so well in the
above calculations. In general, a polynomial should not be evaluated
in expanded form but not all polynomials have nice factored forms.
The recommended method of evaluation (in the absence of a nice
factorization) is called Horner's
method (which your book tells you about on page 316). Horner's method
uses a nested evaluation. See the general formula on page 316. For
this particular polynomial, Horner's method is
1 + x(-6 + x(15 + x(-20 + x(15+ x(-6 + x))))
Add code to the program to evaluate the polynomial this way (assign
to poly3) and print out the result (as a fourth column).
HOMEWORK DUE Thursday, January 31, 2002:
- Answer questions (a) and (c) in Computer Problem 1.3, page 44.
- Determine from the results of your calculation of UFL
whether or not gradual underflow is used on this
system? Give reasons. (HINT: IEEE Double precision
floating point representation is used, so you should
be able to compute theoretically what the smallest number would
be without subnormals and what the smallest
positive number would be without them.)
- What did you get for OFL?
- Why did the input 10 cause problems in the tenth.cc
program but 8 and 4096 worked as expected?
- Answer question (a) in Computer Problem 1.6.
- What did you use as an example in Computer Problem 1.6(b) to
illustrate the difference between the two methods? How different
were the results (compute the largest absolute error and the largest
relative error you got)?
- In Computer Problem #1.14, discuss the behavior for the three
different methods for computing (x - 1)^6 for x near 1. Explain
the behavior of the three methods. Compare the behavior of the
methods for x really close to 1 versus a bit further away. What
is the condition number of the function and does it help understand
what is going on?
Also hand in a copy of your program that computed UFL and OFL.