Fixed Point
In decimal, real numbers are numbers that can be represented, not always precisely, with numbers that contain a decimal place. For example, the number 1 / 4 is a real number that can be represented exactly by 0. 25. The number π, however, can only be approximated with a finite number of digits such as 3. 14159. The decimal place signifies how much each of the digits in a number represent. For example, in the number 12. 34 the 1 represents ten because it is two places to the left of the decimal place and the 3 represents three tenths because it is immediately to the right of the decimal place. Therefore, the decimal number 12. 34 can be rewritten in the following way.
12. 34 = 1 ⋅ 10 + 2 ⋅ 1 + 3 ⋅ 1 / 10 + 4 ⋅ 1 / 100
Note that the multipliers in the above equation can be rewritten using exponents as follows:
12. 34 = 1 × 101 + 2 × 100 + 3 × 10 − 1 + 4 × 10 − 2
In binary, real numbers are numbers that can be represented with numbers that contain a binary point. For example, the binary number 10. 01 is a real number. Just as with decimal real numbers, binary real numbers can be rewritten using an equation that utilizes multipliers of powers of 2. For example, the binary number 10. 01 can be rewritten in the following way.
10. 01 = 1 × 21 + 0 × 20 + 0 × 2 − 1 + 1 × 2 − 2
All of the numbers on the right side of the above equation are decimal numbers, therefore, the decimal value of the binary number 10. 01 can be computed by evaluating the right side of the above equation.
10. 01 = 1 × 21 + 0 × 20 + 0 × 2 − 1 + 1 × 2 − 2
10. 01 = 1 × 21 + 0 × 20 + 0 × 1 / 21 + 1 × 1 / 22
10. 01 = 1 × 2 + 0 × 1 + 0 × 1 / 2 + 1 × 1 / 4
10. 01 = 2 + 0 + 0 + 1 / 4
10. 01 = 2. 25
A practical consequence of using a decimal point to represent real numbers is that not all real numbers can be represented exactly. The number 1 / 3, for example, can not be represented with a finite number of decimal places because only numbers that are a sum of powers of 10 can be represented exactly. The number 1 / 3 must be approximated with a number such as 0. 33333333333. Binary numbers can only represent numbers that are a sum of powers of 2. As a consequence, there are real numbers that can be represented exactly in decimal that must be approximated in binary, such as the decimal number 0. 1. The decimal number 0. 1 can be approximated by the binary number 0. 00011001100110011 which is equivalent to the decimal number 0. 9999847412.
Floating Point
Representing very large or small numbers using decimal points can be tedious. Scientific notation utilizes powers of 10 to to simplify these numbers. For example 1200000000000, is written as 1. 2 × 1012 in scientific notation. This same approach can be used with binary, but is instead called floating point representation, because the binary point floats to different locations depending on the exponent. The fixed point binary number 1000000000000000. 0 is equivalent to the floating point number 1. 0 × 101111. The binary number 1. 0 × 101111 can be converted to decimal in two different ways. The first is to convert all the binary numbers to decimal and then evaluate the equations. For example:
1. 0 × 101111 = 1. 0 × 101111
1. 0 × 101111 = 1. 0 × 215
1. 0 × 101111 = 1. 0 × 32768
1. 0 × 101111 = 32768. 0
The second way to convert binary number to decimal is to shift the binary place by the number of times specified by the exponent and covert the resulting fixed point binary number to decimal. For example.
1. 0 × 101111 = 1. 0 × 101111
1. 0 × 101111 = 1000000000000000. 0
1. 0 × 101111 = 32768. 0
Computers often use floating point numbers to represent real numbers because they can represent a larger range of values with the same number of bits. For example the largest number that can be represented with a 64-bit fixed point number is approximately 1018. The largest number that can be represented with a 64-bit floating point number is approximately 10308.
Floating point numbers can not, however, represent more numbers than fixed point numbers with the same number of bits. This is because there are 264 different permutations of 64 bits, and each permutation represents a number in floating point and in fixed point. As a result, floating point numbers have a limited number of significant digits. For example a 64-bit number with a 11 bit exponent has 53 bits to represent the significant digits. A 53-bit number can only represent 9. 0 × 1015 different values, which means the 64-bit floating point number has 15 significant digits. This means that the number 1234567890123456. 0, which has 16 significant digits, can not be represented with a 64-bit binary number even though it is well below the maximum value of 10308.