University:
Whitman College
Course:
MATH-467 | Numerical Analysis
Academic year:

2022
Views:

394

Pages:

2
Author:

Nelson Lane

IEEE Floating Point Format 1. Definition: A floating point number consists of three parts: (i) The sign (+/-), (ii) The Mantissa (string of significant digits [in base 2, bits]); (iii) The exponent. A normalized floating point number is a number of the form: ±1.bbbbbb...b × 2p where the number of b’s will vary depending on the precision, and p is the exponent (from the floating pt number definition). 2. Levels of precision for floating point numbers (the integers below represent how many bits will be used for the representation): sign exponent mantissa single 1 8 23 1 11 52 double long double 1 15 64 Important Note: Unless otherwise stated, everything we do with floating point numbers will be done in double precision. 3. Definition: Machine epsilon, mach is the distance between the number 1 and the next smallest number greater than 1. In double precision (which is what Matlab will use), mach = 2−52 ≈ 2.2 × 10−16 4. IEEE rounding (to nearest neighbor): The bit of interest is in the 53d position (recall we’re using double precision). • If the bit is 0, round down (truncate after 52d bit) • If the bit is 1, check the bits to the right: – If they are not all zero (the usual choice), round up. – If they are all zero, round up iff bit 52 is 1. 5. Definition: f l(x) is the floating point representation of x. 6. Example: Compute f l(1/3) and find the error in this approximation. First, we see that 1 3 = 0.01 10 2 To get this base 2 number into (normalized) floating point form, 1.010101 . . . 10 |{z} 1 010101 . . . × 2−2 52d bit so that the error (or tail) is 2−52 · 0.01 × 2−2 = 1 3 · 2−54 . That leaves: f l(1/3) = 1.010101 . . . 101 × 2−2 With an error, in terms of mach : 1 3 · 2−54 = 1 12 mach

Description

Related Documents

Free up your schedule!

Take 5 seconds to unlock