Floating Point Numbers - PDF Free Download

Floating Point Numbers Summer 8 Fractional numbers Fractional numbers fixed point Floating point numbers the IEEE 7 floating point standard Floating point operations Rounding modes CMPE Summer 8 Slides by ADB

Positional representation of fractional numbers In base - - - - Multiplier Number - - - - Position Decimal point CMPE Summer 8 Slides by ADB Positional representation of fractional numbers In base - - - - Multiplier Number - - - - Position Binary point CMPE Summer 8 Slides by ADB

Fractional numbers fixed point Fixed-point representation How much information is necessary to store? How do you choose a format for the bits? Fixed-point operations Addition Align binary points, and add straight down Multiplication??? CMPE Summer 8 Slides by ADB Decimal to binary conversion Convert A =. to base CMPE Summer 8 Slides by ADB

Fixed-point number density CMPE Summer 8 Slides by ADB 7 Scientific notation In base Example:. 8 In base Example:. (= 8. ) The general form r = Sign significand base Exponent CMPE Summer 8 Slides by ADB 8

Single-precision IEEE 7 floating-point numbers 9 8 7 9 8 7 9 8 7 e x p o n e n t Sign significand One-bit sign Eight-bit exponent -bit significand That s the fractional part CMPE Summer 8 Slides by ADB 9 Single-precision IEEE 7 floating-point numbers Normalized numbers: only one non-zero bit to the left of the binary point Adjust the exponent as needed r = ( ) S F E Implicit leading in the significand (the hidden bit ) r = ( ) S ( + F) E Bias notation to represent the exponent With the bias B = 7 r = ( ) S ( + F) E-B CMPE Summer 8 Slides by ADB

How to convert a base- number into IEEE 7 single-precision floating point Convert the number to binary The big part And the fractional part Normalize Isolate the hidden one Remove the significand s hidden one Add bias to the exponent Represent the number CMPE Summer 8 Slides by ADB Example:. 9 8 7 9 8 7 9 8 7 Sign Convert to binary Normalize Remove hidden one Add bias to exponent The end significand CMPE Summer 8 Slides by ADB

Double-precision IEEE 7 floating-point numbers 9 8 7 9 8 7 9 8 7 e x p o n e n t Sign significand 9 8 7 9 8 7 9 8 7 significand, continued One-bit sign Eleven-bit exponent -bit significand That s the fractional part CMPE Summer 8 Slides by ADB Summary of IEEE 7 formats Precision Parameter Single Single Ext. Double Double Ext. Bits for F Bits for E 8 E max +7 + + +8 E min 8 Total bits 79 For every precision, there are reserved exponents, used for special quantities: E min (i.e., E=) is used for zero and denorms E max + (i.e., E= or E=7, with bias) is used for NaN and infinity CMPE Summer 8 Slides by ADB 7

Special quantities: Infinity This special quantity avoids halt on overflow Much safer than returning the largest possible number Representation: E = E max + E = in single precision with bias E = 7 in double precision with bias F = Sign (+ or ) 9 8 7 9 8 7 9 8 7 Sign significand CMPE Summer 8 Slides by ADB Special quantities: Infinity Examples of operations that return ±Inf Operation / / Inf Sqrt (+Inf) / Inf Result CMPE Summer 8 Slides by ADB 8

Special quantities: NaN (Not a Number) This special quantity avoids halt on invalid operations Representation: E = E max + E = in single precision with bias E = 7 in double precision with bias F 9 8 7 9 8 7 9 8 7 Sign significand CMPE Summer 8 Slides by ADB 7 Special quantities: NaN (Not a Number) Examples of operations that return NaN Operation Result +/ / /+ / / Inf log(+) log( ) CMPE Summer 8 Slides by ADB 8 9

Special quantities: Zero Representation: E = E min (i.e., E = ) F = Sign: + or 9 8 7 9 8 7 9 8 7 Sign significand CMPE Summer 8 Slides by ADB 9 Special quantities: Zero Examples of operations that involve ± Operation + / Sqrt NaN produced by CMPE Summer 8 Slides by ADB

Floating point numbers range What is the largest number we can represent in IEEE 7 single-precision floating point? What is the smallest number? 9 8 7 9 8 7 9 8 7 Sign significand CMPE Summer 8 Slides by ADB Floating point numbers range What is the largest number we can represent in IEEE 7 double-precision floating point? What is the smallest number? CMPE Summer 8 Slides by ADB

Floating point numbers: density Fact : Floats are not reals E.g., / Fact : Floats are not decimals E.g.,. (base ) =. (base ) Fact : Not even all the integers in the range are represented E.g.,,, (base ) = (base ) CMPE Summer 8 Slides by ADB Floating point numbers: density Close to : high density Far from : low density CMPE Summer 8 Slides by ADB

Special quantities: Denormals These are numbers smaller than ^(E min ) Fill the gap between ^(E min ) and (gradual underflow) Representation: E = E min (i.e., E = ) F The number represented is.f - f - f - ^(E min ) 9 8 7 9 8 7 9 8 7 Sign significand CMPE Summer 8 Slides by ADB Special quantities: Denormals From to ^(E min ) CMPE Summer 8 Slides by ADB

Summary of IEEE 7 numbers Single precision Exponent Significand Double precision Exponent Significand Object anything anything 7 7 CMPE Summer 8 Slides by ADB 7 Floating point operations: addition Addition: C = A + B Step : Align the significands Are the exponents different? Shift the smaller number to the right and adjust the exponent Step : Add the significands Don t forget the sign Step : Normalize the sum Check for overflow or underflow CMPE Summer 8 Slides by ADB 8

Floating point operations: addition A =. (base ) B =. (base ) Step : Align the significands Step : Add the significands Step : Normalize the sum CMPE Summer 8 Slides by ADB 9 Floating point operations: multiplication C = A B Step : Add the exponents Step : Multiply the significands Without the sign Step : Normalize the product Step : Set the sign of the product Both positive? Result is positive Both negative? Result is positive Different signs? Result is negative CMPE Summer 8 Slides by ADB

Floating point operations: multiplication A =. (base ) B =. (base ) Step : Add the exponents Step : Multiply the significands Step : Normalize the product Step : Set the sign of the product CMPE Summer 8 Slides by ADB Rounding What is rounding, and when do we need to round? Numbers that we can represent, and numbers that we can not represent: What is fair? CMPE Summer 8 Slides by ADB

Rounding modes IEEE 7 defines four rounding modes Towards +Inf Towards Inf Towards To nearest CMPE Summer 8 Slides by ADB Rounding modes We need some extra bits In IEEE-7, the internal representation uses at least more bits in the significand Guard Round Sticky CMPE Summer 8 Slides by ADB 7

Rounding modes CMPE Summer 8 Slides by ADB Recommended exercises Ex. A: Represent the binary number A =. in IEEE 7 single-precision and double-precision. Ex. B: Give the IEEE 7 single-precision representation of the number (thirty-four million and one). Give its rounded representation using all four rounding modes, and for each one, give the decimal value of the number actually represented. CMPE Summer 8 Slides by ADB 8