1 FloatingPoint Data Representation and Manipulation 198:231 Introduction to Computer Organization Instructor: Nicole Hynes 1
2 Fixed Point Numbers Fixed point number: integer part + fractional part Fixed number of digits to left and right of the radix point In decimal: integer fraction = Similarly in binary: = = = What about base b? 2
3 Converting Decimal Fraction to Binary Fraction Algorithm illustration: =? 2 int part frac part = = = = Read off int parts in order Therefore, = Stop when frac part = 0 3
4 Decimal Fraction to Binary Fraction Converting from decimal to binary may result in a nonterminating fraction. Example: repeating sequence May need to round to desired number of fractional places. Example: =? 2 int part frac part = = = = = = = = = r e p e a t s 4
5 Rounding Because computers represent numbers using a fixed number of bits, both the range and precision of numbers that can be represented are limited. Precision is usually associated with the number of fractional bits allowed by the computer representation. If the number has more fractional bits than is allowed by the computer representation, the number must be rounded to the required precision. Example: How should to 2 fractional bits? ? Or ? 5
6 Rounding Rounding modes Let a be the number and ā be its rounded value 1. Roundtowardzero Round a to nearest number ā of desired precision such that ā a Also called truncation because it simply drops excess fractional bits 2. Rounddown Round a to nearest number ā of desired precision such that ā a Also called roundtowardnegativeinfinity 3. Roundup Round a to nearest number ā of desired precision such that ā a Also called roundtowardpositiveinfinity 4. Roundtoeven Round a to the number ā of desired precision such that a ā is minimized If there is a tie, choose the ā whose least significant digit/bit is even Also called roundtonearest Default mode used in IEEE Floating Point Format, which we ll discuss next 6
7 Rounding Rounding examples Assume precision is 2 fractional bits Number Rounded Value Roundtoward0 Rounddown Roundup Roundtoeven
8 Fixed Point Arithmetic Adapt integer arithmetic algorithms Will illustrate for unsigned fixed point only Addition and Subtraction Similar to integer addition/subtraction Just align radix points Example: = = = align binary points 8
9 Multiplication Fixed Point Arithmetic 1. Ignore radix points; multiply as integers 2. Insert radix point of product: no. of fractional places = sum of no. of fractional places of two operands Example: = = =
10 Fixed Point Arithmetic Division 1. Shift right radix point of divisor until it is a whole integer 2. Shift right radix point of dividend the same number of positions 3. Divide as in integer division 4. Radix point of quotient is in same position as that of dividend Example: ( ) ( ) = May result in a quotient with nonterminating fractional part round to desired number of fractional places 10
11 Floating Point Numbers Fixed point numbers can also be written in scientific notation also referred to as floating point format: significand Decimal: = = exponent Binary: = = significand exponent Significand (a.k.a. mantissa) is normalized: exactly one digit/bit to left of decimal/binary point. Allows for more compact representation of real numbers than fixed point format. 11
12 Floating Point Representation Most computers support the IEEE 754 standard for encoding floating point numbers: Single precision (32 bits): C type float Double precision (64 bits): C type double Intel x86 processors also support extended precision format (80 bits) 12
13 IEEE Single Precision FP Format Normalized binary FP number Single precision FP format significand ±1.fraction 2 exponent s b_exp frac 32 bits Field # Bits Value Remarks s 1 0 if number is positive; 1 if negative b_exp 8 exponent + bias, where bias = = = 127 called the biased exponent frac 23 fractional part of significand 1 to left of binary point is not stored (hidden bit) 13
14 IEEE Single Precision FP Format Problem: Find the single precision FP representation of Solution: 1. Convert to binary FP: = Normalize binary FP: = Map to single precision FP format: s = 1 frac = (pad with zeros to make 23 bits) b_exp = = 132 = Answer:
15 IEEE Single Precision FP Format FP numbers that can be represented in IEEE single precision format: 1. Normalized values Numbers of the form ±1.fraction 2 exponent 126 exponent b_exp 254 Most positive/negative number = ± Least positive/negative number = ± Observations on b_exp:  always positive (all zeros) and (all 1 s) not used: these bit patterns are used to represent special values s 0 & 255 frac 15
16 IEEE Single Precision FP Format 2. Denormalized values a. b_exp = 0 and frac = 0 represents the value ±0.0 Note: two representations of zero. s b. b_exp = 0 and frac 0 represents the binary number of the form ±0.fraction s frac Notes:  significand < 1 (bit to left of binary point is 0)  exponent of binary number must be 126 (= 1 bias)  allows representation of numbers smaller than least positive/negative normalized number, ±
17 IEEE Single Precision FP Format 3. Special values a. b_exp = all 1 s and frac = 0 represents the value ±. Typically used to represent results that overflow. s b. b_exp = all 1 s and frac 0 represents NaN ( Not a Number ). Typically used to represent results that can t be represented as a real number (e.g., 1 ). s frac 0 17
18 Why Use a Biased Representation? The IEEE single precision FP format can be generalized to any number of exponent and fractional bits: ±1.fraction 2 exponent s b_exp frac 1 k n For a kbit biased exponent field:  bias = 2 k b_exp = exponent + bias  Exponent of normalized FP number is limited to [ (2 k1 2), (2 k1 1)]  As a result, 1 b_exp (2 k 2)  As before, b_exp = all 0 s and all 1 s are used to represent denormalized values and special values 18
19 Why Use a Biased Representation? By biasing the exponent, i.e. adding (2 k1 1) to the true exponent, the resulting biased exponent is always nonnegative and hence can be treated as an unsigned integer. Comparing unsigned integers is easy: Treated as unsigned integers, which is larger: or ? Compare bitwise starting from left (msb). Stop at bit position where the numbers differ. The number with a 1 bit is larger larger Can compare two numbers in IEEE FP format with the same sign using same algorithm: < 19
20 IEEE Double Precision FP Format Normalized binary FP number Double precision FP format significand ±1.fraction 2 exponent s b_exp frac 64 bits Field # Bits Value Remarks s 1 0 if number is positive; 1 if negative b_exp 11 exponent + bias, where bias = = = 1023 called the biased exponent frac 52 fractional part of significand 1 to left of binary point is not stored (hidden bit) 20
21 x86 Extended Precision Normalized binary FP number Extended precision FP format significand ±1.fraction 2 exponent s b_exp 1 frac 80 bits Field # Bits Value Remarks s 1 0 if number is positive; 1 if negative b_exp 15 exponent + bias, where bias = = = 16,383 called the biased exponent frac 64 entire significand 1.fraction no hidden bit! 21
22 Floating Point Arithmetic Addition and Subtraction 1. Make exponents equal 2. Add/subtract significands 3. Normalize result Why? Let A = a 2 e1 and B = b 2 e2 and suppose e1 < e2 Then A can be rewritten as A = a 2 e2 2 (e2e1) Therefore, A + B = ( (a 2 (e2e1) ) + b ) 2 e2 Shift a right of the binary point (e2e1) places; then add to b 22
23 Floating Point Arithmetic Addition Example: IEEE single precision format + s b_exp frac = = Don t forget the hidden bit! To simplify illustration, let s show the hidden bit hidden bit significand 23
24 Floating Point Arithmetic Addition Example, Cont Make exponents equal To leave value unchanged: Shift significand left by 1 bit must decrease exponent by 1 Shift significand right by 1 bit must increase exponent by 1 Increase smaller exponent to equal larger exponent. Why? Will shift significand right, losing only least significant bits Therefore, increase exponent of , shifting significand right by = = 8 10 places 24
25 Floating Point Arithmetic Addition Example, Cont. Note that hidden bit is shifted into msb Shift significand of right by 8 places original value shift right 1 place shift right 2 places shift right 3 places shift right 4 places shift right 5 places shift right 6 places shift right 7 places shift right 8 places 25
26 Floating Point Arithmetic Addition Example. Cont. 2. Add significands Normalize result (already normalized; hide hidden bit)
27 Floating Point Arithmetic Multiplication 1. Add exponents 2. Multiply significands 3. Normalize result Why? Let A = a 2 e1 and B = b 2 e2 Then, A B = ( a b ) 2 e1+e2 27
28 Floating Point Arithmetic Multiplication Example: IEEE single precision format s b_exp frac = = As before, let s show the hidden bit hidden bit significand 28
29 Floating Point Arithmetic Multiplication Example. Cont. 1. Add true exponents b_exp b_exp Note that these are biased exponents: b_exp 1 = true_exponent true_exponent 1 = b_exp b_exp 2 = true_exponent true_exponent 2 = b_exp Now, true_exponent result = true_exponent 1 + true_exponent 2. Therefore, b_exp result = true_exponent result = (b_exp 1 + b_exp 2 ) = ( ) =
30 Floating Point Arithmetic Multiplication Example. Cont. 2. Multiply significands significand significand result = = sign result = 1 Why? b_exp result = (from previous slide) 3. Normalize result shift significand result right by 1 bit (hide hidden bit in IEEE format!) increase b_exp result by
31 Floating Point Arithmetic Division 1. Subtract exponents 2. Divide significands 3. Normalize result Why? Let A = a 2 e1 and B = b 2 e2 Then, A / B = ( a / b ) 2 e1e2 31
32 Floating Point Arithmetic Division Example: IEEE single precision format s b_exp frac = = As before, let s show the hidden bit hidden bit significand 32
33 Floating Point Arithmetic Division Example. Cont. 1. Subtract true exponents b_exp b_exp Note that these are biased exponents: b_exp 1 = true_exponent true_exponent 1 = b_exp b_exp 2 = true_exponent true_exponent 2 = b_exp Now, true_exponent result = true_exponent 1  true_exponent 2. Therefore, b_exp result = true_exponent result = (b_exp 1  b_exp 2 ) = ( ) =
34 Floating Point Arithmetic Division Example. Cont. 2. Divide significands significand significand result = = sign result = 0 b_exp result = (from previous slide) 3. Normalize result shift significand result left by 1 bit 1.01 (hide hidden bit in IEEE format!) decrease b_exp result by
