Floating Point COE 308. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Similar documents
Floating Point Arithmetic

Floating Point. The World is Not Just Integers. Programming languages support numbers with fraction

9 Floating-Point. 9.1 Objectives. 9.2 Floating-Point Number Representation. Value = ± (1.F) 2 2 E Bias

Arithmetic. Chapter 3 Computer Organization and Design

CSCI 402: Computer Architectures. Arithmetic for Computers (4) Fengguang Song Department of Computer & Information Science IUPUI.

Computer Architecture and IC Design Lab. Chapter 3 Part 2 Arithmetic for Computers Floating Point

Written Homework 3. Floating-Point Example (1/2)

3.5 Floating Point: Overview

Chapter 3 Arithmetic for Computers (Part 2)

COMPUTER ORGANIZATION AND DESIGN

Arithmetic for Computers. Hwansoo Han

Floating Point Arithmetic

Floating Point Arithmetic. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Floating Point Arithmetic

Chapter 3. Arithmetic for Computers

Chapter 3. Arithmetic for Computers

Chapter 3. Arithmetic Text: P&H rev

Chapter 3. Arithmetic for Computers

Chapter 3 Arithmetic for Computers

Lecture 13: (Integer Multiplication and Division) FLOATING POINT NUMBERS

Thomas Polzer Institut für Technische Informatik

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Signed Multiplication Multiply the positives Negate result if signs of operand are different

Floating-Point Arithmetic

5DV118 Computer Organization and Architecture Umeå University Department of Computing Science Stephen J. Hegner. Topic 3: Arithmetic

CS222: Processor Design

Slide Set 11. for ENCM 369 Winter 2015 Lecture Section 01. Steve Norman, PhD, PEng

Floating Point Numbers

Review: MIPS Organization

Floating-Point Arithmetic

TDT4255 Computer Design. Lecture 4. Magnus Jahre

CSE 2021 Computer Organization. Hugh Chesser, CSEB 1012U W4-W

Floating Point Numbers. Lecture 9 CAP

Computer Architecture. Chapter 3: Arithmetic for Computers

xx.yyyy Lecture #11 Floating Point II Summary (single precision): Precision and Accuracy Fractional Powers of 2 Representation of Fractions

CO Computer Architecture and Programming Languages CAPL. Lecture 15

Computer Architecture Chapter 3. Fall 2005 Department of Computer Science Kent State University

MIPS Integer ALU Requirements

Floating-point representations

Floating-point representations

Instruction Set Architecture of. MIPS Processor. MIPS Processor. MIPS Registers (continued) MIPS Registers

CS61C : Machine Structures

CS61C Floating Point Operations & Multiply/Divide. Lecture 9. February 17, 1999 Dave Patterson (http.cs.berkeley.edu/~patterson)

Review 1/2 Big Idea: Instructions determine meaning of data; nothing inherent inside the data Characters: ASCII takes one byte

IEEE Standard 754 for Binary Floating-Point Arithmetic.

Integer Multiplication and Division

Divide: Paper & Pencil

Floating Point (with contributions from Dr. Bin Ren, William & Mary Computer Science)

Lecture 10: Floating Point, Digital Design

The MIPS R2000 Instruction Set

Precision and Accuracy

Chapter 3. Arithmetic for Computers

UC Berkeley CS61C : Machine Structures

COMP2611: Computer Organization. Data Representation

95% of the folks out there are completely clueless about floating-point.

UCB CS61C : Machine Structures

Chapter 2 Float Point Arithmetic. Real Numbers in Decimal Notation. Real Numbers in Decimal Notation

ECE232: Hardware Organization and Design

Floating Point Puzzles. Lecture 3B Floating Point. IEEE Floating Point. Fractional Binary Numbers. Topics. IEEE Standard 754

Floating-Point Data Representation and Manipulation 198:231 Introduction to Computer Organization Lecture 3

Floating point. Today! IEEE Floating Point Standard! Rounding! Floating Point Operations! Mathematical properties. Next time. !

EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing

Chapter Three. Arithmetic

Module 2: Computer Arithmetic

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

UC Berkeley CS61C : Machine Structures

Floating-point Arithmetic. where you sum up the integer to the left of the decimal point and the fraction to the right.

The Sign consists of a single bit. If this bit is '1', then the number is negative. If this bit is '0', then the number is positive.

Arithmetic for Computers. Integer Addition. Chapter 3

Format. 10 multiple choice 8 points each. 1 short answer 20 points. Same basic principals as the midterm

Integer Subtraction. Chapter 3. Overflow conditions. Arithmetic for Computers. Signed Addition. Integer Addition. Arithmetic for Computers

Computer Architecture Set Four. Arithmetic

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 3. Arithmetic for Computers

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 3. Arithmetic for Computers

Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition. Carnegie Mellon

REGISTERS INSTRUCTION SET DIRECTIVES SYSCALLS

Floating Point January 24, 2008

Data Representation Floating Point

EE260: Logic Design, Spring n Integer multiplication. n Booth s algorithm. n Integer division. n Restoring, non-restoring

Floating Point. CSE 238/2038/2138: Systems Programming. Instructor: Fatma CORUT ERGİN. Slides adapted from Bryant & O Hallaron s slides

Foundations of Computer Systems

IEEE Standard 754 for Binary Floating-Point Arithmetic.

Boolean Algebra. Chapter 3. Boolean Algebra. Chapter 3 Arithmetic for Computers 1. Fundamental Boolean Operations. Arithmetic for Computers

September, 2003 Saeid Nooshabadi

Floating Point Numbers

Floating Point Numbers

Numeric Encodings Prof. James L. Frankel Harvard University

CO212 Lecture 10: Arithmetic & Logical Unit

Data Representation Floating Point

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 3. Arithmetic for Computers

Systems I. Floating Point. Topics IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties

Floating Point : Introduction to Computer Systems 4 th Lecture, May 25, Instructor: Brian Railing. Carnegie Mellon

CO Computer Architecture and Programming Languages CAPL. Lecture 13 & 14

Computer Arithmetic Floating Point

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 3. Arithmetic for Computers

VTU NOTES QUESTION PAPERS NEWS RESULTS FORUMS Arithmetic (a) The four possible cases Carry (b) Truth table x y

Floating Point Puzzles. Lecture 3B Floating Point. IEEE Floating Point. Fractional Binary Numbers. Topics. IEEE Standard 754

CS61C : Machine Structures

CS429: Computer Organization and Architecture

IEEE Standard for Floating-Point Arithmetic: 754

Transcription:

Floating Point COE 38 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Floating-Point Numbers IEEE 754 Floating-Point Standard Floating-Point Addition and Subtraction Floating-Point Multiplication Extra Bits and Rounding MIPS Floating-Point Instructions Floating Point COE 38 Computer Architecture Muhamed Mudawar slide 2

The World is Not Just Integers Programming languages support numbers with fraction Called floating-point numbers Examples: 3.459265 (π) 2.7828 (e). or. 9 (seconds in a nanosecond) 86,4,,, or 8.64 3 (nanoseconds in a day) last number is a large integer that cannot fit in a 32-bit integer We use a scientific notation to represent Very small numbers (e.g.. 9 ) Very large numbers (e.g. 8.64 3 ) Scientific notation: ± d. f f 2 f 3 f 4 ± e e 2 e 3 Floating Point COE 38 Computer Architecture Muhamed Mudawar slide 3 Floating-Point Numbers Examples of floating-point numbers in base 5.34 3,.534 5, 2.3, 2.3 3 Examples of floating-point numbers in base 2. 2 23,. 2 25,. 2 3,. 2 6 Exponents are kept in decimal for clarity binary point decimal point The binary number (.) 2 = 2 3 +2 2 +2 +2 +2 3 = 3.625 Floating-point numbers should be normalized Exactly one non-zero digit should appear before the point In a decimal number, this digit can be from to 9 In a binary number, this digit should be Normalized FP Numbers: 5.34 3 and. 2 3 NOT Normalized:.534 5 and. 2 6 Floating Point COE 38 Computer Architecture Muhamed Mudawar slide 4 2

Floating-Point Representation A floating-point number is represented by the triple S is the Sign bit ( is positive and is negative) Representation is called sign and magnitude E is the Exponent field (signed) Very large numbers have large positive exponents Very small close-to-zero numbers have negative exponents More bits in exponent field increases range of values F is the Fraction field (fraction after binary point) More bits in fraction field improves the precision of FP numbers S Exponent Fraction Value of a floating-point number = (-) S val(f) 2 val(e) Floating Point COE 38 Computer Architecture Muhamed Mudawar slide 5 Next... Floating-Point Numbers IEEE 754 Floating-Point Standard Floating-Point Addition and Subtraction Floating-Point Multiplication Extra Bits and Rounding MIPS Floating-Point Instructions Floating Point COE 38 Computer Architecture Muhamed Mudawar slide 6 3

IEEE 754 Floating-Point Standard Found in virtually every computer invented since 98 Simplified porting of floating-point numbers Unified the development of floating-point algorithms Increased the accuracy of floating-point numbers Single Precision Floating Point Numbers (32 bits) -bit sign + 8-bit exponent + 23-bit fraction S Exponent 8 Fraction 23 Double Precision Floating Point Numbers (64 bits) -bit sign + -bit exponent + 52-bit fraction S Exponent Fraction 52 (continued) Floating Point COE 38 Computer Architecture Muhamed Mudawar slide 7 Normalized Floating Point Numbers For a normalized floating point number (S, E, F) S E F = f f 2 f 3 f 4 Significand is equal to (.F) 2 = (.f f 2 f 3 f 4 ) 2 IEEE 754 assumes hidden. (not stored) for normalized numbers Significand is bit longer than fraction Value of a Normalized Floating Point Number is ( ) S (.F) 2 2 val(e) ( ) S (.f f 2 f 3 f 4 ) 2 2 val(e) ( ) S ( + f 2 - + f 2 2-2 + f 3 2-3 + f 4 2-4 ) 2 2 val(e) ( ) S is when S is (positive), and when S is (negative) Floating Point COE 38 Computer Architecture Muhamed Mudawar slide 8 4

Biased Exponent Representation How to represent a signed exponent? Choices are Sign + magnitude representation for the exponent Two s complement representation Biased representation IEEE 754 uses biased representation for the exponent Value of exponent = val(e) = E Bias(Bias is a constant) Recall that exponent field is 8 bits for single precision E can be in the range to 255 E = and E = 255 are reserved for special use (discussed later) E = to 254 are used for normalized floating point numbers Bias = 27 (half of 254), val(e) = E 27 val(e=) = 26, val(e=27) =, val(e=254) = 27 Floating Point COE 38 Computer Architecture Muhamed Mudawar slide 9 Biased Exponent Cont d For double precision, exponent field is bits E can be in the range to 247 E = and E = 247 are reserved for special use E = to 246 are used for normalized floating point numbers Bias = 23 (half of 246), val(e) = E 23 val(e=) = 22, val(e=23) =, val(e=246) = 23 Value of a Normalized Floating Point Number is ( ) S (.F) 2 2 E Bias ( ) S (.f f 2 f 3 f 4 ) 2 2 E Bias ( ) S ( + f 2 - + f 2 2-2 + f 3 2-3 + f 4 2-4 ) 2 2 E Bias Floating Point COE 38 Computer Architecture Muhamed Mudawar slide 5

Examples of Single Precision Float What is the decimal value of this Single Precision float? Solution: Sign = is negative Exponent = () 2 = 24, E bias= 24 27 = 3 Significand = (. ) 2 = + 2-2 =.25 (. is implicit) Value in decimal =.25 2 3 =.5625 What is the decimal value of? Solution: implicit Value in decimal = +(. ) 2 2 3 27 = (. ) 2 2 3 = (. ) 2 =.375 Floating Point COE 38 Computer Architecture Muhamed Mudawar slide Examples of Double Precision Float What is the decimal value of this Double Precision float? Solution: Value of exponent = () 2 Bias = 29 23 = 6 Value of double float = (. ) 2 2 6 (. is implicit) = (. ) 2 = 74.5 What is the decimal value of? Do it yourself! (answer should be.5 2 7 =.7875) Floating Point COE 38 Computer Architecture Muhamed Mudawar slide 2 6

Converting FP Decimal to Binary Convert.825 to binary in single and double precision Solution: Fraction bits can be obtained using multiplication by 2.825 2 =.625.625 2 =.25.825 = (.).25 2 =.5 2 = ½ + ¼ + /6 = 3/6.5 2 =. Stop when fractional part is Fraction = (.) 2 = (.) 2 2 (Normalized) Exponent = + Bias = 26 (single precision) and 22 (double) Single Precision Double Precision Floating Point COE 38 Computer Architecture Muhamed Mudawar slide 3 Largest Normalized Float What is the Largest normalized float? Solution for Single Precision: Exponent bias = 254 27 = 27 (largest exponent for SP) Significand = (. ) 2 = almost 2 Value in decimal 2 2 27 2 28 3.428 38 Solution for Double Precision: Value in decimal 2 2 23 2 24.79769 38 Overflow: exponent is too large to fit in the exponent field Floating Point COE 38 Computer Architecture Muhamed Mudawar slide 4 7

Smallest Normalized Float What is the smallest (in absolute value) normalized float? Solution for Single Precision: Exponent bias = 27 = 26 (smallest exponent for SP) Significand = (. ) 2 = Value in decimal = 2 26 =.7549 38 Solution for Double Precision: Value in decimal = 2 22 = 2.2257 38 Underflow: exponent is too small to fit in exponent field Floating Point COE 38 Computer Architecture Muhamed Mudawar slide 5 Zero, Infinity, and NaN Zero Exponent field E = and fraction F = + and are possible according to sign bit S Infinity Infinity is a special value represented with maximum E and F = For single precision with 8-bit exponent: maximum E = 255 For double precision with -bit exponent: maximum E = 247 Infinity can result from overflow or division by zero + and are possible according to sign bit S NaN (Not a Number) NaN is a special value represented with maximum E and F Result from exceptional situations, such as / or sqrt(negative) Operation on a NaN results is NaN: Op(X, NaN) = NaN Floating Point COE 38 Computer Architecture Muhamed Mudawar slide 6 8

Denormalized Numbers IEEE standard uses denormalized numbers to Fill the gap between and the smallest normalized float Provide gradual underflow to zero Denormalized: exponent field E is and fraction F Implicit. before the fraction now becomes. (not normalized) Value of denormalized number ( S,, F ) Single precision: ( ) S (.F) 2 2 26 Double precision: ( ) S (.F) 2 2 22 Negative Overflow Negative Underflow Positive Underflow Positive Overflow - Normalized ( ve) Denorm Denorm Normalized (+ve) + -2 28-2 26 2 26 2 28 Floating Point COE 38 Computer Architecture Muhamed Mudawar slide 7 Floating-Point Comparison IEEE 754 floating point numbers are ordered Because exponent uses a biased representation Exponent value and its binary representation have same ordering Placing exponent before the fraction field orders the magnitude Larger exponent larger magnitude For equal exponents, Larger fraction larger magnitude < (.F) 2 2 E min < (.F)2 2 E Bias < (E min = Bias) Because sign bit is most significant quick test of signed < Integer comparator can compare magnitudes X = (E X, F X ) Y = (E Y, F Y ) Integer Magnitude Comparator X < Y X = Y X > Y Floating Point COE 38 Computer Architecture Muhamed Mudawar slide 8 9

Summary of IEEE 754 Encoding Single-Precision Normalized Number Denormalized Number Zero Infinity NaN Exponent = 8 to 254 255 255 Fraction = 23 Anything nonzero nonzero Value ±(.F) 2 2 E 27 ±(.F) 2 2 26 ± ± NaN Double-Precision Normalized Number Denormalized Number Zero Infinity NaN Exponent = to 246 247 247 Fraction = 52 Anything nonzero nonzero Value ±(.F) 2 2 E 23 ±(.F) 2 2 22 ± ± NaN Floating Point COE 38 Computer Architecture Muhamed Mudawar slide 9 Next... Floating-Point Numbers IEEE 754 Floating-Point Standard Floating-Point Addition and Subtraction Floating-Point Multiplication Extra Bits and Rounding MIPS Floating-Point Instructions Floating Point COE 38 Computer Architecture Muhamed Mudawar slide 2

Floating Point Addition Example Consider adding: (.) 2 2 + (.) 2 2 3 For simplicity, we assume 4 bits of precision (or 3 bits of fraction) Cannot add significands Why? Because exponents are not equal How to make exponents equal? Shift the significand of the lesser exponent right until its exponent matches the larger number (.) 2 2 3 = (.) 2 2 2 = (.) 2 2 Difference between the two exponents = ( 3) = 2 So, shift right by 2 bits. + Now, add the significands:. Carry. Floating Point COE 38 Computer Architecture Muhamed Mudawar slide 2 Addition Example cont d So, (.) 2 2 + (.) 2 2 3 = (.) 2 2 However, result (.) 2 2 is NOT normalized Normalize result: (.) 2 2 = (.) 2 2 In this example, we have a carry So, shift right by bit and increment the exponent Round the significand to fit in appropriate number of bits We assumed 4 bits of precision or 3 bits of fraction Round to nearest: (.) 2 (.) 2 Renormalize if rounding generates a carry Detect overflow / underflow If exponent becomes too large (overflow) or too small (underflow) +.. Floating Point COE 38 Computer Architecture Muhamed Mudawar slide 22

Floating Point Subtraction Example Consider: (.) 2 2 3 (.) 2 2 2 We assume again: 4 bits of precision (or 3 bits of fraction) Shift significand of the lesser exponent right Difference between the two exponents = 2 ( 3) = 5 Shift right by 5 bits: (.) 2 2 3 = (.) 2 2 2 Convert subtraction into addition to 2's complement Sign 2 s Complement +. 2 2. 2 2. 2 2. 2 2. 2 2 Since result is negative, convert result from 2's complement to sign-magnitude 2 s Complement. 2 2 Floating Point COE 38 Computer Architecture Muhamed Mudawar slide 23 Subtraction Example cont d So, (.) 2 2 3 (.) 2 2 2 =. 2 2 2 Normalize result:. 2 2 2 =. 2 2 For subtraction, we can have leading zeros Count number z of leading zeros (in this case z = ) Shift left and decrement exponent by z Round the significand to fit in appropriate number of bits We assumed 4 bits of precision or 3 bits of fraction Round to nearest: (.) 2 (.) 2 Renormalize: rounding generated a carry. 2 2. 2 2 =. 2 2 2. +. Result would have been accurate if more fraction bits are used Floating Point COE 38 Computer Architecture Muhamed Mudawar slide 24 2

Floating Point Addition / Subtraction Start. Compare the exponents of the two numbers. Shift the smaller number to the right until its exponent would match the larger exponent. 2. Add / Subtract the significands according to the sign bits. 3. Normalize the sum, either shifting right and incrementing the exponent or shifting left and decrementing the exponent 4. Round the significand to the appropriate number of bits, and renormalize if rounding generates a carry Shift significand right by d = E X E Y Add significands when signs of X and Y are identical, Subtract when different X Y becomes X + ( Y) Normalization shifts right by if there is a carry, or shifts left by the number of leading zeros in the case of subtraction Overflow or underflow? no Done yes Exception Rounding either truncates fraction, or adds a to least significant fraction bit Floating Point COE 38 Computer Architecture Muhamed Mudawar slide 25 Floating Point Adder Block Diagram E X E Y Exponent Subtractor sign F X Swap F Y d = E X E Y Shift Right add/sub S X S Y Sign Computation add / subtract sign Significand Adder/Subtractor max ( E X, E Y ) c z Detect carry, or Count leading s c z Shift Right / Left Inc / Dec c Rounding Logic S Z E Z F Z Floating Point COE 38 Computer Architecture Muhamed Mudawar slide 26 3

Next... Floating-Point Numbers IEEE 754 Floating-Point Standard Floating-Point Addition and Subtraction Floating-Point Multiplication Extra Bits and Rounding MIPS Floating-Point Instructions Floating Point COE 38 Computer Architecture Muhamed Mudawar slide 27 Floating Point Multiplication Example Consider multiplying:. 2 2 by. 2 2 2 As before, we assume 4 bits of precision (or 3 bits of fraction) Unlike addition, we add the exponents of the operands Result exponent value = ( ) + ( 2) = 3 Using the biased representation: E Z = E X + E Y Bias E X = ( ) + 27 = 26 (Bias = 27 for SP) E Y = ( 2) + 27 = 25 E Z = 26 + 25 27 = 24 (value = 3) Now, multiply the significands: (.) 2 (.) 2 = (.) 2 3-bit fraction 3-bit fraction 6-bit fraction.. Floating Point COE 38 Computer Architecture Muhamed Mudawar slide 28 4

Multiplication Example cont d Since sign S X S Y, sign of product S Z = (negative) So,. 2 2. 2 2 2 =. 2 2 3 However, result:. 2 2 3 is NOT normalized Normalize:. 2 2 3 =. 2 2 2 Shift right by bit and increment the exponent At most bit can be shifted right Why? Round the significand to nearest:. 2. 2 (3-bit fraction) Result. 2 2 2 (normalized) Detect overflow / underflow.. No overflow / underflow because exponent is within range Floating Point COE 38 Computer Architecture Muhamed Mudawar slide 29 + Floating Point Multiplication Start. Add the biased exponents of the two numbers, subtracting the bias from the sum to get the new biased exponent 2. Multiply the significands. Set the result sign to positive if operands have same sign, and negative otherwise 3. Normalize the product if necessary, shifting its significand right and incrementing the exponent 4. Round the significand to the appropriate number of bits, and renormalize if rounding generates a carry Biased Exponent Addition E Z = E X + E Y Bias Result sign S Z = S X xor S Y can be computed independently Since the operand significands.f X and.f Y are and < 2, their product is and < 4. To normalize product, we need to shift right by bit only and increment exponent Overflow or underflow? no Done yes Exception Rounding either truncates fraction, or adds a to least significant fraction bit Floating Point COE 38 Computer Architecture Muhamed Mudawar slide 3 5

Next... Floating-Point Numbers IEEE 754 Floating-Point Standard Floating-Point Addition and Subtraction Floating-Point Multiplication Extra Bits and Rounding MIPS Floating-Point Instructions Floating Point COE 38 Computer Architecture Muhamed Mudawar slide 3 Extra Bits to Maintain Precision Floating-point numbers are approximations for Real numbers that they cannot represent Infinite variety of real numbers exist between. and 2. However, exactly 2 23 fractions can be represented in SP, and Exactly 2 52 fractions can be represented in DP (double precision) Extra bits are generated in intermediate results when Shifting and adding/subtracting a p-bit significand Multiplying two p-bit significands (product can be 2p bits) But when packing result fraction, extra bits are discarded We only need few extra bits in an intermediate result Minimizing hardware but without compromising precision Floating Point COE 38 Computer Architecture Muhamed Mudawar slide 32 6

Guard Bit Guard bit: guards against loss of a significant bit Only one guard bit is needed to maintain accuracy of result Shifted left (if needed) during normalization as last fraction bit Example on the need of a guard bit:. 2 5. 2-2 (subtraction). 2 5. 2 5 (shift right 7 bits). 2 5 Guard bit do not discard. 2 5 (2's complement). 2 5 (add significands) +. 2 4 (normalized) Floating Point COE 38 Computer Architecture Muhamed Mudawar slide 33 Round and Sticky Bits Two extra bits are needed for rounding Just after normalizing a result significand Round bit: appears just after the normalized significand Sticky bit: appears after the round bit (OR of all additional bits) Reduce the hardware and still achieve accurate arithmetic As if result significand was computed exactly and rounded Consider the same example of previous slide:. OR-reduce 2 5. 2 5 (2's complement). 2 5 (sum) +. 2 4 (normalized) Round bit Sticky bit Floating Point COE 38 Computer Architecture Muhamed Mudawar slide 34 7

Four Rounding Modes Normalized result has the form:. f f 2 f l rs The round bit r and sticky bit s appear after the last fraction bit f l IEEE 754 standard specifies four modes of rounding Round to Nearest Even: default rounding mode Increment result if: r s = or (r s = and f l = ) Otherwise, truncate result significand to. f f 2 f l Round toward + : result is rounded up Increment result if sign is positive and r or s = Round toward : result is rounded down Increment result if sign is negative and r or s = Round toward : always truncate result Floating Point COE 38 Computer Architecture Muhamed Mudawar slide 35 Round following result using IEEE 754 rounding modes:. 2-7 Round Bit Sticky Round to Nearest Even: Bit Truncate result since r = Truncated Result:. 2-7 Round towards + : Truncate result since negative Round towards : Increment since negative and s = Incremented result:. 2-7 Renormalize and increment exponent (because of carry) Final rounded result:. 2-6 Round towards : Example on Rounding Truncate always Floating Point COE 38 Computer Architecture Muhamed Mudawar slide 36 8

Advantages of IEEE 754 Standard Used predominantly by the industry Encoding of exponent and fraction simplifies comparison Integer comparator used to compare magnitude of FP numbers Includes special exceptional values: NaN and ± Special rules are used such as: / is NaN, sqrt( ) is NaN, / is, and / is Computation may continue in the face of exceptional conditions Denormalized numbers to fill the gap Between smallest normalized number. 2 E min and zero Denormalized numbers, values.f 2 E min, are closer to zero Gradual underflow to zero Floating Point COE 38 Computer Architecture Muhamed Mudawar slide 37 Floating Point Complexities Operations are somewhat more complicated In addition to overflow we can have underflow Accuracy can be a big problem Extra bits to maintain precision: guard, round, and sticky Four rounding modes Division by zero yields Infinity Zero divide by zero yields Not-a-Number Other complexities Implementing the standard can be tricky See text for description of 8x86 and Pentium bug! Not using the standard can be even worse Floating Point COE 38 Computer Architecture Muhamed Mudawar slide 38 9

Next... Floating-Point Numbers IEEE 754 Floating-Point Standard Floating-Point Addition and Subtraction Floating-Point Multiplication Extra Bits and Rounding MIPS Floating-Point Instructions Floating Point COE 38 Computer Architecture Muhamed Mudawar slide 39 MIPS Floating Point Coprocessor Called Coprocessor or the Floating Point Unit (FPU) 32 separate floating point registers: $f, $f,, $f3 FP registers are 32 bits for single precision numbers Even-odd register pair form a double precision register Use the even number for double precision registers $f, $f2, $f4,, $f3 are used for double precision Separate FP instructions for single/double precision Single precision: add.s, sub.s, mul.s, div.s (.s extension) Double precision: add.d, sub.d, mul.d, div.d (.d extension) FP instructions are more complex than the integer ones Take more cycles to execute Floating Point COE 38 Computer Architecture Muhamed Mudawar slide 4 2

FP Arithmetic Instructions Instruction add.s fd, fs, ft add.d fd, fs, ft sub.s fd, fs, ft sub.d fd, fs, ft mul.s fd, fs, ft mul.d fd, fs, ft div.s fd, fs, ft div.d fd, fs, ft sqrt.s fd, fs sqrt.d fd, fs abs.s fd, fs abs.d fd, fs neg.s fd, fs neg.d fd, fs Meaning (fd) = (fs) + (ft) (fd) = (fs) + (ft) (fd) = (fs) (ft) (fd) = (fs) (ft) (fd) = (fs) (ft) (fd) = (fs) (ft) (fd) = (fs) / (ft) (fd) = (fs) / (ft) (fd) = sqrt (fs) (fd) = sqrt (fs) (fd) = abs (fs) (fd) = abs (fs) (fd) = (fs) (fd) = (fs) x x x x x x x x x x x x x x Format fs 5 fs 5 fs 5 fs 5 fs 5 fs 5 2 2 3 3 4 4 5 5 7 7 Floating Point COE 38 Computer Architecture Muhamed Mudawar slide 4 FP Load/Store Instructions Separate floating point load/store instructions lwc: load word coprocessor ldc: load double coprocessor swc: store word coprocessor sdc: store double coprocessor Instruction lwc $f2, 4($t) ldc $f2, 4($t) swc $f2, 4($t) sdc $f2, 4($t) Meaning ($f2) = Mem[($t)+4] ($f2) = Mem[($t)+4] Mem[($t)+4] = ($f2) Mem[($t)+4] = ($f2) Better names can be used for the above instructions l.s = lwc (load FP single), s.s = swc (store FP single), x3 x35 x39 x3d General purpose register is used as the base register Format $t $f2 im 6 = 4 $t $f2 im 6 = 4 $t $f2 im 6 = 4 $t $f2 im 6 = 4 l.d = ldc (load FP double) s.d = sdc (store FP double) Floating Point COE 38 Computer Architecture Muhamed Mudawar slide 42 2

FP Data Movement Instructions Moving data between general purpose and FP registers mfc: move from coprocessor (to general purpose register) mtc: move to coprocessor (from general purpose register) Moving data between FP registers mov.s: move single precision float mov.d: move double precision float = even/odd pair of registers Instruction mfc $t, $f2 mtc $t, $f2 mov.s $f4, $f2 mov.d $f4, $f2 Meaning ($t) = ($f2) ($f2) = ($t) ($f4) = ($f2) ($f4) = ($f2) x x x x 4 Format $t $f2 $t $f2 $f2 $f2 $f4 $f4 6 6 Floating Point COE 38 Computer Architecture Muhamed Mudawar slide 43 FP Convert Instructions Convert instruction: cvt.x.y Convert to destination format x from source format y Supported formats Single precision float =.s (single precision float in FP register) Double precision float =.d (double float in even-odd FP register) Signed integer word =.w (signed integer in FP register) Instruction cvt.s.w fd, fs cvt.s.d fd, fs cvt.d.w fd, fs cvt.d.s fd, fs cvt.w.s fd, fs cvt.w.d fd, fs Meaning to single from integer to single from double to double from integer to double from single to integer from single to integer from double x x x x x x Format fs 5 fs 5 fs 5 fs 5 fs 5 fs 5 x2 x2 x2 x2 x24 x24 Floating Point COE 38 Computer Architecture Muhamed Mudawar slide 44 22

FP Compare and Branch Instructions FP unit (co-processor ) has a condition flag Set to (false) or (true) by any comparison instruction Three comparisons: equal, less than, less than or equal Two branch instructions based on the condition flag Instruction c.eq.s fs, ft c.eq.d fs, ft c.lt.s fs, ft c.lt.d fs, ft c.le.s fs, ft c.le.d fs, ft bcf Label bct Label Meaning cflag = ((fs) == (ft)) cflag = ((fs) == (ft)) cflag = ((fs) <= (ft)) cflag = ((fs) <= (ft)) cflag = ((fs) <= (ft)) cflag = ((fs) <= (ft)) branch if (cflag == ) branch if (cflag == ) x x x x x x x x 8 8 Format im 6 im 6 x32 x32 x3c x3c x3e x3e Floating Point COE 38 Computer Architecture Muhamed Mudawar slide 45 Example : Area of a Circle.data pi:.double 3.45926535897924 msg:.asciiz "Circle Area = ".text main: ldc $f2, pi # $f2,3 = pi li $v, 7 # read double (radius) syscall # $f, = radius mul.d $f2, $f, $f # $f2,3 = radius*radius mul.d $f2, $f2, $f2 # $f2,3 = area la $a, msg li $v, 4 # print string (msg) syscall li $v, 3 # print double (area) syscall # print $f2,3 Floating Point COE 38 Computer Architecture Muhamed Mudawar slide 46 23

Example 2: Matrix Multiplication void mm (int n, double x[n][n], y[n][n], z[n][n]) { for (int i=; i!=n; i=i+) for (int j=; j!=n; j=j+) { double sum =.; for (int k=; k!=n; k=k+) sum = sum + y[i][k] * z[k][j]; x[i][j] = sum; } } Matrices x, y, and z are n n double precision float Matrix size is passed in $a = n Array addresses are passed in $a, $a2, and $a3 What is the MIPS assembly code for the procedure? Floating Point COE 38 Computer Architecture Muhamed Mudawar slide 47 Matrix Multiplication Procedure /3 Initialize Loop Variables mm: addu $t, $, $ # $t = i = ; for st loop L: addu $t2, $, $ # $t2 = j = ; for 2 nd loop L2: addu $t3, $, $ # $t3 = k = ; for 3 rd loop sub.d $f, $f, $f # $f = sum =. Calculate address of y[i][k] and load it into $f2,$f3 Skip i rows (i n) and add k elements L3: multu $t, $a # i*size(row) = i*n mflo $t4 # $t4 = i*n addu $t4, $t4, $t3 # $t4 = i*n + k sll $t4, $t4, 3 # $t4 =(i*n + k)*8 addu $t4, $a2, $t4 # $t4 = address of y[i][k] ldc $f2, ($t4) # $f2 = y[i][k] Floating Point COE 38 Computer Architecture Muhamed Mudawar slide 48 24

Matrix Multiplication Procedure 2/3 Similarly, calculate address and load value of z[k][j] Skip k rows (k n) and add j elements multu $t3, $a # k*size(row) = k*n mflo $t5 # $t5 = k*n addu $t5, $t5, $t2 # $t5 = k*n + j sll $t5, $t5, 3 # $t5 =(k*n + j)*8 addu $t5, $a3, $t5 # $t5 = address of z[k][j] ldc $f4, ($t5) # $f4 = z[k][j] Now, multiply y[i][k] by z[k][j] and add it to $f mul.d $f6, $f2, $f4 # $f6 = y[i][k]*z[k][j] add.d $f, $f, $f6 # $f = sum addiu $t3, $t3, # k = k + bne $t3, $a, L3 # loop back if (k!= n) Floating Point COE 38 Computer Architecture Muhamed Mudawar slide 49 Matrix Multiplication Procedure 3/3 Calculate address of x[i][j] and store sum multu $t, $a # i*size(row) = i*n mflo $t6 # $t6 = i*n addu $t6, $t6, $t2 # $t6 = i*n + j sll $t6, $t6, 3 # $t6 =(i*n + j)*8 addu $t6, $a, $t6 # $t6 = address of x[i][j] sdc $f, ($t6) # x[i][j] = sum Repeat outer loops: L2 (for j = ) and L (for i = ) addiu $t2, $t2, # j = j + bne $t2, $a, L2 # loop L2 if (j!= n) addiu $t, $t, # i = i + bne $t, $a, L # loop L if (i!= n) Return: jr $ra # return Floating Point COE 38 Computer Architecture Muhamed Mudawar slide 5 25