Computer Arithmetic Floating Point

Similar documents
Floating Point Arithmetic

Divide: Paper & Pencil

Floating Point. The World is Not Just Integers. Programming languages support numbers with fraction

Floating-Point Data Representation and Manipulation 198:231 Introduction to Computer Organization Lecture 3

Floating-point representations

Floating-point representations

Number Systems and Computer Arithmetic

Chapter 03: Computer Arithmetic. Lesson 09: Arithmetic using floating point numbers

FLOATING POINT NUMBERS

Operations On Data CHAPTER 4. (Solutions to Odd-Numbered Problems) Review Questions

Chapter 4. Operations on Data

ECE232: Hardware Organization and Design

Chapter Three. Arithmetic

3.5 Floating Point: Overview

Signed Multiplication Multiply the positives Negate result if signs of operand are different

Chapter 3: Arithmetic for Computers

Floating-point Arithmetic. where you sum up the integer to the left of the decimal point and the fraction to the right.

COMP2611: Computer Organization. Data Representation

Floating Point Numbers. Lecture 9 CAP

Numeric Encodings Prof. James L. Frankel Harvard University

Floating Point Numbers

Floating point. Today! IEEE Floating Point Standard! Rounding! Floating Point Operations! Mathematical properties. Next time. !

Module 2: Computer Arithmetic

Data Representation Floating Point

Introduction to Computers and Programming. Numeric Values

Floating Point (with contributions from Dr. Bin Ren, William & Mary Computer Science)

CO212 Lecture 10: Arithmetic & Logical Unit

Representing and Manipulating Floating Points

Floating Point Numbers

Floating Point Numbers

Inf2C - Computer Systems Lecture 2 Data Representation

Finite arithmetic and error analysis

Roundoff Errors and Computer Arithmetic

Data Representation Floating Point

4 Operations On Data 4.1. Foundations of Computer Science Cengage Learning

Chapter 2 Float Point Arithmetic. Real Numbers in Decimal Notation. Real Numbers in Decimal Notation

Foundations of Computer Systems

Representing and Manipulating Floating Points

Number Systems Standard positional representation of numbers: An unsigned number with whole and fraction portions is represented as:

Floating Point Puzzles. Lecture 3B Floating Point. IEEE Floating Point. Fractional Binary Numbers. Topics. IEEE Standard 754

The Sign consists of a single bit. If this bit is '1', then the number is negative. If this bit is '0', then the number is positive.

Number Systems. Both numbers are positive

Floating Point Representation. CS Summer 2008 Jonathan Kaldor

Up next. Midterm. Today s lecture. To follow

MIPS Integer ALU Requirements

Representing and Manipulating Floating Points. Jo, Heeseung

Floating Point. CSE 351 Autumn Instructor: Justin Hsia

C NUMERIC FORMATS. Overview. IEEE Single-Precision Floating-point Data Format. Figure C-0. Table C-0. Listing C-0.

Scientific Computing. Error Analysis

Floating Point January 24, 2008

Introduction to Computer Systems Recitation 2 May 29, Marjorie Carlson Aditya Gupta Shailin Desai

Floating Point : Introduction to Computer Systems 4 th Lecture, May 25, Instructor: Brian Railing. Carnegie Mellon

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

COMPUTER ARCHITECTURE AND ORGANIZATION. Operation Add Magnitudes Subtract Magnitudes (+A) + ( B) + (A B) (B A) + (A B)

Floating Point Puzzles. Lecture 3B Floating Point. IEEE Floating Point. Fractional Binary Numbers. Topics. IEEE Standard 754

Floating Point. CSE 351 Autumn Instructor: Justin Hsia

Integers and Floating Point

Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition. Carnegie Mellon

COMPUTER ORGANIZATION AND DESIGN

COMP Overview of Tutorial #2

Floating Point Puzzles The course that gives CMU its Zip! Floating Point Jan 22, IEEE Floating Point. Fractional Binary Numbers.

Computer Architecture and IC Design Lab. Chapter 3 Part 2 Arithmetic for Computers Floating Point

Systems I. Floating Point. Topics IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties

Floating Point Arithmetic. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Representing and Manipulating Floating Points

Review of Calculus, cont d

UNIT-III COMPUTER ARTHIMETIC

CS 61C: Great Ideas in Computer Architecture Floating Point Arithmetic

CSCI 402: Computer Architectures. Arithmetic for Computers (3) Fengguang Song Department of Computer & Information Science IUPUI.

Floating point. Today. IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties Next time.

VTU NOTES QUESTION PAPERS NEWS RESULTS FORUMS Arithmetic (a) The four possible cases Carry (b) Truth table x y

Organisasi Sistem Komputer

CS429: Computer Organization and Architecture

Lecture 13: (Integer Multiplication and Division) FLOATING POINT NUMBERS

Computer Architecture Chapter 3. Fall 2005 Department of Computer Science Kent State University

1.2 Round-off Errors and Computer Arithmetic

Chapter 2 Data Representations

COSC 243. Data Representation 3. Lecture 3 - Data Representation 3 1. COSC 243 (Computer Architecture)

ECE331: Hardware Organization and Design

ecture 25 Floating Point Friedland and Weaver Computer Science 61C Spring 2017 March 17th, 2017

IT 1204 Section 2.0. Data Representation and Arithmetic. 2009, University of Colombo School of Computing 1

4 Operations On Data 4.1. Foundations of Computer Science Cengage Learning

Data Representations & Arithmetic Operations

fractional quantities are typically represented in computers using floating point format this approach is very much similar to scientific notation

Representing and Manipulating Floating Points. Computer Systems Laboratory Sungkyunkwan University

Floating Point COE 308. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Giving credit where credit is due

CS321. Introduction to Numerical Methods

Systems Programming and Computer Architecture ( )

Floating-Point Numbers in Digital Computers

Giving credit where credit is due

CS 261 Fall Floating-Point Numbers. Mike Lam, Professor.

System Programming CISC 360. Floating Point September 16, 2008

Floating-Point Arithmetic

CPE300: Digital System Architecture and Design


Computer Arithmetic Ch 8

Computer Arithmetic Ch 8

Number Representations

CS 33. Data Representation (Part 3) CS33 Intro to Computer Systems VIII 1 Copyright 2018 Thomas W. Doeppner. All rights reserved.

Transcription:

Computer Arithmetic Floating Point Chapter 3.6 EEC7 FQ 25 About Floating Point Arithmetic Arithmetic basic operations on floating point numbers are: Add, Subtract, Multiply, Divide Transcendental operations (sine, cosine, log, exp, etc.) are synthesized from these 2 Floating Point Addition Floating Point Addition (cont.) Like addition using base scientific representation: Align decimal points Add Normalize the result Example: 9.998 x 2 + 5. x - --- Align (make smaller # same exp. as larger, why?): 9.998 x 2 +.5 x 2 Add: 9.998 x 2 +.5 x 2 ---.3 x 2 Normalize (integer part must be =>, <= 9 ).3 x 2 =.3 x 3 Observation: By hand, the precision is unlimited. For computer hardware, precision is limited to a fixed number of bits. --- 3 4 IEEE Floating Point Standard (754) IEEE Single precision format: 8 23 sign exponent mantissa exponent is biased by 27, mantissa has a hidden. Binary Floating Point Add Similar to add in base scientific, but must operate on standard floating point representation, use basic computer operations Example:.25 = (.) = (.) Align: compare exponents by subtracting Sign of result tells which is larger Magnitude of result tells how many places smaller must be moved B: A: - --- # B is larger by two (8 ten ) 5 6

Binary Floating Point Add (cont.) Shift smaller number to the, by magnitude of exponent subtraction, include hidden bit in shifting. Set smaller exponent equal to larger exponent (.) (.) Add mantissas (including hidden bits) (.) + (.) -------- (.) Binary Floating Point Add (cont.) Normalize the result (get hidden bit to be ) This example is already normalized, when would the result not be normalized? 7 8 Floating Point Subtract Alignment binary point, like addition Then algorithm for sign magnitude numbers takes over Negate the second operand, then add Must set sign bit of result to be consistent with outcome 9. Compare s. Compare s 2

. Compare s. Compare s 3 4. Compare s. Compare s computed fraction fraction 6. Normalize 5 6. Compare s. Compare s fraction 6. Normalize fraction 6. Normalize 7. Normalize 7. Normalize 8. Done 7 8

Floating Point Multiply Similar to multiply in base scientific: Multiply mantissas Add exponents Normalize Example: 3. x 2 + 5. x - 5. x -> >.5 x 2 9 Binary Floating Point Multiply Similar to base, but must deal with standard floating point representation Example (using only 4 mantissa bits): x ------- Multiply the mantissas, don t t forget the hidden s:. x. -> >. 2 Binary Floating Point Multiply (cont.) Add exponents: -------- now has double bias (one for each term), so subtract 27: - Compute the result sign bit XOR of operand sign bits Reconstruct the result. Binary Floating Point Multiply (cont.) Normalize the result Shift binary point if integer part is >, increment exponent:. Trim excess bits low order bits and trim hidden bit 2 22 Floating Point Division Dual of multiplication Divide mantissas (include hidden hits) Subtract exponents. Must add back in bias (27) because it has been eliminated Normalize result hidden bit must be may require left shifts, exponent decrement Trim excess bits Arithmetic Instructions MIPS processor has separate instructions for integer and floating point arithmetic Floating point operations are relatively slow, want to use integer if appropriate: Integer Add time unit Fl. Pt. Add Fl. Pt. Multiply Fl. Pt. Divide 2 time units 3 time units 2 time units 23 24

Floating Point Range: Maximum Floating point can represent very large numbers. The largest number is: (.) Maximum exponent is: 254-27 = 27 => weight of 2 27. Mantissa is: + /2 + /4 +... + /2 23 = 2 - /2 23 2 Thus the largest number 2 28 38 Do we need larger numbers? Probably not: Radius of the universe = age of the universe x speed of light = ( years) x ( 2.5 days/year) x ( 5 seconds/day) x ( 8.5 meters/sec) = 26 meters Bill Gates net worth is < dollars Floating Point Range: Minimum Floating point can represent very small positive numbers. Normally, the smallest positive number is: (.) Minimum exponent is: - 27 = -26 => weight of 2-26. Mantissa is: Thus the smallest number 2-26 -38 Do we need smaller numbers? Probably not: Mass of an electron: -27 grams But somebody was small minded: 25 26 Denormalized Numbers There is no hidden when the exponent is, i.e. the number is no longer normalized exponent has same meaning as, i.e., 2-26 Denormalized number can have a leading in any bit position, allowing even smaller numbers However smaller numbers have fewer bits of precision Smallest denormalized number is: (.) which represents 2-49 only has bit of precision -45 Denormalization complicates floating point hardware design, questionable usefulness 27 Overflow and Underflow When a result is larger than (.) overflow occurs, and the result becomes +infinity, which is represented as: -infinity is the same, but with a in the sign bit Operations with infinity will produce the expected result: # + infinity = infinity Underflow occurs when the result is smaller than the smallest denormalized number result becomes 28 Precision The set of all real numbers is infinite. The set of floating point numbers is finite, 2 32 We must map a range of real numbers onto a single floating point number The mapping cannot be precise, some precision is lost. How much? Integer Precision First, consider precision of mapping real numbers to binary integers:............... The real numbers between two integers must be mapped to one of those integers. Simplest method is truncation, discard the fractional part:............... 29 3

Truncation Precision With truncation, assuming real numbers are randomly distributed between integers, what is the expected loss in precision for a given number? Loss............... Cumulative Truncation Error Adding up a large list of numbers can quickly result in significant cumulative error. What is the expected roundoff error for adding a large list? What is the maximum roundoff error? The fastest computers can add more than 9 numbers/second, => cumulative error can reach 32-bits (4 x 9 ) in a few seconds 3 32 Precision With rounding to nearest integer, assuming real numbers are randomly distributed between integers, what is the magnitude of the expected loss in precision for a given number?............... Cumulative Roundoff Error What is the expected cumulative roundoff error when adding up a large list of numbers? What is the maximum roundoff error? Loss - 33 34 Floating Point Precision Just as the precision of an integer is relative to the weight of the LSB (), the precision of a floating point number is relative to the weight of the LSB LSB weight is determined by the exponent LSB weight is 2-23 times 2 exp precision ranges from -.5 to.5 LSB for rounding, to LSB for truncation Double Precision Floating Point If computers can accumulate errors so quickly, what to do? Larger representation, 2 words = 64 bits IEEE double precision format: 52 sign exponent mantissa exponent is biased by 23, mantissa has a hidden. 35 36

Cumulative Double Precision Error Even the fastest computers will take a while to accumulate enough error to run out of precision using double precision FP ( 9 operations/sec) x ( 5 seconds/day) x (.5 days/month) = 5.5 operations/month = 2 52 operations/month => a couple months before the max error magnitude approaches magnitude of mantissa Double Precision Range Double precision allows much larger/smaller numbers due to larger exponent 23 2 23 37 For comparison, there are an estimated 7 atoms in the universe Increased precision is much more important than increased range. Supercomputers have used quad precision floating point (28-bit) to avoid error accumulation for huge computations. 37 38 Methods There are various methods for rounding, all of which are useful for certain situations Truncation (rounding toward zero) to nearest toward +infinity:.22 -> >.3.28 -> >.3-2.8 -> -2.8-2.89 -> -2.8 toward -infinity:.22 -> >.2.28 -> >.2-2.8 -> -2.9-2.89 -> -2.9 39