Floating Point Representation. CS Summer 2008 Jonathan Kaldor

Similar documents
Floating Point Arithmetic

Floating Point. The World is Not Just Integers. Programming languages support numbers with fraction

Floating-Point Data Representation and Manipulation 198:231 Introduction to Computer Organization Lecture 3

FLOATING POINT NUMBERS

Floating Point Numbers

15213 Recitation 2: Floating Point

Floating-point representations

Floating-point representations

Divide: Paper & Pencil

CS321 Introduction To Numerical Methods

Roundoff Errors and Computer Arithmetic

Floating-point Arithmetic. where you sum up the integer to the left of the decimal point and the fraction to the right.

Floating-Point Numbers in Digital Computers

Floating Point Numbers. Lecture 9 CAP

COMP2611: Computer Organization. Data Representation

Floating-Point Numbers in Digital Computers

ECE232: Hardware Organization and Design

Chapter Three. Arithmetic

Scientific Computing. Error Analysis

2 Computation with Floating-Point Numbers

fractional quantities are typically represented in computers using floating point format this approach is very much similar to scientific notation

2 Computation with Floating-Point Numbers

CS 261 Fall Floating-Point Numbers. Mike Lam, Professor.

CS 6210 Fall 2016 Bei Wang. Lecture 4 Floating Point Systems Continued

IEEE Standard for Floating-Point Arithmetic: 754

Scientific Computing: An Introductory Survey

Floating-point representation

CS 261 Fall Floating-Point Numbers. Mike Lam, Professor.

MAT128A: Numerical Analysis Lecture Two: Finite Precision Arithmetic

1.2 Round-off Errors and Computer Arithmetic

3.5 Floating Point: Overview

Finite arithmetic and error analysis

Computer Arithmetic Floating Point

Number Systems. Decimal numbers. Binary numbers. Chapter 1 <1> 8's column. 1000's column. 2's column. 4's column

Computational Economics and Finance

Chapter 3: Arithmetic for Computers

CO212 Lecture 10: Arithmetic & Logical Unit

CSCI 402: Computer Architectures. Arithmetic for Computers (3) Fengguang Song Department of Computer & Information Science IUPUI.

Signed umbers. Sign/Magnitude otation

Floating Point Arithmetic

EE 109 Unit 19. IEEE 754 Floating Point Representation Floating Point Arithmetic

Mathematical preliminaries and error analysis

Chapter 03: Computer Arithmetic. Lesson 09: Arithmetic using floating point numbers

Inf2C - Computer Systems Lecture 2 Data Representation

Classes of Real Numbers 1/2. The Real Line

Floating-Point Arithmetic

Number Systems Standard positional representation of numbers: An unsigned number with whole and fraction portions is represented as:

Computational Economics and Finance

Floating Point. CSE 351 Autumn Instructor: Justin Hsia

Computational Methods. Sources of Errors

CS 101: Computer Programming and Utilization

Floating Point. CSE 351 Autumn Instructor: Justin Hsia

EE260: Logic Design, Spring n Integer multiplication. n Booth s algorithm. n Integer division. n Restoring, non-restoring

IEEE-754 floating-point

CS321. Introduction to Numerical Methods

Floating Point. CSE 351 Autumn Instructor: Justin Hsia

Floating Point (with contributions from Dr. Bin Ren, William & Mary Computer Science)

Review of Calculus, cont d

Computer Architecture and IC Design Lab. Chapter 3 Part 2 Arithmetic for Computers Floating Point

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

2.1.1 Fixed-Point (or Integer) Arithmetic

Data Representation Floating Point

Computer Arithmetic Ch 8

Computer Arithmetic Ch 8

Floating Point Representation in Computers

Objectives. look at floating point representation in its basic form expose errors of a different form: rounding error highlight IEEE-754 standard

CS 61C: Great Ideas in Computer Architecture Floating Point Arithmetic

Lecture Notes: Floating-Point Numbers

Fixed Point. Basic idea: Choose a xed place in the binary number where the radix point is located. For the example above, the number is

Computer Systems C S Cynthia Lee

Computational Methods CMSC/AMSC/MAPL 460. Representing numbers in floating point and associated issues. Ramani Duraiswami, Dept. of Computer Science

What we need to know about error: Class Outline. Computational Methods CMSC/AMSC/MAPL 460. Errors in data and computation

Computer Architecture Chapter 3. Fall 2005 Department of Computer Science Kent State University

Number Systems CHAPTER Positional Number Systems

Number Representations

CS101 Lecture 04: Binary Arithmetic

Computational Methods CMSC/AMSC/MAPL 460. Representing numbers in floating point and associated issues. Ramani Duraiswami, Dept. of Computer Science

Scientific Computing: An Introductory Survey

Data Representation Floating Point

Floating point. Today! IEEE Floating Point Standard! Rounding! Floating Point Operations! Mathematical properties. Next time. !

CS 61C: Great Ideas in Computer Architecture Performance and Floating Point Arithmetic

ecture 25 Floating Point Friedland and Weaver Computer Science 61C Spring 2017 March 17th, 2017

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. What is scientific computing?

Numeric Encodings Prof. James L. Frankel Harvard University

Real Numbers. range -- need to deal with larger range, not just [not just -2^(n-1) to +2^(n-1)-1] - large numbers - small numbers (fractions)

Computational Methods CMSC/AMSC/MAPL 460. Representing numbers in floating point Error Analysis. Ramani Duraiswami, Dept. of Computer Science

Introduction to Computers and Programming. Numeric Values

Integers and Floating Point

Outline. What is Performance? Restating Performance Equation Time = Seconds. CPU Performance Factors

CS 61C: Great Ideas in Computer Architecture Performance and Floating-Point Arithmetic

Representing and Manipulating Floating Points. Jo, Heeseung

IEEE Standard 754 Floating Point Numbers

Floating Point January 24, 2008

unused unused unused unused unused unused

Chapter 2 Float Point Arithmetic. Real Numbers in Decimal Notation. Real Numbers in Decimal Notation

The Sign consists of a single bit. If this bit is '1', then the number is negative. If this bit is '0', then the number is positive.

EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing

Chapter 2. Data Representation in Computer Systems

Signed Multiplication Multiply the positives Negate result if signs of operand are different

Chapter 3. Errors and numerical stability

Transcription:

Floating Point Representation CS3220 - Summer 2008 Jonathan Kaldor

Floating Point Numbers Infinite supply of real numbers Requires infinite space to represent certain numbers We need to be able to represent numbers in a finite (preferably fixed) amount of space

Floating Point Numbers What do we need to consider when choosing our format? Space: using space efficiently Ease of use: representation should be easy to work with for adding/subtracting/ multiplying/comparisons

Floating Point Numbers What do we need to consider when choosing our format? Limits: want to represent very large numbers, very small numbers Precision: want to be able to represent neighboring but unequal numbers accurately

Precision, Precisely Two ways of looking at precision: Let x and y be two adjacent numbers in our representation, with x > y Absolute precision: x - y a.k.a. the epsilon of the number y Relative precision: x - y / x

What is a Floating Point Number? Examples: 7.423 x 10 3, 5.213 x 10-2, etc Floating point because the decimal moves around as exponent changes Finite length real numbers expressed in scientific notation Only need to store significant digits

What is a Fixed Point Number? Compare to fixed point: constant number of digits to left and right of decimal point Examples: 150000.000000 and 000000.000015 Fixed absolute precision, but relative precision can vary (very poor relative precision for small numbers)

Floating Point Systems β: Base or radix of system (usually either 2 or 10) p: precision of system (number of significant digits available) [L,U]: lower and upper bounds of exponent

Floating Point Systems Given β, p, [L,U], a floating point number is: ±(d0 + d1/β + d2/β 2 +... + dp-1/β p-1 ) x β e mantissa where 0 di β-1 L e U

Example Floating Point System Let β = 10, p = 4, L = -99, U = 99. Then numbers look like 4.560 x 10 03-5.132 x 10-26 This is a convenient system for exploring properties of floating point systems in general

Example Numbers What are the largest and smallest numbers in absolute value that we can represent in this system? 9.999 x 10 99 0.001 x 10-99 Note: we have shifted to denormalized numbers (first digit can be zero)

Example Numbers Lets write zero: 0.000 x 10 0... or 0.000 x 10 1,... or 0.000 x 10 10... or 0.000 x 10 x No longer have a unique representation for zero

Denormalization In fact, we no longer have a unique representation for many of our numbers 4.620 x 10 2 0.462 x 10 3 These are both the same number... almost We have lost information in the second representation, however

Normalized Numbers Usually best to require first digit in representation be nonzero Requires a special format for zero, now Double bonus: in binary (β=2), our normalized mantissa always starts with 1... can avoid writing it down

Some Simple Computations (1.456 x 10 03 ) + (2.378 x 10 01 ) 1.480 x 10 03 Note: we lose digits Note also: Answer depends on rounding strategy

Rounding Strategies Easiest method: chop off excess (a.k.a. round to zero) More precise method: round to nearest number. In case of a tie, round to nearest number with even last digit Examples: 2.449, 2.450, 2.451, 2.550, 2.551, rounded to 2 digits

Some Simple Computations (1.000 x 10 60 ) x (1.000 x 10 60 ) Number not representable in our system (exponent 120 too large) Denoted as overflow

Some Simple Computations (1.000 x 10-60 ) x (1.000 x 10-60 ) Number not representable in our system (exponent -120 too small) Denoted as underflow

Some Simple Computations 1.432 x 10 2-1.431 x 10 2 Answer is 0.001 x 10 2 = 1.000 x 10-1 However, we have lost almost all precision in the answer Example of catastrophic cancellation

Catastrophic Cancellation In general, adding/subtracting two nearlysimilar quantities is unadvisable. Avoiding it can be important for accuracy Consider the familiar quadratic formula...

Some Simple Computations 1.000 x 10 03-1.000 x 10 03 + 1.000 x 10-04 Answer depends on order of operations In real numbers, addition and subtraction are associative In floating point numbers, addition and subtraction are NOT associative

Unexpected Results Because of its peculiarities, some results in floating point are unexpected Take 1/n as n Unbounded in real numbers Finite in floating point (why?)

Precision As before, epsilon of a floating point number x is defined as the absolute precision of x and the next number in the floating point representation. distance to next representable number note: x is first converted to FP number Relative precision: depends on p (mostly) Exception: denormalized numbers

IEEE-754 Defines four floating point types (with β=2) but we re only interested in two of them Single precision: 32 bits of storage, with p = 24, L = -126, U = 127 Double precision: 64 bits of storage, with p = 53, L = -1022, U = 1023 One bit for sign

IEEE-754 Sign Exponent 8 or 11 bits Mantissa 23 or 52 bits

IEE-754 Single precision: Largest value: ~3.4028234 x 10 38 Smallest value: ~1.4012985 x 10-45 (2-149 ) Double precision Largest value: ~1.7976931 x 10 308 Smallest value: ~2.225 x 10-307 (2-1074 )

Mantissa Recall in β=2, our mantissa looks like 1.0101101 (we normalize it so that the first digit is always nonzero) First digit is then always 1... so we don t need to store it Gain an extra digit of precision (look back and see definition of p compared to actual bit storage)

Exponent Rather than have sign bit for exponent, or represent it in 2s complement, we bias the exponent by adding 127 (single) or 1023 (double) to the actual exponent i.e. if number is 1.0111 x 2 10, in single precision the exponent stored is 137

Exponent Why do we bias the exponent? Makes comparisons between floating point numbers easy: can do bitwise comparison

Example What is 29.34375 in single precision?

Zero Normalizing mantissa creates a problem for storing 0. To get around this, we reserve the smallest exponent (-127, which when biased is 0) to represent denormalized numbers (implicit digit is 0 instead of 1) Exponent 0 is otherwise considered like exponent 1 (in single precision, both are 2-126 )

Denormalized Numbers Thus, zero is the number consisting of all zeros in exponent and mantissa fields (can be signed) Nonzero mantissa: denormalized numbers Allows us to express numbers smaller than expected range, at reduced precision Graceful underflow

Special Numbers Similarly, the maximum exponent (exponent field of all 1 s) is reserved for special numbers If mantissa is all zero, then number is either + or - If mantissa is nonzero, then number is NaN (Not A Number)

Special Numbers If x is a finite number ± x = - ± x = - ±x / 0 = ± (x!= 0) ± / 0 = ± ±x / ± = ±0 ±0 / ±0 = NaN ± / ± = NaN Any computation with NaN NaN

Overflow and Underflow If computation underflows, result is 0 of appropriate sign If computation overflows, result is of appropriate sign Can be an issue, but catastrophic cancellation / precision issues usually far more important

Example of System with Floating Point Error (Demo, also part of HW3)