Chapter 3. Errors and numerical stability

Similar documents
Mathematical preliminaries and error analysis

Roundoff Errors and Computer Arithmetic

Finite arithmetic and error analysis

Computational Methods. Sources of Errors

Computing Basics. 1 Sources of Error LECTURE NOTES ECO 613/614 FALL 2007 KAREN A. KOPECKY

2.1.1 Fixed-Point (or Integer) Arithmetic

Divide: Paper & Pencil


Class 5. Data Representation and Introduction to Visualization

Scientific Computing: An Introductory Survey

Data Representation and Introduction to Visualization

Introduction to Computers and Programming. Numeric Values

Floating Point Arithmetic

Section 1.4 Mathematics on the Computer: Floating Point Arithmetic

Chapter 03: Computer Arithmetic. Lesson 09: Arithmetic using floating point numbers

2 Computation with Floating-Point Numbers

1.2 Round-off Errors and Computer Arithmetic

CS321. Introduction to Numerical Methods

Chapter 3: Arithmetic for Computers

Review of Calculus, cont d

MAT128A: Numerical Analysis Lecture Two: Finite Precision Arithmetic

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

Number Systems. Both numbers are positive

Floating-Point Numbers in Digital Computers

Floating-Point Numbers in Digital Computers

Floating Point Representation in Computers

3.1 DATA REPRESENTATION (PART C)

Floating-point representations

Floating-point representations

Review Questions 26 CHAPTER 1. SCIENTIFIC COMPUTING

Physics 331 Introduction to Numerical Techniques in Physics

Module 2: Computer Arithmetic

Numerical Computing: An Introduction

2 Computation with Floating-Point Numbers

Floating-Point Arithmetic

Floating-Point Arithmetic

Errors in Computation

Floating Point Numbers

Chapter 4. Operations on Data

Classes of Real Numbers 1/2. The Real Line

Lecture Objectives. Structured Programming & an Introduction to Error. Review the basic good habits of programming

CS321 Introduction To Numerical Methods

COMPUTER ARCHITECTURE AND ORGANIZATION. Operation Add Magnitudes Subtract Magnitudes (+A) + ( B) + (A B) (B A) + (A B)

Floating Point Representation. CS Summer 2008 Jonathan Kaldor

Computer Organisation CS303

Binary floating point encodings

COMP2611: Computer Organization. Data Representation

CHAPTER V NUMBER SYSTEMS AND ARITHMETIC

Computer Arithmetic Floating Point

AM205: lecture 2. 1 These have been shifted to MD 323 for the rest of the semester.

fractional quantities are typically represented in computers using floating point format this approach is very much similar to scientific notation

Computational Economics and Finance

Scientific Computing: An Introductory Survey

ECE 2020B Fundamentals of Digital Design Spring problems, 6 pages Exam Two Solutions 26 February 2014

Scientific Computing. Error Analysis

ECE 2030B 1:00pm Computer Engineering Spring problems, 5 pages Exam Two 10 March 2010

FLOATING POINT NUMBERS

(Refer Slide Time: 02:59)

IT 1204 Section 2.0. Data Representation and Arithmetic. 2009, University of Colombo School of Computing 1

Floating-point representation

Digital Logic. The Binary System is a way of writing numbers using only the digits 0 and 1. This is the method used by the (digital) computer.

Chapter Three. Arithmetic

ECE 2030D Computer Engineering Spring problems, 5 pages Exam Two 8 March 2012

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

CSCI 402: Computer Architectures. Arithmetic for Computers (3) Fengguang Song Department of Computer & Information Science IUPUI.

ECE232: Hardware Organization and Design

The ALU consists of combinational logic. Processes all data in the CPU. ALL von Neuman machines have an ALU loop.

Introduction to Numerical Computing

Floating Point. The World is Not Just Integers. Programming languages support numbers with fraction

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. What is scientific computing?

Hani Mehrpouyan, California State University, Bakersfield. Signals and Systems

Data Representation in Computer Memory

Reals 1. Floating-point numbers and their properties. Pitfalls of numeric computation. Horner's method. Bisection. Newton's method.

Accuracy versus precision

Number Representations

Number Systems and Computer Arithmetic

Truncation Errors. Applied Numerical Methods with MATLAB for Engineers and Scientists, 2nd ed., Steven C. Chapra, McGraw Hill, 2008, Ch. 4.

Outline. 1 Scientific Computing. 2 Approximations. 3 Computer Arithmetic. Scientific Computing Approximations Computer Arithmetic

Floating-point Arithmetic. where you sum up the integer to the left of the decimal point and the fraction to the right.

Computational Economics and Finance

Floating-Point Arithmetic

Operations On Data CHAPTER 4. (Solutions to Odd-Numbered Problems) Review Questions

Number Systems and Binary Arithmetic. Quantitative Analysis II Professor Bob Orr

Computer Arithmetic. 1. Floating-point representation of numbers (scientific notation) has four components, for example, 3.

CHAPTER 2 SENSITIVITY OF LINEAR SYSTEMS; EFFECTS OF ROUNDOFF ERRORS

ECE 2020B Fundamentals of Digital Design Spring problems, 6 pages Exam Two 26 February 2014

CS 265. Computer Architecture. Wei Lu, Ph.D., P.Eng.

Floating Point (with contributions from Dr. Bin Ren, William & Mary Computer Science)

CHAPTER 5: Representing Numerical Data

unused unused unused unused unused unused

Introduction to floating point arithmetic

Lecture 03 Approximations, Errors and Their Analysis

Floating Point Numbers

Floating Point Numbers

MIPS Integer ALU Requirements

Floating-point numbers. Phys 420/580 Lecture 6

Data Representation Floating Point

UNIT - I: COMPUTER ARITHMETIC, REGISTER TRANSFER LANGUAGE & MICROOPERATIONS

Inf2C - Computer Systems Lecture 2 Data Representation

COSC 243. Data Representation 3. Lecture 3 - Data Representation 3 1. COSC 243 (Computer Architecture)

Transcription:

Chapter 3 Errors and numerical stability 1 Representation of numbers Binary system : micro-transistor in state off 0 on 1 Smallest amount of stored data bit Object in memory chain of 1 and 0 10011000110101001111010010100010 Byte 8 bits

On most micro-computers and workstations : word 4 bytes 32 bits double precision word 8 bytes 64 bits operations in double precision On certain super-computers (Cray) : word 8 bytes 64 bits double precision on Cray quadruple precision on a PC or workstation!

Integers An integer is stored in a word 4 bytes 32 bits In n-bit systems : i = s n 1 2 n 1 + s n 2 2 n 2 +... + s 2 2 2 + s 1 2 1 + s 0 2 0 s k = 0 or 1 [s n 1, s n 2... s 2, s 1, s 0 ] stored in memory Largest representable integer : 2 n 1 In a 32-bit system : 2 32 1 = 4 294 976 295 If n > 2 32 1 overflow

Signed integers What about negative integers? Binary arithmetic on words of fixed length arithmetic on a finite cyclic group Add 1 to the largest representable integer : 111...111 + 1 = 1 000...000 First 1 lost 0 In a n-bit system : addition is defined modulo 2 n

Binary representation of a negative integer? Inverse each bit (0 1 and 1 0), then add 1 Two s complement representation Examples : in a 3-bit system Integer 101 Add 010 111 Add 001 1000 = 0 Opposite of 101 010 + 001 = 011

Examples : in an 8-bit system -2 : 11111110-1 : 11111111 0 : 00000000 1 : 00000001 2 : 00000010 Conclusions The first bit indicates the sign of the integer : 1. if first bit 0 : integer > 0 (0 i 2 n 1 1) 2. if the first bit is 1 : integer <0 ( 2 n 1 i < 0)

Real numbers Floating point representation Scientific notation 0.6022 10 24 1. base (10) 2. exponent (24) 3. mantissa (0.6022)

In computers, base 2 only exponent and mantissa are stored x = s 2 e s sign (0 for positive, 1 for negative) e exponent (integer) f 1 = 1 for x 0 f 1 = 0 for x = 0 f k = 0 or 1 for k > 1 p f k 2 k Single precision real one word = 32 bits k=1 sign exponent mantissa

8-bit exponent 128 e 127 (2 7 = 128) 23-bit mantissa 1 2 M 1 ( 1 2) 23 0.99999988 Upper limit : 23 M = 0.1111... 111 = 2 k = k=1 ( 1 2 1 ( 1 2 1 1 2 ) 23 ) = 1 ( ) 23 1 2 Largest representable real : ( 1 Smallest non-zero representable real ( ) ) 23 1 2 127 10 38 2 1 2 2 128 10 39

In single precision, accuracy on real numbers 6 digits (1/2) 23 10 7 In double precision, real number 2 words = 64 bits 11-bit exponent 52-bit mantissa 10 308 x 10 308 2 210 1 = 2 1023 Accuracy 15 digits N.B. : Number of bits in exponent range Number of bits in mantissa precision

2 Consequences 1. Every number is not necessarily representable Rounding error Example : (0.1) 10 = (0.0001100110011...) 2 2. ɛ > 0 smallest representable number If 0 < x < ɛ x replaced by 0 Underflow Example : If a > 2 ɛ a 1 < 2 ɛ = 0 a 1 a = 0

3. Non-homogeneous distribution of real numbers Higher density near 0 Example : 16-bit system with 7-bit mantissa, 8-bit exponent Distance between 2 successive values of mantissa (0.0000001) 2 = 2 7 (0.0078) 10 Small exponent (e.g. -100) distance between 2 successive reals in floating point representation 0.0000001 2 2 100 0.0078 10 30 = 7.8 10 33 Large exponent (e.g. +100) distance between 2 successive reals in floating point representation 0.0000001 2 2 100 7.8 10 27

3 Numerical errors Arithmetic with a finite number of digits x approximation of x Absolute error ɛ a = x x Relative error ɛ r = ɛ a /x = x/x 1 Possible errors Example : Distributivity and associativity do not necessarily hold!!

Examples associativity of multiplication in 2-digit representation : (0.56 0.65) 0.54 = 0.36 0.54 = 0.19 0.56 (0.65 0.54) = 0.56 0.35 = 0.20 associativity of addition in 6-digit representation : (0.243875 10 6 +0.412648 10 1 ) 0.243826 10 6 = 0.243879 10 6 0.243826 10 6 = 0.000053 10 6 = 0.530000 10 2 (0.243875 10 6 0.243826 10 6 )+0.412648 10 1 = 0.000049 10 6 +0.412648 10 1 = 0.531265 10 2 distributivity in 6-digit representation : (0.152755 0.152732)/0.910939 30 = 0.252487 10 26 0.152755/0.910939 30 0.152732/0.910939 30 = 0.167690 10 30 0.167664 10 30 = 0.260000 10 26

3 classes of errors 1. Initial error 2. Truncation error 3. Rounding error Examples of truncation error Calculate e x using e x = 1 + x + x2 2! + x3 3! + x4 4! + O(x5 ) O(x n ) number of the same order as x n Calculate defined integral using quadrature rule b a dxf(x) n w i f(x i ) i=1

A numerical calculation comprises many steps! Examples : Suppose x = x + δ, ȳ = y + η z = x y z = ( x ȳ) = x + δ (y + η) + ɛ ɛ rounding error z = z + δ η + ɛ Depending on the sign of δ, η, ɛ, the error can be large or small. If x et y are almost equal, z is almost zero and the error is large z = x/y z = ( x/ȳ) = x + δ y + η + ɛ z z + 1 y δ x y 2 η + ɛ Error on z includes errors from x and y rounding error ɛ Si y is small, the error is large

It is important to understand how errors can propagate in a calculation 1. Small variations in initial data can give rise to big differences in final results ill-conditioned problem Examples : weather forecast, butterfly effect 2. Truncation errors usually depend on a parameter N if N : calculated solution exact solution Reduce truncation error by choosing N larger, might not be practical 3. Rounding errors accumulate randomly, and often cancel each other In certain cases, they can increase rapidly instability

Examples : Calculate e 1/3 in a 4-digit representation ( exact value = 1.39561242508608951) 1 3 0.3333 initial error = 0.000033333... Propagated error e 0.3333 e 1/3 = e 0.3333 (1 e 0.00003333... ) 0.0000465196 Calculate e x using expansion e x = 1 + x + x2 2! + x3 3! + x4 4! + O(x5 ) with x = 0.3333. If O(x 5 ) terms are neglected, truncation error ( ) 0.3333 5 + 0.33336 + 0.0000362750 5! 6! Summing truncated quantities 1 + 0.3333 + 0.0555 + 0.0062 + 0.0005 = 1.3955 In a 10-digit representation : result = 1.3955296304

Important source of rounding errors : Cancellation error in particular when two very close numbers are subtracted Example : In a 3-digit representation 1 15 1 16 = 0.667 10 1 0.625 10 1 = 0.420 10 2 Exact value : 0.417 10 2

Example : Solve x 2 178x + 2 = 0 x ± = b ± b 2 4ac 2a b 2 >> 4ac x implies the subtraction of two very close numbers x + = 1.779887634 10 2 x = 1.123665 10 2 Compare the number of significant digits! Improve accuracy by using x + x = c/a : x = 2 = 1.123666439 10 2 1.779887634 102

Example : Recurrence relations Particularly sensitive to propagations of initial and rounding errors Recurrence relation of Bessel function J n (x) J n+1 (x) = 2n x J n(x) J n 1 (x) If n > x, 2n/x > 1 multiplies errors in J n huge loss of accuracy For example, in a 6-digit representation J 0 (1) = 0.765198, J 1 (1) = 0.440051 J 7 (1) = 0.008605 instead of 0.000002!! Set J 8 (1) = 0, J 7 (1) = k, use backward recurrence relation and renormalise result using J 0 (x) + 2J 2 (x) + 2J 4 (x) +... = 1