Review of Calculus, cont d

Similar documents
Roundoff Errors and Computer Arithmetic

CS321. Introduction to Numerical Methods

fractional quantities are typically represented in computers using floating point format this approach is very much similar to scientific notation

CS321 Introduction To Numerical Methods

Floating-Point Numbers in Digital Computers

Floating-Point Numbers in Digital Computers

2 Computation with Floating-Point Numbers

Section 1.4 Mathematics on the Computer: Floating Point Arithmetic

Mathematical preliminaries and error analysis

Computational Methods. Sources of Errors

2 Computation with Floating-Point Numbers

Scientific Computing: An Introductory Survey

MAT128A: Numerical Analysis Lecture Two: Finite Precision Arithmetic

Floating-point representation

1.2 Round-off Errors and Computer Arithmetic

1.3 Floating Point Form

Finite arithmetic and error analysis

ME 261: Numerical Analysis. ME 261: Numerical Analysis

Floating-Point Arithmetic

Classes of Real Numbers 1/2. The Real Line

Computer Arithmetic Floating Point

Divide: Paper & Pencil

Review Questions 26 CHAPTER 1. SCIENTIFIC COMPUTING

Computing Basics. 1 Sources of Error LECTURE NOTES ECO 613/614 FALL 2007 KAREN A. KOPECKY

Floating Point Representation. CS Summer 2008 Jonathan Kaldor

Chapter 3. Errors and numerical stability

Truncation Errors. Applied Numerical Methods with MATLAB for Engineers and Scientists, 2nd ed., Steven C. Chapra, McGraw Hill, 2008, Ch. 4.

3.1 DATA REPRESENTATION (PART C)

2.1.1 Fixed-Point (or Integer) Arithmetic

Computational Mathematics: Models, Methods and Analysis. Zhilin Li

Chapter 3: Arithmetic for Computers

ECE232: Hardware Organization and Design


EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing

Floating Point Representation in Computers

Accuracy versus precision

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. What is scientific computing?

Scientific Computing. Error Analysis

Objectives. look at floating point representation in its basic form expose errors of a different form: rounding error highlight IEEE-754 standard

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

Computational Economics and Finance

Real Numbers finite subset real numbers floating point numbers Scientific Notation fixed point numbers

Floating point numbers in Scilab

Chapter 03: Computer Arithmetic. Lesson 09: Arithmetic using floating point numbers

Scientific Computing: An Introductory Survey

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Floating-Point Arithmetic

Lecture 03 Approximations, Errors and Their Analysis

Lecture Notes: Floating-Point Numbers

Introduction to Computers and Programming. Numeric Values

Floating-point representations

Floating-point representations

Numerical Methods 5633

Numerical Computing: An Introduction

Introduction to floating point arithmetic

Hani Mehrpouyan, California State University, Bakersfield. Signals and Systems

CHAPTER 2 SENSITIVITY OF LINEAR SYSTEMS; EFFECTS OF ROUNDOFF ERRORS

Most nonzero floating-point numbers are normalized. This means they can be expressed as. x = ±(1 + f) 2 e. 0 f < 1

Numerical computing. How computers store real numbers and the problems that result

Computer Arithmetic. 1. Floating-point representation of numbers (scientific notation) has four components, for example, 3.

Floating Point Considerations

Floating-Point Arithmetic

COMP2611: Computer Organization. Data Representation

Computational Economics and Finance

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

Chapter 2. Data Representation in Computer Systems

Floating Point. The World is Not Just Integers. Programming languages support numbers with fraction

Floating Point Numbers

Floating Point Arithmetic

Module 2: Computer Arithmetic

Lecture Objectives. Structured Programming & an Introduction to Error. Review the basic good habits of programming

Number Systems. Both numbers are positive

Independent Representation

Introduction to Scientific Computing Lecture 1

CS101 Lecture 04: Binary Arithmetic

AM205: lecture 2. 1 These have been shifted to MD 323 for the rest of the semester.

Chapter 4. Operations on Data

What we need to know about error: Class Outline. Computational Methods CMSC/AMSC/MAPL 460. Errors in data and computation

Operations On Data CHAPTER 4. (Solutions to Odd-Numbered Problems) Review Questions

Data Representation Floating Point

Computational Methods CMSC/AMSC/MAPL 460. Representing numbers in floating point and associated issues. Ramani Duraiswami, Dept. of Computer Science

Floating point. Today! IEEE Floating Point Standard! Rounding! Floating Point Operations! Mathematical properties. Next time. !

Errors in Computation

Introduction to Numerical Computing

MATH 353 Engineering mathematics III

Real Numbers finite subset real numbers floating point numbers Scientific Notation fixed point numbers

Process model formulation and solution, 3E4 Computer software tutorial - Tutorial 3

Data Representation Floating Point

Outline. 1 Scientific Computing. 2 Approximations. 3 Computer Arithmetic. Scientific Computing Approximations Computer Arithmetic

COMPUTER ARCHITECTURE AND ORGANIZATION. Operation Add Magnitudes Subtract Magnitudes (+A) + ( B) + (A B) (B A) + (A B)

Basics of Computation. PHY 604:Computational Methods in Physics and Astrophysics II

Bits, Words, and Integers

Natural Numbers and Integers. Big Ideas in Numerical Methods. Overflow. Real Numbers 29/07/2011. Taking some ideas from NM course a little further

4 Operations On Data 4.1. Foundations of Computer Science Cengage Learning

Physics 331 Introduction to Numerical Techniques in Physics

Math 340 Fall 2014, Victor Matveev. Binary system, round-off errors, loss of significance, and double precision accuracy.

LECTURE 0: Introduction and Background

ECE 2020B Fundamentals of Digital Design Spring problems, 6 pages Exam Two Solutions 26 February 2014

15213 Recitation 2: Floating Point

Transcription:

Jim Lambers MAT 460/560 Fall Semester 2009-10 Lecture 4 Notes These notes correspond to Sections 1.1 1.2 in the text. Review of Calculus, cont d Taylor s Theorem, cont d We conclude our discussion of Taylor s Theorem with some examples that illustrate how the nthdegree Taylor polynomial P n (x) the remainder R n (x) can be computed for a given function f(x). Example If we set n = 1 in Taylor s Theorem, then we have f(x) = P 1 (x) + R 1 (x) P 1 (x) = f(x 0 ) + f (x 0 )(x x 0 ). This polynomial is a linear function that describes the tangent line to the graph of f at the point (x 0, f(x 0 )). If we set n = 0 in the theorem, then we obtain f(x) = P 0 (x) + R 0 (x), P 0 (x) = f(x 0 ) R 0 (x) = f (ξ(x))(x x 0 ), ξ(x) lies between x 0 x. If we use the integral form of the remainder, R n (x) = x x 0 f (n+1) (s) (x s) n ds, n! then we have x f(x) = f(x 0 ) + f (s) ds, x 0 which is equivalent to the Total Change Theorem part of the Fundamental Theorem of Calculus. Using the Mean Value Theorem for integrals, we can see how the first form of the remainder can be obtained from the integral form. 1

Example Let f(x) = sin x. Then f(x) = P 3 (x) + R 3 (x), P 3 (x) = x x3 3! = x x3 6, R 3 (x) = 1 4! x4 sin ξ(x) = 1 24 x4 sin ξ(x), ξ(x) is between 0 x. The polynomial P 3 (x) is the 3rd Maclaurin polynomial of sin x, or the 3rd Taylor polynomial with center x 0 = 0. If x [ 1, 1], then R n (x) = 1 24 x4 sin ξ(x) = 1 24 x4 sin ξ(x) 1 24, since sin x 1 for all x. This bound on R n (x) serves as an upper bound for the error in the approximation of sin x by P 3 (x) for x [ 1, 1]. Example Let f(x) = e x. Then f(x) = P 2 (x) + R 2 (x), P 2 (x) = 1 + x + x2 2, R 2 (x) = x3 6 eξ(x), ξ(x) is between 0 x. The polynomial P 2 (x) is the 2nd Maclaurin polynomial of e x, or the 2nd Taylor polynomial with center x 0 = 0. If x > 0, then R 2 (x) can become quite large, as its magnitude is much smaller if x < 0. Therefore, one method of computing e x using a Maclaurin polynomial is to use the nth Maclaurin polynomial P n (x) of e x when x < 0, n is chosen sufficiently large so that R n (x) is small for the given value of x. If x > 0, then we instead compute e x using the nth Maclaurin polynomial for e x, which is given by P n (x) = 1 x + x2 2 x3 6 + + ( 1)n x n, n! then obtaining an approximation to e x by taking the reciprocal of our computed value of e x. 2

Example Let f(x) = x 2. Then, for any real number x 0, f(x) = P 1 (x) + R 1 (x), P 1 (x) = x 2 0 + 2x 0 (x x 0 ) = 2x 0 x x 2 0, R 1 (x) = (x x 0 ) 2. Note that the remainder does not include a mystery point ξ(x) since the 2nd derivative of x 2 is only a constant. The linear function P 1 (x) describes the tangent line to the graph of f(x) at the point (x 0, f(x 0 )). If x 0 = 1, then we have P 1 (x) = 2x 1, R 1 (x) = (x 1) 2. We can see that near x = 1, P 1 (x) is a reasonable approximation to x 2, since the error in this approximation, given by R 1 (x), would be small in this case. Roundoff Errors Computer Arithmetic In computing the solution to any mathematical problem, there are many sources of error that can impair the accuracy of the computed solution. The study of these sources of error is called error analysis, which will be discussed in the next lecture. In this lecture, we will focus on one type of error that occurs in all computation, whether performed by h or on a computer: roundoff error. This error is due to the fact that in computation, real numbers can only be represented using a finite number of digits. In general, it is not possible to represent real numbers exactly with this limitation, therefore they must be approximated by real numbers that can be represented using a fixed number of digits, which is called the precision. Furthermore, as we shall see, arithmetic operations applied to numbers that can be represented exactly using a given precision do not necessarily produce a result that can be represented using the same precision. It follows that if a fixed precision is used, then every arithmetic operation introduces error into a computation. Given that scientific computations can have several sources of error, one would think that it would be foolish to compound the problem by performing arithmetic using fixed precision. As we shall see in this lecture, however, using a fixed precision is far more practical than other options,, as long as computations are performed carefully, sufficient accuracy can still be achieved. 3

Floating-Point Numbers We now describe a typical system for representing real numbers on a computer. Definition (Floating-point Number System) Given integers β > 1, p 1, L, U L, a floating-point number system F is defined to be the set of all real numbers of the form x = ±mβ E. The number m is the mantissa of x, has the form p 1 m = d j β j, j=0 each digit d j, j = 0,..., p 1 is an integer satisfying 0 d i β 1. The number E is called the exponent or the characteristic of x, it is an integer satisfying L E U. The integer p is called the precision of F, β is called the base of F. The term floating-point comes from the fact that as a number x F is multiplied by or divided by a power of β, the mantissa does not change, only the exponent. As a result, the decimal point shifts, or floats, to account for the changing exponent. Nearly all computers use a binary floating-point system, in which β = 2. In fact, most computers conform to the IEEE stard for floating-point arithmetic. The stard specifies, among other things, how floating-point numbers are to be represented in memory. Two representations are given, one for single-precision one for double-precision. Under the stard, single-precision floating-point numbers occupy 4 bytes in memory, with 23 bits used for the mantissa, 8 for the exponent, one for the sign. IEEE double-precision floating-point numbers occupy eight bytes in memory, with 52 bits used for the mantissa, 11 for the exponent, one for the sign. Example Let x = 117. Then, in a floating-point number system with base β = 10, x is represented as x = (1.17)10 2, 1.17 is the mantissa 2 is the exponent. If the base β = 2, then we have x = (1.110101)2 6, 1.110101 is the mantissa 6 is the exponent. The mantissa should be interpreted as a string of binary digits, rather than decimal digits; that is, 1.110101 = 1 2 0 + 1 2 1 + 1 2 2 + 0 2 3 + 1 2 4 + 0 2 5 + 1 2 6 = 1 + 1 2 + 1 4 + 1 16 + 1 64 4

= 117 64 = 117 2 6. Properties of Floating-Point Systems A floating-point system F can only represent a finite subset of the real numbers. As such, it is important to know how large in magnitude a number can be still be represented, at least approximately, by a number in F. Similarly, it is important to know how small in magnitude a number can be still be represented by a nonzero number in F; if its magnitude is too small, then it is most accurately represented by zero. Definition (Underflow, Overflow) Let F be a floating-point number system. The smallest positive number in F is called the underflow level, it has the value UFL = m min β L, L is the smallest valid exponent m min is the smallest mantissa. The largest positive number in F is called the overflow level, it has the value OFL = β U+1 (1 β p ). The value of m min depends on whether floating-point numbers are normalized in F; this point will be discussed later. The overflow level is the value obtained by setting each digit in the mantissa to β 1 using the largest possible value, U, for the exponent. It is important to note that the real numbers that can be represented in F are not equally spaced along the real line. Numbers having the same exponent are equally spaced, the spacing between numbers in F decreases as their magnitude decreases. Normalization It is common to normalize floating-point numbers by specifying that the leading digit d 0 of the mantissa be nonzero. In a binary system, with β = 2, this implies that the leading digit is equal to 1, therefore need not be stored. Therefore, in the IEEE floating-point stard, p = 24 for single precision, p = 53 for double precision, even though only 23 52 bits, respectively, are used to store mantissas. In addition to the benefit of gaining one additional bit of precision, normalization also ensures that each floating-point number has a unique representation. One drawback of normalization is that fewer numbers near zero can be represented exactly than if normalization is not used. Therefore, the IEEE stard provides for a practice called gradual 5

underflow, in which the leading digit of the mantissa is allowed to be zero when the exponent is equal to L, thus allowing smaller values of the mantissa. In such a system, the number UFL is equal to β L p+1, as in a normalized system, UFL = β L. Rounding A number that can be represented exactly in a floating-point system is called a machine number. Since only finitely many real numbers are machine numbers, it is necessary to determine how nonmachine numbers are to be approximated by machine numbers. The process of choosing a machine number to approximate a non-machine number is called rounding, the error introduced by such an approximation is called roundoff error. Given a real number x, the machine number obtained by rounding x is denoted by fl(x). In most floating-point systems, rounding is achieved by one of two strategies: chopping, or rounding to zero, is the simplest strategy, in which the base-β expansion of a number is truncated after the first p digits. As a result, fl(x) is the unique machine number between 0 x that is nearest to x. rounding to nearest, or rounding to even sets fl(x) to be the machine number that is closest to x in absolute value; if two numbers satisfy this property, then fl(x) is set to be the choice whose last digit is even. Example Suppose we are using a floating-point system with β = 10 (decimal), with p = 4 significant digits. Then, if we use chopping, or rounding to zero, we have fl(2/3) = 0.6666, as if we use rounding to nearest, then we have fl(2/3) = 0.6667. If p = 2 we use rounding to nearest, then fl(89.5) = 90, because we choose to round so that the last digit is even when the original number is equally distant from two machine numbers. This is why the strategy of rounding to nearest is sometimes called rounding to even. 6