CS321. Introduction to Numerical Methods

Similar documents
CS321 Introduction To Numerical Methods

Chapter 03: Computer Arithmetic. Lesson 09: Arithmetic using floating point numbers

Review of Calculus, cont d

Roundoff Errors and Computer Arithmetic

Mathematical preliminaries and error analysis

Divide: Paper & Pencil

Floating Point Arithmetic

Section 1.4 Mathematics on the Computer: Floating Point Arithmetic

Scientific Computing. Error Analysis

in this web service Cambridge University Press

Floating-Point Data Representation and Manipulation 198:231 Introduction to Computer Organization Lecture 3

Number Systems and Binary Arithmetic. Quantitative Analysis II Professor Bob Orr

MACHINE LEVEL REPRESENTATION OF DATA

Classes of Real Numbers 1/2. The Real Line

2 Computation with Floating-Point Numbers

Floating-point Arithmetic. where you sum up the integer to the left of the decimal point and the fraction to the right.

Number Systems CHAPTER Positional Number Systems

Lecture Objectives. Structured Programming & an Introduction to Error. Review the basic good habits of programming

2 Computation with Floating-Point Numbers

Floating-Point Numbers in Digital Computers

Floating-Point Numbers in Digital Computers

Number Systems. Both numbers are positive

fractional quantities are typically represented in computers using floating point format this approach is very much similar to scientific notation

Computational Methods. Sources of Errors

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

COE 202- Digital Logic. Number Systems II. Dr. Abdulaziz Y. Barnawi COE Department KFUPM. January 23, Abdulaziz Barnawi. COE 202 Logic Design

Hani Mehrpouyan, California State University, Bakersfield. Signals and Systems

1.2 Round-off Errors and Computer Arithmetic

Finite arithmetic and error analysis

Floating-point representation

Operations On Data CHAPTER 4. (Solutions to Odd-Numbered Problems) Review Questions

COMP Overview of Tutorial #2

COE 202: Digital Logic Design Number Systems Part 2. Dr. Ahmad Almulhem ahmadsm AT kfupm Phone: Office:

IT 1204 Section 2.0. Data Representation and Arithmetic. 2009, University of Colombo School of Computing 1

Errors in Computation

Computational Mathematics: Models, Methods and Analysis. Zhilin Li

ECE232: Hardware Organization and Design

Internal Data Representation

Introduction to Computers and Programming. Numeric Values

2 Number Systems 2.1. Foundations of Computer Science Cengage Learning

Computational Economics and Finance

Floating Point. The World is Not Just Integers. Programming languages support numbers with fraction

Floating Point Representation. CS Summer 2008 Jonathan Kaldor

Chapter 3. Errors and numerical stability

Scientific Computing: An Introductory Survey

Chapter 2. Data Representation in Computer Systems

CHAPTER 2 SENSITIVITY OF LINEAR SYSTEMS; EFFECTS OF ROUNDOFF ERRORS

CHAPTER 5: Representing Numerical Data

1.3 Floating Point Form

Exponential Numbers ID1050 Quantitative & Qualitative Reasoning

Positional notation Ch Conversions between Decimal and Binary. /continued. Binary to Decimal

CS101 Lecture 04: Binary Arithmetic

Numbers and Computers. Debdeep Mukhopadhyay Assistant Professor Dept of Computer Sc and Engg IIT Madras

Computational Economics and Finance

Chapter 3: Arithmetic for Computers

Computer Arithmetic Floating Point

Chapter 4 Section 2 Operations on Decimals

CHAPTER 2 Number Systems

ECE 2030B 1:00pm Computer Engineering Spring problems, 5 pages Exam Two 10 March 2010

Module 2: Computer Arithmetic

Number Systems and Computer Arithmetic

CHAPTER V NUMBER SYSTEMS AND ARITHMETIC

Objectives. look at floating point representation in its basic form expose errors of a different form: rounding error highlight IEEE-754 standard

Number Systems. Binary Numbers. Appendix. Decimal notation represents numbers as powers of 10, for example

Review Questions 26 CHAPTER 1. SCIENTIFIC COMPUTING

3.1 DATA REPRESENTATION (PART C)

Floating-Point Arithmetic

Definition. A Taylor series of a function f is said to represent function f, iff the error term converges to 0 for n going to infinity.

MAT128A: Numerical Analysis Lecture Two: Finite Precision Arithmetic

ME 261: Numerical Analysis. ME 261: Numerical Analysis

1. NUMBER SYSTEMS USED IN COMPUTING: THE BINARY NUMBER SYSTEM

Lecture (01) Digital Systems and Binary Numbers By: Dr. Ahmed ElShafee

ECE 2030D Computer Engineering Spring problems, 5 pages Exam Two 8 March 2012

CHW 261: Logic Design

Signed umbers. Sign/Magnitude otation

Unit 7 Number System and Bases. 7.1 Number System. 7.2 Binary Numbers. 7.3 Adding and Subtracting Binary Numbers. 7.4 Multiplying Binary Numbers

Floating Point Representation in Computers

These are reserved words of the C language. For example int, float, if, else, for, while etc.

ECE 2020B Fundamentals of Digital Design Spring problems, 6 pages Exam Two Solutions 26 February 2014

Computer Organisation CS303

Accuracy versus precision

Review of Number Systems

Data Representation Floating Point

Objectives. Connecting with Computer Science 2

Lecture 03 Approximations, Errors and Their Analysis

Numeral Systems. -Numeral System -Positional systems -Decimal -Binary -Octal. Subjects:

ECE 2020B Fundamentals of Digital Design Spring problems, 6 pages Exam Two 26 February 2014

Data Representation Floating Point

9/3/2015. Data Representation II. 2.4 Signed Integer Representation. 2.4 Signed Integer Representation

Introduction to Scientific Computing Lecture 1

FLOATING POINT NUMBERS

COSC 243. Data Representation 3. Lecture 3 - Data Representation 3 1. COSC 243 (Computer Architecture)

CS 101: Computer Programming and Utilization

Computer Arithmetic. 1. Floating-point representation of numbers (scientific notation) has four components, for example, 3.

D I G I T A L C I R C U I T S E E

Floating Point Arithmetic

unused unused unused unused unused unused

Decimal Binary Conversion Decimal Binary Place Value = 13 (Base 10) becomes = 1101 (Base 2).

Chapter 2. Positional number systems. 2.1 Signed number representations Signed magnitude

Chapter Three. Arithmetic

Transcription:

CS31 Introduction to Numerical Methods Lecture 1 Number Representations and Errors Professor Jun Zhang Department of Computer Science University of Kentucky Lexington, KY 40506 0633 August 5, 017

Number in Different Bases Humans use number base 10 for reasons of the number of fingers Computers use number base for reasons of on off switch A number based on 8 can be used to facilitate conversion between the base and the base 10 Base 10 Expansion: 7631 1 0 The general formula for a base 10 integer is 1 10 0 6 10 3 300 10 1 7 10 6000 4 3 10 70000 a n a n 1 a a 1 a 0 a 0 10 0 a 1 10 1 a 10 a n 1 10 n 1 a n 10 n

Binary Numbers 3

Number System 4

Definitions There are some definitions for representing a real number in base 10 and in base 5

Nonintegers and Fractions An example of a fractional number 0.631 6 10 610 110 3 100 1 4 1000 310 1 10000 10 3 The general formula for a decimal fraction 1 3 0. b1bb3 b1 10 b 10 b3 10 Decimal fractional numbers can be repeating or nonrepeating (rational or irrational numbers) ( a a a a. b b b ) n n1 1 0 n k ak 10 k0 k1 1 b k 3 10 k 10 6

Number Conversion 7

Base β Numbers Convert an integer number from base 8 system to base 10 system (763 General formula ) 8 8 8 0 3 8 8(3 8(6 8 3 (3 499 51 (0.9746 ) 1 10 6 8 8(6 (8(7))) 8(7 ))) 7 8 (3994 ) 10 Convert a fractional number between 0 and 1 1 (0.763 ) 7 8 6 8 3 8 ( a n a n k ak k0 k1 n1 a1a0. b1b b3 ) b k k 3 3 Convert integer and fractional parts separately 8

From system α to system β with α < β Conversion between Systems express (N) α in nested form using powers of α replace each digit by corresponding base β number carry out indicated arithmetic in base β Two examples were given in the previous page Note that one needs to convert the integer and the fractional parts separately The method is in theory can be used to convert a number between any two bases. But the third step is not easy for humans if the base is not 10 9

Division Algorithm Use remainder quotient split method to convert an integer, best if α > β 1. Divide the representation by β;. Take the remainder as the next digit in the base β representation; 3. Repeat the procedure on the quotient. This procedure is easier to carry out by hand Note that the first digit obtained is the one closest to the radix point )) ) ( ( ( ) ( ) ( 1 0 0 1 1 m m m c c c c c c c c N 10

Repeated Division To convert a base 10 integer number to a base β<10 number, we can use repeated division algorithm 576 3 40 / 8 5 / 8 So we have / 8 0 5 40 3 It can be further converted to a binary number using the binary octal table (in a later slide) ( 576.) / 8 ( 576.) remainder remainder remainder remainder (500 10.) 8 Note that the first digit you obtain is the first digit next to the radix point 0 0 5 10 (500.) 8 (101 000 010 000.) 0 0 5 11

Multiplication Algorithm Use integer fraction split process for converting a fractional part x k 1 c k - k (0. c 1 cc3 ) 1.) multiply the (fractional) number by β;.) take the integer part as the first (next) digit; 3.) repeat the process on the remaining fractional part. Again, the first digit obtained is the one closest to the radix point Terminating fractional number may become non terminating in different base systems, and vice versa 1

Integer Fraction Split To convert a fractional base 10 number to a base β<10 number, we can use the integer fractional split algorithm So we have 0.37 0.744 0 0.744 1.488 1 0.488 0.976 0 0.976 1.95 1 0.95 1.904 1 0.904 1.808 1 ( 0.37 ) 10 (0.010111...) 13

Use Intermediate Base We can convert a base 10 number to a base 8 (octal) number, then to base (binary) number, and vice versa Binary octal table Groups of three binary digits can be translated to an octal number using the binary octal table (110 101 001.011 100 111) = (651.347) 8 = (?) 10 (57.31) 10 = (?) 8 = (?) 14

Base 16 Numbers Certain computers with word length being multiples of 4 use hexadecimal system, need A, B, C, D, E, and F to represent 10, 11, 1, 13, 14, and 15. Binary hexadecimal table (010 111 100.110 100 101 111) = (1011 1100.1101 0010 1111) = (BC.DF)16 15

Definitions There are some definitions for representing a real number in base 10 and in base 16

Write a number in the form of Normalized Scientific Notation 718. 359 0. 718359 A normalized floating point number has the form 4 10 where d 1 0,n is an integer x. d d d 10 0 1 3 n In a simple notation 1 n x r 10 r 1 10 r is called normalized mantissa and n is the exponent. Also for binary representation x q m 1 q 1 17

Machine Numbers The real numbers that can be represented exactly in a computer are called the machine numbers for this computer Most real numbers are not machine numbers If a computer has word length of the form 0.d 1 d d 3 d 4, then 0.1011 is a machine number, but 0.10101 is not Machine numbers are machine dependent. The use of normalized floating point numbers creates a phenomenon of hole at zero, a bunch of numbers close to 0 are not representable This is mainly caused by the under flow problem, i.e., small numbers close to zero will be treated as zero in a computer. (0.0100 cannot be stored in the above mentioned computer, why?) 18

A 3 bit Machine Single precision floating point numbers with 3 bit word length How to store a floating point number with 3 bits? Sign of q needs 1 bit Integer m needs 8 bits Number q needs 3 bits m q Single precision IEEE standard floating point format s c17 ( 1) (1. f ) 0<c<55 19

3 bit Representation The exponent number c is actually stored, so m = c 17, this is an excess 17 code (make sure only positive numbers are stored, 0< c <55) With normalized representation, the first bit is always 1 and needs not be stored. The mantissa actually contains 4 binary digits with a hidden bit With a mantissa of 3 bits, a machine can have about six significant decimal 3 7 digits of accuracy, since 1.10 The smallest positive number є such that +ε is called the machine epsilon or unit roundoff error 3 Academic definition 1.0 non academic definition 4 1.0 For single precision, For double precision, 1.10.10 7 16 5.9610 1.1110 8 16 0

3 bit Representation IEEE 3 bit single precision 1

64 bit Representation IEEE 3 bit single precision and 64 bit double precision For each integer, single precision allocates 31 bit, double precision allocates 61 bits

The exponent c is stored, with More About the Exponent 0 < < 55 = (11 111 111) The actual range of exponent is 16 17 17 =0 is reserved for the special number ±0, and =55is for ± This strategy avoids the need to handle sign for the exponent The largest number can be stored 3.4 10 The smallest number can be stored 1. 10 3

Patriot Missile Defense System 4

How to represent a real number x? Representation Procedure 1.) if x is zero, stored it by a full word of zero bits, (a possible sign bit).) for nonzero x, first consider sign bit and then consider x 3.) convert both integer and fractional parts of x from decimal to octal, then to binary 4.) one plus normalize ( x ) by shifting the binary point 5.) find the 4 bit 1 plus normalized mantissa 6.) find the exponent of by setting it equal to c-17 and determine c 7.) denote the 3 bit representation as 8 hexadecimal digits 5

A 3 bit single precision pattern Summary and Examples can be interpreted as the real number Examples b 1bb3 b9b10b11 b3 Find the 3 bit representation of -5.34375. Integer part (5.) 10 = (64.) 8 = (110 100.) Fractional part (.34375) 10 = (.17) 8 = (.001 111) b ( b b b ) 17 3 9 ( 1) 1 (1. b b b 10 11 3 ) 6

Examples cont. 5.34375 10 (110 100.001111) (1.101 000 011110) 5 The exponent is (5) 10, we need to write it as c - 17 = 5 so c = 13 The stored exponent is ( 13) The representation of -5.34375 is (1 10 000 100 101 000 011 110 000 000 000 00) = (1100 0010 0101 0000 1111 0000 0000 0000) =(C50F000) 16 What number has the representation (45DE4000) 16? 10 (04) 8 (10 000 100) 7

Errors in Representing Numbers Non machine numbers are represented by a nearest machine number in computer Additional digits will be truncated Correct rounding and roundoff error For a 3 bit single precision machine with 3 bits for mantissa, the relative error in correct rounding is 3 The unit roundoff error is 1 4 For a computer with a 4 binary digit word length, the number 0.11101011 will be stored as 0.1110 for rounding off, and 0.1111 for correct rounding 8

Arithmetic Operations Most computers use double length arithmetic operations. Numbers are extended to double length, arithmetic operations are performed, and the result is rounded to a single length number 0.131 + 0.3456 10 0.1310000 + 0.00034560 0.1344560 On a machine with 4 digit word length, this number needs to be rounded to 0.134 resulting in an error 0.00004560 9

Notation fl(x) Use fl(x) to denote the floating point machine number corresponding to a real number x fl(x) = x( 1 + δ), For the 3 bit machine (with correct rounding), we have It is easy to see that if є < -3, then fl(1 + є) = 1 Note that the numbers are smaller than ε. It has more than 3 zeros. 30

Inverse Error Analysis More generally known as backward error analysis Denote as one of the basic arithmetic operations, then fl(x y) = (x y)(1 + δ), Two interpretations fl(z) = (x + y)(1 + δ) perturbation of sum fl(z) = x(1 + δ) + y(1 + δ) sum of perturbations Direct error analysis and reverse error analysis Forward error analysis and backward error analysis 4 31

Loss of Significance Subtraction of two nearly equal numbers may result in loss of significant digits on a finite precision machine Cure: reprogram or use higher precision arithmetic, may be costly Use a 4 word length computer: 0.11111111 0.11101010 = 0.00010101 = 0.1010 10 0.1111 0.1110 = 0.00010000 = 0.1000 10 Loss of Precision Theorem Let x and y be normalized floating point machine numbers with x > y > 0. If p q 1 y/ x for some positive integers p and q, then at most p and at least q significant digits are lost in the subtraction x - y 3

Loss of Significance Since y<x, the computer has to shift y before carrying out the subtraction so Look at the mantissa Hence, to normalize the representation of x y, a shift of at least q bits to the left is necessary. Thus at least q (spurious) zeros are supplied on the right hand side of the mantissa ) ) (1 ) (1 p q n m n m r x y r r s r s r ( n n m s y ) ( 1, 1 where,, s r s y r x m n n n m s r y x ) ( 33

x = 37.59361 and y = 37.58416 y 1 0.000501754, x An Example from Book which is between -1 = 0.00044 and -11 = 0.000488. At least 11 and at most 1 binary digits are lost when computing x - y Exactly how many digits lost depends on a computer x = 0.3759361 10 and y = 0.3758416 10 Suppose a machine has 5 decimal digits of accuracy We have ~ x 0.3759410 and ~ y 0.3758410, The machine computes ~ ~ 3 x y 0.00010 0.1000010 34

An Example Cont. Exact computation The relative error is x y = 0.00009396 = 0.9396 10-4 ~ ~ x y x y x y 0.60410 0.939610 5 4 0.0648 This relative error is considered to be large since the machine has 5 decimal digits of accuracy Note that the double length computing operations are carried out after the numbers are stored If we use a machine with at least 8 decimal digits of accuracy, we can have the exact value as 0.93960000 10-4 35

Avoiding Loss of Significance Analyze possible loss of significance, reschedule computation to avoid subtraction of two nearly equal numbers, modify algorithm Example. Evaluating f ( x) 1 1 Using 5 decimal digit arithmetic for x = 10-3, we have f(x) = 0 Rationalizing the numerator, we have x f ( x) x x 1 1 at x 0 computing (10 3 1 6 ) 1 1 0.10, and f ( x) 0.510. 36

More Examples Evaluating f ( x) x sin x at x 0 (. ) Using Taylor series for sin x sin x x 3 x 3! 5 x 5! 7 x 7! Then f ( x) Compute x = 0.1 with four decimal digit arithmetic, sin(0.1) = 0.9983 10-1. So x -sin x = 0.17 10-3. But x 3 /3! = 0.1667 10-3. This strategy is not good for large x. For x = π, x sin π = π, but x 3 /3! = 5.1677. Range reduction for periodic functions sin 3 x 3! 5 x 5! 7 x 7! 153.14 sin3.47 1994 sin3.47 37