CS321 Introduction To Numerical Methods

Size: px
Start display at page:

Download "CS321 Introduction To Numerical Methods"

Transcription

1 CS3 Introduction To Numerical Methods Fuhua (Frank) Cheng Department of Computer Science University of Kentucky Lexington KY

2 - - Table of Contents Errors and Number Representations 3 Error Types 3 Review of Taylor Series 3 3 Numbers on Computers 4 3 Representation of Numbers 4 3 Conversion 5 33 Floating-Point Number System 8 34 Arithmetic Operations 35 Round-off Errors 3 36 Control of Round-Off Errors 8 Numerical Linear Algebra System of Linear Equations Gaussian Elimination without Pivoting Gaussian Elimination with Partial Pivoting 5 3 Errors 8 4 Iterative Improvement 33 3 Eigenvalues and Eigenvectors 34 3 The Power Method 35 3 The QR Algorithm 38 4 Polynomial Interpolation 44 4 Lagrange Form 46 4 Newton Form 49 5 Solution of Nonlinear Equations 53 5 Solving a Single Nonlinear Equation 53 5 The Special Case of Polynomials 56 6 Numerical Integration 6 7 Piecewise Polynomial (Spline) Interpolation 64 7 Hermite Interpolation 64 7 Spline Interpolation 66 8 Numerical Differentiation 7 9 References 75 Index 76

3 - 3 - Error Types Errors and Number Representations Two major errors exist in numerical computations: truncation error and round-off error Truncation error is caused by the approximations used in the mathematical formula of the scheme For instance, when we use finite terms of the Taylor series to approximate a function, truncation error occurs Round-off error is caused by the finite-precision nature of a computer The number of digits that can be used to represent a number in a computer is finite In a lot of cases, we simply can not get a precise representation for a number in a computer Consequently, round-off error occurs To study these errors, one needs to know how Taylor series are used to derive numerical schemes and how numbers are stored and operated in a computer Review of Taylor Series A function f (x ) is said to be analytic at x a if f (x ) can be represented by a power series in powers of (x a ) within a radius of convergence, D > x a > The power series is given by f (x ) f (a ) + (x a )*f (a ) + (x a ) *f (a ) +! i (x a ) i *f (i ) (a ) () i! The Taylor series of a function about x is called the Maclaurin series The following are some familiar examples of Taylor series: e x x + x x +! 3! x 3 + x 5 + sinx x 3! 5! lnx (x ) i x + x 4 + cosx! 4! i + x + x + x 3 + x (x ) + i x i ( a, D ) () i! ( ) i x i + ( a, D ) (3) (i +)! ( ) i x i ( a, D ) (4) (i )! i (x ) 3 3 i x i ( a, D ) (5) ( ) i (x ) i ( a, D ) (6) i Usually, one uses finite terms in the Taylor series to approximate a function This causes truncation error For instance, in (), if we use the first (n +) terms to approximate f (x ), we get an error term E n + defined as follows:

4 - 4 - E n + i n + The following theorem is useful in getting an estimate for E n + (x a ) i *f (i ) (a ) (7) Theorem (Taylor s Theorem) If the function f (x ) possesses continuous derivatives of order,,, (n +) in an open interval I (c, d ), then for any a in I, where x is any value in I and f (x ) n i i! i! (x a ) i *f (i ) (a ) + R n + (8) (x a ) n + R n + *f (n +) (ξ) (9) (n +)! for some ξ between a and x R n + is called the remainder Theorem shows that estimating the truncation error (7) can be achieved by estimating (9) as far as one can get an upper bound on the value of f (n +) (ξ) For instance, in (), if we use the first 5 terms to approximate the value of e x at x, the truncation error E 5 would be e + () + () + () () 4 4 () 5 E 5 *e ξ for some ξ between and By taking an upper bound for e ξ in the above equation for E 5, say 3, one gets the following estimate for the truncation error E Numbers Used in Computers 3 Representation of Numbers The number system that we are most familiar with is the so called decimal number system A positive number in a decimal number system can always be expressed as x (a n a n a b b )

5 - 5 - a n * n + a n * n + a * + b * + b * + () n i a i * i + i b i * i where a i and b i are integers between and 9 a n * n + a n * n + a * is called the integer part of x and b * + b * + is called the fractional part of x The fractional part could contain infinitely many terms is called the base of the decimal number system A negative decimal number is a positive decimal number with a negative sign in front of it However, it is not necessary to use as a base Any positive integer different from can be used as a base Actually, sometimes it is necessary to use positive integers different from as bases An obvious example is the base number system used in computers In general, if β > is chosen as a base, then a number x in the base β number system is expressed as x (a n a n a b b )β a n * β n + a n * β n + a * β + b * β + b * β + () n i a i * β i + i b i * β i where a i and b i are integers between and β Popular bases include, 8, and 6 (see Table ) When β is bigger than, it is usually convenient to use non-integer symbols to represent those a i and b i which are bigger than 9 For instance, when β 6, we use A, B, C, D, E and F to represent,,, 3, 4 and 5, respectively (see Table ) Examples of numbers in different number systems are shown below (57), (574), (), () (75) 8, (57) 8, (FC ) 6, (A 9C 75E ) 6 Table Popular number systems Base Number system Symbols binary, 8 octal,,,7 decimal,,,9 6 hexadecimal,,,9,a,b,c,d,e,f 3 Conversion Converting a number from one representation to another is necessary when the number is to be processed in a representation different from its input form The conversion is done for its integer part and fractional part separately

6 - 6 - To convert the integer part of a decimal number to a base β ( ) representation, simply divide the integer part repeatedly by β and take the remainders The repeated division stops when the quotient becomes zero For example, to convert (37) to a binary integer, we divide (37) by repeatedly until the quotient becomes zero The least significant digit in the binary representation is the first remainder, the second least significant digit is the second remainder,, etc Example (37) (??? ) Hence (37) () The above process follows from the observation that if (37) is expressed in the following binary form then by dividing both sides by, one gets (37) a n * n + a n * n + + a * + a * (37) an * n + a n * n + + a * + a q (a n * n + a n * n + + a * ) is an integer and a / is a fraction Therefore, q has to be the quotient and a has to be the remainder Similarly, by dividing q by again, one gets (a n * n + a n * n 3 + a * ) as the quotient and a as the quotient, etc To convert a base β ( ) integer to a decimal integer, simply evaluate its base β representation in the decimal number system Example (53) 8 (??? ) (53) 8 (5* 8 + 3* 8 + ) (346) In general, if (a n a n a a is given, its decimal equivalent can be evaluated as follows:

7 - 7 - t : a n ; for i : n downto do t : t* β + a i ; To convert the fractional part of a decimal number to a base β ( ) representation, simply multiply the fractional part repeatedly by β and take the integer parts This process stops when the fractional part becomes zero However, the fractional part might never become zero In that case, the fractional part of the base β representation has infinitely many terms Example (75) (??? ) 75 * 5 5 * Hence (75) () Example () (??? ) * * 4 4 * 8 8 * 6 6 * * 4 4 * 8 8 * 6 6 * Hence () ( ) (Note that the process repeats itself after the 5th step) To convert the fractional part of a base β ( ) representation to a decimal representation, one can use a similar approach, ie repeatedly multiply the fractional part by and take the integer parts The multiplication has to be performed in the base β number system ( has to be converted to a base β representation) The integer part of each product is converted to a decimal integer first before being used in the decimal form Example () (??? ) * * Hence () (75)

8 - 8 - If the fractional part f in a base β representation has finite terms, say f (b b b n, then it can also be converted to a decimal form using a nested loop as follows s : b n /β ; for i : n downto do s : (b i + s )/β ; Example () (??? ) s 4 5 s 3 ( + 5)/ 5 s ( + 5)/ 65 s ( + 65)/ 85 Hence () (85)

9 Floating-Point Number System Numbers used in scientific computation are usually assigned a unit of memory called a word or a double word Each word consists of fixed number of digits For example, a word in a typical PC consists of 3 binary digits (bits) and a double word consists of 64 binary digits Numbers are usually stored in these digits in two modes: fixed-point (or, integer) number mode, and floating-point (or, real) number mode We will focus on the floating-point number mode only This is the representation scheme used by most computers (because it generates smaller relative errors, as will be seen later) A floating-point number system is characterized by 4 parameters (integers): β number base t number of digits L, U lower and upper bounds of the exponent and is denoted by F (β, t, L, U ) Each number in F (β, t, L, U ) is called a floating-point number and is expressed as ± (d d d t * β e with L e U and d i β but d β e is called the exponent and (± d d d t is called the mantissa The first digit of the mantissa is always non-zero, with one exception, is also a number of the floating-point number system but is expressed as +( )* β L with t zeros in the mantissa F (β, t, L, U ) contains (U L +)(β )β t + floating-point numbers The numbers are not uniformly spaced Since a floating-point number system contains only finite floating-point numbers, not every decimal number can be precisely represented by a floating-point number Given a decimal number x, if it can not be precisely represented by a floating-point number, a floating-point number closest to x is selected to represent x in the floating-point number system The process of replacing x by its nearest floating-point number is called rounding This floating-point number is called the floating-point representation of x in F (β, t, L, U ) and is denoted by f l (x ) The difference between x and f l (x ) is called the round-off error round-off error x f l (x ) The round-off error is an absolute error The relative round-off error is defined by relative round-off error x f l (x ) / x

10 - - Example F (,,, ) contains 3 floating-point numbers ± () * ± 3/ ± () * ± ± () * ± 3/4 ± () * ± / ± () * ± 3/8 ± () * ± /4 ± () * 3/8 3/8 3/ 3/4 / /4 /4 / 3/4 3/ Example If x (4) then f l (x ) in F (,,, ) is f l (x ) +() * /4 round-off error x f l (x ) relative round-off error x f l (x ) / x 466 5% Example If x (86554) then f l (x ) in F (, 4,, 3) is f l (x ) +(8655) * 3 round-off error x f l (x ) 4 relative round-off error x f l (x ) / x 464 5% Usually, floating-point number systems produce (relatively) large round-off error but smaller relative round-off error Smaller relative round-off errors are preferred than smaller round-off errors because the smaller the relative round-off error, the more the correct digits in f l (x ) The relationship between the number of correct digits in f l (x ) and the relative round-off error is stated in the following theorem Theorem Let f l (x ) be the floating-point representation of x in the floating-point number system F (β, t, L, U ) If there are m ( t ) zeros between the decimal point and the first non-zero digit in the relative round-off error of f l (x ), ie, x f l (x ) / x (r r with r i ; i,, m, but r m + then the number of correct digits in f l (x ) (starting from the left-most digit) is at least m Proof Let x (d d * β e and f l (x ) (g g g t * β e If the first n digits of f l (x ) are correct, ie, d i g i, i,, n, then

11 - - x f l (x ) ( h h * β e with n zeros between the decimal point and h Hence, Since x f l (x ) x (h h * β e n (d d * β e (h h * β n (d d ( (d d ( it follows that (h h * β n x f l (x ) (h h * β n x or ( h h x x f l (x ) ( h h with n zeros between the decimal point and h in the left term and n zeros between the decimal point and h in the right term That is, if the number of correct digits in f l (x ) is n then the number of zeros between the decimal points and the first non-zero digit in the relative round-off error is either n or n Therefore, if the number of zeros between the decimal point and the first non-zero digit in the relative round-off error is m then the correct digits in f l (x ) is either m or m + 34 Arithmetic Operations We assume a floating-point number system F (β, t, L, U ) is used in our computer Note that the way a floating-point number is stored in a word defines a unique floating-point number system For example, a floating-point number in a PC is stored in a word (single precision) as follows 8 3 sign bit exponent mantissa (8 bits) (3 bits) Therefore, the floating-point number system F (, 3, 8, 7) is used in a PC If a double word is used, the second word is usually used as a supplementary space for the mantissa Some computers use biased exponent For example, 7 bits can represent unsigned integers in the range 7 ( exponents 64 63); the number stored is the true exponent plus 64

12 - - Given a number x, the process of finding its floating-point representation in F (β, t, L, U ) is performed as follows The number is normalized, ie, finding the following representation, first x (a a a t a t + * β e ; a We shall assume that L e U The fraction is then replaced with a t -digit fraction by one of the following methods chopping : digits beyond a t are dropped rounding : taking the first t digits of (a a at at + + β/) If e < L or e > U then we say an underflow or overflow has occurred An overflow is a fatal error Generally, when an overflow occurs, the execution of the program stops If an underflow occurs, usually the computer simply sets x to zero without interrupting the program Example If F (β, t, L, U ) F (, 4, 4, 3) and x (86557) then f l (x ) (8655) * by chopping and Fl (x ) (8656) * by rounding Example If F (β, t, L, U ) F (, 4, 4,3) and x () then f l (x ) () * by chopping and f l (x ) () * by rounding Four arithmetic operations: addition (+), subtraction ( ), multiplication (* ), and division (/) are used in F (β, t, L, U ) The operations are usually performed in double-length work area (registers two words long) Addition/subtraction of two floating-point numbers is performed as follows: The smaller-in-magnitude is adjusted (shifted) first so that the exponent is the same as the larger one This process is called "unnormalization" The operation is performed, the result normalized if necessary and then chopped or rounded to the correct length Multiplication/division of two floating-point numbers is performed as follows: Multiply/divide the fractions, and add/subtract the exponents The fraction is normalized if necessary (the fractional part will usually be larger than can be stored in a word (or, double word)) and then chopped or rounded to the correct number of digits

13 - 3 - Example If F (β, t, L, U ) F (, 4, 4, 3), a () and b () * *, then f l (a +b )? first unnormalize b to get b () * then perform the addition * * * Hence, f l (a +b ) () * for both chopping and rounding Example For the same floating-point number system if a () * and b () * then f l (a +b )? No unnormalization is required Simply perform the addition * + * * Normalize to get () * Hence, f l (a +b ) () * by both chopping and rounding Example For a () * and b () * in F (, 4, 4, 3), f l (a*b )??? a b + * s complement of - Working register (6 bits) No normalization is required The result is then chopped or rounded to the correct number of digits Hence, f l (a*b ) () * by chopping f l (a*b ) () * by rounding 35 Round-off Errors An important fact about relative round-off error is: if F (β, t, L, U ) is the floating-point number system used in the computer then the relative round-off error is always less than or equal

14 - 4 - Example For the same floating-point number system, if a () and b () * * then f l (a /b )? a b / s complement of - Working register (6 bits) No normalization is required The result is then chopped or rounded to the correct number of digits Hence, f l (a /b ) () * 3 by chopping f l (a /b ) () * 3 by rounding to a constant ε defined as follows x f l (x t, ε β t x /, for chopping for rounding () The proof of () can be sketched as follows Let x (x x x t x t + * β r If chopping is used in the rounding process, we have f l (x ) (x x x t * β r Hence, x f l (x ) ( xt + xt + )β * βr β t * β r β r t Consequently, x f l (x ) x β r t β t β β r If rounding is used in the rounding process, define a and b as follows: a (x x x t * β r b (x x x t * β r + β r t It is easy to see that if x t + < β/ then f l (x ) a and x lies in the left half of the interval [a, b ] If x t + β/ then f l (x ) b and x lies in the right half of the interval [a, b ] Hence, in either case, x f l (x ) a b β r t

15 - 5 - But then x f l (x ) x β r t β t β β r Example For x 467 in F (, 4, 5, 5) If chopping is used, we have f l (x ) 46* and ε 3 x f l (x ) And indeed, 5648 x If rounding is used, we have f l (x ) 47* and ε 3 / x f l (x ) And indeed, x ε is usually referred to as machine precision, machine unit, or machine epsilon It can also be defined as the smallest, positive floating-point number x such that f l ( + x ) > (3) It is not difficult to prove that these two definitions of ε are equivalent In the following two examples we will show that the ε defined by () for the given floating-point number system is indeed the smallest, positive floating-point number that satisfies (3) Example For F (, 4, 5, 5) with chopping ε 3 and f l (+ε) f l () > If x is a positive number smaller than ε then x t for some t i 9, i,, t But then f l (+x ) f l (t t ) f l (t * t * ) Therefore, ε is the smallest positive number such that (3) is true δ f l (x ) x x ε plays an important role in estimating the error in floating-point arithmetic operations Note that if we set (4) then the floating-point representation of a given non-zero number x can be expressed in the following form

16 - 6 - Example For F (, 4, 5, 5) with rounding ε 3 and f l ( + ε) f l (5) > If x is a positive number smaller than ε then x t t for some t 4 and t i 9, i, 3, But then f l (+x ) f l (t t ) f l (t * t * ) (due to that t 4) Therefore, ε is the smallest positive number such that (3) is true f l (x ) x ( + δ) where δ ε (5) ie, the floating-point representation of x is actually a slight perturbation of x (if x is already a floating-point number then f l (x ) x ) x could be an expression If x a Θ b denotes the exact result of any of the four arithmetic operations (+,, *, /) on two floating-point numbers a and b, then following (5), we have where f l (a Θb ) (a Θ b )( + δ) δ f l (a Θb ) (a Θ b ) and δ ε a Θ b This shows that one can estimate the relative error generated in arithmetic operations using backward error analysis For example, given three floating-point numbers x, y, and z in the floating-point number system F (, 4, 64, 63), to estimate the relative round-off error in computing z* (x + y ), one compute (x + y ) first and produce f l (x + y ), then compute z*f l (x + y ) to produce f l (z*f l (x + y )) Note that 3, chopping and f l (x + y ) (x + y )( + δ ) for some δ where δ 4, f l (z* (x + y )) f l (z*f l (x + y )) (z*f l (x + y ))( + δ ) rounding for some δ which satisfies the same upper bounds as δ Consequently, f l (z* (x + y )) z (x + y )( + δ )( + δ ) z (x + y )( + δ + δ + δ δ ) Hence, f l (z (x + y )) z (x + y ) δ + δ + δ δ + 46, chopping z (x + y ) , rounding

17 - 7 - The second terms ( 46 and 48 ) on the right hand side of the above inequality can be ignored since their values are small compared with the first terms While holding in real number arithmetic, the following axioms do not hold in floating-point arithmetic () a + (b + c )) ((a + b ) + c ) () a * (b * c )) ((a * b ) * c ) (3) If a * b a * c and a then b c (4) a * (b + c ) a * b + a * c (5) a * (b / a ) b Examples which do not hold for the first three cases are shown below Examples which do not hold for the last two cases can be found in Sterbenz s book [4] Example For F (6, 6, 64, 63) with chopping, if a ( ) 6 * 6 63 b (55637) 6 * 6 63 and c (AD 9) 6 * 6 63 then f l (a + (b + c )) f l (a + f l (b + c )) (CF 558) 6 * 6 63 and f l ((a + b ) + c ) f l (f l (a + b ) + c ) (CCCCCC ) 6 * 6 63 Or, if a 6 6, b, and c then f l (a + (b + c )) 6 6 and f l ((a + b ) + c ) In either case, f l (a + (b + c )) f l ((a + b ) + c ) Example For F (, 6, 64, 63) with chopping, if a b + 5 and c * 5 then f l ((a * b ) * c ) f l (f l (a * b ) * c ) (999999) * and f l (a * (b * c )) f l (a * f l (b * c )) (999998) * Hence, f l ((a * b ) * c ) f l (a * (b * c )) Nevertheless, the following axioms hold in floating-point arithmetic () a + b b + a () a * b b * a (3) a + + a a (4) a * * a

18 - 8 - Example For F (, 6, 64, 63) with chopping, if a, b 9, and c then f l (a * b ) (8) * f l (a * c ) But obviously b c (5) a + ( a ) ( a ) + a 36 Control of Round-Off Errors Round-off error occurs in many occasions, such as when adding (subtracting) a very small number to (from) a very large number The most serious round-off error, however, occurs when a number is subtracted from another one of almost the same size In such a case, called catastrophic cancellation, one loses many significant digits in the result and, consequently, would get very large round-off error if the result is then used in other arithmetic operations Minimizing the effect of round-off errors is a major concern in handling floating-point arithmetic operations It can be achieved through improving the hardware and software features of the computers and through careful programming () Hardware features: use extra digits (bits) inside the arithmetic unit to get more accurate results () Software features: to design compilers so that addition is done before chopping or rounding of the result of a multiplication within a floating-point operation (flop) (3) Careful programming: use stable algorithms in floating-point arithmetic operations An algorithm is said to be stable if errors introduced into the calculation grow not too fast (means not at an exponential rate) as the computation proceeds The moral here is to avoid situations in which cancellation (subtraction), especially catastrophic cancellation, occurs Techniques that have been used include: grouping [Nakamura], changing variables, using Taylor expansions, and rewriting the equation If a subtraction is inevitable, then use double precision arithmetic The hardware and software features are machine-dependent The third approach, however, is under the complete control of us In the following, we will show examples that use the techniques listed in (3) to reduce round-off error In the case of solving an quadratic equation ax + bx + c for x, we know that x can be computed using the following formulas: x x b + b 4ac ; (6) a b b (7) a

19 - 9 - However, if the magnitude of b is much bigger than that of 4ac then either x or x will be poorly determined since, in this case, the magnitude of b would be very close to that of b 4ac and, consequently, one would get a catastrophic cancellation either in x or x For instance, if one is to solve the following quadratic equation in the floating-point number system F (, 5, 5, 5) with rounded arithmetic, x + x + since b 345 and b 4ac 34, it follows that If (6) is used to calculate x, one gets b 4ac 9 x + 9 The true value of x is 98 This results in a relative round-off error of about 9% Note that in the numerator of the above computation process of x, most of the significant digits cancel, only the least significant one is left to determine the result The other solution x does not have this problem Note that x 9 and the exact solution is 999 Hence, x is almost correct A better approach is to rationalize the numerator of x if b is positive and rationalize the numerator of x if b is negative, and use the new formulas to calculate x and x, as follows: b > x c ; x b + b 4ac b b 4ac (8) a (9) b x b + b 4ac ; x a c b + b 4ac For instance, in the above example, if x is computed using (8), we get () x The computed value is correct to five places now, with a relative round-off error of less than 8%

20 - - Given an x, one usually evaluates e x by adding up terms in () until some condition is satisfied, such as when the size of a term is smaller than a given tolerance, or simply by adding up certain number of terms in () Now consider the problem of evaluating e 55 in the floating-point number system F (, 5, 5, 5) with rounded arithmetic If we use the sum of the first 5 terms in () to approximate e x, then by adding up the first 5 terms in () for x 55, we get 6363 for e 55 But actually e The reason for this is that in the process of calculating the sum, errors that occur are emphasized by cancellation, see the computation process shown below We may use more digits However, this is more costly n x n /n! x n /n! A better approach is to use e x + ( x ) + ( x ) +! () ( x ) 3 + 3! to evaluate e x if x is negative Note that if x is negative then there is no cancellation involved when () is used to evaluate e x Indeed, with (), we get 4865 for e 55, a result that is correct to four places, in only terms Other examples are shown below Example Evaluate f (x ) x sinx when x sinx is close to x when x is close to zero To avoid cancellation, one should use (3) to replace sinx and use the new expression to evaluate f (x )

21 - - Example Evaluate f (x ) e x e x when x To avoid cancellation, one should use the following alternative when x f (x ) e x (e 3x (3x ) ) (3x + + ) e x! Example Evaluate f (x ) cos x sin x when x 4 π To avoid cancellation when x is close to π/4, one may use the alternative π f (x ) cos( ) 4 Example Evaluate f (x ) lnx when x e To avoid cancellation, use f (x ) lnx lne ln(x /e )

2 Computation with Floating-Point Numbers

2 Computation with Floating-Point Numbers 2 Computation with Floating-Point Numbers 2.1 Floating-Point Representation The notion of real numbers in mathematics is convenient for hand computations and formula manipulations. However, real numbers

More information

2 Computation with Floating-Point Numbers

2 Computation with Floating-Point Numbers 2 Computation with Floating-Point Numbers 2.1 Floating-Point Representation The notion of real numbers in mathematics is convenient for hand computations and formula manipulations. However, real numbers

More information

CS321. Introduction to Numerical Methods

CS321. Introduction to Numerical Methods CS31 Introduction to Numerical Methods Lecture 1 Number Representations and Errors Professor Jun Zhang Department of Computer Science University of Kentucky Lexington, KY 40506 0633 August 5, 017 Number

More information

Floating-point representation

Floating-point representation Lecture 3-4: Floating-point representation and arithmetic Floating-point representation The notion of real numbers in mathematics is convenient for hand computations and formula manipulations. However,

More information

Mathematical preliminaries and error analysis

Mathematical preliminaries and error analysis Mathematical preliminaries and error analysis Tsung-Ming Huang Department of Mathematics National Taiwan Normal University, Taiwan August 28, 2011 Outline 1 Round-off errors and computer arithmetic IEEE

More information

Computational Methods. Sources of Errors

Computational Methods. Sources of Errors Computational Methods Sources of Errors Manfred Huber 2011 1 Numerical Analysis / Scientific Computing Many problems in Science and Engineering can not be solved analytically on a computer Numeric solutions

More information

Accuracy versus precision

Accuracy versus precision Accuracy versus precision Accuracy is a consistent error from the true value, but not necessarily a good or precise error Precision is a consistent result within a small error, but not necessarily anywhere

More information

Computational Mathematics: Models, Methods and Analysis. Zhilin Li

Computational Mathematics: Models, Methods and Analysis. Zhilin Li Computational Mathematics: Models, Methods and Analysis Zhilin Li Chapter 1 Introduction Why is this course important (motivations)? What is the role of this class in the problem solving process using

More information

Classes of Real Numbers 1/2. The Real Line

Classes of Real Numbers 1/2. The Real Line Classes of Real Numbers All real numbers can be represented by a line: 1/2 π 1 0 1 2 3 4 real numbers The Real Line { integers rational numbers non-integral fractions irrational numbers Rational numbers

More information

Floating-Point Numbers in Digital Computers

Floating-Point Numbers in Digital Computers POLYTECHNIC UNIVERSITY Department of Computer and Information Science Floating-Point Numbers in Digital Computers K. Ming Leung Abstract: We explain how floating-point numbers are represented and stored

More information

Finite arithmetic and error analysis

Finite arithmetic and error analysis Finite arithmetic and error analysis Escuela de Ingeniería Informática de Oviedo (Dpto de Matemáticas-UniOvi) Numerical Computation Finite arithmetic and error analysis 1 / 45 Outline 1 Number representation:

More information

Review of Calculus, cont d

Review of Calculus, cont d Jim Lambers MAT 460/560 Fall Semester 2009-10 Lecture 4 Notes These notes correspond to Sections 1.1 1.2 in the text. Review of Calculus, cont d Taylor s Theorem, cont d We conclude our discussion of Taylor

More information

Floating-Point Numbers in Digital Computers

Floating-Point Numbers in Digital Computers POLYTECHNIC UNIVERSITY Department of Computer and Information Science Floating-Point Numbers in Digital Computers K. Ming Leung Abstract: We explain how floating-point numbers are represented and stored

More information

Roundoff Errors and Computer Arithmetic

Roundoff Errors and Computer Arithmetic Jim Lambers Math 105A Summer Session I 2003-04 Lecture 2 Notes These notes correspond to Section 1.2 in the text. Roundoff Errors and Computer Arithmetic In computing the solution to any mathematical problem,

More information

Floating Point Representation. CS Summer 2008 Jonathan Kaldor

Floating Point Representation. CS Summer 2008 Jonathan Kaldor Floating Point Representation CS3220 - Summer 2008 Jonathan Kaldor Floating Point Numbers Infinite supply of real numbers Requires infinite space to represent certain numbers We need to be able to represent

More information

Review Questions 26 CHAPTER 1. SCIENTIFIC COMPUTING

Review Questions 26 CHAPTER 1. SCIENTIFIC COMPUTING 26 CHAPTER 1. SCIENTIFIC COMPUTING amples. The IEEE floating-point standard can be found in [131]. A useful tutorial on floating-point arithmetic and the IEEE standard is [97]. Although it is no substitute

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 1 Scientific Computing Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction

More information

MAT128A: Numerical Analysis Lecture Two: Finite Precision Arithmetic

MAT128A: Numerical Analysis Lecture Two: Finite Precision Arithmetic MAT128A: Numerical Analysis Lecture Two: Finite Precision Arithmetic September 28, 2018 Lecture 1 September 28, 2018 1 / 25 Floating point arithmetic Computers use finite strings of binary digits to represent

More information

Lecture Objectives. Structured Programming & an Introduction to Error. Review the basic good habits of programming

Lecture Objectives. Structured Programming & an Introduction to Error. Review the basic good habits of programming Structured Programming & an Introduction to Error Lecture Objectives Review the basic good habits of programming To understand basic concepts of error and error estimation as it applies to Numerical Methods

More information

Chapter 3: Arithmetic for Computers

Chapter 3: Arithmetic for Computers Chapter 3: Arithmetic for Computers Objectives Signed and Unsigned Numbers Addition and Subtraction Multiplication and Division Floating Point Computer Architecture CS 35101-002 2 The Binary Numbering

More information

CHAPTER 2 SENSITIVITY OF LINEAR SYSTEMS; EFFECTS OF ROUNDOFF ERRORS

CHAPTER 2 SENSITIVITY OF LINEAR SYSTEMS; EFFECTS OF ROUNDOFF ERRORS CHAPTER SENSITIVITY OF LINEAR SYSTEMS; EFFECTS OF ROUNDOFF ERRORS The two main concepts involved here are the condition (of a problem) and the stability (of an algorithm). Both of these concepts deal with

More information

Section 1.4 Mathematics on the Computer: Floating Point Arithmetic

Section 1.4 Mathematics on the Computer: Floating Point Arithmetic Section 1.4 Mathematics on the Computer: Floating Point Arithmetic Key terms Floating point arithmetic IEE Standard Mantissa Exponent Roundoff error Pitfalls of floating point arithmetic Structuring computations

More information

Computational Economics and Finance

Computational Economics and Finance Computational Economics and Finance Part I: Elementary Concepts of Numerical Analysis Spring 2015 Outline Computer arithmetic Error analysis: Sources of error Error propagation Controlling the error Rates

More information

The ALU consists of combinational logic. Processes all data in the CPU. ALL von Neuman machines have an ALU loop.

The ALU consists of combinational logic. Processes all data in the CPU. ALL von Neuman machines have an ALU loop. CS 320 Ch 10 Computer Arithmetic The ALU consists of combinational logic. Processes all data in the CPU. ALL von Neuman machines have an ALU loop. Signed integers are typically represented in sign-magnitude

More information

Math 340 Fall 2014, Victor Matveev. Binary system, round-off errors, loss of significance, and double precision accuracy.

Math 340 Fall 2014, Victor Matveev. Binary system, round-off errors, loss of significance, and double precision accuracy. Math 340 Fall 2014, Victor Matveev Binary system, round-off errors, loss of significance, and double precision accuracy. 1. Bits and the binary number system A bit is one digit in a binary representation

More information

1.2 Round-off Errors and Computer Arithmetic

1.2 Round-off Errors and Computer Arithmetic 1.2 Round-off Errors and Computer Arithmetic 1 In a computer model, a memory storage unit word is used to store a number. A word has only a finite number of bits. These facts imply: 1. Only a small set

More information

Introduction to Numerical Computing

Introduction to Numerical Computing Statistics 580 Introduction to Numerical Computing Number Systems In the decimal system we use the 10 numeric symbols 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 to represent numbers. The relative position of each symbol

More information

in this web service Cambridge University Press

in this web service Cambridge University Press 978-0-51-85748- - Switching and Finite Automata Theory, Third Edition Part 1 Preliminaries 978-0-51-85748- - Switching and Finite Automata Theory, Third Edition CHAPTER 1 Number systems and codes This

More information

Module 2: Computer Arithmetic

Module 2: Computer Arithmetic Module 2: Computer Arithmetic 1 B O O K : C O M P U T E R O R G A N I Z A T I O N A N D D E S I G N, 3 E D, D A V I D L. P A T T E R S O N A N D J O H N L. H A N N E S S Y, M O R G A N K A U F M A N N

More information

(Refer Slide Time: 02:59)

(Refer Slide Time: 02:59) Numerical Methods and Programming P. B. Sunil Kumar Department of Physics Indian Institute of Technology, Madras Lecture - 7 Error propagation and stability Last class we discussed about the representation

More information

Number Systems and Binary Arithmetic. Quantitative Analysis II Professor Bob Orr

Number Systems and Binary Arithmetic. Quantitative Analysis II Professor Bob Orr Number Systems and Binary Arithmetic Quantitative Analysis II Professor Bob Orr Introduction to Numbering Systems We are all familiar with the decimal number system (Base 10). Some other number systems

More information

fractional quantities are typically represented in computers using floating point format this approach is very much similar to scientific notation

fractional quantities are typically represented in computers using floating point format this approach is very much similar to scientific notation Floating Point Arithmetic fractional quantities are typically represented in computers using floating point format this approach is very much similar to scientific notation for example, fixed point number

More information

LECTURE 0: Introduction and Background

LECTURE 0: Introduction and Background 1 LECTURE 0: Introduction and Background September 10, 2012 1 Computational science The role of computational science has become increasingly significant during the last few decades. It has become the

More information

Computational Economics and Finance

Computational Economics and Finance Computational Economics and Finance Part I: Elementary Concepts of Numerical Analysis Spring 2016 Outline Computer arithmetic Error analysis: Sources of error Error propagation Controlling the error Rates

More information

Number Systems. Both numbers are positive

Number Systems. Both numbers are positive Number Systems Range of Numbers and Overflow When arithmetic operation such as Addition, Subtraction, Multiplication and Division are performed on numbers the results generated may exceed the range of

More information

Chapter 03: Computer Arithmetic. Lesson 09: Arithmetic using floating point numbers

Chapter 03: Computer Arithmetic. Lesson 09: Arithmetic using floating point numbers Chapter 03: Computer Arithmetic Lesson 09: Arithmetic using floating point numbers Objective To understand arithmetic operations in case of floating point numbers 2 Multiplication of Floating Point Numbers

More information

Scientific Computing. Error Analysis

Scientific Computing. Error Analysis ECE257 Numerical Methods and Scientific Computing Error Analysis Today s s class: Introduction to error analysis Approximations Round-Off Errors Introduction Error is the difference between the exact solution

More information

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer Arithmetic ECE 666 Part 4-A Floating-Point Arithmetic Israel Koren ECE666/Koren Part.4a.1 Preliminaries - Representation

More information

Divide: Paper & Pencil

Divide: Paper & Pencil Divide: Paper & Pencil 1001 Quotient Divisor 1000 1001010 Dividend -1000 10 101 1010 1000 10 Remainder See how big a number can be subtracted, creating quotient bit on each step Binary => 1 * divisor or

More information

3.1 DATA REPRESENTATION (PART C)

3.1 DATA REPRESENTATION (PART C) 3.1 DATA REPRESENTATION (PART C) 3.1.3 REAL NUMBERS AND NORMALISED FLOATING-POINT REPRESENTATION In decimal notation, the number 23.456 can be written as 0.23456 x 10 2. This means that in decimal notation,

More information

AM205: lecture 2. 1 These have been shifted to MD 323 for the rest of the semester.

AM205: lecture 2. 1 These have been shifted to MD 323 for the rest of the semester. AM205: lecture 2 Luna and Gary will hold a Python tutorial on Wednesday in 60 Oxford Street, Room 330 Assignment 1 will be posted this week Chris will hold office hours on Thursday (1:30pm 3:30pm, Pierce

More information

Floating-point numbers. Phys 420/580 Lecture 6

Floating-point numbers. Phys 420/580 Lecture 6 Floating-point numbers Phys 420/580 Lecture 6 Random walk CA Activate a single cell at site i = 0 For all subsequent times steps, let the active site wander to i := i ± 1 with equal probability Random

More information

Voluntary State Curriculum Algebra II

Voluntary State Curriculum Algebra II Algebra II Goal 1: Integration into Broader Knowledge The student will develop, analyze, communicate, and apply models to real-world situations using the language of mathematics and appropriate technology.

More information

What we need to know about error: Class Outline. Computational Methods CMSC/AMSC/MAPL 460. Errors in data and computation

What we need to know about error: Class Outline. Computational Methods CMSC/AMSC/MAPL 460. Errors in data and computation Class Outline Computational Methods CMSC/AMSC/MAPL 460 Errors in data and computation Representing numbers in floating point Ramani Duraiswami, Dept. of Computer Science Computations should be as accurate

More information

Number Systems CHAPTER Positional Number Systems

Number Systems CHAPTER Positional Number Systems CHAPTER 2 Number Systems Inside computers, information is encoded as patterns of bits because it is easy to construct electronic circuits that exhibit the two alternative states, 0 and 1. The meaning of

More information

Floating-Point Data Representation and Manipulation 198:231 Introduction to Computer Organization Lecture 3

Floating-Point Data Representation and Manipulation 198:231 Introduction to Computer Organization Lecture 3 Floating-Point Data Representation and Manipulation 198:231 Introduction to Computer Organization Instructor: Nicole Hynes nicole.hynes@rutgers.edu 1 Fixed Point Numbers Fixed point number: integer part

More information

Numerical Computing: An Introduction

Numerical Computing: An Introduction Numerical Computing: An Introduction Gyula Horváth Horvath@inf.u-szeged.hu Tom Verhoeff T.Verhoeff@TUE.NL University of Szeged Hungary Eindhoven University of Technology The Netherlands Numerical Computing

More information

Floating Point Arithmetic

Floating Point Arithmetic Floating Point Arithmetic CS 365 Floating-Point What can be represented in N bits? Unsigned 0 to 2 N 2s Complement -2 N-1 to 2 N-1-1 But, what about? very large numbers? 9,349,398,989,787,762,244,859,087,678

More information

Computational Methods CMSC/AMSC/MAPL 460. Representing numbers in floating point and associated issues. Ramani Duraiswami, Dept. of Computer Science

Computational Methods CMSC/AMSC/MAPL 460. Representing numbers in floating point and associated issues. Ramani Duraiswami, Dept. of Computer Science Computational Methods CMSC/AMSC/MAPL 460 Representing numbers in floating point and associated issues Ramani Duraiswami, Dept. of Computer Science Class Outline Computations should be as accurate and as

More information

(+A) + ( B) + (A B) (B A) + (A B) ( A) + (+ B) (A B) + (B A) + (A B) (+ A) (+ B) + (A - B) (B A) + (A B) ( A) ( B) (A B) + (B A) + (A B)

(+A) + ( B) + (A B) (B A) + (A B) ( A) + (+ B) (A B) + (B A) + (A B) (+ A) (+ B) + (A - B) (B A) + (A B) ( A) ( B) (A B) + (B A) + (A B) COMPUTER ARITHMETIC 1. Addition and Subtraction of Unsigned Numbers The direct method of subtraction taught in elementary schools uses the borrowconcept. In this method we borrow a 1 from a higher significant

More information

ME 261: Numerical Analysis. ME 261: Numerical Analysis

ME 261: Numerical Analysis. ME 261: Numerical Analysis ME 261: Numerical Analysis 3. credit hours Prereq.: ME 163/ME 171 Course content Approximations and error types Roots of polynomials and transcendental equations Determinants and matrices Solution of linear

More information

Natural Numbers and Integers. Big Ideas in Numerical Methods. Overflow. Real Numbers 29/07/2011. Taking some ideas from NM course a little further

Natural Numbers and Integers. Big Ideas in Numerical Methods. Overflow. Real Numbers 29/07/2011. Taking some ideas from NM course a little further Natural Numbers and Integers Big Ideas in Numerical Methods MEI Conference 2011 Natural numbers can be in the range [0, 2 32 1]. These are known in computing as unsigned int. Numbers in the range [ (2

More information

Floating Point. The World is Not Just Integers. Programming languages support numbers with fraction

Floating Point. The World is Not Just Integers. Programming languages support numbers with fraction 1 Floating Point The World is Not Just Integers Programming languages support numbers with fraction Called floating-point numbers Examples: 3.14159265 (π) 2.71828 (e) 0.000000001 or 1.0 10 9 (seconds in

More information

Truncation Errors. Applied Numerical Methods with MATLAB for Engineers and Scientists, 2nd ed., Steven C. Chapra, McGraw Hill, 2008, Ch. 4.

Truncation Errors. Applied Numerical Methods with MATLAB for Engineers and Scientists, 2nd ed., Steven C. Chapra, McGraw Hill, 2008, Ch. 4. Chapter 4: Roundoff and Truncation Errors Applied Numerical Methods with MATLAB for Engineers and Scientists, 2nd ed., Steven C. Chapra, McGraw Hill, 2008, Ch. 4. 1 Outline Errors Accuracy and Precision

More information

Floating-Point Arithmetic

Floating-Point Arithmetic Floating-Point Arithmetic Raymond J. Spiteri Lecture Notes for CMPT 898: Numerical Software University of Saskatchewan January 9, 2013 Objectives Floating-point numbers Floating-point arithmetic Analysis

More information

Numerical computing. How computers store real numbers and the problems that result

Numerical computing. How computers store real numbers and the problems that result Numerical computing How computers store real numbers and the problems that result The scientific method Theory: Mathematical equations provide a description or model Experiment Inference from data Test

More information

COMPUTER ARCHITECTURE AND ORGANIZATION. Operation Add Magnitudes Subtract Magnitudes (+A) + ( B) + (A B) (B A) + (A B)

COMPUTER ARCHITECTURE AND ORGANIZATION. Operation Add Magnitudes Subtract Magnitudes (+A) + ( B) + (A B) (B A) + (A B) Computer Arithmetic Data is manipulated by using the arithmetic instructions in digital computers. Data is manipulated to produce results necessary to give solution for the computation problems. The Addition,

More information

2.1.1 Fixed-Point (or Integer) Arithmetic

2.1.1 Fixed-Point (or Integer) Arithmetic x = approximation to true value x error = x x, relative error = x x. x 2.1.1 Fixed-Point (or Integer) Arithmetic A base 2 (base 10) fixed-point number has a fixed number of binary (decimal) places. 1.

More information

Chapter 4. Operations on Data

Chapter 4. Operations on Data Chapter 4 Operations on Data 1 OBJECTIVES After reading this chapter, the reader should be able to: List the three categories of operations performed on data. Perform unary and binary logic operations

More information

Introduction to floating point arithmetic

Introduction to floating point arithmetic Introduction to floating point arithmetic Matthias Petschow and Paolo Bientinesi AICES, RWTH Aachen petschow@aices.rwth-aachen.de October 24th, 2013 Aachen, Germany Matthias Petschow (AICES, RWTH Aachen)

More information

CS 450 Numerical Analysis. Chapter 7: Interpolation

CS 450 Numerical Analysis. Chapter 7: Interpolation Lecture slides based on the textbook Scientific Computing: An Introductory Survey by Michael T. Heath, copyright c 2018 by the Society for Industrial and Applied Mathematics. http://www.siam.org/books/cl80

More information

Using Arithmetic of Real Numbers to Explore Limits and Continuity

Using Arithmetic of Real Numbers to Explore Limits and Continuity Using Arithmetic of Real Numbers to Explore Limits and Continuity by Maria Terrell Cornell University Problem Let a =.898989... and b =.000000... (a) Find a + b. (b) Use your ideas about how to add a and

More information

Computer Arithmetic. 1. Floating-point representation of numbers (scientific notation) has four components, for example, 3.

Computer Arithmetic. 1. Floating-point representation of numbers (scientific notation) has four components, for example, 3. ECS231 Handout Computer Arithmetic I: Floating-point numbers and representations 1. Floating-point representation of numbers (scientific notation) has four components, for example, 3.1416 10 1 sign significandbase

More information

Unavoidable Errors in Computing

Unavoidable Errors in Computing Unavoidable Errors in Computing Gerald W. Recktenwald Department of Mechanical Engineering Portland State University gerry@me.pdx.edu These slides are a supplement to the book Numerical Methods with Matlab:

More information

In this lesson you will learn: how to add and multiply positive binary integers how to work with signed binary numbers using two s complement how fixed and floating point numbers are used to represent

More information

Reals 1. Floating-point numbers and their properties. Pitfalls of numeric computation. Horner's method. Bisection. Newton's method.

Reals 1. Floating-point numbers and their properties. Pitfalls of numeric computation. Horner's method. Bisection. Newton's method. Reals 1 13 Reals Floating-point numbers and their properties. Pitfalls of numeric computation. Horner's method. Bisection. Newton's method. 13.1 Floating-point numbers Real numbers, those declared to be

More information

Hani Mehrpouyan, California State University, Bakersfield. Signals and Systems

Hani Mehrpouyan, California State University, Bakersfield. Signals and Systems Hani Mehrpouyan, Department of Electrical and Computer Engineering, California State University, Bakersfield Lecture 3 (Error and Computer Arithmetic) April 8 th, 2013 The material in these lectures is

More information

Floating Point January 24, 2008

Floating Point January 24, 2008 15-213 The course that gives CMU its Zip! Floating Point January 24, 2008 Topics IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties class04.ppt 15-213, S 08 Floating

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 1 Scientific Computing Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction

More information

EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing

EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing EE878 Special Topics in VLSI Computer Arithmetic for Digital Signal Processing Part 4-B Floating-Point Arithmetic - II Spring 2017 Koren Part.4b.1 The IEEE Floating-Point Standard Four formats for floating-point

More information

unused unused unused unused unused unused

unused unused unused unused unused unused BCD numbers. In some applications, such as in the financial industry, the errors that can creep in due to converting numbers back and forth between decimal and binary is unacceptable. For these applications

More information

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer Arithmetic ECE 666 Part 4-B Floating-Point Arithmetic - II Israel Koren ECE666/Koren Part.4b.1 The IEEE Floating-Point

More information

Data Representation Floating Point

Data Representation Floating Point Data Representation Floating Point CSCI 2400 / ECE 3217: Computer Architecture Instructor: David Ferry Slides adapted from Bryant & O Hallaron s slides via Jason Fritts Today: Floating Point Background:

More information

4 Operations On Data 4.1. Foundations of Computer Science Cengage Learning

4 Operations On Data 4.1. Foundations of Computer Science Cengage Learning 4 Operations On Data 4.1 Foundations of Computer Science Cengage Learning Objectives After studying this chapter, the student should be able to: List the three categories of operations performed on data.

More information

Data Representation Floating Point

Data Representation Floating Point Data Representation Floating Point CSCI 2400 / ECE 3217: Computer Architecture Instructor: David Ferry Slides adapted from Bryant & O Hallaron s slides via Jason Fritts Today: Floating Point Background:

More information

Introduction to numerical algorithms

Introduction to numerical algorithms Introduction to numerical algorithms Given an algebraic equation or formula, we may want to approximate the value, and while in calculus, we deal with equations or formulas that are well defined at each

More information

CS 265. Computer Architecture. Wei Lu, Ph.D., P.Eng.

CS 265. Computer Architecture. Wei Lu, Ph.D., P.Eng. CS 265 Computer Architecture Wei Lu, Ph.D., P.Eng. 1 Part 1: Data Representation Our goal: revisit and re-establish fundamental of mathematics for the computer architecture course Overview: what are bits

More information

Chapter 3. Errors and numerical stability

Chapter 3. Errors and numerical stability Chapter 3 Errors and numerical stability 1 Representation of numbers Binary system : micro-transistor in state off 0 on 1 Smallest amount of stored data bit Object in memory chain of 1 and 0 10011000110101001111010010100010

More information

Definition. A Taylor series of a function f is said to represent function f, iff the error term converges to 0 for n going to infinity.

Definition. A Taylor series of a function f is said to represent function f, iff the error term converges to 0 for n going to infinity. Definition A Taylor series of a function f is said to represent function f, iff the error term converges to 0 for n going to infinity. 120202: ESM4A - Numerical Methods 32 f(x) = e x at point c = 0. Taylor

More information

Floating Point Puzzles. Lecture 3B Floating Point. IEEE Floating Point. Fractional Binary Numbers. Topics. IEEE Standard 754

Floating Point Puzzles. Lecture 3B Floating Point. IEEE Floating Point. Fractional Binary Numbers. Topics. IEEE Standard 754 Floating Point Puzzles Topics Lecture 3B Floating Point IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties For each of the following C expressions, either: Argue that

More information

At the ith stage: Input: ci is the carry-in Output: si is the sum ci+1 carry-out to (i+1)st state

At the ith stage: Input: ci is the carry-in Output: si is the sum ci+1 carry-out to (i+1)st state Chapter 4 xi yi Carry in ci Sum s i Carry out c i+ At the ith stage: Input: ci is the carry-in Output: si is the sum ci+ carry-out to (i+)st state si = xi yi ci + xi yi ci + xi yi ci + xi yi ci = x i yi

More information

CO212 Lecture 10: Arithmetic & Logical Unit

CO212 Lecture 10: Arithmetic & Logical Unit CO212 Lecture 10: Arithmetic & Logical Unit Shobhanjana Kalita, Dept. of CSE, Tezpur University Slides courtesy: Computer Architecture and Organization, 9 th Ed, W. Stallings Integer Representation For

More information

1. NUMBER SYSTEMS USED IN COMPUTING: THE BINARY NUMBER SYSTEM

1. NUMBER SYSTEMS USED IN COMPUTING: THE BINARY NUMBER SYSTEM 1. NUMBER SYSTEMS USED IN COMPUTING: THE BINARY NUMBER SYSTEM 1.1 Introduction Given that digital logic and memory devices are based on two electrical states (on and off), it is natural to use a number

More information

Floating point. Today! IEEE Floating Point Standard! Rounding! Floating Point Operations! Mathematical properties. Next time. !

Floating point. Today! IEEE Floating Point Standard! Rounding! Floating Point Operations! Mathematical properties. Next time. ! Floating point Today! IEEE Floating Point Standard! Rounding! Floating Point Operations! Mathematical properties Next time! The machine model Chris Riesbeck, Fall 2011 Checkpoint IEEE Floating point Floating

More information

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. What is scientific computing?

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. What is scientific computing? Lecture Notes to Accompany Scientific Computing An Introductory Survey Second Edition by Michael T. Heath Scientific Computing What is scientific computing? Design and analysis of algorithms for solving

More information

Floating Point (with contributions from Dr. Bin Ren, William & Mary Computer Science)

Floating Point (with contributions from Dr. Bin Ren, William & Mary Computer Science) Floating Point (with contributions from Dr. Bin Ren, William & Mary Computer Science) Floating Point Background: Fractional binary numbers IEEE floating point standard: Definition Example and properties

More information

Chapter 2 Float Point Arithmetic. Real Numbers in Decimal Notation. Real Numbers in Decimal Notation

Chapter 2 Float Point Arithmetic. Real Numbers in Decimal Notation. Real Numbers in Decimal Notation Chapter 2 Float Point Arithmetic Topics IEEE Floating Point Standard Fractional Binary Numbers Rounding Floating Point Operations Mathematical properties Real Numbers in Decimal Notation Representation

More information

COMP2611: Computer Organization. Data Representation

COMP2611: Computer Organization. Data Representation COMP2611: Computer Organization Comp2611 Fall 2015 2 1. Binary numbers and 2 s Complement Numbers 3 Bits: are the basis for binary number representation in digital computers What you will learn here: How

More information

Positional notation Ch Conversions between Decimal and Binary. /continued. Binary to Decimal

Positional notation Ch Conversions between Decimal and Binary. /continued. Binary to Decimal Positional notation Ch.. /continued Conversions between Decimal and Binary Binary to Decimal - use the definition of a number in a positional number system with base - evaluate the definition formula using

More information

What Every Computer Scientist Should Know About Floating-Point Arithmetic

What Every Computer Scientist Should Know About Floating-Point Arithmetic Page 1 of 87 Numerical Computation Guide Appendix D What Every Computer Scientist Should Know About Floating- Point Arithmetic Note This appendix is an edited reprint of the paper What Every Computer Scientist

More information

AMTH142 Lecture 10. Scilab Graphs Floating Point Arithmetic

AMTH142 Lecture 10. Scilab Graphs Floating Point Arithmetic AMTH142 Lecture 1 Scilab Graphs Floating Point Arithmetic April 2, 27 Contents 1.1 Graphs in Scilab......................... 2 1.1.1 Simple Graphs...................... 2 1.1.2 Line Styles........................

More information

Floating Point Considerations

Floating Point Considerations Chapter 6 Floating Point Considerations In the early days of computing, floating point arithmetic capability was found only in mainframes and supercomputers. Although many microprocessors designed in the

More information

Honors Precalculus: Solving equations and inequalities graphically and algebraically. Page 1

Honors Precalculus: Solving equations and inequalities graphically and algebraically. Page 1 Solving equations and inequalities graphically and algebraically 1. Plot points on the Cartesian coordinate plane. P.1 2. Represent data graphically using scatter plots, bar graphs, & line graphs. P.1

More information

Most nonzero floating-point numbers are normalized. This means they can be expressed as. x = ±(1 + f) 2 e. 0 f < 1

Most nonzero floating-point numbers are normalized. This means they can be expressed as. x = ±(1 + f) 2 e. 0 f < 1 Floating-Point Arithmetic Numerical Analysis uses floating-point arithmetic, but it is just one tool in numerical computation. There is an impression that floating point arithmetic is unpredictable and

More information

Chapter 2. Data Representation in Computer Systems

Chapter 2. Data Representation in Computer Systems Chapter 2 Data Representation in Computer Systems Chapter 2 Objectives Understand the fundamentals of numerical data representation and manipulation in digital computers. Master the skill of converting

More information

Representing and Manipulating Floating Points. Jo, Heeseung

Representing and Manipulating Floating Points. Jo, Heeseung Representing and Manipulating Floating Points Jo, Heeseung The Problem How to represent fractional values with finite number of bits? 0.1 0.612 3.14159265358979323846264338327950288... 2 Fractional Binary

More information

CHAPTER V NUMBER SYSTEMS AND ARITHMETIC

CHAPTER V NUMBER SYSTEMS AND ARITHMETIC CHAPTER V-1 CHAPTER V CHAPTER V NUMBER SYSTEMS AND ARITHMETIC CHAPTER V-2 NUMBER SYSTEMS RADIX-R REPRESENTATION Decimal number expansion 73625 10 = ( 7 10 4 ) + ( 3 10 3 ) + ( 6 10 2 ) + ( 2 10 1 ) +(

More information

Computer Architecture and Organization

Computer Architecture and Organization 3-1 Chapter 3 - Arithmetic Computer Architecture and Organization Miles Murdocca and Vincent Heuring Chapter 3 Arithmetic 3-2 Chapter 3 - Arithmetic Chapter Contents 3.1 Fixed Point Addition and Subtraction

More information

correlated to the Michigan High School Mathematics Content Expectations

correlated to the Michigan High School Mathematics Content Expectations correlated to the Michigan High School Mathematics Content Expectations McDougal Littell Algebra 1 Geometry Algebra 2 2007 correlated to the STRAND 1: QUANTITATIVE LITERACY AND LOGIC (L) STANDARD L1: REASONING

More information

CHAPTER 5 Computer Arithmetic and Round-Off Errors

CHAPTER 5 Computer Arithmetic and Round-Off Errors CHAPTER 5 Computer Arithmetic and Round-Off Errors In the two previous chapters we have seen how numbers can be represented in the binary numeral system and how this is the basis for representing numbers

More information