CHAPTER 2 SENSITIVITY OF LINEAR SYSTEMS; EFFECTS OF ROUNDOFF ERRORS

CHAPTER SENSITIVITY OF LINEAR SYSTEMS; EFFECTS OF ROUNDOFF ERRORS The two main concepts involved here are the condition (of a problem) and the stability (of an algorithm). Both of these concepts deal with the effects of small changes (perturbations) in the data. In this regard, the text uses the term "sensitivity" of a linear system to refer to how its solution is affected by small changes in the data. Some notation regarding floating-point arithmetic: b denotes the base of the number system k denotes the floating-point precision (the number of significant digits in the mantissa or fractional part of a floating-point number) nonzero floating-point numbers are always stored in a normalized form: e ± 0. d d L d k b, where d and each d i is a base b digit (0 d i b for i k ). The floating-point representation fl (x) of a real number x is obtained by either chopping or rounding its (infinite) base b expansion to k significant digits (and adjusting the exponent so that it is normalized). For example, if b = 0, k = 4 and x = /, then 0 and ( x) 0.6666 0 0 fl = + with chopping arithmetic ( x) 0.6667 0 0 fl = + with rounding arithmetic. The following material is from my CSc 49A notes. 64

FLOATING-POINT ARITHMETIC -- a simulation of real arithmetic -- Notation: we ll use the symbol f l. For example, if x denotes a real number, then fl (x) denotes its floating-point representation. Similarly, if a and b are floatingpoint numbers, then fl ( a + b), a b), a b), a / b) denote the floating-point sum, difference, product and quotient, respectively, of a and b. The implementation of these floating-point operations (in either software or hardware) depends on several factors, and includes, for example, a choice regarding -- rounding or chopping -- the number of significant digits used for floating-point addition and subtraction. For simplicity, we ll consider only idealized floating-point arithmetic, which is defined as follows. Let denote any one of the basic arithmetic operations + /, and let x and y denote floating-point numbers. f l ( x y) is obtained by performing exact arithmetic on x and y, and then rounding or chopping this result to k significant digits. Note : although no actual digital computers or calculators implement floating-point arithmetic this way (it s too expensive, as it requires a very long accumulator for doing addition and subtraction), idealized floating-point arithmetic -- behaves very much like any actual implementation -- is very simple to do in hand s, and -- has accuracy almost identical to that of any actual implementation. Note. If f l is applied to an arithmetic expression containing more than one arithmetic operation, then each of the arithmetic operations must be replaced by its corresponding floating-point operation. For example, and fl ( x + y z) means x + y) z) fl ( xy + z / cos( x)) means x y) + z / cos( x))). Each l f operation is computed according to the rules of idealized floating-point arithmetic, that is, the exact value of the result is rounded or chopped to k significant 65

digits before proceeding with the rest of the. Note that we ll compute x fl (cos x), fl ( x), e ) and so on this way. Note : with idealized floating-point arithmetic, the maximum relative error in f l ( x y) is the same as the maximum relative error in converting a real number z to floating-point form. Thus, for a single floating-point + /, the relative error is very small: it is < b k (with chopping) or < b k (with rounding). However, the relative error in a floating-point might be large if more than one floating-point operation is performed. For example, compute f l ( x + y + z) when x =+0.4 0 0, y = 0.5508 0 4, z = 0. 0 0 using base b =0, precision k =4, rounding idealized floating-point arithmetic. 0 fl ( x + y) = + 0. 0 since x + y = 0.449 fl ( x + y + z) = + 0.000 0 since.. = 0.000 Since the exact value x + y + z = 0.000449, the relative error is 0.000449 0.000 0.000449 = 0. or %. Note, however, that this large relative error can be avoided by changing the order in which these numbers are added together. Consider the evaluation of We obtain f l ( x + z + y) = x + z) + y). fl ( x + z) = 0.000 x + z) + y) = 0.449 0 or since 0.000 0 which has a relative error of only 0.0008 or 0.08%. 0.000 0.00005508 = 0.000449, 66

ERRORS AND FLOATING-POINT ARITHMETIC Example. Using b = 0, k = 4 chopping floating-point arithmetic, evaluate for x =.76, y =.000400, z =.67. w = 000 x x y z x y) = 0.7 0 x y) z) = 0.4000 0 000 x) = 0.76 0 w) = Exact w = 7.6.0004999 0 0.76 0 / 0.4000 0.76000.000400.7999.7.67.0004 ) = 0.90 0 55,5. So fl (w ) has no correct significant digits. 6 Example. Approximate e x for x = 5.5 using b = 0, k = 5 rounding floating-point arithmetic and the Taylor polynomial approximation (expanded about x 0 = 0) e x 4 n x x x x + x + + + + L +.!! 4! n! The floating-point results in the summation of the following terms: 67

e 5.5.0000 5.5000 +5.5 7.70 +8.9 4.94 +8.446 0.08 +0.768.69 +6.980.490 +.5997 and so on. Using rounding floating-point arithmetic with b = 0 and k = 5, this sum equals 0.0066 (or +0.66 0 ) after summing 5 terms (that is, n = 4), and no further terms change this sum (as they are all < 0 7 ). However, the exact value of e 5.5 is 0.00408677 (to 6 significant digits), so 5.5 fl ( e ) has no correct significant digits. 68

CONDITION AND STABILITY In analyzing the effects of round-off error on an inaccurate computed solution, we want to distinguish whether the algorithm (the procedure for computing the solution) is at fault, or whether the problem is such that no algorithm can be expected to reasonably solve the problem. Concepts involved: stable/unstable algorithm well-conditioned/ill-conditioned problem Definition A problem whose (exact) solution can change greatly with small changes in the data defining the problem is called ill-conditioned. data {d i exact solution {r i defining a problem exact arithmetic perturbed problem, data { ˆ d i = {d i + ε i exact solution { ˆ r i with ε i d i small exact arithmetic If there exist small ε i such that { r ˆ i are not close to {r i, then the problem is ill-conditioned. ˆ i r i for all small i If { r { ε, then the problem is well-conditioned. Example : Consider the system of linear equations Hx = b : / / / / / 4 / / 4 / 5 x x x = / 6 /. 47 / 60 Using the exact { d i given above in the matrix H and the vector b, the exact solution is x =. However, if the entries of H and b are rounded to significant decimal digits to give the following perturbed problem H ˆ x ˆ = b ˆ : 69

.00 0.500 0. 0.500 0. 0.50 0. xˆ 0.50 xˆ 0.00 xˆ =.8.08 0.78 then the exact solution (to 5 significant digits) is.0895 ˆx = 0.48797..490 Thus, the problem of solving Hx = b is ill-conditioned. Note. The condition of a problem has nothing to do with floating-point arithmetic or round-off error; it is defined in terms of exact. However, if a problem is ill-conditioned, it will be difficult (or impossible) to solve accurately using floating-point arithmetic. Definition. An algorithm is said to be stable (for a class of problems) if it determines a computed solution (using floating-point arithmetic) that is close to the exact solution of some (small) perturbation of the given problem. given problem, specified by computed solution data {d i floating-point {r i perturbed problem, data exact solution {ˆ d i ={d i + ε i exact { r ˆ i with ε i d i small If there exist data d ˆ i d i (small ε i for all i) such that r ˆ i r i (for all i), then the algorithm is said to be stable. Meaning: the effect of the floating-point arithmetic (the round-off error) is no worse than the effect of slightly perturbing the given problem, and solving the perturbed problem exactly. 70

Note. If there exists no set of data { d ˆ i close to {d i such that algorithm is said to be unstable. rˆ r for all i, then the i i Stability analysis of the of a solution to the problem considered above in Example. If Gaussian elimination with partial pivoting (an algorithm considered in Chapter 6) is implemented in base 0, precision k =, chopping, floating-point arithmetic, and is used to solve the system of linear equations Hˆ xˆ = bˆ :.00 0.500 0. 0.500 0. 0.50 0. 0.50 0.00 xˆ xˆ xˆ =.8.08 0.78, then the computed solution is r ] T = [0.480.88., whereas the exact solution is T approximately x ˆ = [.0895 0.48797.490]. However, this is stable, since there exists a perturbed problem ( H ˆ + E)ˆ r = bˆ + e, for example,.06 0.47 0.7 0.5 0.8 0.5 0.9 rˆ.87 0.6 = rˆ.08507, 0.78 rˆ 0.785 T whose exact solution r ˆ = [0.465.80.7] is close to r. Since the size of the perturbations E and e is small, the of the computed solution r is stable. Analysis of Example. The problem of computing w = 000 x x y z for x = 0.76, y = 0.000400, z = 0.67 is ill-conditioned; that is, the value of w is x, y, z. very sensitive to small changes in the data { For example, consider the perturbed problem having data x ˆ = 0.75, y ˆ = y and z ˆ = 0.68. The perturbations are small (relative error) given that the floating-point precision is k = 4: x ˆ x x = 0.00078, y ˆ y y = 0.0, z ˆ z z = 0.00079. 7

given problem, data { x, y, z w = 55,5.05 exact perturbed problem, data { xˆ, yˆ, zˆ w ˆ = 45,4.7 exact Since { x, yˆ, zˆ { x, y, z ˆ and w ˆ is not close to w, the problem is ill-conditioned. Note. This illustrates a general principle: division by a small, inaccurate number will create a large error. E.g., 0.0005 = 000 but 0.00049 040.8 In the above calculation of w, there is cancellation and a loss of significant digits in computing x y z using floating-point arithmetic. Thus fl ( x y z) is relatively small and may be inaccurate, implying that the computed w is likely inaccurate. Stability analysis: The algorithm for computing w simply refers to the floating-point fl ( 000 x) / x y) z) ) To show that this algorithm is stable, one needs to find perturbed data { x, yˆ, zˆ ˆ so that the exact solution of this perturbed problem is close to the computed solution of the given problem. 7

given problem, computed solution data { x, y, z floating-point w = 9,000 perturbed problem, with data x ˆ = x exact solution y ˆ = y exact w ˆ = 9,000 z ˆ = 0.67999 ˆ ˆ, ˆ ˆ (in fact w ˆ = w in this case), the algorithm is stable. Since { x, y z { x, y, z and w w Note from this example that a stable algorithm does not guarantee an accurate computed solution. If the problem is well-conditioned, then a stable algorithm will produce an accurate computed solution, but not necessarily if the problem is ill-conditioned. 000x Note also that a better algorithm (for this particular data) is to evaluate fl. ( x z) y This gives the best possible computed solution. However, since the problem is illconditioned, if your data is not exact, then the computed solution will likely be very inaccurate. Analysis of Example. This is an example of what might be called catastrophic cancellation. The answer is computed as a sum of terms such as 8.9, -4.94, etc. that are accurate to at most decimal places (approximately 0.00). But the correct value of e 5.5 is 0.00408677. Thus, the computed value of 0.0066 is essentially a sum of round-off errors! The (correct) significant digits in most of the numbers added together cancel out. 7

Condition data. The problem is well-conditioned; that is, it is not sensitive to small changes in the e 5.5+ ε = e = e 5.5 5.5 e ε ε + ε + ε + 6 + L. So for all ε such that ε / 5. 5 is small, since the ε and higher order terms in ε are much smaller than ε, we have e 5.5 + ε e 5.5 ( + ε ), implying that the relative error e 5.5 e 5.5+ε e 5.5 ε is small. Stability The algorithm used to compute e 5.5 is unstable: given problem, 0.0066 data x = 5.5 floating-point perturbed problem, solution will be x ˆ = 5.5 + ε exact near e 5.5 0.00408677 with ε /5.5 small for all small ε. Explanation for this: cannot find small ε so that e 5.5 + ε.0066 because e x is a continuous function and e 5.987886 = e 5.5.487886 = 0.0066 That is, if e 5.5 + ε.0066, then ε 0.487886, but this is not a small perturbation of 5.5. 74

Note. A stable algorithm for computing e 5.5 (and, in general, for computing e x for x < 0 ): e 5.5 = e 5.5 = + 5.5 + (5.5) + (5.5) 6 +... E.g., using b = 0, k = 5 floating-point arithmetic, this (using 8 terms of the Taylor polynomial approximation) gives a computed solution of 0.0040865 (which is very accurate). 75