Numerical Methods for Eng [ENGR 391] [Lyes KADEM 2007]

CHAPTER I I. Introduction Approximations and Round-off errors The concept of errors is very important to the effective use of numerical methods. Usually we can compare the numerical result with the analytical solution. However, when the analytical solution is not available (which is usually the case), we have to estimate the errors. The first step to minimize the errors is to apply simplifications to our problem and use simple formulations that can be solved analytically. However, sometimes the results are far from the reality. Hence, more complex formulations are needed, but as a consequence, it is more difficult to solve them analytically. Solving these problems will be then only possible by using numerical methods. However, the problem with numerical methods is that they yield approximate results. It is, therefore, important to develop criteria to determine if our approximation of the solution is acceptable. II. Accuracy and precision The errors associated with computation or measurements can be characterized by their accuracy and their precision. Accuracy: how closely a computed or measured value agrees with the true value. Precision: how closely individual computed or measured values agree with each other. Figure.1.2. accuracy and precision (a) inaccurate and imprecise; (b) accurate and imprecise; (c) inaccurate and precise; (d) accurate and precise. In engineering problems, we try to minimize both imprecision and inaccuracy Approximations and round-off errors 1

III. Errors definition The errors encountered in numerical methods can be classified into: Truncation errors: defined as the errors due the fact that we used an approximation to solve the problem instead of solving the problem analytically. Round-off errors: appears when numbers having limited significant figures are used to represent exact numbers (example: π; e; ). When considering the errors due to the fact that we are using numerical methods, the true value for the solution can be written as: Hence, the error can be computed as: True value = approximation + error Error (E t ) = True value approximation E t is the true error since we are comparing the approximation with the true value. To take into account the magnitude of the error, it is preferable to normalize the error to the true value: True error True fractional relative error = True value We can express this in percentage as: True value Approximation t = 100% True error Where t is the relative error. An important point to notice is that in the definition of the true error, we used the true value of the solution. However, the true value is not always available and we have, therefore, to compute an approximation of the error. For that, we normalize the error using the best available estimate of the true value: a = Approximation error Approximation 100% However, in real life it is not obvious to know the approximation. What is the solution? As several numerical methods include an iterative process, we will define the error as: a Current approximation Previous approximation = 100% Current approximation Approximations and round-off errors 2

You can notice, from the above formulation, that the error can be negative or positive. But in reality the most important thing for us is that the absolute error has to be lower than a certain limit s (this limit is very dependent upon the application and the computational time): III.1. Round-off errors a < s These errors originate from the fact that computers retain only a fixed number of significant figures during calculation. These errors are, therefore, directly related to the manner in which numbers are stored in a computer. In fact, remember that instead of using decimal number system (or base-10) as we do, a computer uses a binary system (or base-2). Why? Because, this corresponds to the on/off positions of electronic components. In a 16-bit computer word, the numbers will be stored as: 1 0 1 0 1 1 0 1 0 0 1 1 1 0 1 0 sign number III.2. Floating point representation The floating point representation is used to store fractional quantities. The number is expressed under the following form: m.b e where (m) is the mantissa; (b) is the base of the number system used and (e) is the exponent. As an example, the number 156.76 could be represented as 0.15676 10 3 in a floating point base- 10 system. Usually for the storage of fractional quantities, the first bit is reserved to the sign, then the signed exponent, and the last bits for the mantissa. Therefore, for an optimal storage, if the mantissa has leading zero digits, they are removed and transferred to the exponent. 1/34 = 0.0294117 Would be stored as: 0.0294 10 0 However, because the zero before the 2, we lose the digit (1). A better storage is: 0.2941 10-1 Floating-point representation allows both fractions and large numbers to be stored, however, this has a computational cost since floating-point numbers take more time to be processed than integer numbers, and it has also a precision price, since only a finite number of figures can be stored in the mantissa. In 64 bits using IEEE 754 standard Approximations and round-off errors 3

sign exponent mantissa =64 bits 1bit 11 bits 52 bits III.3. Limited range of quantities that may be represented As the number of bits is limited some very large or very small numbers can not be represented. If you try to store a number outside this range you will generate an overflow error. - How to deal the problem of π? π = 3.141592653558 To be stored on a base-10 system carrying seven significant figures: we can omit the figures after the seventh: π = 3.141592; this is called chopping. This will generate an error of 0.0000065 Or we can round the eighth figure: π = 3.141593 This will generate an error of -0.0000035 Therefore, rounding reduces the error III.4. Comparison between two numbers When comparing two numbers, it is wiser to test that the difference is less than an acceptably small tolerance rather than to test for equality: If you want to test if a=b, the best solution is to write in your program: If a b Machine epsilon can be used as a criterion to ensure a certain portability of the code, since it will not depend on the storage characteristics of the machine used. 1 t = base t: is the number of digits in the mantissa III.5. Extended precision It is also possible to increase the accuracy of the computation by assigning a double precision to the variables. In this case, about 15 to 16 decimals of precision and a range of approximately: 10-308 to 10 308 is used. However, this will increase the execution time and the need for memory storage. Note In almost all engineering problems, the precision provided by computers are enough. The computers using the IEEE format allow 52 bits to be used for the mantissa. Approximations and round-off errors 4

IV. Arithmetic manipulation of computer numbers Basic arithmetic operations such as addition, subtraction or multiplication can lead to significant round off errors. - Addition The mantissa of the number of the smaller exponent is modified so that the exponents are the same. If we consider a computer with just 4-digit mantissa and a 1-digit exponent, if we add 0.1557 10 1 to 0.4381 10-1, the following process will occur if chopping is used: 0.4318 10-1 0.004318 10 1 Then 0.1557 10 1 0.0043 10 1 --------------- 0.1600 10 1 - Subtraction The same thing as for addition happens with subtraction: 0.3641 10 2-0.2686 10 2 --------------- 0.0955 10 2 Due to the presence of the zero just before the (9), the result is normalized: 0.0955 10 2 0.9550 10 1 Note that we added a zero to fill the space of the 4-digit mantissa. - Multiplication 0.1363 10 3 0.6423 10-1 = 0.08754549 10 2 normalization 0.8754549 10 1 chopping 0.8754 10 1 The errors produced by these arithmetic manipulations may seem negligible, but, several methods in engineering require an iterative process to find the solution. The computations are, therefore, interdependent and this might lead to a dramatic increase in the round-off errors. Approximations and round-off errors 5

IV.1. Errors due to addition of large and small numbers 4000 + 0.0010 Is computed as: 0.4000 10 4 0.0000001 10 4 ---------------------- 0.4000001 10 4 chopping 0.4000 10 4 The small number is completely ignored This kind of problems usually occurs in the computation of infinite series where the first terms are large. To avoid this problem, you have to compute the series in an ascending order. IV.2. Subtractive cancellation This error occurs when we perform the substraction of nearly equal floating point numbers. Calculate 9.01 3 on a 3-decimal-digit computer. To avoid the problem of subtractive cancellation, use double precision (use the function Double(X) with SciLab or Matlab ). Single precision [32 bits] - 24 bits assigned to mantissa (first bit assumed =1 and not stored). - 8 bits to signed exponent. Double precisions [64 bits] - 56 bits assigned to mantissa. - 8 bits to signed exponent. Additional Information On June 4, 1996 an unmanned Ariane 5 rocket launched by the European Space Agency exploded just forty seconds after its lift-off from Kourou, French Guiana. The rocket was on its first voyage, after a decade of development costing $7 billion. The destroyed rocket and its cargo were valued at $500 million. A board of inquiry investigated the causes of the explosion and in two weeks issued a report. It turned out that the cause of the failure was a software error in the inertial reference system. Specifically a 64 bit floating point number relating to the horizontal velocity of the rocket with respect to the platform was converted to a 16 bit signed integer. The number was larger than 32,767, the largest integer storable in a 16 bit signed integer, and thus the conversion failed. Approximations and round-off errors 6

Roundoff error on Detroit Edison bills Detroit Edison's residential electric bill has a section titled "Energy Use Report." This section reports incorrect numbers due to improper integer roundoff. One of the fields gives the average daily energy use for the month in Kilowatt-hours, rounded to the nearest integer value. Another field gives the percent change against the same month for the previous year. The percent change is calculated using the rounded value for energy use. This can result in large errors. For example, my February 2005 use was 11.68 KWh/day, compared to 11.21 the previous year. After rounding this becomes 12 compared to 11, and the change is reported on the bill as 9 percent (12/11-1) instead of the correct 4 percent (11.68/11.21-1). I wrote to Detroit Edison about this. Their only response was an offer to "assist [you] in understanding how the percentage... is calculated." Since I already know how it is calculated (incorrectly), I declined the offer. Rounding error changes Parliament makeup Debora Weber-Wulff, 7 Apr 1992 Jim Rees We experienced a shattering computer error during a German election this past Sunday (5 April). The elections to the parliament for the state of Schleswig-Holstein were affected. German elections are quite complicated to calculate. First, there is the 5% clause: no party with less than 5% of the vote may be seated in parliament. All the votes for this party are lost. Seats are distributed by direct vote and by list. All persons winning a precinct vote (i.e. having more votes than any other candidate in the precinct) are seated. Then a complicated system (often D'Hondt, now they have newer systems) is invoked that seats persons from the party lists according to the proportion of the votes for each party. Often quite a number of extra seats (and office space and salaries) are necessary so that the seat distribution reflects the vote percentages each party got. On Sunday the votes were being counted, and it looked like the Green party was hanging on by their teeth to a vote percentage of exactly 5%. This meant that the Social Democrats (SPD) could not have anyone from their list seated, which was most unfortunate, as the candidate for minister president was number one on the list, and the SPD won all precincts: no extra seats needed. After midnight (and after the election results were published) someone discovered that the Greens actually only had 4,97% of the vote. The program that prints out the percentages only uses one place after the decimal, and had *rounded the count up* to 5%! This software had been used for *years*, and no one had thought to turn off the rounding at this very critical (and IMHO very undemocratic) region! So 4,97% of the votes were thrown away, the seats were recalculated, the SPD got to seat one person from the list, and now have a one seat majority in the parliament. And the newspapers are clucking about the "computers" making such a mistake. Approximations and round-off errors 7