This document has been written by Michaël Baudin from the Scilab Consortium. December 2010 The Scilab Consortium Digiteo. All rights reserved. - PDF

Description
SCILAB IS NOT NAIVE This document has been written by Michaël Baudin from the Scilab Consortium. December 2010 The Scilab Consortium Digiteo. All rights reserved. December 2010 Abstract Most of the time,

Please download to get full document.

View again

of 20
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Information
Category:

News & Politics

Publish on:

Views: 20 | Pages: 20

Extension: PDF | Download: 0

Share
Transcript
SCILAB IS NOT NAIVE This document has been written by Michaël Baudin from the Scilab Consortium. December 2010 The Scilab Consortium Digiteo. All rights reserved. December 2010 Abstract Most of the time, the mathematical formula is directly used in the Scilab source code. But, in many algorithms, some additional work is performed, which takes into account the fact that the computer does not process mathematical real values, but performs computations with their floating point representation. The goal of this article is to show that, in many situations, Scilab is not naive and use algorithms which have been specifically tailored for floating point computers. We analyze in this article the particular case of the quadratic equation, the complex division and the numerical derivatives. In each example, we show that the naive algorithm is not sufficiently accurate, while Scilab implementation is much more robust. Contents 1 Introduction An open-source document Introduction Quadratic equation Theory Experiments Massive cancellation Overflow Explanations Properties of the roots Floating-Point implementation : overview Floating-Point implementation : fixing massive cancellation Floating-Point implementation : fixing overflow problems Conditioning of the problem References Exercises Answers to exercises Numerical derivatives Theory Experiments Explanations 3.3.1 Floating point implementation Robust algorithm One more step References Complex division Theory Experiments Explanations Algebraic computations Smith s method One more step References Exercises Answers to exercises Conclusion 44 6 Acknowledgments 45 7 Appendix Why 0.1 is rounded Why sin(π) is rounded One more step Bibliography 50 Index 53 2 1 Introduction 1.1 An open-source document This document is an open-source project. The L A TEX sources are available on the Scilab Forge: The L A TEX sources are provided under the terms of the Creative Commons Attribution- ShareAlike 3.0 Unported License: The Scilab scripts are provided on the Forge, inside the project, under the scripts sub-directory. The scripts are available under the CeCiLL licence: 1.2 Introduction As a practical example of the problem considered in this document, consider the following numerical experiments. The following session is an example of a Scilab session, where we compute the real number 0.1 by two different, but mathematically equivalent ways. -- format (25) - - 0.1 ans = ans = 0.1 == ans = F I guess that for a person who has never heard of these problems, this experiment may be a shock. To get things clearer, let s check that the sinus function is also approximated in the sense that the value of sin(π) is not exactly zero. -- format (25) -- sin (0.0) ans = 0. -- sin ( %pi ) ans = With symbolic computation systems, such as Maple[32], Mathematica[41] or Maxima[2] for example, the calculations are performed with abstract mathematical symbols. Therefore, there is no loss of accuracy, as long as no numerical evaluation is performed. If a numerical solution is required as a rational number of the form p/q where p and q are integers and q 0, there is still no loss of accuracy. On the other 3 hand, in numerical computing systems, such as Scilab[7], Matlab[33] or Octave[3] for example, the computations are performed with floating point numbers. When a numerical value is stored, it is generally associated with a rounding error. The difficulty of numerical computations is generated by the fact that, while the mathematics treat with real numbers, the computer deals with their floating point representations. This is the difference between the naive, mathematical, approach, and the numerical, floating-point, implementation. In this article, we will not present the floating point arithmetic in detail. Instead, we will show examples of floating point issues by using the following algebraic and experimental approach. 1. First, we will derive the basic theory of a mathematical formula. 2. Then, we will implement it in Scilab and compare with the result given by the equivalent function provided by Scilab. As we will see, some particular cases do not work well with our formula, while the Scilab function computes a correct result. 3. Finally, we will analyze the reasons of the differences. Our numerical experiments will be based on Scilab. In order to measure the accuracy of the results, we will use two different measures of error: the relative error and the absolute error[20]. Assume that x c R is a computed value and x e R is the expected (exact) value. We are looking for a measure of the difference between these two real numbers. Most of the time, we use the relative error e r = x c x e, (1) x e where we assume that x e 0. The relative error e r is linked with the number of significant digits in the computed value x c. For example, if the relative error e r = 10 6, then the number of significant digits is 6. When the expected value is zero, the relative error cannot be computed, and we then use instead the absolute error e a = x c x e. (2) A practical way of checking the expected result of a computation is to compare the formula computed by hand with the result produced by a symbolic tool. Recently, Wolfram has launched the website which let us access to Mathematica with a classical web browser. Many examples in this document have been validated with this tool. In the following, we make a brief overview of floating point numbers used in Scilab. Real variables in Scilab are stored in double precision floating point variables. Indeed, Scilab uses the IEEE 754 standard so that real variables are stored with 64 bits floating point numbers, called doubles. The floating point number associated with a given x R will be denoted by fl(x). While the real numbers form a continuum, floating point numbers are both finite and bounded. Not all real numbers can be represented by a floating point number. 4 Indeed, there is a infinite number of reals, while there is a finite number of floating point numbers. In fact, there are, at most, 2 64 different 64 bits floating point numbers. This leads to roundoff, underflow and overflow. The double floating point numbers are associated with a machine epsilon equal to 2 52, which is approximately equal to This parameter is stored in the %eps Scilab variable. Therefore, we can expect, at best, approximately 16 significant decimal digits. This parameter does not depend on the machine we use. Indeed, be it a Linux or a Windows system, Scilab uses IEEE doubles. Therefore, the value of the %eps variable is always the same in Scilab. Negative normalized floating point numbers are in the range [ , ] and positive normalized floating point numbers are in the range [10 307, ]. The limits given in the previous intervals are only decimal approximations. Any real number greater than or smaller than is not representable as a double and is stored with the infinite value: in this case, we say that an overflow occurred. A real which magnitude is smaller than is not representable as a double and is stored as a zero: in this case, we say that an underflow occurred. The outline of this paper is the following. In the first section, we compute the roots of a quadratic equation. In the second section, we compute the numerical derivatives of a function. In the final section, we perform a numerically difficult division, with complex numbers. The examples presented in this introduction are presented in the appendix of this document. 2 Quadratic equation In this section, we analyze the computation of the roots of a quadratic polynomial. As we shall see, there is a whole world from the mathematical formulas to the implementation of such computations. In the first part, we briefly report the formulas which allow to compute the real roots of a quadratic equation with real coefficients. We then present the naive algorithm based on these mathematical formulas. In the second part, we make some experiments in Scilab and compare our naive algorithm with the roots Scilab function. In the third part, we analyze why and how floating point numbers must be taken into account when the roots of a quadratic are required. 2.1 Theory In this section, we present the mathematical formulas which allow to compute the real roots of a quadratic polynomial with real coefficients. We chose to begin by the example of a quadratic equation, because most of us exactly know (or think we know) how to solve such an equation with a computer. Assume that a, b, c R are given coefficients and a 0. Consider the following quadratic [4, 1, 5] equation: where x R is the unknown. ax 2 + bx + c = 0, (3) 5 Let us define by = b 2 4ac the discriminant of the quadratic equation. We consider the mathematical solution of the quadratic equation, depending on the sign of the discriminant = b 2 4ac. If 0, there are two real roots: If = 0, there is one double root: x = b, (4) 2a x + = b +. (5) 2a x ± = b 2a. (6) If 0, there are two complex roots: x ± = b 2a ± i 2a. (7) We now consider a simplified algorithm where we only compute the real roots of the quadratic, assuming that 0. This naive algorithm is presented in figure 1. input : a, b, c output: x, x + := b 2 4ac; s := ; x := ( b s)/(2a); x + := ( b + s)/(2a); Algorithm 1: Naive algorithm to compute the real roots of a quadratic equation. - We assume that Experiments In this section, we compare our naive algorithm with the roots function. We begin by defining a function which naively implements the mathematical formulas. Then we use our naive function on two particular examples. In the first example, we focus on massive cancellation and in the second example, we focus on overflow problems. The following Scilab function myroots is a straightforward implementation of the previous formulas. It takes as input the coefficients of the quadratic, stored in the vector variable p, and returns the two roots in the vector r. function r= myroots (p) c= coeff (p,0); b= coeff (p,1); a= coeff (p,2); r (1)=( - b+ sqrt (b ^2-4* a*c ))/(2* a); r (2)=( -b- sqrt (b ^2-4* a*c ))/(2* a); endfunction 6 2.2.1 Massive cancellation We analyze the rounding errors which are appearing when the discriminant of the quadratic equation is such that b 2 4ac. We consider the following quadratic equation ɛx 2 + (1/ɛ)x ɛ = 0 (8) with ɛ 0. The discriminant of this equation is = 1/ɛ 2 + 4ɛ 2. The two real solutions of the quadratic equation are x = 1/ɛ 1/ɛ 2 + 4ɛ 2 2ɛ, x + = 1/ɛ + 1/ɛ 2 + 4ɛ 2. (9) 2ɛ We are mainly interested in the case where the magnitude of ɛ is very small. The roots are approximated by x 1/ɛ 2, x + ɛ 2, (10) when ɛ is close to zero. We now consider the limit of the two roots when ɛ 0. We have lim x =, ɛ 0 lim x + = 0. (11) ɛ 0 In the following Scilab script, we compare the roots computed by the roots function and the roots computed by our naive function. Only the positive root x + ɛ 2 is considered in this test. Indeed, the x root is so that x in both implementations. We consider the special case ɛ = = We begin by creating a polynomial with the poly function, which is given the coefficients of the polynomial. The variable e1 contains the expected value of the positive root x + = ɛ 2. Then we compute the roots r1 and r2 with the two functions roots and myroots. We finally compute the relative errors error1 and error2. p= poly ([ ], x , coeff ); e1 = 1e -8; roots1 = myroots (p); r1 = roots1 (1); roots2 = roots (p); r2 = roots2 (1); error1 = abs (r1 -e1 )/ e1; error2 = abs (r2 -e1 )/ e1; printf ( Expected : %e\n , e1 ); printf ( Naive method : %e ( error =%e )\n , r1, error1 ); printf ( Scilab method : %e ( error =%e )\n , r2, error2 ); The previous script produces the following output. Expected : e -008 Naive method : e -009 ( error = e -002) Scilab method : e -008 ( error = e - 016) We see that the naive method produces a root which has no significant digit and a relative error which is 14 orders of magnitude greater than the relative error of the Scilab root. 7 This behavior is explained by the fact that the expression for the positive root x + given by the equality 5 is numerically evaluated as following. We first consider how the discriminant = 1/ɛ 2 + 4ɛ 2 is computed. The term 1/ɛ 2 is equal to and the term 4ɛ 2 is equal to Therefore, the sum of these two terms is equal to Hence, the square root of the discriminant is 1/ɛ2 + 4ɛ 2 = (12) As we see, the first digits are correct, but the last digits are subject to rounding errors. When the expression 1/ɛ + 1/ɛ 2 + 4ɛ 2 is evaluated, the following computations are performed : 1/ɛ + 1/ɛ 2 + 4ɛ 2 = (13) = (14) We see that the result is mainly driven by the cancellation of significant digits. We may think that the result is extreme, but it is not. For example, consider the case where we reduce further the value of ɛ down to ɛ = 10 11, we get the following output : Expected : e -022 Naive method : e +000 ( error = e +000) Scilab method : e -022 ( error = e -016) The relative error is this time 16 orders of magnitude greater than the relative error of the Scilab root. There is no significant decimal digit in the result. In fact, the naive implementation computes a false root x + even for a value of epsilon equal to ɛ = 10 3, where the relative error is 7 orders of magnitude greater than the relative error produced by the roots function Overflow In this section, we analyse the overflow which appears when the discriminant of the quadratic equation is such that b 2 4ac is not representable as a double. We consider the following quadratic equation x 2 + (1/ɛ)x + 1 = 0 (15) with ɛ 0. We especially consider the case ɛ 0. The discriminant of this equation is = 1/ɛ 2 4. Assume that the discriminant is positive. Therefore, the roots of the quadratic equation are x = 1/ɛ 1/ɛ These roots are approximated by, x + = 1/ɛ + 1/ɛ 2 4. (16) 2 x 1/ɛ, x + ɛ, (17) when ɛ is close to zero. We now consider the limit of the two roots when ɛ 0. We have lim x =, ɛ 0 lim x + = 0. (18) ɛ 0 8 To create a difficult case, we search ɛ so that 1/ɛ 2 , because we know that is the maximum representable double precision floating point number. Therefore, we expect that something should go wrong in the computation of the expression 1/ɛ2 4. We choose ɛ = In the following script, we compare the roots computed by the roots function and our naive implementation. e =1.e -155 a = 1; b = 1/e; c = 1; p= poly ([c b a], x , coeff ); expected = [-e; -1/e]; roots1 = myroots (p); roots2 = roots (p); error1 = abs ( roots1 - expected )/ norm ( expected ); error2 = abs ( roots2 - expected )/ norm ( expected ); printf ( Expected : %e %e\n , expected (1), expected (2)); printf ( Naive method : %e %e ( error =%e %e )\n ,... roots1 (1), roots1 (2), error1 (1), error1 (2)); printf ( Scilab method : %e %e ( error =%e %e )\n ,... roots2 (1), roots2 (2), error2 (1), error2 (2)); The previous script produces the following output. Expected : e e +155 Naive method : Inf Inf ( error = Nan Nan ) Scilab method : e e +155 ( error = e e +000) In this case, the discriminant = b 2 4ac has been evaluated as 1/ɛ 2 4, which is approximately equal to This number cannot be represented in a double precision floating point number. It therefore produces the IEEE Infinite number, which is displayed by Scilab as Inf. The Infinite number is associated with an algebra and functions can perfectly take this number as input. Therefore, when the square root function must compute, it produces again Inf. This number is then propagated into the final roots. 2.3 Explanations In this section, we suggest robust methods to compute the roots of a quadratic equation. The methods presented in this section are extracted from the quad routine of the RPOLY algorithm by Jenkins and Traub [24, 23]. This algorithm is used by Scilab in the roots function, where a special case is used when the degree of the equation is equal to 2, i.e. a quadratic equation Properties of the roots In this section, we present elementary results, which will be used in the derivation of robust floating point formulas of the roots of the quadratic equation. Let us assume that the quadratic equation 3, with real coefficients a, b, c R and a 0 has a positive discriminant = b 2 4ac. Therefore, the two real roots of 9 the quadratic equation are given by the equations 4 and 5. We can prove that the sum and the product of the roots satisfy the equations x + x + = b a, x x + = c a. (19) Therefore, the roots are the solution of the normalized quadratic equation x 2 (x + x + )x + x x + = 0. (20) Another transformation leads to an alternative form of the roots. Indeed, the original quadratic equation can be written as a quadratic equation of the unknown 1/x. Consider the quadratic equation 3 and divide it by 1/x 2, assuming that x 0. This leads to the equation c(1/x) 2 + b(1/x) + a = 0, (21) where we assume that x 0. The two real roots of the quadratic equation 21 are x = x + = 2c b + b 2 4ac, (22) 2c b b 2 4ac. (23) The expressions 22 and 23 can also be derived directly from the equations 4 and 5. For that purpose, it suffices to multiply their numerator and denominator by b + b 2 4ac Floating-Point implementation : overview The numerical experiments presented in sections and suggest that the floating point implementation must deal with two different problems: massive cancellation when b 2 4ac because of the cancellation of the terms b and ± b 2 4ac which may have opposite signs, overflow in the computation of the square root of the discriminant ±(b 2 4ac) when b 2 4ac is not representable as a floating point number. The cancellation problem occurs only when the discriminant is positive, i.e. only when there are two real roots. Indeed, the cancellation will not appear when 0, since the complex roots do not use the sum b ± b 2 4ac. When = 0, the double real root does not cause any trouble. Therefore, we must take into account for the cancellation problem only in the equations 4 and 5. On the other hand, the overflow problem occurs whatever the sign of the discriminant but does not occur when = 0. Therefore, we must take into account for this problem in the equations 4, 5 and 7. In section 2.3.3, we focus on the cancellation error while the overflow problem is addressed in section 2.3.3 Floating-Point implementation : fixing massive cancellation In this section, we present the computation of the roots of a quadratic equation with protection against massive cancellation. When the discriminant is positive, the massive cancellation problem can be split in two cases: if b 0, then b b 2 4ac may suffer of massive cancellation because b is positive and b 2 4ac is negative, if b 0, then b + b 2 4ac may suffer of massive cancellation because b is negative and b 2 4ac is positive. Therefore, if b 0, we should use the expression b b 2 4ac, if b 0, we should use the expression b + b 2 4ac. The solution consists in a combination of the following expressions of the roots given by, on one hand the equations 4 and 5, and, on the other hand the equations 22 and 23. We pick the formula so that the sign of b is the same as the sign of the square root. The following choice allow to solve the massive cancellation problem: if b 0, then compute x from 22, else (if b 0), compute x from 4, if b 0, then compute x + from 5, else (if b 0), compute x + from 23. We can also consider the modified Fagnano formulas where the sign function is defined by 2c x 1 = b + sgn(b) b 2 4ac, (24) x 2 = b + sgn(b) b 2 4ac, (25) 2a sgn(b) = { 1, if b 0, 1, if b 0. (26) The roots x 1,2 correspond to the roots x +,. Indeed, on one hand, if b 0, x 1 = x and if b 0, x 1 = x +. On the other hand, if b 0, x 2 = x + and if b 0, x 2 = x. Moreover, we notice that the division by two (and the multiplication by 2) is exact with floating point numbers so these operations cannot be a source of problem. But it is interesting to use b/2, which involves o
Related Search
Similar documents
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks