Scientiﬁ c Computing

(1)

Walter Gander · Martin J. Gander · Felix Kwok

Scientifi c

Computing

An Introduction

using Maple and MATLAB

Editorial Board T. J.Barth M.Griebel D.E.Keyes R.M.Nieminen D.Roose T.Schlick

11

(2)

Texts in Computational

Science and Engineering 11

Editors

Timothy J. Barth Michael Griebel David E. Keyes Risto M. Nieminen Dirk Roose

Tamar Schlick

For further volumes:

http://www.springer.com/series/5151

(3)

(4)

Walter Gander

^•

Martin J. Gander

^•

Felix Kwok

Scientific Computing

An Introduction using Maple and MATLAB

123

(5)

ETH Z¨urich Z¨urich Switzerland

Section de Mathématiques Université de Genève Genève

Switzerland

ISSN 1611-0994

ISBN 978-3-319-04324-1 ISBN 978-3-319-04325-8 (eBook) DOI 10.1007/978-3-319-04325-8

Springer Cham Heidelberg New York Dordrecht London

Library of Congress Control Number: 2014937000 Mathematics Subject Classification (2010): 65-00, 65-01 c Springer International Publishing Switzerland 2014

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this pub- lication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’s location, in its current version, and permission for use must always be obtained from Springer. Permis- sions for use may be obtained through RightsLink at the Copyright Clearance Center. Violations are liable to prosecution under the respective Copyright Law.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publica- tion does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

While the advice and information in this book are believed to be true and accurate at the date of pub- lication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein.

Printed on acid-free paper

Springer is part of Springer Science+Business Media (www.springer.com)

(6)

V

This book is dedicated to Professor Gene H. Golub

1932–2007

(picture by Jill Knuth)

The three authors represent three generations of mathematicians who have been enormously inﬂuenced by Gene Golub.

He shaped our lives and our academic careers through his advice, his leadership, his friendship and his care for younger scientists.

We are indebted and will always honor his memory.

(7)

(8)

Preface

We are conducting ever more complex computations built upon the assumption that the underlying numer- ical methods are mature and reliable.

When we bundle existing algorithms into libraries and wrap them into packages to facilitate easy use, we create de facto standards that make it easy to ignore numerical analysis.

John Guckenheimer, president SIAM, in SIAM News, June 1998: Numerical Computation in the Information Age

When redrafting the book I was tempted to present the algorithms in ALGOL, but decided that the diﬃculties of providing procedures which were correct in every de- tail were prohibitive at this stage.

James Wilkinson, The Algebraic Eigenvalue Problem, Ox- ford University Press, 1988.

This book is an introduction toscientiﬁc computing, the mathematical modeling in science and engineering and the study of how to exploit computers in the solution of technical and scientiﬁc problems. It is based on mathematics, numerical and symbolic/algebraic computations, parallel/distributed processing and visualization. It is also a popular and growing area — many new curricula incomputational science and engineering have been, and continue to be, developed, leading to new academic degrees and even entire new disciplines.

A prerequisite for this development is the ubiquitous presence of computers, which are being used by virtually every student and scientist. While traditional scientiﬁc work is based on developing theories and performing experiments, the possibility to use computers at any time has created a third way of increasing our knowledge, which is through modeling and simulation.

The use of simulation is further facilitated by the availability of sophisticated, robust and easy-to-use software libraries. This has the obvious advantage of shielding the user from the underlying numerics; however, this also has the danger of leaving the user unaware of the limitations of the algorithms, which can lead to incorrect results when used improperly. Moreover, some algorithms can be fast for certain types of problems but highly ineﬃcient for others. Thus, it is important for the user to be able to make an informed decision on which algorithms to use, based on the properties of the problem to be solved. The goal of this book is to familiarize the reader with the basic

(9)

concepts of scientiﬁc computing and algorithms that form the workhorses of many numerical libraries. In fact, we will also emphasize the eﬀective implementation of the algorithms discussed.

Numerical scientific computing has a long history; in fact, computers were first built for this purpose. Konrad Zuse [154] built his first (mechanical) computer in 1938 because he wanted to have a machine that would solve systems of linear equations that arise, e.g., when a civil engineer designs a bridge. At about the same time (and independently), Howard H. Aiken wanted to build a machine that would solve systems of ordinary differential equations [17].

The ﬁrst high quality software libraries contained indeed numerical algorithms. They were produced in an international eﬀort in the programming language ALGOL60 [111], and are described in the handbook “Numerical Algebra” [148]. These fundamental procedures for solving linear equations and eigenvalue problems were developed further, rewritten in FORTRAN, and became the LINPACK [26] and EISPACK [47] libraries. They are still in use and available at www.netlib.orgfrom Netlib. In order to help students to use this software, Cleve Moler created around 1980 a friendly in- terface to those subroutines, which he called Matlab(Matrix Laboratory).

Matlabwas so successful that a company was founded: MathWorks. Today, Matlab is “the language of technical computing”, a very powerful tool in scientiﬁc computing.

Parallel to the development of numerical libraries, there were also efforts to do exact and algebraic computations. The first computer algebra systems were created some 50 years ago: At ETH, Max Engeli createdSymbal, and at MIT, Joel Moses Macsyma. Macsyma is the oldest system that is still available. However, computer algebra computations require much more computer resources than numerical calculations. Therefore, only when computers became more powerful did these systems flourish. Today the market leaders areMathematicaandMaple.

Often, a problem may be solved analytically (“exactly”) by a computer algebra system. In general, however, analytical solutions do not exist, and numerical approximations or other special techniques must be used instead.

Moreover, computer Algebra is a very powerful tool for deriving numerical algorithms; we use Maplefor this purpose in several chapters of this book.

Thus, computer algebra systems and numerical libraries are complementary tools: working with both is essential in scientiﬁc computing. We have chosen MatlabandMapleas basic tools for this book. Nonetheless, we are aware that the diﬀerence between pure computer algebra systems and numerical Matlab-like systems is disappearing, and the two may merge and become indistinguishable by the user in the near future.

(10)

IX

How to use this book

Prerequisites for understanding this book are courses in calculus and linear algebra. The content of this book is too much for a typical one semester course in scientific computing. However, the instructor can choose those sections that he wishes to teach and that fit his schedule. For example, for an introductory course in scientific computing, one can very well use the least squares chapter and teach only one of the methods for computing the QR decomposition. However, for an advanced course focused solely on least squares methods, one may also wish to consider the singular value decomposition (SVD) as a computational tool for solving least squares problems. In this case, the book also provides a detailed description on how to compute the SVD in the chapter on eigenvalues. The material is presented in such a way that a student can also learn directly from the book. To help the reader navigate the volume, we provide in section1.2some sample courses that have been taught by the authors at various institutions.

The focus of the book is algorithms: we would like to explain to the students how some fundamental functions in mathematical software are designed. Many exercises require programming in Matlab or Maple, since we feel it is important for students to gain experience in using such powerful software systems. They should also know about their limitations and be aware of the issue addressed by John Guckenheimer. We tried to include meaningful examples and problems, not just academic exercises.

Acknowledgments

The authors would like to thank Oscar Chinellato, Ellis Whitehead, Oliver Ernst and Laurence Halpern for their careful proofreading and helpful sug- gestions.

Walter Gander is indebted to Hong Kong Baptist University (HKBU) and especially to its Vice President Academic, Franklin Luk, for giving him the opportunity to continue to teach students after his retirement at ETH.

Several chapters of this book have been presented and improved successfully in courses at HKBU. We are also thankful to the University of Geneva, where we met many times to ﬁnalize the manuscript.

Geneva and Z¨urich, August 2013 Walter Gander, Martin J. Gander, Felix Kwok

(11)

(12)

Chapter 1. Why Study Scientiﬁc Computing?

Computational Science and Engineering (CS&E) is now widely accepted, along with theory and experiment, as a crucial third mode of scientiﬁc investigation and en- gineering design. Aerospace, automotive, biological, chemical, semiconductor, and other industrial sectors now rely on simulation for technical decision support.

Introduction to the First SIAM Conference on Computa- tional Science and Engineering, September 21–24, 2000, Washington DC.

The emergence of scientiﬁc computing as a vital part of science and engineering coincides with the explosion in computing power in the past 50 years. Many physical phenomena have been well understood and have accurate models describing them since the late 1800s, but before the widespread use of computers, scientists and engineers were forced to make many simplify- ing assumptions in the models in order to make them solvable by pencil-and- paper methods, such as series expansion. With the increase of computing power, however, one can aﬀord to use numerical methods that are compu- tationally intensive but that can tackle the full models without the need to simplify them. Nonetheless, every method has its limitations, and one must understand how they work in order to use them correctly.

1.1 Example: Designing a Suspension Bridge

To get an idea of the kinds of numerical methods that are used in engineering problems, let us consider the design of a simplesuspension bridge. The bridge consists of a pair of ropes fastened on both sides of the gorge, see Figure 1.1. Wooden supports going across the bridge are attached to the ropes at regularly spaced intervals. Wooden boards are then fastened between the supports to form the deck. We would like to calculate the shape of the bridge as well as the tension in the rope supporting it.

1.1.1 Constructing a Model

Let us construct a simple one-dimensional model of the bridge structure by assuming that the bridge does not rock side to side. To calculate the shape of the bridge, we need to know the forces that are exerted on the ropes by the supports. LetLbe the length of the bridge andx be the distance from one

W. Gander et al.,Scientific Computing - An Introduction using Maple and MATLAB, Texts in Computational Science and Engineering 11,

DOI 10.1007/978-3-319-04325-8 1,

©Springer International Publishing Switzerland 2014

(21)

Supports

Deck boards (Others not shown) Ropes

Figure 1.1. A simple suspension bridge.

Figure 1.2. Force diagram for the bridge example.

end of the bridge. Assume that the supports are located at x_i,i= 1, . . . , n, withh being the spacing between supports. Let w(x) be the force per unit distance exerted on the deck atxby gravity, due to the weight of the deck and of the people on it. If we assume that any weight on the segment [xi−1, xi] are exerted entirely on the supports atx_i−1andx_i, then the forcef_iexerted on the rope by the support atxican be written as

fi= x_i

x_i−1

w(x)(x−xi−1)dx+ x_i+1

x_i

w(x)(xi+1−x)dx

. (1.1) We now consider the rope as an elastic string, which is stretched by the force exerted by the wooden supports. Let ui be the height of the bridge at xi, T_i−1/2be the tension of the segment of the rope between xi−1 andxi, and θi−1/2be the angle it makes with the horizontal. Figure1.2shows the force diagram on the rope atxi.

Since there is no horizontal displacement in the bridge, the horizontal forces must balance out, meaning

Ti−1/2cos(θi−1/2) =Ti+1/2cos(θi+1/2) =C,

(22)

Example: Designing a Suspension Bridge 3

whereK is a constant. Vertical force balance then gives T_i+1/2sin(θ_i+1/2)−T_i−1/2sin(θ_i−1/2) =f_i, or

Ctan(θ_i+1/2)−Ctan(θ_i−1/2) =fi. But

tan(θ_i+1/2) = u_i+1−u_i

h ,

so we in fact have

K(ui+1−2ui+ui−1)

h =fi, i= 1, . . . , n, (1.2) where u0 and un+1 are the known heights of the bridge at its ends and u1, . . . , un are the unknown heights.

1.1.2 Simulating the Bridge

Now, if we want to compute the shape of the bridge and the tensionsT_i−1/2, we must ﬁrst calculate the forces fi from (1.1), and then solve the system of linear equations (1.2). To calculate the fi, one must evaluate integrals, which may not be analytically feasible for certain weight distributionsw(x).

Instead, one can approximate the integral numerically using aRiemann sum, for instance:

xi

x_i−1

w(x)(x−x_i−1)dx≈ 1 N

N j=1

w(x_i−1+jh/N)·h j.

For largeN, this converges to the exact value of the integral, but the error behaves like 1/N; this means if we want to have three decimal digits of accuracy in the answer, one would need approximately 10³points. There are other formulas that give more accurate values with fewer number of points;

this is discussed in more detail in Chapter9.

The next step is to solve (1.2) for theu_i. This can be rewritten as Au=f,

where A ∈ R^n×n is a matrix, u ∈ Rⁿ is the vector of unknowns, and f is the vector of forces we just calculated. This system can be solved by Gaussian elimination, i.e., by row reducing the matrix, as taught in a basic linear algebra course. So for n = 4, a uniform distribution w(x) = 1, and

(23)

u₀=u_n+1= 0, we can calculate

⎛

⎜⎜

⎝

−2 1 0 0 1

1 −2 1 0 1

0 1 −2 1 1

0 0 1 −2 1

⎞

⎟⎟

⎠−→

⎛

⎜⎜

⎝

−2 1 0 0 1 0 −³₂ 1 0 ³₂

0 1 −2 1 1

0 0 1 −2 1

⎞

⎟⎟

⎠

−→

⎛

⎜⎜

⎝

−2 1 0 0 1 0 −³2 1 0 ³₂

0 0 −⁴₃ 1 2

0 0 1 −2 1

⎞

⎟⎟

⎠−→

⎛

⎜⎜

⎝

−2 1 0 0 1 0 −³2 1 0 ³₂

0 0 −⁴₃ 1 2

0 0 0 −⁵₄ ⁵₂

⎞

⎟⎟

⎠.

Back substitution givesu= _K^h(−2,−3,−3,−2). However, one often wishes to calculate the shape of the bridge under different weight distributionsw(x), e.g., when people are standing on different parts of the bridge. So the matrix Astays the same, but the right-hand side f changes to reflect the different weight distributions. It would be a waste to have to redo the row reductions every time, when onlyf has changed! A much better way is to use theLU decomposition, which writes the matrix A in factored form and reuses the factors to solve equations with different right-hand sides. This is shown in Chapter3.

In the above row reduction, we can see easily that there are many zero entries that need not be calculated, but the computer has no way of knowing that in advance. In fact, the number of additions and multiplications required for solving the generic (i.e., full) linear system is proportional ton³, whereas in our case, we only need aboutnadditions and multiplications because of the many zero entries. To take advantage of thesparsenature of the matrix, one needs to store it diﬀerently and use diﬀerent algorithms on it. One possibility is to use thebanded matrix format; this is shown in Section3.6.

Suppose now that the people on the bridge have moved, but only by a few meters. The shape of the bridge would have only changed slightly, since the weight distribution is not very diﬀerent. Thus, instead of solving a new linear system from scratch, one could imagine using the previous shape as a ﬁrst guess and make small corrections to the solution until it matches the new distribution. This is the basis ofiterative methods, which are discussed in Chapter11.

1.1.3 Calculating Resonance Frequencies

A well-designed bridge should never collapse, but there have been spectacular bridge failures in history. One particularly memorable one was the collapse of the Tacoma Narrows bridgeon November 7, 1940. On that day, powerful wind gusts have excited a natural resonance mode of the bridge, setting it into a twisting motion that it was not designed to withstand. As the winds continued, the amplitude of the twisting motion grew, until the bridge

(24)

Example: Designing a Suspension Bridge 5

eventually collapsed¹.

It turns out that one can study the resonance modes of the bridge by considering theeigenvalue problem

Au=λu,

cf. [37]. Clearly, a two-dimensional model is needed to study the twisting motion mentioned above, but let us illustrate the ideas by considering the eigenvalues of the 1D model for n = 4. For this simple problem, one can guess the eigenvectors and verify that

u^(k)= (sin(kπ/5),sin(2kπ/5),sin(3kπ/5),sin(4π/5)), k= 1,2,3,4 are in fact eigenvectors with associated eigenvaluesλ^(k)=−2 + 2 cos(kπ/5).

However, for more complicated problems, such as one with varying mass along the bridge or for 2D problems, it is no longer possible to guess the eigenvectors. Moreover, the characteristic polynomial

P(λ) = det(λI−A)

is a polynomial of degree n, and it is well known that no general formula exists for finding the roots of such polynomials forn≥5. In Chapter7, we will present numerical algorithms for finding the eigenvalues of A. In fact, the problem of finding eigenvalues numerically also requires approximately n³ operations, just like Gaussian elimination. This is in stark contrast with the theoretical point of view that linear systems are “easy” and polynomial root-finding is “impossible”. To quote the eminent numerical analyst Nick Trefethen [139],

Abel and Galois notwithstanding, large-scale matrix eigenvalue problems are about as easy to solve in practice as linear systems of equations.

1.1.4 Matching Simulations with Experiments

When modeling the bridge in the design process, we must use many parameters, such as the weight of the deck (expressed in terms of the mass densityρ per unit length) and the elasticity constantKof the supporting rope. In real- ity, these quantities depend on the actual material used during construction, and may deviate from the nominal values assumed during the design process. To get an accurate model of the bridge for later simulation, one needs to estimate these parameters from measurements taken during experiments.

For example, we can measure the vertical displacementsyiof the constructed bridge at points xi, and compare it with the displacementsui predicted by the model, i.e., the displacements satisfying Au= f. Since both Aand f

1http://www.youtube.com/watch?v=3mclp9QmCGs

(25)

depend on the model parameters ρ andK, the u_i also depend on these parameters; thus, the mismatch between the model and the experimental data can be expressed as a function ofρandK:

F(ρ, K) = n i=1

|yi−ui(ρ, K)|². (1.3) Thus, we can estimate the parameters by ﬁnding the optimal parametersρ^∗ andK^∗that minimizeF. There are several ways of calculating the minimum:

1. Using multivariate calculus, we know that

∂F

∂ρ(ρ^∗, K^∗) = 0, ∂F

∂K(ρ^∗, K^∗) = 0. (1.4) Thus, we have a system of two nonlinear equations in two unknowns, which must then be solved to obtain ρ^∗ and K^∗. This can be solved by many methods, the best known of which is Newton’s method. Such methods are discussed in Chapter5.

2. The above approach has the disadvantage that (1.4) is satisﬁed by all stationary points of F(ρ, K), i.e., both the maxima and the minima of F(ρ, K). Since we are only interested in the minima of the function, a more direct approach would be to start with an initial guess (ρ⁰, K⁰) (e.g., the nominal design values) and then ﬁnd successively better approximations (ρ^k, K^k),k= 1,2,3,that reduce the mismatch, i.e.,

F(ρ^k+1, K^k+1)≤F(ρ^k, K^k).

This is the basis of optimization algorithms, which can be applied to other minimization problems. Such methods are discussed in detail in Chapter12.

3. The function F(ρ, K) in (1.3) has a very special structure in that it is a sum of squares of the diﬀerences. As a result, the minimization problem is known as aleast-squares problem. Least-squares problems, in particular linear ones, often arise because they yield the best unbiased estimator in the statistical sense for linear models. Because of the prevalence and special structure of least-squares problems, it is possible to design specialized methods that are more eﬃcient and/or robust for these problems than general optimization algorithms. One example is theGauss–Newton method, which resembles a Newton method, except that second-order derivative terms are dropped to save on computation.

This and other methods are presented in Chapter6.

1.2 Navigating this Book: Sample Courses

This book intentionally contains too many topics to be done from cover to cover, even for an intensive full-year course. In fact, many chapters contain

(26)

Navigating this Book: Sample Courses 7

enough material for stand-alone semester courses on their respective topics.

To help instructors and students navigate through the volume, we provide some sample courses that can be built from its sections.

1.2.1 A First Course in Numerical Analysis

The following sections have been used to build the ﬁrst year numerical analysis course at the University of Geneva in 2011–12 (54 hours of lectures).

1. Finite precision arithmetic (2.1–2.6) 2. Linear systems (3.2–3.4)

3. Interpolation and FFT (4.2.1–4.2.4,4.3.1,4.4) 4. Nonlinear equations (5.2.1–5.2.3,5.4)

5. Linear and nonlinear least squares (6.1–6.8,6.8.2,6.8.3,6.5.1,6.5.2) 6. Iterative methods (11.1–11.2.5,11.3.2–11.3.4,11.7.1)

7. Eigenvalue problems (7.1,7.2,7.4,7.5.2,7.6) 8. Singular value decomposition (6.3)

9. Numerical integration (9.1,9.2,9.3,9.4.1–9.4.2) 10. Ordinary diﬀerential equations (10.1,10.3)

A ﬁrst term course at Stanford for computer science students in 1996 and 1997 (’Introduction to Scientiﬁc Computing using MapleandMatlab, 40 hours of lectures) was built using

1. Finite precision arithmetic (2.2)

2. Nonlinear equations (5.2.1–5.2.3,5.2.5,5.2.7,5.4)

3. Linear systems (3.2.1, 3.2.2, 3.2.3, 11.2–11.2.3, 11.3.2, 11.3.3, 11.4, 11.7.1)

4. Interpolation (4.2.1–4.2.4,4.3.1) 5. Least Squares (6.2,6.5.1,6.8.2) 6. Diﬀerentiation (8.2,8.2.1)

7. Quadrature (9.2,9.2.4,9.3.1,9.3.2,9.4.1–9.4.2) 8. Eigenvalue problems (7.3,7.4,7.6)

9. Ordinary diﬀerential equations (10.1,10.3,10.4)

(27)

1.2.2 Advanced Courses

The following advanced undergraduate/graduate courses (38 hours of lectures each) have been taught at Baptist University in Hong Kong between 2010 and 2013. We include a list of chapters from which these courses were built.

1. Eigenvalues and Iterative Methods for Linear Equations (Chapters 7, 11)

2. Least Squares (Chapter6)

3. Quadrature and Ordinary Diﬀerential Equations (Chapters9and10) At the University of Geneva, the following graduate courses (28 hours of lectures, and 14 hours of exercises) have been taught between 2004 and 2011:

1. Iterative Methods for Linear Equations (Chapter11) 2. Optimization (Chapter12)

1.2.3 Dependencies Between Chapters

Chapter2on ﬁnite precision arithmetic and Chapter3on linear equations are required for most, if not all, of the subsequent chapters. At the beginning of each chapter, we give a list of sections that are prerequisites to understanding the material. Readers who are not familiar with these sections should refer to them ﬁrst before proceeding.

(28)

Chapter 2. Finite Precision Arithmetic

In the past 15 years many numerical analysts have pro- gressed from being queer people in mathematics depart- ments to being queer people in computer science depart- ments!

George Forsythe, What to do till the computer scientist comes. Amer. Math. Monthly 75, 1968.

It is hardly surprising that numerical analysis is widely regarded as an unglamorous subject. In fact, mathe- maticians, physicists, and computer scientists have all tended to hold numerical analysis in low esteem for many years – a most unusual consensus.

Nick Trefethen, The deﬁnition of numerical analysis, SIAM news, November 1992.

The golden age of numerical analysis has not yet started.

Volker Mehrmann, round table discussion ”Future Direc- tions in Numerical Analysis,” moderated by Gene Golub and Nick Trefethen at ICIAM 2007.

Finite precision arithmetic underlies all the computations performed numerically, e.g. inMatlab; only symbolic computations, e.g. Maple, are largely independent of finite precision arithmetic. Historically, when the invention of computers allowed a large number of operations to be performed in very rapid succession, nobody knew what the influence of finite precision arithmetic would be on this many operations: would small rounding errors sum up rapidly and destroy results? Would they statistically cancel? The early days of numerical analysis were therefore dominated by the study of rounding errors, and made this rapidly developing field not very attractive (see the quote above). Fortunately, this view of numerical analysis has since changed, and nowadays the focus of numerical analysis is the study of algorithms for the problems of continuous mathematics¹. There are nonetheless a few pitfalls every person involved in scientific computing should know, and this chapter is precisely here for this reason. After an introductory example in Section 2.1, we present the difference between real numbers and machine numbers in Section 2.2 on a generic, abstract level, and give for the more computer science oriented reader the concrete IEEE arithmetic standard in Section2.3. We then discuss the influence of rounding errors on operations in

1Nick Trefethen, The deﬁnition of numerical analysis, SIAM News, November 1992 W. Gander et al.,Scientific Computing - An Introduction using Maple and MATLAB, Texts in Computational Science and Engineering 11,

DOI 10.1007/978-3-319-04325-8 2,

©Springer International Publishing Switzerland 2014

(29)

Section 2.4, and explain the predominant pitfall of catastrophic cancellation when computing diﬀerences. In Section2.5, we explain in very general terms what the condition number of a problem is, and then show in Section 2.6 two properties of algorithms for a given problem, namely forward stability and backward stability. It is the understanding of condition numbers and stability that allowed numerical analysts to move away from the study of rounding errors, and to focus on algorithmic development. Sections2.7 and 2.8represent a treasure trove with advanced tips and tricks when computing in ﬁnite precision arithmetic.

2.1 Introductory Example

A very old problem already studied by ancient Greek mathematicians is the squaring of a circle. The problem consists of constructing a square that has the same area as the unit circle. Finding a method for transforming a circle into a square this way (quadrature of the circle) became a famous problem that remained unsolved until the 19th century, when it was proved using Galois theory that the problem cannot be solved with the straight edge and compass.

We know today that the area of a circle is given by A = πr², where r denotes the radius of the circle. An approximation is obtained by draw- ing a regular polygon inside the circle, and by computing the surface of the polygon. The approximation is improved by increasing the number of sides. Archimedes managed to produce a 96-sided polygon, and was able to bracket π in the interval (3¹⁰₇₁,3¹₇). The enclosing interval has length 1/497 = 0.00201207243 — surely good enough for most practical applica- tions in his time.

Fn

2 cos^α2ⁿ

r= 1 sin^α2ⁿ

αn

2

C

B A

Figure 2.1. Squaring of a circle

To compute such a polygonal approximation ofπ, we consider Figure2.1.

Without loss of generality, we may assume thatr = 1. Then the areaFnof the isosceles triangleABCwith center angleαn:= ^2π_n is

Fn= cosα_n 2 sinα_n

2 ,

(30)

Real Numbers and Machine Numbers 11

and the area of the associatedn-sided polygon becomes An=nFn= n

2

2 cosα_n 2 sinα_n

2

= n

2sinαn= n 2sin

2π n

. Clearly, computing the approximationAnusingπwould be rather contradic- tory. Fortunately,A2ncan be derived fromAn by simple algebraic transfor- mations, i.e. by expressing sin(αn/2) in terms of sinαn. This can be achieved by using identities for trigonometric functions:

sinαn

2 =

1−cosαn

2 =

1−

1−sin²αn

2 . (2.1)

Thus, we have obtained a recurrence for sin(α_n/2) from sinα_n. To start the recurrence, we compute the area A6 of the regular hexagon. The length of each side of the six equilateral triangles is 1 and the angle is α6 = 60^◦, so that sinα6 =

√3

2 . Therefore, the area of the triangle is F6 = √

3/4 and A₆= 3^√₂³. We obtain the following program for computing the sequence of approximations An:

Algorithm 2.1. Computation ofπ, Naive Version

s=sqrt(3)/2; A=3*s; n=6; % initialization

z=[A-pi n A s]; % store the results

while s>1e-10 % termination if s=sin(alpha) small s=sqrt((1-sqrt(1-s*s))/2); % new sin(alpha/2) value

n=2*n; A=n/2*s; % A=new polygon area z=[z; A-pi n A s];

end

m=length(z);

for i=1:m

fprintf(’%10d %20.15f %20.15f %20.15f\n’,z(i,2),z(i,3),z(i,1),z(i,4)) end

The results, displayed in Table 2.1, are not what we would expect: ini- tially, we observe convergence towardsπ, but forn >49152, the error grows again and ﬁnally we obtainAn = 0 ?! Although the theory and the program are both correct, we still obtain incorrect answers. We will explain in this chapter why this is the case.

2.2 Real Numbers and Machine Numbers

Every computer is a finite automaton. This implies that a computer can only store a finite set of numbers and perform only a finite number of operations.

In mathematics, we are used to calculating with real numbersRcovering the continuous interval (−∞,∞), but on the computer, we must contend with a

(31)

n An An−π sin(αn) 6 2.598076211353316 −0.543516442236477 0.866025403784439 12 3.000000000000000 −0.141592653589794 0.500000000000000 24 3.105828541230250 −0.035764112359543 0.258819045102521 48 3.132628613281237 −0.008964040308556 0.130526192220052 96 3.139350203046872 −0.002242450542921 0.065403129230143 192 3.141031950890530 −0.000560702699263 0.032719082821776 384 3.141452472285344 −0.000140181304449 0.016361731626486 768 3.141557607911622 −0.000035045678171 0.008181139603937 1536 3.141583892148936 −0.000008761440857 0.004090604026236 3072 3.141590463236762 −0.000002190353031 0.002045306291170 6144 3.141592106043048 −0.000000547546745 0.001022653680353 12288 3.141592516588155 −0.000000137001638 0.000511326906997 24576 3.141592618640789 −0.000000034949004 0.000255663461803 49152 3.141592645321216 −0.000000008268577 0.000127831731987 98304 3.141592645321216 −0.000000008268577 0.000063915865994 196608 3.141592645321216 −0.000000008268577 0.000031957932997 393216 3.141592645321216 −0.000000008268577 0.000015978966498 786432 3.141592303811738 −0.000000349778055 0.000007989482381 1572864 3.141592303811738 −0.000000349778055 0.000003994741190 3145728 3.141586839655041 −0.000005813934752 0.000001997367121 6291456 3.141586839655041 −0.000005813934752 0.000000998683561 12582912 3.141674265021758 0.000081611431964 0.000000499355676 25165824 3.141674265021758 0.000081611431964 0.000000249677838 50331648 3.143072740170040 0.001480086580246 0.000000124894489 100663296 3.137475099502783 −0.004117554087010 0.000000062336030 201326592 3.181980515339464 0.040387861749671 0.000000031610136 402653184 3.000000000000000 −0.141592653589793 0.000000014901161 805306368 3.000000000000000 −0.141592653589793 0.000000007450581 1610612736 0.000000000000000 −3.141592653589793 0.000000000000000

Table 2.1. Unstable computation ofπ

(32)

Real Numbers and Machine Numbers 13

discrete, finite set ofmachine numbers M={−ã_min, . . . ,ã_max}. Hence each real number ahas to be mapped onto a machine number ãto be used on a computer. In fact a whole interval of real numbers is mapped onto one machine number, as shown in Figure2.2.

a∈R

˜ a∈M

˜

amin 0 ˜amax

Figure 2.2.

Mapping of real numbersRonto machine numbersM

Nowadays, machine numbers are often represented in thebinary system.

In general, anybase(orradix)Bcould be used to represent numbers. A real machine number orﬂoating point number consists of two parts, a mantissa (or signiﬁcant) mand anexponent e

˜

a = ±m×Be

m = D.D· · ·D mantissa e = D· · ·D exponent

whereD∈ {0,1, . . . , B−1}stands for onedigit. To make the representation of machine numbers unique (note that e.g. 1.2345×10³= 0.0012345×10⁶), we require for a machine number ã= 0 that the first digit before the decimal point in the mantissa be nonzero; such numbers are called normalized. One defining characteristic for any finite precision arithmetic is the number of digits used for the mantissa and the exponent: the number of digits in the exponent defines therange of the machine numbers, whereas the numbers of digits in the mantissa defines theprecision.

More specifically [100], a finite precision arithmetic is defined by four integer parameters: B, the base or radix, p, the number of digits in the mantissa, andlandudefining the exponent range: l≤e≤u.

The precision of the machine is described by the real machine number eps. Historically, eps is deﬁned to be the smallest positive ˜a∈Msuch that

˜

a+ 1 = 1 when the addition is carried out on the computer. Because this definition involves details about the behavior of floating point addition, which are not easily accessible, a newer definition of eps is simply the spacing of the floating point numbers between 1 andB (usually B = 2). The current definition only relies on how the numbers are represented.

Simplecalculatorsoften use the familiar decimal system (B= 10). Typi- cally there arep= 10 digits for the mantissa and 2 for the exponent (l=−99 andu= 99). In this ﬁnite precision arithmetic, we have

• eps= 0.000000001 = 1.000000000×10⁻⁹,

• the largest machine number

˜

amax= 9.999999999×10⁺⁹⁹,

(33)

• the smallest machine number

˜

amin=−9.999999999×10⁺⁹⁹,

• the smallest (normalized) positive machine number

˜

a+= 1.000000000×10⁻⁹⁹.

Early computers, for example the MARK 1 designed by Howard Aiken and Grace Hopper at Harvard and built in 1944, or the ERMETH (Elektronis- che Rechenmaschine der ETH) constructed by Heinz Rutishauser, Ambros Speiser and Eduard Stiefel, were also decimal machines. The ERMETH, built in 1956, was operational at ETH Zurich from 1956–1963. The representation of a real number used 16 decimal digits: The ﬁrst digit, theq-digit, stored the sum of the digits modulo 3. This was used as a check to see if the machine word had been transmitted correctly from memory to the registers. The next three digits contained the exponent. Then the next 11 digits represented the mantissa, and ﬁnally, the last digit held the sign. The range of positive machine numbers was 1.0000000000×10⁻²⁰⁰≤a˜≤9.9999999999×10¹⁹⁹. The possibly larger exponent range in this setting from−999 to 999 was not fully used.

In contrast, the very ﬁrst programmable computer, the Z3, which was built by the German civil engineer Konrad Zuse and presented in 1941 to a group of experts only, was already using the binary system. The Z3 worked with an exponent of 7 bits and a mantissa of 14 bits (actually 15, since the numbers were normalized). The range of positive machine numbers was the interval

[2⁻⁶³, 1.11111111111111×2⁶²]≈[1.08×10⁻¹⁹, 9.22×10¹⁸].

InMaple(a computer algebra system), numerical computations are performed in base 10. The number of digits of the mantissa is deﬁned by the variable Digits, which can be freely chosen. The number of digits of the exponent is given by the word length of the computer — for 32-bit machines, we have a huge maximal exponent ofu= 2³¹= 2147483648.

2.3 The IEEE Standard

Since 1985 we have for computer hardware the ANSI/IEEE Standard 754 for Floating Point Numbers. It has been adopted by almost all computer manufacturers. The base isB= 2.

2.3.1 Single Precision

The IEEE single precision ﬂoating point standard representation uses a 32- bit word with bits numbered from 0 to 31 from left to right. The ﬁrst bitSis

(34)

The IEEE Standard 15

the sign bit, the next eight bitsEare the exponent bits,e=EEEEEEEE, and the ﬁnal 23 bits are the bits F of the mantissam:

S

e EEEEEEEE

m

F F F F F F F F F F F F F F F F F F F F F F F

0 1 8 9 31

The value ˜arepresented by the 32 bit word is deﬁned as follows:

normal numbers: If 0< e <255, then ã= (−1)^S×2ê⁻¹²⁷×1.m, where 1.m is the binary number created by prefixingmwith an implicit leading 1 and a binary point.

subnormal numbers: Ife= 0 andm= 0, then ˜a= (−1)^S×2⁻¹²⁶×0.m. These are known asdenormalized (orsubnormal) numbers.

Ife= 0 andm= 0 andS= 1, then ˜a=−0.

Ife= 0 andm= 0 andS= 0, then ˜a= 0.

exceptions: Ife= 255 andm= 0, then ˜a=NaN(Not a number) Ife= 255 andm= 0 andS= 1, then ˜a=−Inf.

Ife= 255 andm= 0 andS= 0, then ˜a=Inf.

Some examples:

0 10000000 00000000000000000000000 = +1 x 2^(128-127) x 1.0 = 2 0 10000001 10100000000000000000000 = +1 x 2^(129-127) x 1.101 = 6.5 1 10000001 10100000000000000000000 = -1 x 2^(129-127) x 1.101 = -6.5 0 00000000 00000000000000000000000 = 0

1 00000000 00000000000000000000000 = -0 0 11111111 00000000000000000000000 = Inf 1 11111111 00000000000000000000000 = -Inf 0 11111111 00000100000000000000000 = NaN 1 11111111 00100010001001010101010 = NaN

0 00000001 00000000000000000000000 = +1 x 2^(1-127) x 1.0 = 2^(-126) 0 00000000 10000000000000000000000 = +1 x 2^(-126) x 0.1 = 2^(-127) 0 00000000 00000000000000000000001

= +1 x 2^(-126) x 0.00000000000000000000001 = 2^(-149)

= smallest positive denormalized machine number

InMatlab, real numbers are usually represented indouble precision. The functionsinglecan however be used to convert numbers to single precision.

Matlabcan also print real numbers using the hexadecimal format, which is convenient for examining their internal representations:

>> format hex

(35)

>> x=single(2) x =

40000000

>> 2 ans =

4000000000000000

>> s=realmin(’single’)*eps(’single’) s =

00000001

>> format long

>> s s =

1.4012985e-45

>> s/2 ans =

0

% Exceptions

>> z=sin(0)/sqrt(0) Warning: Divide by zero.

z = NaN

>> y=log(0)

Warning: Log of zero.

y = -Inf

>> t=cot(0)

Warning: Divide by zero.

> In cot at 13 t =

Inf

We can see thatxrepresents the number 2 in single precision. The functions realminandepswith parameter ’single’compute the machine constants for single precision. This means thatsis the smallest denormalized number in single precision. Dividing s by 2 gives zero because of underflow. The computation of zyields an undefined expression which results in NaNeven though the limit is defined. The final two computations foryandtshow the exceptionsInfand -Inf.

2.3.2 Double Precision

The IEEEdouble precisionfloating point standard representation uses a 64- bit word with bits numbered from 0 to 63 from left to right. The first bit S is the sign bit, the next eleven bitsEare the exponent bits foreand the final 52 bitsF represent the mantissa m:

S

e EEEEEEEEEEE

m F F F F F· · ·F F F F F

0 1 11 12 63

Scientiﬁ c Computing

Walter Gander · Martin J. Gander · Felix Kwok

Scientifi c

Computing

An Introduction

using Maple and MATLAB

Editorial Board T. J.Barth M.Griebel D.E.Keyes R.M.Nieminen D.Roose T.Schlick

11

Texts in Computational

Science and Engineering 11

Editors

Timothy J. Barth Michael Griebel David E. Keyes Risto M. Nieminen Dirk Roose

Tamar Schlick

Walter Gander

Martin J. Gander

Felix Kwok

Scientific Computing

An Introduction using Maple and MATLAB

123

Preface

How to use this book

Contents

Chapter 1. Why Study Scientiﬁc Computing?

1.1 Example: Designing a Suspension Bridge

1.2 Navigating this Book: Sample Courses

Chapter 2. Finite Precision Arithmetic

2.1 Introductory Example

2.2 Real Numbers and Machine Numbers

2.3 The IEEE Standard