• Aucun résultat trouvé

Scientifi c Computing

N/A
N/A
Protected

Academic year: 2022

Partager "Scientifi c Computing"

Copied!
926
0
0

Texte intégral

(1)

Walter Gander · Martin J. Gander · Felix Kwok

Scientifi c

Computing

An Introduction

using Maple and MATLAB

Editorial Board T. J.Barth M.Griebel D.E.Keyes R.M.Nieminen D.Roose T.Schlick

11

(2)

Texts in Computational

Science and Engineering 11

Editors

Timothy J. Barth Michael Griebel David E. Keyes Risto M. Nieminen Dirk Roose

Tamar Schlick

For further volumes:

http://www.springer.com/series/5151

(3)
(4)

Walter Gander

Martin J. Gander

Felix Kwok

Scientific Computing

An Introduction using Maple and MATLAB

123

(5)

ETH Z¨urich Z¨urich Switzerland

Section de Math´ematiques Universit´e de Gen`eve Gen`eve

Switzerland

ISSN 1611-0994

ISBN 978-3-319-04324-1 ISBN 978-3-319-04325-8 (eBook) DOI 10.1007/978-3-319-04325-8

Springer Cham Heidelberg New York Dordrecht London

Library of Congress Control Number: 2014937000 Mathematics Subject Classification (2010): 65-00, 65-01 c Springer International Publishing Switzerland 2014

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this pub- lication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’s location, in its current version, and permission for use must always be obtained from Springer. Permis- sions for use may be obtained through RightsLink at the Copyright Clearance Center. Violations are liable to prosecution under the respective Copyright Law.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publica- tion does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

While the advice and information in this book are believed to be true and accurate at the date of pub- lication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein.

Printed on acid-free paper

Springer is part of Springer Science+Business Media (www.springer.com)

(6)

V

This book is dedicated to Professor Gene H. Golub

1932–2007

(picture by Jill Knuth)

The three authors represent three generations of mathematicians who have been enormously influenced by Gene Golub.

He shaped our lives and our academic careers through his advice, his leadership, his friendship and his care for younger scientists.

We are indebted and will always honor his memory.

(7)
(8)

Preface

We are conducting ever more complex computations built upon the assumption that the underlying numer- ical methods are mature and reliable.

When we bundle existing algorithms into libraries and wrap them into packages to facilitate easy use, we create de facto standards that make it easy to ignore numerical analysis.

John Guckenheimer, president SIAM, in SIAM News, June 1998: Numerical Computation in the Information Age

When redrafting the book I was tempted to present the algorithms in ALGOL, but decided that the difficulties of providing procedures which were correct in every de- tail were prohibitive at this stage.

James Wilkinson, The Algebraic Eigenvalue Problem, Ox- ford University Press, 1988.

This book is an introduction toscientific computing, the mathematical mod- eling in science and engineering and the study of how to exploit computers in the solution of technical and scientific problems. It is based on mathe- matics, numerical and symbolic/algebraic computations, parallel/distributed processing and visualization. It is also a popular and growing area — many new curricula incomputational science and engineering have been, and con- tinue to be, developed, leading to new academic degrees and even entire new disciplines.

A prerequisite for this development is the ubiquitous presence of com- puters, which are being used by virtually every student and scientist. While traditional scientific work is based on developing theories and performing ex- periments, the possibility to use computers at any time has created a third way of increasing our knowledge, which is through modeling and simulation.

The use of simulation is further facilitated by the availability of sophisticated, robust and easy-to-use software libraries. This has the obvious advantage of shielding the user from the underlying numerics; however, this also has the danger of leaving the user unaware of the limitations of the algorithms, which can lead to incorrect results when used improperly. Moreover, some algorithms can be fast for certain types of problems but highly inefficient for others. Thus, it is important for the user to be able to make an informed decision on which algorithms to use, based on the properties of the problem to be solved. The goal of this book is to familiarize the reader with the basic

(9)

concepts of scientific computing and algorithms that form the workhorses of many numerical libraries. In fact, we will also emphasize the effective implementation of the algorithms discussed.

Numerical scientific computing has a long history; in fact, computers were first built for this purpose. Konrad Zuse [154] built his first (mechanical) computer in 1938 because he wanted to have a machine that would solve systems of linear equations that arise, e.g., when a civil engineer designs a bridge. At about the same time (and independently), Howard H. Aiken wanted to build a machine that would solve systems of ordinary differential equations [17].

The first high quality software libraries contained indeed numerical algo- rithms. They were produced in an international effort in the programming language ALGOL60 [111], and are described in the handbook “Numerical Algebra” [148]. These fundamental procedures for solving linear equations and eigenvalue problems were developed further, rewritten in FORTRAN, and became the LINPACK [26] and EISPACK [47] libraries. They are still in use and available at www.netlib.orgfrom Netlib. In order to help students to use this software, Cleve Moler created around 1980 a friendly in- terface to those subroutines, which he called Matlab(Matrix Laboratory).

Matlabwas so successful that a company was founded: MathWorks. Today, Matlab is “the language of technical computing”, a very powerful tool in scientific computing.

Parallel to the development of numerical libraries, there were also efforts to do exact and algebraic computations. The first computer algebra systems were created some 50 years ago: At ETH, Max Engeli createdSymbal, and at MIT, Joel Moses Macsyma. Macsyma is the oldest system that is still available. However, computer algebra computations require much more com- puter resources than numerical calculations. Therefore, only when computers became more powerful did these systems flourish. Today the market leaders areMathematicaandMaple.

Often, a problem may be solved analytically (“exactly”) by a computer algebra system. In general, however, analytical solutions do not exist, and numerical approximations or other special techniques must be used instead.

Moreover, computer Algebra is a very powerful tool for deriving numerical algorithms; we use Maplefor this purpose in several chapters of this book.

Thus, computer algebra systems and numerical libraries are complementary tools: working with both is essential in scientific computing. We have chosen MatlabandMapleas basic tools for this book. Nonetheless, we are aware that the difference between pure computer algebra systems and numerical Matlab-like systems is disappearing, and the two may merge and become indistinguishable by the user in the near future.

(10)

IX

How to use this book

Prerequisites for understanding this book are courses in calculus and linear algebra. The content of this book is too much for a typical one semester course in scientific computing. However, the instructor can choose those sec- tions that he wishes to teach and that fit his schedule. For example, for an introductory course in scientific computing, one can very well use the least squares chapter and teach only one of the methods for computing the QR decomposition. However, for an advanced course focused solely on least squares methods, one may also wish to consider the singular value decompo- sition (SVD) as a computational tool for solving least squares problems. In this case, the book also provides a detailed description on how to compute the SVD in the chapter on eigenvalues. The material is presented in such a way that a student can also learn directly from the book. To help the reader navigate the volume, we provide in section1.2some sample courses that have been taught by the authors at various institutions.

The focus of the book is algorithms: we would like to explain to the students how some fundamental functions in mathematical software are de- signed. Many exercises require programming in Matlab or Maple, since we feel it is important for students to gain experience in using such pow- erful software systems. They should also know about their limitations and be aware of the issue addressed by John Guckenheimer. We tried to include meaningful examples and problems, not just academic exercises.

Acknowledgments

The authors would like to thank Oscar Chinellato, Ellis Whitehead, Oliver Ernst and Laurence Halpern for their careful proofreading and helpful sug- gestions.

Walter Gander is indebted to Hong Kong Baptist University (HKBU) and especially to its Vice President Academic, Franklin Luk, for giving him the opportunity to continue to teach students after his retirement at ETH.

Several chapters of this book have been presented and improved successfully in courses at HKBU. We are also thankful to the University of Geneva, where we met many times to finalize the manuscript.

Geneva and Z¨urich, August 2013 Walter Gander, Martin J. Gander, Felix Kwok

(11)
(12)

Contents

Chapter 1. Why Study Scientific Computing? . . . 1

1.1 Example: Designing a Suspension Bridge . . . 1

1.1.1 Constructing a Model . . . 1

1.1.2 Simulating the Bridge . . . 3

1.1.3 Calculating Resonance Frequencies . . . 4

1.1.4 Matching Simulations with Experiments . . . 5

1.2 Navigating this Book: Sample Courses . . . 6

1.2.1 A First Course in Numerical Analysis . . . 7

1.2.2 Advanced Courses . . . 8

1.2.3 Dependencies Between Chapters . . . 8

Chapter 2. Finite Precision Arithmetic . . . 9

2.1 Introductory Example . . . 10

2.2 Real Numbers and Machine Numbers. . . 11

2.3 The IEEE Standard . . . 14

2.3.1 Single Precision. . . 14

2.3.2 Double Precision . . . 16

2.4 Rounding Errors . . . 19

2.4.1 Standard Model of Arithmetic . . . 19

2.4.2 Cancellation . . . 20

2.5 Condition of a Problem . . . 24

2.5.1 Norms . . . 24

2.5.2 Big- and Little-O Notation . . . 27

2.5.3 Condition Number . . . 29

2.6 Stable and Unstable Algorithms. . . 33

2.6.1 Forward Stability. . . 33

2.6.2 Backward Stability . . . 36

2.7 Calculating with Machine Numbers: Tips and Tricks . . . 38

2.7.1 Associative Law . . . 38

2.7.2 Summation Algorithm by W. Kahan . . . 39

2.7.3 Small Numbers . . . 40

2.7.4 Monotonicity . . . 40

2.7.5 Avoiding Overflow . . . 41

(13)

2.7.6 Testing for Overflow . . . 42

2.7.7 Avoiding Cancellation . . . 43

2.7.8 Computation of Mean and Standard Deviation . . . . 45

2.8 Stopping Criteria . . . 48

2.8.1 Machine-independent Algorithms . . . 48

2.8.2 Test Successive Approximations. . . 51

2.8.3 Check the Residual. . . 51

2.9 Problems . . . 52

Chapter 3. Linear Systems of Equations . . . 61

3.1 Introductory Example . . . 62

3.2 Gaussian Elimination. . . 66

3.2.1 LU Factorization . . . 73

3.2.2 Backward Stability . . . 77

3.2.3 Pivoting and Scaling . . . 79

3.2.4 Sum of Rank-One Matrices . . . 82

3.3 Condition of a System of Linear Equations . . . 84

3.4 Cholesky Decomposition . . . 88

3.4.1 Symmetric Positive Definite Matrices. . . 88

3.4.2 Stability and Pivoting . . . 92

3.5 Elimination with Givens Rotations . . . 95

3.6 Banded matrices . . . 97

3.6.1 Storing Banded Matrices . . . 97

3.6.2 Tridiagonal Systems . . . 99

3.6.3 Solving Banded Systems with Pivoting. . . 100

3.6.4 Using Givens Rotations . . . 103

3.7 Problems . . . 105

Chapter 4. Interpolation . . . 113

4.1 Introductory Examples. . . 114

4.2 Polynomial Interpolation. . . 116

4.2.1 Lagrange Polynomials . . . 117

4.2.2 Interpolation Error . . . 119

4.2.3 Barycentric Formula . . . 121

4.2.4 Newton’s Interpolation Formula. . . 123

4.2.5 Interpolation Using Orthogonal Polynomials. . . 127

4.2.6 Change of Basis, Relation with LU and QR . . . 132

4.2.7 Aitken-Neville Interpolation . . . 139

4.2.8 Extrapolation . . . 142

4.3 Piecewise Interpolation with Polynomials . . . 144

4.3.1 Classical Cubic Splines. . . 145

4.3.2 Derivatives for the Spline Function . . . 147

4.3.3 Sherman–Morrison–Woodbury Formula . . . 155

4.3.4 Spline Curves . . . 157

(14)

Contents XIII

4.4 Trigonometric Interpolation . . . 158

4.4.1 Trigonometric Polynomials . . . 160

4.4.2 Fast Fourier Transform (FFT) . . . 162

4.4.3 Trigonometric Interpolation Error . . . 164

4.4.4 Convolutions Using FFT. . . 168

4.5 Problems . . . 171

Chapter 5. Nonlinear Equations. . . 181

5.1 Introductory Example . . . 182

5.2 Scalar Nonlinear Equations . . . 184

5.2.1 Bisection . . . 185

5.2.2 Fixed Point Iteration. . . 187

5.2.3 Convergence Rates . . . 190

5.2.4 Aitken Acceleration and theε-Algorithm . . . 193

5.2.5 Construction of One Step Iteration Methods . . . 199

5.2.6 Multiple Zeros . . . 205

5.2.7 Multi-Step Iteration Methods . . . 207

5.2.8 A New Iteration Formula . . . 210

5.2.9 Dynamical Systems. . . 212

5.3 Zeros of Polynomials . . . 215

5.3.1 Condition of the Zeros . . . 217

5.3.2 Companion Matrix . . . 220

5.3.3 Horner’s Scheme . . . 222

5.3.4 Number Conversions . . . 227

5.3.5 Newton’s Method: Classical Version . . . 230

5.3.6 Newton Method Using Taylor Expansions . . . 231

5.3.7 Newton Method for Real Simple Zeros . . . 232

5.3.8 Nickel’s Method . . . 237

5.3.9 Laguerre’s Method . . . 239

5.4 Nonlinear Systems of Equations . . . 240

5.4.1 Fixed Point Iteration. . . 242

5.4.2 Theorem of Banach . . . 242

5.4.3 Newton’s Method. . . 245

5.4.4 Continuation Methods . . . 251

5.5 Problems . . . 252

Chapter 6. Least Squares Problems . . . 261

6.1 Introductory Examples. . . 262

6.2 Linear Least Squares Problem and the Normal Equations . . . 266

6.3 Singular Value Decomposition (SVD). . . 269

6.3.1 Pseudoinverse . . . 274

6.3.2 Fundamental Subspaces . . . 275

6.3.3 Solution of the Linear Least Squares Problem . . . 277

6.3.4 SVD and Rank . . . 279

(15)

6.4 Condition of the Linear Least Squares Problem . . . 280

6.4.1 Differentiation of Pseudoinverses . . . 282

6.4.2 Sensitivity of the Linear Least Squares Problem. . . . 285

6.4.3 Normal Equations and Condition . . . 286

6.5 Algorithms Using Orthogonal Matrices . . . 287

6.5.1 QR Decomposition . . . 287

6.5.2 Method of Householder . . . 289

6.5.3 Method of Givens . . . 292

6.5.4 Fast Givens . . . 298

6.5.5 Gram-Schmidt Orthogonalization. . . 301

6.5.6 Gram-Schmidt with Reorthogonalization. . . 306

6.5.7 Partial Reorthogonalization . . . 308

6.5.8 Updating and Downdating the QR Decomposition . . . 311

6.5.9 Covariance Matrix Computations Using QR. . . 320

6.6 Linear Least Squares Problems with Linear Constraints . . . 323

6.6.1 Solution with SVD . . . 325

6.6.2 Classical Solution Using Lagrange Multipliers . . . 328

6.6.3 Direct Elimination of the Constraints . . . 330

6.6.4 Null Space Method. . . 333

6.7 Special Linear Least Squares Problems with Quadratic Constraint. . . 334

6.7.1 Fitting Lines . . . 335

6.7.2 Fitting Ellipses . . . 337

6.7.3 Fitting Hyperplanes, Collinearity Test . . . 340

6.7.4 Procrustes or Registration Problem. . . 344

6.7.5 Total Least Squares . . . 349

6.8 Nonlinear Least Squares Problems . . . 354

6.8.1 Notations and Definitions . . . 354

6.8.2 Newton’s Method. . . 356

6.8.3 Gauss-Newton Method. . . 360

6.8.4 Levenberg-Marquardt Algorithm . . . 361

6.9 Least Squares Fit with Piecewise Functions . . . 364

6.9.1 Structure of the Linearized Problem . . . 367

6.9.2 Piecewise Polynomials . . . 368

6.9.3 Examples . . . 372

6.10 Problems . . . 374

Chapter 7. Eigenvalue Problems . . . 387

7.1 Introductory Example . . . 388

7.2 A Brief Review of the Theory . . . 392

7.2.1 Eigen-Decomposition of a Matrix . . . 392

7.2.2 Characteristic Polynomial . . . 396

7.2.3 Similarity Transformations . . . 396

(16)

Contents XV

7.2.4 Diagonalizable Matrices . . . 397

7.2.5 Exponential of a Matrix . . . 397

7.2.6 Condition of Eigenvalues. . . 398

7.3 Method of Jacobi . . . 405

7.3.1 Reducing Cost by Using Symmetry . . . 414

7.3.2 Stopping Criterion . . . 417

7.3.3 Algorithm of Rutishauser . . . 417

7.3.4 Remarks and Comments onJacobi . . . 420

7.4 Power Methods . . . 422

7.4.1 Power Method . . . 423

7.4.2 Inverse Power Method (Shift-and-Invert). . . 424

7.4.3 Orthogonal Iteration . . . 425

7.5 Reduction to Simpler Form . . . 429

7.5.1 Computing Givens Rotations . . . 429

7.5.2 Reduction to Hessenberg Form . . . 430

7.5.3 Reduction to Tridiagonal Form . . . 434

7.6 QR Algorithm . . . 436

7.6.1 Some History . . . 437

7.6.2 QR Iteration . . . 437

7.6.3 Basic Facts . . . 437

7.6.4 Preservation of Form. . . 438

7.6.5 Symmetric Tridiagonal Matrices . . . 439

7.6.6 Implicit QR Algorithm. . . 443

7.6.7 Convergence of the QR Algorithm . . . 445

7.6.8 Wilkinson’s Shift . . . 447

7.6.9 Test for Convergence and Deflation. . . 448

7.6.10 Unreduced Matrices have Simple Eigenvalues . . . 449

7.6.11 Specific Numerical Examples . . . 451

7.6.12 Computing the Eigenvectors. . . 453

7.7 Computing the Singular Value Decomposition (SVD) . . . 453

7.7.1 Transformations . . . 454

7.7.2 Householder-Rutishauser Bidiagonalization . . . 454

7.7.3 Golub-Kahan-Lanczos Bidiagonalization . . . 457

7.7.4 Eigenvalues and Singular Values . . . 457

7.7.5 Algorithm of Golub-Reinsch. . . 458

7.8 QD Algorithm . . . 464

7.8.1 Progressive QD Algorithm. . . 464

7.8.2 Orthogonal LR-Cholesky Algorithm . . . 468

7.8.3 Differential QD Algorithm. . . 472

7.8.4 Improving Convergence Using Shifts . . . 474

7.8.5 Connection to Orthogonal Decompositions. . . 478

7.9 Problems . . . 482

(17)

Chapter 8. Differentiation. . . 487

8.1 Introductory Example . . . 488

8.2 Finite Differences . . . 491

8.2.1 Generating Finite Difference Approximations . . . 494

8.2.2 Discrete Operators for Partial Derivatives . . . 496

8.3 Algorithmic Differentiation . . . 499

8.3.1 Idea Behind Algorithmic Differentiation . . . 499

8.3.2 Rules for Algorithmic Differentiation . . . 504

8.3.3 Example: Circular Billiard . . . 505

8.3.4 Example: Nonlinear Eigenvalue Problems . . . 509

8.4 Problems . . . 514

Chapter 9. Quadrature . . . 517

9.1 Computer Algebra and Numerical Approximations . . . 518

9.2 Newton–Cotes Rules . . . 521

9.2.1 Error of Newton–Cotes Rules . . . 525

9.2.2 Composite Rules . . . 527

9.2.3 Euler–Maclaurin Summation Formula . . . 531

9.2.4 Romberg Integration . . . 537

9.3 Gauss Quadrature . . . 541

9.3.1 Characterization of Nodes and Weights . . . 545

9.3.2 Orthogonal Polynomials . . . 547

9.3.3 Computing the Weights . . . 552

9.3.4 Golub–Welsch Algorithm . . . 555

9.4 Adaptive Quadrature. . . 561

9.4.1 Stopping Criterion . . . 563

9.4.2 Adaptive Simpson quadrature. . . 565

9.4.3 Adaptive Lobatto quadrature . . . 569

9.5 Problems . . . 577

Chapter 10. Numerical Ordinary Differential Equations . . . 583

10.1 Introductory Examples. . . 584

10.2 Basic Notation and Solution Techniques . . . 587

10.2.1 Notation, Existence of Solutions . . . 587

10.2.2 Analytical and Numerical Solutions . . . 589

10.2.3 Solution by Taylor Expansions . . . 591

10.2.4 Computing with Power Series . . . 593

10.2.5 Euler’s Method . . . 597

10.2.6 Autonomous ODE, Reduction to First Order System . . . 603

10.3 Runge-Kutta Methods . . . 604

10.3.1 Explicit Runge-Kutta Methods . . . 604

10.3.2 Local Truncation Error . . . 606

10.3.3 Order Conditions . . . 608

10.3.4 Convergence . . . 615

(18)

Contents XVII

10.3.5 Adaptive Integration . . . 617

10.3.6 Implicit Runge-Kutta Methods . . . 625

10.4 Linear Multistep Methods . . . 631

10.4.1 Local Truncation Error . . . 635

10.4.2 Order Conditions . . . 636

10.4.3 Zero Stability . . . 638

10.4.4 Convergence . . . 643

10.5 Stiff Problems. . . 646

10.5.1 A-Stability . . . 650

10.5.2 A Nonlinear Example . . . 653

10.5.3 Differential Algebraic Equations . . . 655

10.6 Geometric Integration . . . 656

10.6.1 Symplectic Methods . . . 658

10.6.2 Energy Preserving Methods . . . 661

10.7 Delay Differential Equations. . . 664

10.8 Problems . . . 666

Chapter 11. Iterative Methods for Linear Systems . . . 673

11.1 Introductory Example . . . 675

11.2 Solution by Iteration . . . 677

11.2.1 Matrix Splittings . . . 677

11.2.2 Residual, Error and the Difference of Iterates . . . 678

11.2.3 Convergence Criteria. . . 680

11.2.4 Singular Systems . . . 683

11.2.5 Convergence Factor and Convergence Rate . . . 684

11.3 Classical Stationary Iterative Methods . . . 687

11.3.1 Regular Splittings and M-Matrices . . . 687

11.3.2 Jacobi . . . 691

11.3.3 Gauss-Seidel . . . 694

11.3.4 Successive Over-relaxation (SOR) . . . 695

11.3.5 Richardson . . . 702

11.4 Local Minimization by Nonstationary Iterative Methods . . . 704

11.4.1 Conjugate Residuals . . . 705

11.4.2 Steepest Descent . . . 705

11.5 Global Minimization with Chebyshev Polynomials . . . 708

11.5.1 Chebyshev Semi-Iterative Method . . . 719

11.5.2 Acceleration of SSOR . . . 724

11.6 Global Minimization by Extrapolation . . . 726

11.6.1 Minimal Polynomial Extrapolation (MPE) . . . 729

11.6.2 Reduced Rank Extrapolation (RRE) . . . 733

11.6.3 Modified Minimal Polynomial Extrapolation (MMPE) . . . 734

11.6.4 Topologicalε-Algorithm (TEA). . . 735

11.6.5 Recursive Topologicalε-Algorithm . . . 737

(19)

11.7 Krylov Subspace Methods . . . 739

11.7.1 The Conjugate Gradient Method . . . 740

11.7.2 Arnoldi Process. . . 758

11.7.3 The Symmetric Lanczos Algorithm . . . 761

11.7.4 Solving Linear Equations with Arnoldi . . . 766

11.7.5 Solving Linear Equations with Lanczos . . . 769

11.7.6 Generalized Minimum Residual: GMRES . . . 773

11.7.7 Classical Lanczos for Non-Symmetric Matrices . . . . 780

11.7.8 Biconjugate Gradient Method (BiCG) . . . 793

11.7.9 Further Krylov Methods . . . 800

11.8 Preconditioning . . . 801

11.9 Problems . . . 804

Chapter 12. Optimization . . . 817

12.1 Introductory Examples. . . 818

12.1.1 How much daily exercise is optimal ?. . . 818

12.1.2 Mobile Phone Networks . . . 821

12.1.3 A Problem from Operations Research . . . 828

12.1.4 Classification of Optimization Problems . . . 831

12.2 Mathematical Optimization . . . 832

12.2.1 Local Minima . . . 832

12.2.2 Constrained minima and Lagrange multipliers. . . 835

12.2.3 Equality and Inequality Constraints . . . 838

12.3 Unconstrained Optimization. . . 842

12.3.1 Line Search Methods. . . 842

12.3.2 Trust Region Methods . . . 856

12.3.3 Direct Methods . . . 859

12.4 Constrained Optimization . . . 862

12.4.1 Linear Programming . . . 862

12.4.2 Penalty and Barrier Functions . . . 872

12.4.3 Interior Point Methods. . . 873

12.4.4 Sequential Quadratic Programming. . . 877

12.5 Problems . . . 880

Bibliography . . . 887

Index . . . 895

(20)

Chapter 1. Why Study Scientific Computing?

Computational Science and Engineering (CS&E) is now widely accepted, along with theory and experiment, as a crucial third mode of scientific investigation and en- gineering design. Aerospace, automotive, biological, chemical, semiconductor, and other industrial sectors now rely on simulation for technical decision support.

Introduction to the First SIAM Conference on Computa- tional Science and Engineering, September 21–24, 2000, Washington DC.

The emergence of scientific computing as a vital part of science and en- gineering coincides with the explosion in computing power in the past 50 years. Many physical phenomena have been well understood and have accu- rate models describing them since the late 1800s, but before the widespread use of computers, scientists and engineers were forced to make many simplify- ing assumptions in the models in order to make them solvable by pencil-and- paper methods, such as series expansion. With the increase of computing power, however, one can afford to use numerical methods that are compu- tationally intensive but that can tackle the full models without the need to simplify them. Nonetheless, every method has its limitations, and one must understand how they work in order to use them correctly.

1.1 Example: Designing a Suspension Bridge

To get an idea of the kinds of numerical methods that are used in engineering problems, let us consider the design of a simplesuspension bridge. The bridge consists of a pair of ropes fastened on both sides of the gorge, see Figure 1.1. Wooden supports going across the bridge are attached to the ropes at regularly spaced intervals. Wooden boards are then fastened between the supports to form the deck. We would like to calculate the shape of the bridge as well as the tension in the rope supporting it.

1.1.1 Constructing a Model

Let us construct a simple one-dimensional model of the bridge structure by assuming that the bridge does not rock side to side. To calculate the shape of the bridge, we need to know the forces that are exerted on the ropes by the supports. LetLbe the length of the bridge andx be the distance from one

W. Gander et al.,Scientific Computing - An Introduction using Maple and MATLAB, Texts in Computational Science and Engineering 11,

DOI 10.1007/978-3-319-04325-8 1,

©Springer International Publishing Switzerland 2014

(21)

Supports

Deck boards (Others not shown) Ropes

Figure 1.1. A simple suspension bridge.

Figure 1.2. Force diagram for the bridge example.

end of the bridge. Assume that the supports are located at xi,i= 1, . . . , n, withh being the spacing between supports. Let w(x) be the force per unit distance exerted on the deck atxby gravity, due to the weight of the deck and of the people on it. If we assume that any weight on the segment [xi1, xi] are exerted entirely on the supports atxi−1andxi, then the forcefiexerted on the rope by the support atxican be written as

fi= xi

xi−1

w(x)(x−xi1)dx+ xi+1

xi

w(x)(xi+1−x)dx

. (1.1) We now consider the rope as an elastic string, which is stretched by the force exerted by the wooden supports. Let ui be the height of the bridge at xi, Ti−1/2be the tension of the segment of the rope between xi1 andxi, and θi1/2be the angle it makes with the horizontal. Figure1.2shows the force diagram on the rope atxi.

Since there is no horizontal displacement in the bridge, the horizontal forces must balance out, meaning

Ti1/2cos(θi1/2) =Ti+1/2cos(θi+1/2) =C,

(22)

Example: Designing a Suspension Bridge 3

whereK is a constant. Vertical force balance then gives Ti+1/2sin(θi+1/2)−Ti−1/2sin(θi−1/2) =fi, or

Ctan(θi+1/2)−Ctan(θi−1/2) =fi. But

tan(θi+1/2) = ui+1−ui

h ,

so we in fact have

K(ui+12ui+ui1)

h =fi, i= 1, . . . , n, (1.2) where u0 and un+1 are the known heights of the bridge at its ends and u1, . . . , un are the unknown heights.

1.1.2 Simulating the Bridge

Now, if we want to compute the shape of the bridge and the tensionsTi−1/2, we must first calculate the forces fi from (1.1), and then solve the system of linear equations (1.2). To calculate the fi, one must evaluate integrals, which may not be analytically feasible for certain weight distributionsw(x).

Instead, one can approximate the integral numerically using aRiemann sum, for instance:

xi

xi−1

w(x)(x−xi−1)dx≈ 1 N

N j=1

w(xi−1+jh/N)·h j.

For largeN, this converges to the exact value of the integral, but the error behaves like 1/N; this means if we want to have three decimal digits of accuracy in the answer, one would need approximately 103points. There are other formulas that give more accurate values with fewer number of points;

this is discussed in more detail in Chapter9.

The next step is to solve (1.2) for theui. This can be rewritten as Au=f,

where A Rn×n is a matrix, u Rn is the vector of unknowns, and f is the vector of forces we just calculated. This system can be solved by Gaussian elimination, i.e., by row reducing the matrix, as taught in a basic linear algebra course. So for n = 4, a uniform distribution w(x) = 1, and

(23)

u0=un+1= 0, we can calculate

⎜⎜

2 1 0 0 1

1 2 1 0 1

0 1 2 1 1

0 0 1 2 1

⎟⎟

−→

⎜⎜

2 1 0 0 1 0 32 1 0 32

0 1 2 1 1

0 0 1 2 1

⎟⎟

−→

⎜⎜

2 1 0 0 1 0 32 1 0 32

0 0 43 1 2

0 0 1 2 1

⎟⎟

−→

⎜⎜

2 1 0 0 1 0 32 1 0 32

0 0 43 1 2

0 0 0 54 52

⎟⎟

.

Back substitution givesu= Kh(2,3,3,2). However, one often wishes to calculate the shape of the bridge under different weight distributionsw(x), e.g., when people are standing on different parts of the bridge. So the matrix Astays the same, but the right-hand side f changes to reflect the different weight distributions. It would be a waste to have to redo the row reductions every time, when onlyf has changed! A much better way is to use theLU decomposition, which writes the matrix A in factored form and reuses the factors to solve equations with different right-hand sides. This is shown in Chapter3.

In the above row reduction, we can see easily that there are many zero entries that need not be calculated, but the computer has no way of knowing that in advance. In fact, the number of additions and multiplications required for solving the generic (i.e., full) linear system is proportional ton3, whereas in our case, we only need aboutnadditions and multiplications because of the many zero entries. To take advantage of thesparsenature of the matrix, one needs to store it differently and use different algorithms on it. One possibility is to use thebanded matrix format; this is shown in Section3.6.

Suppose now that the people on the bridge have moved, but only by a few meters. The shape of the bridge would have only changed slightly, since the weight distribution is not very different. Thus, instead of solving a new linear system from scratch, one could imagine using the previous shape as a first guess and make small corrections to the solution until it matches the new distribution. This is the basis ofiterative methods, which are discussed in Chapter11.

1.1.3 Calculating Resonance Frequencies

A well-designed bridge should never collapse, but there have been spectacular bridge failures in history. One particularly memorable one was the collapse of the Tacoma Narrows bridgeon November 7, 1940. On that day, powerful wind gusts have excited a natural resonance mode of the bridge, setting it into a twisting motion that it was not designed to withstand. As the winds continued, the amplitude of the twisting motion grew, until the bridge

(24)

Example: Designing a Suspension Bridge 5

eventually collapsed1.

It turns out that one can study the resonance modes of the bridge by considering theeigenvalue problem

Au=λu,

cf. [37]. Clearly, a two-dimensional model is needed to study the twisting motion mentioned above, but let us illustrate the ideas by considering the eigenvalues of the 1D model for n = 4. For this simple problem, one can guess the eigenvectors and verify that

u(k)= (sin(kπ/5),sin(2kπ/5),sin(3kπ/5),sin(4π/5)), k= 1,2,3,4 are in fact eigenvectors with associated eigenvaluesλ(k)=2 + 2 cos(kπ/5).

However, for more complicated problems, such as one with varying mass along the bridge or for 2D problems, it is no longer possible to guess the eigenvectors. Moreover, the characteristic polynomial

P(λ) = det(λI−A)

is a polynomial of degree n, and it is well known that no general formula exists for finding the roots of such polynomials forn≥5. In Chapter7, we will present numerical algorithms for finding the eigenvalues of A. In fact, the problem of finding eigenvalues numerically also requires approximately n3 operations, just like Gaussian elimination. This is in stark contrast with the theoretical point of view that linear systems are “easy” and polynomial root-finding is “impossible”. To quote the eminent numerical analyst Nick Trefethen [139],

Abel and Galois notwithstanding, large-scale matrix eigenvalue problems are about as easy to solve in practice as linear systems of equations.

1.1.4 Matching Simulations with Experiments

When modeling the bridge in the design process, we must use many parame- ters, such as the weight of the deck (expressed in terms of the mass densityρ per unit length) and the elasticity constantKof the supporting rope. In real- ity, these quantities depend on the actual material used during construction, and may deviate from the nominal values assumed during the design pro- cess. To get an accurate model of the bridge for later simulation, one needs to estimate these parameters from measurements taken during experiments.

For example, we can measure the vertical displacementsyiof the constructed bridge at points xi, and compare it with the displacementsui predicted by the model, i.e., the displacements satisfying Au= f. Since both Aand f

1http://www.youtube.com/watch?v=3mclp9QmCGs

(25)

depend on the model parameters ρ andK, the ui also depend on these pa- rameters; thus, the mismatch between the model and the experimental data can be expressed as a function ofρandK:

F(ρ, K) = n i=1

|yi−ui(ρ, K)|2. (1.3) Thus, we can estimate the parameters by finding the optimal parametersρ andKthat minimizeF. There are several ways of calculating the minimum:

1. Using multivariate calculus, we know that

∂F

∂ρ, K) = 0, ∂F

∂K, K) = 0. (1.4) Thus, we have a system of two nonlinear equations in two unknowns, which must then be solved to obtain ρ and K. This can be solved by many methods, the best known of which is Newton’s method. Such methods are discussed in Chapter5.

2. The above approach has the disadvantage that (1.4) is satisfied by all stationary points of F(ρ, K), i.e., both the maxima and the minima of F(ρ, K). Since we are only interested in the minima of the func- tion, a more direct approach would be to start with an initial guess (ρ0, K0) (e.g., the nominal design values) and then find successively better approximations (ρk, Kk),k= 1,2,3,that reduce the mismatch, i.e.,

Fk+1, Kk+1)≤Fk, Kk).

This is the basis of optimization algorithms, which can be applied to other minimization problems. Such methods are discussed in detail in Chapter12.

3. The function F(ρ, K) in (1.3) has a very special structure in that it is a sum of squares of the differences. As a result, the minimization problem is known as aleast-squares problem. Least-squares problems, in particular linear ones, often arise because they yield the best unbiased estimator in the statistical sense for linear models. Because of the prevalence and special structure of least-squares problems, it is possible to design specialized methods that are more efficient and/or robust for these problems than general optimization algorithms. One example is theGauss–Newton method, which resembles a Newton method, except that second-order derivative terms are dropped to save on computation.

This and other methods are presented in Chapter6.

1.2 Navigating this Book: Sample Courses

This book intentionally contains too many topics to be done from cover to cover, even for an intensive full-year course. In fact, many chapters contain

(26)

Navigating this Book: Sample Courses 7

enough material for stand-alone semester courses on their respective topics.

To help instructors and students navigate through the volume, we provide some sample courses that can be built from its sections.

1.2.1 A First Course in Numerical Analysis

The following sections have been used to build the first year numerical anal- ysis course at the University of Geneva in 2011–12 (54 hours of lectures).

1. Finite precision arithmetic (2.1–2.6) 2. Linear systems (3.2–3.4)

3. Interpolation and FFT (4.2.1–4.2.4,4.3.1,4.4) 4. Nonlinear equations (5.2.1–5.2.3,5.4)

5. Linear and nonlinear least squares (6.1–6.8,6.8.2,6.8.3,6.5.1,6.5.2) 6. Iterative methods (11.1–11.2.5,11.3.2–11.3.4,11.7.1)

7. Eigenvalue problems (7.1,7.2,7.4,7.5.2,7.6) 8. Singular value decomposition (6.3)

9. Numerical integration (9.1,9.2,9.3,9.4.1–9.4.2) 10. Ordinary differential equations (10.1,10.3)

A first term course at Stanford for computer science students in 1996 and 1997 (’Introduction to Scientific Computing using MapleandMatlab, 40 hours of lectures) was built using

1. Finite precision arithmetic (2.2)

2. Nonlinear equations (5.2.1–5.2.3,5.2.5,5.2.7,5.4)

3. Linear systems (3.2.1, 3.2.2, 3.2.3, 11.2–11.2.3, 11.3.2, 11.3.3, 11.4, 11.7.1)

4. Interpolation (4.2.1–4.2.4,4.3.1) 5. Least Squares (6.2,6.5.1,6.8.2) 6. Differentiation (8.2,8.2.1)

7. Quadrature (9.2,9.2.4,9.3.1,9.3.2,9.4.1–9.4.2) 8. Eigenvalue problems (7.3,7.4,7.6)

9. Ordinary differential equations (10.1,10.3,10.4)

(27)

1.2.2 Advanced Courses

The following advanced undergraduate/graduate courses (38 hours of lectures each) have been taught at Baptist University in Hong Kong between 2010 and 2013. We include a list of chapters from which these courses were built.

1. Eigenvalues and Iterative Methods for Linear Equations (Chapters 7, 11)

2. Least Squares (Chapter6)

3. Quadrature and Ordinary Differential Equations (Chapters9and10) At the University of Geneva, the following graduate courses (28 hours of lectures, and 14 hours of exercises) have been taught between 2004 and 2011:

1. Iterative Methods for Linear Equations (Chapter11) 2. Optimization (Chapter12)

1.2.3 Dependencies Between Chapters

Chapter2on finite precision arithmetic and Chapter3on linear equations are required for most, if not all, of the subsequent chapters. At the beginning of each chapter, we give a list of sections that are prerequisites to understanding the material. Readers who are not familiar with these sections should refer to them first before proceeding.

(28)

Chapter 2. Finite Precision Arithmetic

In the past 15 years many numerical analysts have pro- gressed from being queer people in mathematics depart- ments to being queer people in computer science depart- ments!

George Forsythe, What to do till the computer scientist comes. Amer. Math. Monthly 75, 1968.

It is hardly surprising that numerical analysis is widely regarded as an unglamorous subject. In fact, mathe- maticians, physicists, and computer scientists have all tended to hold numerical analysis in low esteem for many years – a most unusual consensus.

Nick Trefethen, The definition of numerical analysis, SIAM news, November 1992.

The golden age of numerical analysis has not yet started.

Volker Mehrmann, round table discussion ”Future Direc- tions in Numerical Analysis,” moderated by Gene Golub and Nick Trefethen at ICIAM 2007.

Finite precision arithmetic underlies all the computations performed numer- ically, e.g. inMatlab; only symbolic computations, e.g. Maple, are largely independent of finite precision arithmetic. Historically, when the invention of computers allowed a large number of operations to be performed in very rapid succession, nobody knew what the influence of finite precision arith- metic would be on this many operations: would small rounding errors sum up rapidly and destroy results? Would they statistically cancel? The early days of numerical analysis were therefore dominated by the study of round- ing errors, and made this rapidly developing field not very attractive (see the quote above). Fortunately, this view of numerical analysis has since changed, and nowadays the focus of numerical analysis is the study of algo- rithms for the problems of continuous mathematics1. There are nonetheless a few pitfalls every person involved in scientific computing should know, and this chapter is precisely here for this reason. After an introductory example in Section 2.1, we present the difference between real numbers and machine numbers in Section 2.2 on a generic, abstract level, and give for the more computer science oriented reader the concrete IEEE arithmetic standard in Section2.3. We then discuss the influence of rounding errors on operations in

1Nick Trefethen, The definition of numerical analysis, SIAM News, November 1992 W. Gander et al.,Scientific Computing - An Introduction using Maple and MATLAB, Texts in Computational Science and Engineering 11,

DOI 10.1007/978-3-319-04325-8 2,

©Springer International Publishing Switzerland 2014

(29)

Section 2.4, and explain the predominant pitfall of catastrophic cancellation when computing differences. In Section2.5, we explain in very general terms what the condition number of a problem is, and then show in Section 2.6 two properties of algorithms for a given problem, namely forward stability and backward stability. It is the understanding of condition numbers and stability that allowed numerical analysts to move away from the study of rounding errors, and to focus on algorithmic development. Sections2.7 and 2.8represent a treasure trove with advanced tips and tricks when computing in finite precision arithmetic.

2.1 Introductory Example

A very old problem already studied by ancient Greek mathematicians is the squaring of a circle. The problem consists of constructing a square that has the same area as the unit circle. Finding a method for transforming a circle into a square this way (quadrature of the circle) became a famous problem that remained unsolved until the 19th century, when it was proved using Galois theory that the problem cannot be solved with the straight edge and compass.

We know today that the area of a circle is given by A = πr2, where r denotes the radius of the circle. An approximation is obtained by draw- ing a regular polygon inside the circle, and by computing the surface of the polygon. The approximation is improved by increasing the number of sides. Archimedes managed to produce a 96-sided polygon, and was able to bracket π in the interval (31071,317). The enclosing interval has length 1/497 = 0.00201207243 — surely good enough for most practical applica- tions in his time.

Fn

2 cosα2n

r= 1 sinα2n

αn

2

C

B A

Figure 2.1. Squaring of a circle

To compute such a polygonal approximation ofπ, we consider Figure2.1.

Without loss of generality, we may assume thatr = 1. Then the areaFnof the isosceles triangleABCwith center angleαn:= n is

Fn= cosαn 2 sinαn

2 ,

(30)

Real Numbers and Machine Numbers 11

and the area of the associatedn-sided polygon becomes An=nFn= n

2

2 cosαn 2 sinαn

2

= n

2sinαn= n 2sin

n

. Clearly, computing the approximationAnusingπwould be rather contradic- tory. Fortunately,A2ncan be derived fromAn by simple algebraic transfor- mations, i.e. by expressing sin(αn/2) in terms of sinαn. This can be achieved by using identities for trigonometric functions:

sinαn

2 =

1cosαn

2 =

1

1sin2αn

2 . (2.1)

Thus, we have obtained a recurrence for sin(αn/2) from sinαn. To start the recurrence, we compute the area A6 of the regular hexagon. The length of each side of the six equilateral triangles is 1 and the angle is α6 = 60, so that sinα6 =

3

2 . Therefore, the area of the triangle is F6 =

3/4 and A6= 323. We obtain the following program for computing the sequence of approximations An:

Algorithm 2.1. Computation ofπ, Naive Version

s=sqrt(3)/2; A=3*s; n=6; % initialization

z=[A-pi n A s]; % store the results

while s>1e-10 % termination if s=sin(alpha) small s=sqrt((1-sqrt(1-s*s))/2); % new sin(alpha/2) value

n=2*n; A=n/2*s; % A=new polygon area z=[z; A-pi n A s];

end

m=length(z);

for i=1:m

fprintf(’%10d %20.15f %20.15f %20.15f\n’,z(i,2),z(i,3),z(i,1),z(i,4)) end

The results, displayed in Table 2.1, are not what we would expect: ini- tially, we observe convergence towardsπ, but forn >49152, the error grows again and finally we obtainAn = 0 ?! Although the theory and the program are both correct, we still obtain incorrect answers. We will explain in this chapter why this is the case.

2.2 Real Numbers and Machine Numbers

Every computer is a finite automaton. This implies that a computer can only store a finite set of numbers and perform only a finite number of operations.

In mathematics, we are used to calculating with real numbersRcovering the continuous interval (−∞,∞), but on the computer, we must contend with a

(31)

n An Anπ sin(αn) 6 2.598076211353316 −0.543516442236477 0.866025403784439 12 3.000000000000000 0.141592653589794 0.500000000000000 24 3.105828541230250 0.035764112359543 0.258819045102521 48 3.132628613281237 0.008964040308556 0.130526192220052 96 3.139350203046872 −0.002242450542921 0.065403129230143 192 3.141031950890530 0.000560702699263 0.032719082821776 384 3.141452472285344 0.000140181304449 0.016361731626486 768 3.141557607911622 0.000035045678171 0.008181139603937 1536 3.141583892148936 −0.000008761440857 0.004090604026236 3072 3.141590463236762 −0.000002190353031 0.002045306291170 6144 3.141592106043048 0.000000547546745 0.001022653680353 12288 3.141592516588155 0.000000137001638 0.000511326906997 24576 3.141592618640789 0.000000034949004 0.000255663461803 49152 3.141592645321216 −0.000000008268577 0.000127831731987 98304 3.141592645321216 0.000000008268577 0.000063915865994 196608 3.141592645321216 0.000000008268577 0.000031957932997 393216 3.141592645321216 0.000000008268577 0.000015978966498 786432 3.141592303811738 −0.000000349778055 0.000007989482381 1572864 3.141592303811738 −0.000000349778055 0.000003994741190 3145728 3.141586839655041 0.000005813934752 0.000001997367121 6291456 3.141586839655041 0.000005813934752 0.000000998683561 12582912 3.141674265021758 0.000081611431964 0.000000499355676 25165824 3.141674265021758 0.000081611431964 0.000000249677838 50331648 3.143072740170040 0.001480086580246 0.000000124894489 100663296 3.137475099502783 0.004117554087010 0.000000062336030 201326592 3.181980515339464 0.040387861749671 0.000000031610136 402653184 3.000000000000000 −0.141592653589793 0.000000014901161 805306368 3.000000000000000 −0.141592653589793 0.000000007450581 1610612736 0.000000000000000 3.141592653589793 0.000000000000000

Table 2.1. Unstable computation ofπ

(32)

Real Numbers and Machine Numbers 13

discrete, finite set ofmachine numbers M={−˜amin, . . . ,˜amax}. Hence each real number ahas to be mapped onto a machine number ˜ato be used on a computer. In fact a whole interval of real numbers is mapped onto one machine number, as shown in Figure2.2.

a∈R

˜ a∈M

˜

amin 0 ˜amax

Figure 2.2.

Mapping of real numbersRonto machine numbersM

Nowadays, machine numbers are often represented in thebinary system.

In general, anybase(orradix)Bcould be used to represent numbers. A real machine number orfloating point number consists of two parts, a mantissa (or significant) mand anexponent e

˜

a = ±m×Be

m = D.D· · ·D mantissa e = D· · ·D exponent

whereD∈ {0,1, . . . , B1}stands for onedigit. To make the representation of machine numbers unique (note that e.g. 1.2345×103= 0.0012345×106), we require for a machine number ˜a= 0 that the first digit before the decimal point in the mantissa be nonzero; such numbers are called normalized. One defining characteristic for any finite precision arithmetic is the number of digits used for the mantissa and the exponent: the number of digits in the exponent defines therange of the machine numbers, whereas the numbers of digits in the mantissa defines theprecision.

More specifically [100], a finite precision arithmetic is defined by four integer parameters: B, the base or radix, p, the number of digits in the mantissa, andlandudefining the exponent range: l≤e≤u.

The precision of the machine is described by the real machine number eps. Historically, eps is defined to be the smallest positive ˜a∈Msuch that

˜

a+ 1 = 1 when the addition is carried out on the computer. Because this definition involves details about the behavior of floating point addition, which are not easily accessible, a newer definition of eps is simply the spacing of the floating point numbers between 1 andB (usually B = 2). The current definition only relies on how the numbers are represented.

Simplecalculatorsoften use the familiar decimal system (B= 10). Typi- cally there arep= 10 digits for the mantissa and 2 for the exponent (l=99 andu= 99). In this finite precision arithmetic, we have

eps= 0.000000001 = 1.000000000×109,

the largest machine number

˜

amax= 9.999999999×10+99,

(33)

the smallest machine number

˜

amin=9.999999999×10+99,

the smallest (normalized) positive machine number

˜

a+= 1.000000000×10−99.

Early computers, for example the MARK 1 designed by Howard Aiken and Grace Hopper at Harvard and built in 1944, or the ERMETH (Elektronis- che Rechenmaschine der ETH) constructed by Heinz Rutishauser, Ambros Speiser and Eduard Stiefel, were also decimal machines. The ERMETH, built in 1956, was operational at ETH Zurich from 1956–1963. The representation of a real number used 16 decimal digits: The first digit, theq-digit, stored the sum of the digits modulo 3. This was used as a check to see if the machine word had been transmitted correctly from memory to the registers. The next three digits contained the exponent. Then the next 11 digits represented the mantissa, and finally, the last digit held the sign. The range of positive ma- chine numbers was 1.0000000000×10−200≤a˜9.9999999999×10199. The possibly larger exponent range in this setting from999 to 999 was not fully used.

In contrast, the very first programmable computer, the Z3, which was built by the German civil engineer Konrad Zuse and presented in 1941 to a group of experts only, was already using the binary system. The Z3 worked with an exponent of 7 bits and a mantissa of 14 bits (actually 15, since the numbers were normalized). The range of positive machine numbers was the interval

[2−63, 1.11111111111111×262][1.08×10−19, 9.22×1018].

InMaple(a computer algebra system), numerical computations are per- formed in base 10. The number of digits of the mantissa is defined by the variable Digits, which can be freely chosen. The number of digits of the exponent is given by the word length of the computer — for 32-bit machines, we have a huge maximal exponent ofu= 231= 2147483648.

2.3 The IEEE Standard

Since 1985 we have for computer hardware the ANSI/IEEE Standard 754 for Floating Point Numbers. It has been adopted by almost all computer manufacturers. The base isB= 2.

2.3.1 Single Precision

The IEEE single precision floating point standard representation uses a 32- bit word with bits numbered from 0 to 31 from left to right. The first bitSis

(34)

The IEEE Standard 15

the sign bit, the next eight bitsEare the exponent bits,e=EEEEEEEE, and the final 23 bits are the bits F of the mantissam:

S

e EEEEEEEE

m

F F F F F F F F F F F F F F F F F F F F F F F

0 1 8 9 31

The value ˜arepresented by the 32 bit word is defined as follows:

normal numbers: If 0< e <255, then ˜a= (1)S×2e127×1.m, where 1.m is the binary number created by prefixingmwith an implicit leading 1 and a binary point.

subnormal numbers: Ife= 0 andm= 0, then ˜a= (1)S×2126×0.m. These are known asdenormalized (orsubnormal) numbers.

Ife= 0 andm= 0 andS= 1, then ˜a=0.

Ife= 0 andm= 0 andS= 0, then ˜a= 0.

exceptions: Ife= 255 andm= 0, then ˜a=NaN(Not a number) Ife= 255 andm= 0 andS= 1, then ˜a=Inf.

Ife= 255 andm= 0 andS= 0, then ˜a=Inf.

Some examples:

0 10000000 00000000000000000000000 = +1 x 2^(128-127) x 1.0 = 2 0 10000001 10100000000000000000000 = +1 x 2^(129-127) x 1.101 = 6.5 1 10000001 10100000000000000000000 = -1 x 2^(129-127) x 1.101 = -6.5 0 00000000 00000000000000000000000 = 0

1 00000000 00000000000000000000000 = -0 0 11111111 00000000000000000000000 = Inf 1 11111111 00000000000000000000000 = -Inf 0 11111111 00000100000000000000000 = NaN 1 11111111 00100010001001010101010 = NaN

0 00000001 00000000000000000000000 = +1 x 2^(1-127) x 1.0 = 2^(-126) 0 00000000 10000000000000000000000 = +1 x 2^(-126) x 0.1 = 2^(-127) 0 00000000 00000000000000000000001

= +1 x 2^(-126) x 0.00000000000000000000001 = 2^(-149)

= smallest positive denormalized machine number

InMatlab, real numbers are usually represented indouble precision. The functionsinglecan however be used to convert numbers to single precision.

Matlabcan also print real numbers using the hexadecimal format, which is convenient for examining their internal representations:

>> format hex

(35)

>> x=single(2) x =

40000000

>> 2 ans =

4000000000000000

>> s=realmin(’single’)*eps(’single’) s =

00000001

>> format long

>> s s =

1.4012985e-45

>> s/2 ans =

0

% Exceptions

>> z=sin(0)/sqrt(0) Warning: Divide by zero.

z = NaN

>> y=log(0)

Warning: Log of zero.

y = -Inf

>> t=cot(0)

Warning: Divide by zero.

> In cot at 13 t =

Inf

We can see thatxrepresents the number 2 in single precision. The functions realminandepswith parameter ’single’compute the machine constants for single precision. This means thatsis the smallest denormalized number in single precision. Dividing s by 2 gives zero because of underflow. The computation of zyields an undefined expression which results in NaNeven though the limit is defined. The final two computations foryandtshow the exceptionsInfand -Inf.

2.3.2 Double Precision

The IEEEdouble precisionfloating point standard representation uses a 64- bit word with bits numbered from 0 to 63 from left to right. The first bit S is the sign bit, the next eleven bitsEare the exponent bits foreand the final 52 bitsF represent the mantissa m:

S

e EEEEEEEEEEE

m F F F F F· · ·F F F F F

0 1 11 12 63

Références

Documents relatifs

Show that a (connected) topological group is generated (as a group) by any neighborhood of its

This handbook is a guide to setting computer security policies and procedures for sites that have systems on the Internet (however, the information provided should also be

Either this means that your project is doomed to fail and you will never manage to build a time machine; or it means that the project will succeed but that you are not going to use

The purpose of this note is to give a generaltzation of this fact in a non-linear potential theory developed by s.. There is another proof for this result, based

The key to solving any DP problem efficiently is finding the right way to break the problem into subproblems such that エィ・ bigger problem can be solved relatively easily once

At that time the com­ pany, under the direction of Peter Z inovieff, was exploring the possibilities of using mini­ computers to control electronic music instruments.. My job was

This is actually not a difficult thing to do (for a hypothesis test of means based on a normal or a t distribution).. By

suitable for such systems We propose a graph-type convergence which works, but in turn yields a new type of variational problems as possible limits.. We call these