Convergence - Implications of the Finiteness of the Representation

Part 2 The Software Side: Disappointments and

8 Implications of the Finiteness of the Representation

8.4 Convergence

Consider summing up a sequence of numbers. One would ordinarily assume that it makes no difference in what order one sums. One would be correct if this were done in an algorithm (that is, mathematically), but one would be very wrong if one held the same expectation for a program. The poster child of this phenomenon is the sequence 1, 1/2, 1/3, 1/4, 1/5,..., whose sum is well known (at least to mathematicians) to diverge. In other words, for any constant M there exists an integer n such that the sum of the first n terms of this sequence exceeds M. Yet if we program this summation, for example, as

sum :=0.0; for i:=1 to N do sum := sum + 1.0/i,

we will discover that there exists a value of N such that the sum S_N of the first N terms is equal to the sum S_N+k of the first N + k terms for any k > 0.

This type of convergence is a special instance of the failure of associativity to hold. The explanation is quite simple: Adding the term 1.0/(N + 1) to the sum S_N of the first N terms presents us with a situation very much like 1.0 + εmin = 1.0, except that both S_N and the term 1.0/(N + 1) must be normalized by S_N to yield 1.0 and εmin. In other words, we have to solve

εmin* S_N > 1.0/(N + 1)

for N. This yields a value s for N such that Ss = Ss+1. This value s can easily be obtained as follows; it depends on the word length used for the repre-sentation of sum:

sum :=0.0; i:=1.0;

while sum < sum +1.0/i do

{ sum:=sum + 1.0/i; i:=i+1.0 };

s:=i

Let us now sum up from the back; here we assume we know the value of s from before:

sum :=0.0; for i:=s to 1 by –1 do sum := sum + 1.0/i Let us denote the result of this summation (as a function of s) by Ts. The first surprise is that Ts ≠ Ss; in other words, summing up backwards yields a different result than summing up from the beginning. The second surprise is that Ts < Ts+1. When summing up from the back, we can add many more terms that affect the total than when summing up from the front!

The reason for this surprising result is that first summing up the smaller terms allows them to amount to a quantity that is not as easily overwhelmed by the larger values toward the front. Starting at the front first adds up all the larger terms, so by the time we get to the smaller terms, they are wiped out à la 1.0 + εmin = 1.0.

C6730_C008.fm Page 185 Friday, August 11, 2006 9:27 AM

186 A Programmer’s Companion to Algorithm Analysis One might get the impression that the difference between summing up sequences of simple numbers either forward or backward is just a parlor game that has no significance to general programming. This is by no means valid, even though this type of sum is usually not encountered when design-ing software. However, it does serve as a rather drastic warndesign-ing about the assumptions that programmers make. These assumptions are generally rooted in their understanding of mathematics, and more specifically, arith-metic. Unfortunately, what holds unconditionally in mathematics, with its disregard of representational issues, does not necessarily hold for software, where the finiteness of the number representation can play surprisingly nasty tricks on the unsuspecting programmer.

8.5 What to Do about It

Programmers tend to dismiss issues related to the representation of numbers as relevant only to number crunching. This is not entirely incorrect. For example, bit complexity tends to be irrelevant for almost all practical pro-grams; programmers will use whatever words are provided in a particular language. While heavy-duty numerical error analysis is clearly not needed for software outside of numerical analysis, every programmer should be aware of some of the differences between what mathematics teaches about numbers and what happens to numbers in programs. The fact that certain values cannot be exactly represented can have some impact on software that is far removed from number crunching. The same goes for mathematical identities that mysteriously cease to be valid in software. Finally, testing for equality is often not well understood when it involves floating point num-bers. All of these problems can render software unreliable, if not outright wrong. While there are limits to a programmer’s ability to control the errors caused by the finiteness of the number representation, the awareness of these problems may be very helpful in avoiding them. Certainly, using a test for equality as a termination condition, for iteration or for recursion, is generally not a good idea when floating point numbers are used; these tests should be replaced by safer ones. If testing for equality is directly required, it should be carried out using a margin of error that depends on the context of the test (magnitude of the numbers involved).

Bibliographical Notes

Error analysis and error propagation for floating point numbers is compre-hensively covered in standard numerical methods textbooks. Wilkinson’s

C6730_C008.fm Page 186 Friday, August 11, 2006 9:27 AM

Implications of the Finiteness of the Representation of Numbers 187 books dating back to the early 1960s are classics: Rounding Errors in Algebraic Processes and The Algebraic Eigenvalue Problem. Other texts are Golub and Van Loan: Matrix Computations and Higham: Accuracy and Stability of Numerical Algorithms. Focused on specific computer platforms are Startz: 8087/80287/

80387 for the IBM PC and Compatibles and Asserrhine, Chesneaux, and Lam-otte: Estimation of Round-Off Errors on Several Computers Architectures. Papers discussing the importance of error analysis for general computing are Moler:

“Double-Rounding and Implications for Numeric Computations”; Goldberg:

“Computer Arithmetic”; and, in particular, Goldberg: “What Every Com-puter Scientist Should Know about Floating-Point Arithmetic”. The internal representation and its consequences for programs are also treated in text-books on programming language concepts and compilers, for example, Aho, Sethi, and Ullman: Compilers: Principles, Techniques and Tools.

Exercises Exercise 1

Using different word lengths (2 bytes, 4 bytes, and 8 bytes), determine experimentally the value of εmin for your computing platform.

Exercise 2

Using different word lengths (2 bytes, 4 bytes, and 8 bytes), determine experimentally the smallest value of s such that S_s = S_s+1, where S_j = 1.0 + 1.0/2.0 + 1.0/3.0 + … 1.0/j. Also determine experimentally the smallest value of t such that T_t = T_t+1, where T_j = 1.0/j + 1.0/(j − 1.0) + … 1.0/2.0 + 1.0.

Exercise 3

a. Using different word lengths (2 bytes, 4 bytes, and 8 bytes), find fixed point numbers a, b, and c such that their associativity (distrib-utivity) does not hold.

b. Using different word lengths (2 bytes, 4 bytes, and 8 bytes), find floating point numbers a, b, and c such that their associativity (dis-tributivity) does not hold.

Exercise 4

In view of the problems with representing rational numbers a/b, where b is not a power of 2, one may consider representing an arbitrary rational number

C6730_C008.fm Page 187 Friday, August 11, 2006 9:27 AM

188 A Programmer’s Companion to Algorithm Analysis a/b as the pair [a,b] consisting of the two integers a and b, with b assumed to be positive. In order to obtain uniqueness of this representation, one assumes that a and b are relatively prime; if they are not, for example a = k^.a' and b = k^.b' with k ≥ 2 an integer, then a/b = a'/b', with a' and b' requiring fewer bits for their representation. This is a relatively simple approach that can easily be implemented. Note that the rational numbers are closed under the four basic arithmetic operations.

a. Formulate and implement methods for the addition, subtraction, multiplication, and division of the pairs [a,b] and [c,d].

There is however a problem with this approach which affords absolute precision; in order to understand this problem, we must go to bit complexity.

b. Assume that the length of the integers a, b, c, and d is l. Show that the length of the integers f and g, where [f,g] represents the sum, difference, product, or factor of a/b and c/d, may be 2l. Generalizing this, show that a sequence of m basic arithmetic operations may require integers of length ml to represent the final rational number, for any m > 1. Hint: Assume that a, b, c, and d are all prime numbers.

C6730_C008.fm Page 188 Friday, August 11, 2006 9:27 AM

189

9

Asymptotic Complexities and the Selection of Algorithms

About This Chapter

Asymptotics are the heart of the complexity analysis of algorithms, but their usefulness for software development is limited since by their very nature they ignore constant factors. When constant factors are taken into consider-ation, some overall bad algorithms may be competitive over a certain range of input size. If this range happens to include all practical cases, the bad algorithm may turn out to be superior to an asymptotically much better one.

How to determine this and how to apply this to practical situations is the goal of this chapter.

9.1 Introduction

The complexity analysis of an algorithm aims at categorizing algorithms into a few clearly defined complexity classes. For most practical applications, these classes are polynomial, meaning that the complexity (time or space) is bounded by a polynomial p(n) of the measure of the input employed (usually size of input):

p(n) = a_s^.n^s+ a_s_-1^.n^s^–1 + ... + a₂^.n² + a₁^.n + a₀, for s a positive integer.

This polynomial is then equivalent to n^s. Thus, p(n) = O(n^s) and p(n) ≠O(n^s^–1).

Complexities are thus compared using the highest-order term only, ignor-ing the constant attached to that term as well as all lower-order terms. As we discussed in Chapter 1, we are primarily interested in eight complexity

C6730_C009.fm Page 189 Monday, July 24, 2006 12:29 PM

190 A Programmer’s Companion to Algorithm Analysis classes, defined by their highest-order terms ϕi, i = 1, ..., 8, where ϕ1(n) = 1, ϕ2(n) = log₂(n), ϕ3(n) = , ϕ4(n) = n, ϕ5(n) = n^.log₂(n), ϕ6(n) = n², ϕ7(n) = n³, and ϕ8(n) = 2ⁿ.¹ Considering only the highest-order term is justified if one is primarily interested in what happens for very large values of n, since for those the lower-order terms become increasingly irrelevant; their contribu-tion is completely obliterated by that of the highest-order term. The dismissal of the constant factor of the highest-order term reflects the desire to keep things simple; if two polynomials have the same degree, it is convenient for comparison purposes to consider them equivalent. This allows us to con-struct a nice, orderly hierarchy of complexities. However, for software, things are not quite that nice and orderly.

To obtain a methodology that is practically useful for measuring the per-formance of a program, obtaining the algorithm’s complexity functions is only the first step. We must also pay considerable attention to the constants that were hidden during the complexity analysis. Additionally, programs have certain time and space requirements that are usually ignored when algorithms are being analyzed. Furthermore, when deciding which of two algorithms or programs is better (in some sense), we must look at crossover points in the complexity functions.

9.2 The Importance of Hidden Constants

We have already argued that for software, it is highly unusual to have bit complexity as a valid concept. Virtually everything of practical importance is based on word complexity. This holds for both time and space complexity.

Assume now that we have obtained f(n) as the complexity of an algorithm, for n some measure of the input. If this is the space complexity of the algorithm, then the memory requirements of a program implementing that algorithm are essentially f(n) + C_sp, where the constant C_sp accounts for the space required for the program, for the symbol tables, and for other infor-mation associated with the program. This constant C_sp is independent of the measure n of the input to the program. Thus, the space complexity of the program is closely related to that of the underlying algorithm, provided space is measured in words.² We reiterate that space requirements should always be based on worst-case analyses (see Section 6.2); average space complexity has a limited usefulness for software.

1 As pointed out, there are infinitely many complexity classes between (and beyond) these eight.

For example, ^.n is strictly between ϕ₅(n) and ϕ₆(n), that is, ^.n = O(ϕ₆(n)), ϕ₆(n) ≠O( ^.n), ϕ₅(n) = O( ^.n), and ^.n≠O(ϕ₅(n)). However, for most practical purposes, these eight are gen-erally considered sufficient to categorize complexity functions.

2 If bit complexity is used for the algorithm, the actual space requirement of the program depends on the way these bits are represented. There is a good deal of variability, from using an entire word for each bit to using packed structures.

2 n

2n ²n ²n

2 n ² n

C6730_C009.fm Page 190 Monday, July 24, 2006 12:29 PM

Asymptotic Complexities and the Selection of Algorithms 191 The relationship between the complexity of the algorithm and that of the corresponding program is not quite as clean when it comes to time. Recall that the time complexity of an algorithm is the statement count for the algorithm, in essence, each statement accounts for one unit of time. A pro-gram’s time requirements are not quite that easily captured. By and large, we end up with c₁^.f(n) + c₂, where the constant c₁ measures the duration³ of an average statement and the constant c₂ reflects the amount of time required to load the program and initialize the processes associated with it. Each of the two constants hides a good deal of work.

The difficulty with the constant c₁ is the assumption that we know what an average statement is. We can make some educated guesses or we can determine a range for this constant. The most systematic approach is to base the value of c₁ on some limited test runs of the program at hand. In practice, c₁ will also depend on the target platform (thus, it is related not just to the architecture, but also to the instruction set and the ability of the compiler to exploit efficiently the instruction set). Generally, a reasonably acceptable value for c₁ is acquired experimentally. Nevertheless, the precise value of this constant depends on the program to be executed. Realistically we can only hope for a reasonably small range.⁴

The constant c₂ is a measure of the fixed cost of program execution. In other words, even if virtually no statements are executed, the amount of time c₂ must always be expended. A typical situation where this might occur is a wrong user input that causes the program to abort. It is important to understand that c₂ is definitely not 0. In fact, its value can be quite substantial.

However, it is a time penalty that will always be incurred, so it may appear to be insignificant. While this is certainly not true for most programs, it does have the advantage that there are never any surprises; we always must take at least c₂ time, even if nothing happens.

In the next section we will discuss crossover points. These are particularly important when comparing two algorithms, and then the corresponding programs, according to some complexity measure. Here we explore a slightly different issue. Assume we have two programs with the same asymptotic (time) complexity ϕi(n). The decision of which program to use will first hinge on the constant factors for each program. However, let us assume that both have comparable factors. We may encounter the following situation. One algorithm assumes that n is a power of 2; the complexity analysis is based on that assumption and if n happens not to be a power of 2, the algorithm simply assumes that we pad the input so that the assumption is again satisfied. The other algorithm works for any value of n. In this case, it may

3 We are deliberately vague about the unit. One approach might be to use actual time, for exam-ple in nanoseconds. Another approach would be to assume that the unit involved is a synthetic one that allows us to maintain the simple idea of a unit statement. In this case, we would still be looking at some type of statement count, except that we now take into consideration the actual duration of this average statement.

4 In practice, one hopes that the execution time of a program is within a range of one half of the predicted quantity and double that quantity.

C6730_C009.fm Page 191 Monday, July 24, 2006 12:29 PM

Dans le document A ProgrAmmer’s ComPAnion to Algorithm AnAlysis (Page 194-0)