Bit and Word Complexity Revisited - Implications of the Finiteness of the Representation

Part 2 The Software Side: Disappointments and

8 Implications of the Finiteness of the Representation

8.1 Bit and Word Complexity Revisited

Implications of the Finiteness of the Representation of Numbers

About This Chapter

Algorithms assume that numbers are not afflicted with any limitation of representation. Programs, however, must operate within a very different universe. All numbers in computer systems are represented in a way that invariably implies limitations on what can be represented internally. These limitations have important implications for programming, from the question of which numbers actually exist inside a program to the problem of testing for equality, the issues related to mathematical properties, and curious behaviors related to convergence.

The finiteness of the number representation in programs has several important implications. They are mainly related to floating point numbers, but even fixed point numbers occasionally display quirky behavior. We first revisit the distinction between bit and word complexity in the context of software development and point out that many numbers that we expect to be representable are not. Then we examine the implications for several issues related to arithmetic, in particular, the question of how to test for equality.

Related to this problem is the validity (or lack thereof) of various mathemat-ical properties, identities that we take for granted in algorithms but that provide numerous surprises in programs. Related to this is the convergence of a sum of a sequence of numbers that mathematically diverges.

8.1 Bit and Word Complexity Revisited

Every item in a program necessarily has a finite representation. In almost all instances, this representation is not just finite, but bounded. For the most part, bit complexity does not arise naturally in software; programs may

C6730_C008.fm Page 177 Friday, August 11, 2006 9:27 AM

178 A Programmer’s Companion to Algorithm Analysis specify how many bytes are to be used to represent an item, but once this decision is made, the number of bits used for this item remains fixed. An unbounded representation inherently requires dynamic memory allocation techniques and is used only in exceptional circumstances.¹ For most practical purposes, complexity in programs is word complexity. The only question is how many bits that word has.

Once a fixed-length representation (i.e., a word) is chosen, several limita-tions follow immediately. For one, only a finite number of different values can be represented using a fixed number of bits, certainly no more than 2^m if we have m bits. This implies that we will have to contend with issues such as overflow, underflow, and rounding errors. Before we address these, we must point out another aspect of software that differentiates it from algorithms.

Everybody is aware that certain real values do not have a finite represen-tation, for example the square root of 2, . Most people also know that certain rational numbers do not have a finite decimal representation, for example 1/3.² However, very few programmers are aware that numbers with a perfectly good finite representation, for example 0.3 or 1.3, do not have a finite representation in ordinary programs. This is because the internal rep-resentation of numbers in ordinary programs is binary, and many decimal numbers with a finite representation do not have a finite binary representa-tion. To state it more pointedly, when we write the constant 1.3 in a program, this is, strictly speaking, not a number; it is an identifier that refers to a value that is close to the (decimal) value 1.3, but is never exactly 1.3. Even more confusing is that the value of this identifier 1.3 can change. Its value depends on the number of bits of the word used to represent this identifier. Since program portability does not address the question of the word length of a processor, the same program when run on different platforms (with different processors) may yield different values for the identifier 1.3. In other words, what seems to be the constant 1.3 is not a constant and does not have the value 1.3.³

1 In some instances extremely long representations of numbers are considered. Prominent exam-ples are software packages that support operations on arbitrarily long integers. The need for arithmetic on extremely long integers arises, for example, in public-key cryptography; in partic-ular, the RSA encryption method is based on prime numbers with several hundred digits and employs operations on integers of such lengths for the encryption and decryption of messages.

2 One might argue that 1/3 is a finite representation, but it is not a decimal representation. We know from mathematics that any rational number can be represented in the form a/b, where a and b are integers. However, ordinary programs do not use this representation of rational num-bers (although some software packages do); moreover, no such finite representation exists for real numbers that are not rational.

3 We will not enter into a discussion of processor arithmetic. This is a complicated topic and well beyond the scope of this book. For our purposes, it suffices to know that many processors imple-ment some standard, usually the IEEE standard. However, many processors also deviate in some instances from the standard they ostensibly implement. While in most cases the differences are negligible, it behooves programmers to be at least aware of the underlying problematic.

C6730_C008.fm Page 178 Friday, August 11, 2006 9:27 AM

Implications of the Finiteness of the Representation of Numbers 179 It is obvious that in this, algorithms differ significantly from software.

Nobody expends any time worrying about the representation of numbers in an algorithm; we essentially assume that all ordinary mathematical aspects of numbers are satisfied. This implies that there is no over- or underflow, there are no rounding errors, and every number we can write down exists exactly as we wrote it. This holds even truer for the difference between decimal and binary representation.

The finiteness of the representation of numbers, coupled with the fact that the internal representation is binary, has a number of surprising conse-quences. In particular, testing for equality is suddenly a complicated matter and many mathematical identities no longer hold.

Some data items’ entire range of values can be represented exactly.⁴ This is usually the case whenever the range is finite. Prominent representatives of this type of values are boolean items; also in this category are strings.⁵ It is not difficult to verify that any item whose entire range is finite (and presumably not too large) presents no significant representational problems.

Consequently, any operations applied to such items will yield again exactly representable values as long as they yield values of that type.⁶ Consequently, there are no differences between algorithms and software for items whose range is finite, so no problems owing to their representation can arise when algorithms are implemented in software.

Many programmers believe erroneously that problems owing to the finite-ness of the number representation are irrelevant to their applications, indeed that these difficulties are only of concern when doing heavy-duty numerical analysis. However, representational issues arise in many contexts that have nothing to do with numerical analysis. A simple illustration comes from banking. Consider a program that calculates interest on bank balances. The interest is calculated monthly on the average balance during that month.

Here are two instances where number representation is important: for the calculation of the average daily balance in an account and for the application of the interest rate to that average balance. It is important in this application

4 Clearly, integers and real numbers do not satisfy this. For reals, 1/3 is an example; for integers, the integer following the largest representable integer is an example.

5 The case of strings is perhaps not entirely obvious since strings could be of unbounded length (in contrast to the type character, which can be viewed as a string of length 1). However, given a string (however it may be represented), we can first determine its length, and for this length there is a finite number of different strings (assuming the strings are over a finite alphabet). Why does this argument not carry over to floating point numbers? Because by general agreement, two strings of the same length that differ anywhere are considered different, while two floating point numbers, if they differ by a very small quantity, may be considered equal (see the discussion of testing for equality in Section 8.2). Moreover, in many cases a floating point number is a finite representation of a real number that would require an infinite exact representation; therefore, a better approximation would have more bits or digits, and the length could be increased arbi-trarily without ever providing an exact representation.

6 We assume that an operation applied to items of a certain type yields again an item of that type.

This is somewhat tenuous when it comes to integers and division — the integers are not closed under division, but the reals are. However, the integers are closed under addition, subtraction, and multiplication (as are the reals).

C6730_C008.fm Page 179 Friday, August 11, 2006 9:27 AM

180 A Programmer’s Companion to Algorithm Analysis that all values be in terms of dollars and cents; in other words, there are no fractional cents. Thus, there should be exactly two decimal digits after the decimal point. If the balances are represented as floating point numbers, once the average is computed, the final result must be converted into dollars and cents. Note that the average is unlikely to be in this format. It is even more unlikely for the interest calculation; again, the result must be explicitly converted into the requisite format. It should be clear that care must be exercised, since for large balances, the interest computation may not be correct if the number of bits employed to represent the result is too small.

While any differences are probably only on the order of a few cents, the bank is nevertheless unlikely to be forgiving about such errors.

Another instance where ordinary programs may encounter unexpected problems that are ultimately due to the finiteness of the number represen-tation relates to something seemingly trivial, namely testing whether two values are equal.

Dans le document A ProgrAmmer’s ComPAnion to Algorithm AnAlysis (Page 186-189)