Testing for Equality - Implications of the Finiteness of the Representation

Part 2 The Software Side: Disappointments and

8 Implications of the Finiteness of the Representation

8.2 Testing for Equality

In an algorithm testing for equality is trivial and requires no further discus-sion. In a program, however, testing for equality can be quite complicated because two values that mathematically are identical may be different because they were computed in different ways that resulted in differences owing to their finite representation. A fairly straightforward example is the test of whether the square of the square root of 2 is equal to 2:

( )² =? 2,

which is virtually guaranteed to fail in all programming languages.⁷ A more confusing example is the following test:

1.3 * 10 =? 13,

which again would usually fail for the following reason. Recall that 1.3 does not have a finite binary representation; as a result, its product with the number 10 is not exactly equal to 13. This is true even if we were more careful with the question of type. In both examples, we are mixing reals and integers.

7 We could be more deferential to type concordance and ask whether ( )² = ? 2.0 is true; the test would likely fail, nevertheless.

2.0

C6730_C008.fm Page 180 Friday, August 11, 2006 9:27 AM

Implications of the Finiteness of the Representation of Numbers 181 For algorithms, this is generally ignored; for programs, it is often frowned upon.⁸ So, even if we tested

1.3 * 10.0 =? 13.0, the result would be false.

The problem alluded to in these examples does not arise with integers or with any type where the entire range can be represented (booleans, charac-ters, strings).⁹ Consequently, testing for equality for these types of items is not problematic.

Given the difficulty in obtaining the correct answer when testing for equal-ity of two real values, how can we finesse this problem? We look at two different aspects of a test for equality, namely a test where we increment or decrement a value until some other value is reached and a general test of whether two values are equal.

The first type of testing for equality is a modified enumeration, where the step size is not an integer. This is frequently unsafe and should be avoided if at all possible. Thus, the test for equality arises in the context of reaching a termination condition. Since it is imperative to obtain termination, we could simply replace the test for equality with a test of equal to or less (greater) than. For example, recall the skeleton of a recursive function dis-cussed in Section 7.1:

F(x)

if x=0.0 do { basis case }

else { statements; F(x-0.1); statements }

8 Some programming languages insist that only operands of like type may be combined. In this case operands would have to be converted explicitly to the more general type, in this case from integer to real. Failure to convert would result in an error.

One might question why conversion has to be from integer to real. If one were to convert from real to integer, the above test would hold. There are two arguments against this assertion. First, one always wants to go from a more restricted situation to a less restricted one; since the integers can be viewed as a subset of the reals (at least mathematically; the internal representation of the fixed point number 13 is very different from that of the floating point number 13.0), this would indicate conversion from integer to real (which would always preserve the value involved, in contrast to going the other direction). Second, even if one were to convert a real to an integer, this would not guarantee that one would obtain 13 when applying this conversion to 1.3*10. It would depend on the way in which a real is converted into an integer. There are at least three different standard ways of doing this: rounding down (x is the largest integer such that x≤x), rounding up (x is the smallest integer such that x≤x), and rounding ([x] is the unique integer satisfying x− 0.5 < [x]≤x + 0.5). (To see that these three techniques are all different consider the values 3.9 and −3.9: 3.9 = 3, −3.9 = −4, 3.9 = 4, −3.9 = −3, [3.9] = 4, and [−3.9] = −4.) Consequently, if the actual representation of 1.3*10 is smaller than 13, 1.3*10 would yield 12, not 13. Similarly, if the actual representation of 1.3*10 is larger than 13, 1.3*10 would yield 14, not 13.

9 This problem does not arise with integers either, even though they do not have a finite range, because it fundamentally derives from underflow. Integers do not suffer from underflow — only overflow. Overflow, however, is not an issue in testing for equality, since we are considering only the test of whether two values are equal, and either both values are overflow or none is. How these two values are obtained is a separate issue, which is taken up in the next section.

C6730_C008.fm Page 181 Friday, August 11, 2006 9:27 AM

182 A Programmer’s Companion to Algorithm Analysis If this function is called with a positive integer as an actual parameter (ignoring again the mixing of integer and real types), say F(2), it is unlikely to terminate because the test for equality x = 0.0 most likely will fail since 0.1, the decrement, does not have a finite binary floating point representation.

Thus, the basis case will never be encountered. Were we to replace the test x = 0.0 with x ≤ 0.0, we would at least be guaranteed that the recursion terminates, but it is not clear that the result would be what we expect.

Why might the result not be correct? This depends on the actual value of the number 0.1. Since we have already established that that value cannot be exactly 0.1, it is either larger or smaller (admittedly by very little, but this very small quantity nevertheless trips us up). The obvious intent of the original algorithm is to invoke recursion 10 times for each unit; thus F(2) should give rise to 20 recursive calls, namely F(1.9), F(1.8), ..., F(0.2), F(0.1), and finally F(0.0), which then invokes the basis case. However, if the actually represented value of 0.1 is smaller than the real value 0.1, then the actually represented values of the intermediate quantities (1.9, 1.8, ..., 0.2, 0.1, and 0.0) will all be larger than the values of their identifiers.¹⁰ As a result, it will not be this last value that invokes the basis case (since it is objectively larger than 0.0), but the next value computed in this sequence, which is something quite close to –0.1. This is why the test for equality was such a bad idea, since in the original formulation, the basis case (x = 0.0) would never be reached.¹¹

This brings up the second aspect of a test of equality, namely when we really want to know whether the two items are equal. Clearly, if the values are of a type whose entire range can be represented exactly, no problems exist. However, if the values are floating point type, it is surprisingly difficult to decide whether they are equal.

The obvious approach is to define a margin, say ε, and to decree that the value x is considered equal to the value y if x and y differ by no more than ε:

|x-y|≤ε.

The problem is how to determine ε. On the one hand, ε should be small enough to avoid considering two values as equal that should really be considered different; on the other hand, ε should be large enough that all values one might consider equal are in fact equal. The problem is that there may not be a single fixed value of ε that would work in all instances. In

10 This is not absolutely guaranteed, although it is extremely likely. We are assuming monotonic-ity here; that is subtracting something smaller than 0.1 from 2 yields something larger than 1.9, and then subtracting something smaller than 0.1 from this value yields something larger than 1.8, and so on. To obtain a definitive answer of whether this is true depends on the specific imple-mentation of arithmetic in the processor. (The reason it could not be true is related to the way in which the representation is determined, especially the rounding that occurs when one goes from one order of magnitude to another. Note that since the internal representation is binary, 1.99999 is an order of magnitude larger than 0.99999.)

11 We note that it is good programming practice to avoid testing for equality and replacing it by

≤ or ≥ if the values involved are considered safe, for instance integers.

C6730_C008.fm Page 182 Friday, August 11, 2006 9:27 AM

Implications of the Finiteness of the Representation of Numbers 183 many cases the size of ε depends on the context. Thus, the decision of how large to make ε generally lies with the programmer who is called upon to implement an algorithm in software.¹²

It should be clear that the difficulties with testing for equality carry over to other comparison operators, such as ≤ and ≥ (when applied to floating point numbers). However, tests for < and > are ordinarily safe.

Dans le document A ProgrAmmer’s ComPAnion to Algorithm AnAlysis (Page 189-192)