Parallel Processes - Implications of Compiler and Systems Issues

Part 2 The Software Side: Disappointments and

6 Implications of Compiler and Systems Issues

6.8 Parallel Processes

In modern computing systems, many processes occur in parallel. Thus, a programmer may not be aware of processes that occur at the same time as the execution of his or her program; this in turn may influence the behavior of that program. These processes may be related to the program being exe-cuted (as in the case of asynchronous I/O, whereby the disk controller may execute while the program proceeds with its own instructions) or they may be entirely unrelated to the program executing. In both cases the programmer has virtually no influence on these processes.²⁶

In most cases parallel processes are used to improve the performance of a program. While some overhead is incurred, it is usually not significant enough to be noticeable. However, asynchronous I/O may improve the run-time behavior of a program measurably by allowing the retrieval of data and the execution of instructions to proceed in parallel. Ordinarily, this is controlled by the compiler and the run-time support system of the target platform on which the software is executed. The only way in which this may result in disappointments would be if one were to migrate from a platform supporting asynchronous I/O to one that does not. That such a migration negatively affects the run-time performance of the code is unavoidable.²⁷

An altogether different situation occurs if we are dealing with an overtly parallel program. In this case, the program itself would explicitly create parallel processes whose coordinated execution results in the solution of the overall problem. Determining the time complexity of parallel programs tends to be significantly harder. Parallel programs are outside of the realm of this book, so we refer readers interested in this topic to the literature.

6.9 What to Do about It

Several lessons can be distilled from the observations in this chapter. We mention the more salient points.

An important lesson concerns space complexity. While it may be of interest to know the average space complexity of an algorithm, for programs the average space requirement is not a very useful concept. Instead, the pro-grammer should concentrate on the worst-case space complexity. A program that crashes because of insufficient space is not very useful, even if this occurs

26 The second type of processes, unrelated to the program, can be excluded by running the pro-gram in an environment where no other processes may execute. This is an expensive proposition that is rarely available in practical situations. Thus, we will not consider it any further.

27 Supporting asynchronous I/O typically means a more sophisticated computing platform.

Such a migration would therefore regress from an advanced platform to a more primitive one.

Such a migration would be highly unusual and should be avoided if at all possible.

C6730_C006.fm Page 162 Friday, August 11, 2006 9:21 AM

Implications of Compiler and Systems Issues for Software 163 only rarely. This has also implications for the selection of algorithms to implement.

Programmers should pay particular attention to dynamic data structures and the memory management issues they engender. Allocation and deallo-cation of memory are important for the run-time behavior of software.

This includes hidden operations, especially those generated by the use of recursion.

Of particular interest is the question of garbage collection. While memory compaction is clearly outside of the control of the programmer in most contexts, awareness of this and related issues will at least remove the element of surprise when the behavior of a program appears unpredictable. It may also allow the programmer to avoid some hidden land mines, especially those related to the use of recursion.

A similar observation holds for parameters. The programmer may not have much choice once the programming language is selected, but she should at least be knowledgeable about the various ways of passing para-meters and their implications for time and space complexities. Moreover, an incorrect use of parameter passing mechanisms may result in erroneous code.

Most of the remaining issues enumerated and explained in this chapter are even less tangible. Initialization of variables is a language property, as is the absence of range checks and the treatment of packed arrays. Lazy evaluation may or may not be stipulated in the language definition; absent any statement in the definition, it becomes a compiler issue (with the obvious negative implications for software portability).

Optimization issues, however, are quite tangible. The programmer should always keep the program as simple as possible, resisting the temptation to fine-tune it to a particular target platform and thereby enabling a good optimizing compiler to hold up its end of the bargain.

Bibliographical Notes

Most of the background material for this chapter can be found in standard textbooks covering programming languages and compilers, for example, Aho, Sethi, and Ullman: Compilers: Principles, Techniques and Tools; Aho and Ullman: The Theory of Parsing, Translation, and Compiling; Muchnik: Advanced Compiler Design and Implementation; and Zima and Chapman: Supercompilers for Parallel and Vector Computers. Garbage collection is also discussed in conjunction with operating and run-time support systems. This also holds for the parallel processes briefly mentioned in Section 6.8. We refer to Sil-berschatz, Gavin, and Gagne: Operating Systems Concepts and other texts mentioned in this chapter. An interesting discussion (almost amounting to a mea culpa) can be found in “Good Ideas, through the Looking Glass”,

C6730_C006.fm Page 163 Friday, August 11, 2006 9:21 AM

164 A Programmer’s Companion to Algorithm Analysis written by Niklaus Wirth, author of Pascal and several other programming languages, in which he provides his view of a number of language constructs and their usefulness (or lack thereof). This echoes some of the criticisms listed in this chapter.

Exercises Exercise 1

Consider the following function:

CheckSums(A,B,C)

where A is a two-dimensional (2D) array of size [1:n, 1:n] and B and C are vectors of size [1:n], n being a positive integer. The body of CheckSums implements the following computations:

B[i] = A[i,1] - A[i,2] + A[i,3] - .... A[i,n]

C[i] = A[1,i] - A[2,i] + A[3,i] - .... A[n,i].

a. Give the complete function CheckSums (A,B,C), including the code of the function body and the method of passing the three parameters.

b. Determine the time and space complexity of your function (space complexity is the amount of memory space required by the function, in addition to the space for the actual parameters).

Warning: Your function must work correctly for calls such as CheckSums(D[1:n,1:n], D[1:n,1], D[1,1:n])!

where D[1:n,1] is the first column of the 2D array D and D[1,1:n] is the first row.

Exercise 2

In Section 3.2 we discussed search trees and their height-balanced version, AVL trees.

a. Determine the minimal prudent space requirements for insertion and deletion for general search trees and for height-balanced search

C6730_C006.fm Page 164 Friday, August 11, 2006 9:21 AM

Implications of Compiler and Systems Issues for Software 165 trees. Keep in mind that the use of recursion may not be the most appropriate approach.

b. Examine how the parameters must be passed for the functions for insertion and deletion with minimal prudent space requirements, both for general and for height-balanced search trees.

Exercise 3

Examine how the compiler for your favorite programming language handles space allocation and deallocation in conjunction with recursive calls. You may be able to obtain information from the compiler’s documentation, but you can also glean a good deal of information by conducting an experiment, for example as follows:

a. Create a recursive function RF(n) (it must not be tail recursive — otherwise a clever compiler might substitute iteration), where the parameter n indicates the depth of the recursion stack. (To keep RF efficient, it should have only one recursive call in its body.) Make sure the local variables of RF require sufficient space so that repeated executions of RF(n) do not only result in allocation requests, but also require releasing the memory upon return. Then call RF(n) repeat-edly, say 100 times, for several values of n, say 10, 100, 1,000, 10,000, and 100,000. Carefully instrument your program so you can measure the time each recursive call takes. If garbage collection occurs, it will introduce timing discrepancies that are not explainable by the pro-gram’s instructions alone. These discrepancies will indicate when and where garbage collection functions were invoked during the program’s execution.

b. A refinement of (a) would be to obscure the recursion as follows:

Instead of RF calling itself, RF might call a function RG (which is very similar in structure to RF, but the size of the local variables should be slightly different). This function RG then calls RF. These recursive functions RF and RG are also called coroutines. (There are only two coroutines in this case, but in general the chain of function calls connecting RF back to itself can be arbitrarily long: RF0 calls RF1, which calls RF2, which calls RF3, etc., until finally RFn calls RF0.)

C6730_C006.fm Page 165 Friday, August 11, 2006 9:21 AM

167

7

Implicit Assumptions

About This Chapter

In algorithms there are obvious assumptions that are tacitly presumed to be satisfied. However, a program must be able to handle exceptional situations that violate these tacit assumptions. Programs must also test for assumptions that are fundamental for the correct functioning of the approach. What may be quite obvious for the algorithm designer is nevertheless to be verified in a program.

Implicit assumptions occur frequently in algorithm design; they are less common and far less acceptable in software. This tension between reasonable assumptions and unreasonable attention to detail is an important aspect of the difference between algorithms and software. We examine exceptions as well as more fundamental issues related to this topic.

7.1 Handling Exceptional Situations

Algorithms presume that the reader has some intelligence. Therefore, they tend to be formulated without covering every possibility or aspect. In contrast, programs must be written so that even unexpected input and results do not cause them to crash. This implies that programs do not have the luxury of concentrating on the important aspects of a problem’s solution — all aspects must be covered comprehensively. Modern programming languages recognize this need and provide facilities for handling exceptions, but these facilities must be used by the programmer or they will not improve the programs. Another aspect an algorithm may ignore but a program must not is the initialization of function calls, especially recursive ones. Algorithms may be able to get away with assuming reasonable starting conditions, but programs must test for them.

Moreover, if the assumptions are not satisfied, some specific action must be taken so that the program does not crash!¹ Finally, problems related to incorrect

1 This assessment may be tempered by considerations related to the question of whether the pro-gram is a batch propro-gram or interactive. We will take up this issue later.

C6730_C007.fm Page 167 Monday, July 3, 2006 1:43 PM

168 A Programmer’s Companion to Algorithm Analysis input tend to be ignored in algorithms; programs, however, must ensure that the input is in the format required by the program.

7.1.1 Exception Handling

The need for exception handling can arise in various ways. A typical exception is division by zero. The occurrence of a division by zero may be the conse-quence of several events. It may be the result of incorrect input, the result of a sloppy algorithm², or the result of a rounding error. We will argue below that testing whether the input satisfies the requisite conditions is necessary for code, even though it may not be considered in an algorithm. We will deal with problems caused by the finite representation of numbers in programs in the next chapter. Here, we want to emphasize that exception handling is crucial for the correct functioning of programs. This implies in particular that code must be provided that specifies the action to be taken when an exception is thrown. Before one can do this, it is necessary that the programmer analyze where things can go wrong — where exceptions may occur.

This brings us to a serious weakness of general exception handling: Not every exception of a specific type should be treated in the same way. For example, consider division by zero. If one were to deal with this generally, the exception-handling mechanism could consist only of a generic notifica-tion of this event. It usually does not allow us to deal with it in a specific way. For example, we may want to determine the average salary of a group of employees. The algorithm may not make any provisions for the case where that group, defined in some way, contains no employees, resulting in a careless division by zero if no test was carried out for this special case.

Generic exception handling would merely notify us of a division by zero; it does not allow us to ignore the empty group and the effects this may have on the overall computation. To be able to do this requires a careful analysis of all implicit assumptions together with a careful design of code for each of the possible violations of them.³

2 While we do assume that the algorithm we start out with is correct, its designer may not have paid sufficient attention to all details to be directly translatable into code. For example, we may use BinarySearch to determine an index ind of an item in a sorted array. The value of ind may then be used to access information related to that item. This works only if the item is present in the array; otherwise ind has a value that signals that the item is not present, for example 0, and use of this value to access information related to this (nonexisting) item results in an error. In other words, the algorithm formulates a solution assuming the item is present and ignores the alternative. A program must specify explicitly what to do if the alternative is encountered.

3 Not all programming languages provide facilities for general exception handling. These com-ments suggest that this lack is not nearly as serious as one might assume. Many exceptions must be handled in specific ways, geared to the concrete instance where they arise, and this cannot be done through generic exception handling. By and large, generic exception handling allows con-tinued execution, that is, the program does not crash, but it does not allow actions to be taken that make sense in a specific situation. This means generic exception handling is essentially syn-tactic, not semantic, error handling. Semantic error handling requires an understanding of the meaning of the program and can therefore be carried out only by the programmer.

C6730_C007.fm Page 168 Monday, July 3, 2006 1:43 PM

Implicit Assumptions 169 7.1.2 Initializing Function Calls

The initialization of function calls, especially recursive ones, is another trouble spot that differentiates algorithms from programs. As with excep-tions, the problem arises from values that were computed or determined elsewhere but may now cause our function to misbehave. For general functions, the problem is often that the values of the actual parameters do not make sense for reasons that are too obvious to belabor in the formulation of the algorithm. Unfortunately, once we transition to a pro-gram, these nonsensical situations must be dealt with explicitly since otherwise the program may either produce incorrect results or crash. In the special case of recursive functions, we also have the problem of ensur-ing that every possible call will eventually end up in a basis case, thereby terminating the recursive calls.

Values that do not make sense (and are therefore not considered in the formulation of the algorithm) might be 0 for the number n of elements in a group, for example in the computation of the average salary. If this value is not positive, problems may arise (for example, division by 0). For an algo-rithm, this may be obvious and tacitly understood; for a program, the test n≥ 1 should be explicitly carried out, with additional code provided when the test is not satisfied. A similar situation occurs if we want to program matrix multiplication of two [1:n,1:n] matrices. It would not occur to anyone formulating the algorithm to test explicitly for n≥ 1, yet not doing so might result in code that crashes if this condition happens not to be satisfied. A last example is related to the range checks mentioned in the previous chapter.

If a function has a parameter that accesses an array element, it is highly desirable that this index be within the proper range. If the programming language does not test for this as a matter of course, it is probably advisable that the programmer carry out this test explicitly. Again, an algorithm may not bother with this, but a program should. It is important to realize that both the test and the additional code specifying what is to be done when the test fails are not optional, but are mandatory for good software.

It is not merely sufficient to test for these implicit assumptions. To ensure proper functioning of our programs we must also provide code that addresses the consequences of such a test not being satisfied. By and large, it is unsafe to assume that input to functions will always be as expected.

Surprises may occur because of a variety of issues, from incorrect user input to rounding errors and other exceptions.

For recursive functions, we must additionally ensure that every recursive call terminates. This requires that for any legal input combination (and these parameters must have been tested), eventually a basis case is reached, ter-minating the recursion. Again, there is a subtle gap between algorithm and program. This should be particularly of concern if the variable that governs the recursion is not an integer. For example, consider the following recursive skeleton:

C6730_C007.fm Page 169 Monday, July 3, 2006 1:43 PM

170 A Programmer’s Companion to Algorithm Analysis F(x)

if x=0.0 do { basis case }

else { statements; F(x-0.1); statements }

The idea is that we apply recursion, reducing the argument by 1/10, so if the actual parameter is an integer, the algorithm will terminate. As we pointed out above, this is playing with fire since at a minimum, we must test whether this assumption (that the actual parameter is an integer) is satisfied. However, even if this assumption holds, it is unlikely that a direct implementation of this algorithm will terminate. The problem lies in the representation of floating point numbers and the test for equality, which are taken up in the next chapter. Here, we merely want to state unequivocally that proper functioning of recursion requires that for all inputs, a basis case must be reached. This assumption is violated in this example program, as it would be in similar examples based on floating point numbers, even though the original algorithm does satisfy it.

Finally, users may simply provide incorrect input, possibly through care-lessness or because of transmission errors or an incompatibility of the input device with the receiving unit. Programs will differ depending on whether they are batch programs or interactive ones. For batch programs, an incorrect input usually means the program must terminate execution. There is no way to correct the erroneous input supplied. An interactive program would ordi-narily be expected to prompt the user for input; thus, if the input does not

Dans le document A ProgrAmmer’s ComPAnion to Algorithm AnAlysis (Page 173-0)