The Influence of Language Properties - Implications of Compiler and Systems Issues

Part 2 The Software Side: Disappointments and

6 Implications of Compiler and Systems Issues

6.6 The Influence of Language Properties

Several aspects of programming languages can negatively affect the behavior of a program. We will discuss three here: initialization of variables, packed data structures, and the problems caused by programming languages forcing a programmer to provide far more detail than is necessary to carry out a given operation. In a class by its own is the unfortunate tendency of pro-gramming languages (and their compilers) to forgo the test of whether the indices to structures are within their proper ranges.

6.6.1 Initialization

Algorithms tend to be somewhat cavalier in the way they deal with the initialization of variables. In many cases the assumption is that the reader of the algorithm is sufficiently intelligent to understand what is meant. This approach does not work with programs. Programs (or compilers) are emphatically not prepared to read the mind of the programmer. As a result, one has to pay much more attention to the question of initialization.

There are three ways of dealing with this question. For the most part, they are part of the language specification. Absent a statement describing this issue in the language definition, the question is left to the compiler to resolve.²⁰ These three approaches are undefined variables, predefined vari-ables, and unspecified (or random) variables.

Undefined variables: This refers to requiring that every variable must be explicitly assigned a value before the variable can be used (in an expression, or as a parameter passed by value). Failure to do so will result in an error.²¹ These variables are called undefined.²² Many programmers are less than happy with this, mainly because they have become accustomed to a different approach. However, this is

20 This is generally quite undesirable, since it means different compilers for the same program-ming language may make differing assumptions. This has serious implications for the portabil-ity of programs written in that language, since the results of the same program may differ from one compiler to another.

21 This error is usually a run-time error, since it is undecidable in general to determine whether there is a way to reach an undefined variable

22 This refers to the lack of a value for that variable. In contrast, variables that are not declared at the beginning of the program are called undeclared. Again, different programming languages adopt different approaches. Many programming languages permit implicit declaration of sim-ple variables but require explicit declaration of comsim-plex structures, such as arrays. While impor-tant for software design, this issue is less imporimpor-tant for us.

C6730_C006.fm Page 155 Friday, August 11, 2006 9:21 AM

156 A Programmer’s Companion to Algorithm Analysis by far the most secure way of dealing with the initialization of variables. On the one hand, we would argue that a programmer should not expect a variable to have a value unless an (explicit) instruction was executed that assigned that value to the variable.²³ This clearly indicates that programmers should write code to initial-ize variables if they expect these variables to have values. On the other hand, preassigning values to variables may be inefficient, as we will see.²⁴ The only disadvantage of this method is that the representation of a variable must permit the determination of whether a value has ever been explicitly assigned to it. This may require some additional space (either in the symbol table or in the memory used for the structure).

Predefined variables: This refers to the compiler assigning a specific, predefined value (typically 0) to all variables. While this may be convenient and is usually safe, it may also introduce inefficiencies.

For example, if an array is initialized, the time complexity of this operation is proportional to the number of its elements. If this ini-tialization turns out to be unnecessary (because the program explic-itly assigns values to the array), the time required by the initialization is wasted. In general, it is impossible to determine whether there is a path through the program execution to the use of the array that avoids every explicit assignment of values to the array; this is the only case when this initialization would be of use.

Therefore, it is unclear whether this approach will ever be useful.

More importantly, it deprives the programmer of a useful tool for detecting logical errors (see the discussion in the previous para-graph, including the footnotes).

Unspecified variables: No values are preassigned to variables, but the variables are not considered undefined either. Instead, the content of the memory location corresponding to a given variable is inter-preted according to the type of that variable and used whenever the variable appears in a place that requires a value (for example, in an expression or in call by value). This is supremely unsafe since it is entirely unpredictable what this value is during a specific execution.

23 This may be a very useful way of detecting logical errors. If a variable has not been explicitly assigned a value, this is likely the result of an oversight in the program implementation. Such an error is usually a semantics error; nevertheless, this type of error can be detected by the compiler or the run-time support system. (As a general rule, semantics errors cannot be detected by com-pilers.) It is generally an excellent idea to employ methods that allow the detection of semantics errors by syntactical means, since the detection of true semantics errors (which are based on what the programmer did program instead of what she wanted to program) amounts to mind reading.

24 Or at least not any more efficient than having undefined variables (which are effectively pre-defined as unpre-defined). In practice, the assignment of unpre-defined can be carried out uniformly for all memory locations involved in the program when space is allocated to a program (at the very beginning of execution). Thus, this assignment may be more efficient than the setting to 0 of the space of an individual array, discussed in the next paragraph.

C6730_C006.fm Page 156 Friday, August 11, 2006 9:21 AM

Implications of Compiler and Systems Issues for Software 157 In this scenario that value can (and usually will) be different from one execution of the program to the next. It is clearly the most efficient way, since effectively no initialization occurs. Also, it will be of no concern if the programmer explicitly defines the values of all variables. However, if the programmer omits, perhaps because of a programming error, to assign a value to a variable, this error will not be detected easily. This is highly undesirable since it can be a source of extremely subtle errors that could be easily avoided.²⁵ The choice between these three approaches is part of the programming language definition. It is therefore outside of the programmer’s influence.

Nevertheless, programmers should at least be aware of the advantages and disadvantages of the three methods. They may have a choice of program-ming language, and the treatment of variable initialization could be one factor in deciding which language to use.

6.6.2 Packed Data Structures

Some programming languages recognize that certain data types do not require an entire word or byte for their representation. A typical example is the boolean or logical type, whose representation requires only one bit. As a result, these programming languages may offer the capability of packing data structures. In this approach an array of 1024 boolean variables would use only 128 bytes, instead of 1024 bytes. The difference becomes even more pronounced if the basic unit of memory access is the word (four bytes), in which case packing the array would require 32 words, instead of 1024. This savings in memory comes at a price — access to an individual element of the array becomes more complicated. In our packed array, access to any element requires that we first determine in which byte or word the value of the array element resides. Then this byte or word must be decoded (since it encodes numerous array elements). Only then can we access the desired element. For assignments, a similar process must be carried out.

It follows that in using packed structures, one trades time for space.

(Note that this generally makes sense only for arrays, since other structures tend to be relatively small; only arrays have the property that a small instruction can specify a huge data structure.) In our example of the bool-ean array, it should be clear that the encoding and decoding process takes time that would not have to be expended if the array were not packed. At the same time, a packed array tends to use less space. One may conclude from this that packed structures should not be used if there is enough space available in the main memory, because then one would have to determine how much time the I/O management for the packed structure requires, how

25 The only thing positive that can be said about this type of initialization is that it wastes no time, since no operation must be carried out. This is a terrible justification of an indisputably unsafe practice.

C6730_C006.fm Page 157 Friday, August 11, 2006 9:21 AM

158 A Programmer’s Companion to Algorithm Analysis much time the I/O management for the unpacked structure requires, and whether the difference justifies the additional time required for encoding and decoding. If using packed structures allows one to avoid the use of VMM (or out-of-core programming), it is always better to pack. If even the packed structure is too large to be accommodated in main memory, a more careful analysis of the program behavior is necessary to resolve this issue.

Unfortunately, some compilers ignore packing instructions; in other words, the programming language provides instructions that specify packed structures, but the compiler acts as if they were null statements. It is useful to know what one’s compiler does if one wishes to avoid surprises.

If the compiler ignores packing instructions and we are in the situation where packing would avoid VMM, it may be advantageous from a run-time point of view to carry out the packing instruction explicitly. However, this is almost as bad as doing one’s own memory mapping. It results in awful code that is difficult to debug and terrible to maintain. This is true even if one encapsulates these mechanisms in (properly documented) functions.

6.6.3 Overspecification of Execution Order

The vast majority of programming languages require a programmer to for-mulate aspects of an operation that are neither necessary nor useful. A simple example is the addition of two matrices. In an algorithm one might simply state that the two matrices are added. In most programming languages, this obvious operation ends up as two nested loops. This is necessitated by the absence of appropriate language constructs. This excessive specificity can be very harmful; one could reasonably argue that specifying matrix addition generically, that is, without giving any implementation details, places the burden of choosing an acceptable way of computing this operation on the compiler. As we have seen, coding two nested loops can go horribly wrong if the arrays are too large to fit into main memory and the loops clash with the memory-mapping function. The truth is that any order of traversing the elements of the three matrices involved will do, as long as each location [i,j]

is visited exactly once.

Programming languages frequently force the programmer to specify details that may impede efficient execution. This is in marked contrast to algorithms, where details usually are glossed over, in many cases deliber-ately, frequently assisting in a more efficient approach to execution. For example, if matrix addition were simply stated as such, the compiler could select the most efficient order of visiting the individual matrix elements.

Instead, the stipulation of a specific execution order (which is now to be followed) may result in excruciatingly slow programs. This is a direct con-sequence of the programming language’s failure to allow the specification of a generic “for all array elements” operation. While this example is simple

C6730_C006.fm Page 158 Friday, August 11, 2006 9:21 AM

Implications of Compiler and Systems Issues for Software 159 enough to be immediately obvious, other instances may be subtler. Never-theless, it is important to be aware that language constructs (or the lack thereof) can be sources of inefficiencies for a program written in that lan-guage. This can be particularly important when combined with the actions of a good optimizing compiler.

6.6.4 Avoiding Range Checks

Many programming languages, in an extremely ill-advised pursuit of efficiency, forgo explicit tests that determine whether the index into a structure, usually an array, is within the appropriate range for that index.

Because of problems discussed in the next chapter, sometimes even correct algorithms give rise to incorrect indices. Such problems are notoriously difficult to identify, since absent any meaningful range checks, no error is indicated when this occurs; instead, wrong values are used in calcula-tions or, worse, values are assigned to memory locacalcula-tions that correspond to entirely different variables or structures. This is an egregious instance where a serious semantic error could be detected during run time, but many programming languages value a superficial efficiency (avoiding the test of whether the index is within its proper range) higher than the programmer’s time and effort, which must be expended on debugging once the program misbehaves. The avoidance of range checks might be blamed on a desire for program optimization, but the difference is that optimization can be turned off if the programmer wishes. In contrast, a compiler that does not implement range checks (perhaps because the language specification does not stipulate it) will fail to do so whether it applies optimization or not.

Dans le document A ProgrAmmer’s ComPAnion to Algorithm AnAlysis (Page 166-170)