ALGORITHM OPTIMALITY AND EFFICIENCY - Parallel Algorithm Complexity

Parallel Algorithm Complexity

3.2. ALGORITHM OPTIMALITY AND EFFICIENCY

One way in which we use the big-oh and big-omega notations, introduced in Section 3.1, is as follows. Suppose that we have constructed a valid algorithm to solve a given problem of size n in g (n) time, where g(n) is a known function such as n log₂n or n², obtained through exact or asymptotic analysis. A question of interest is whether or not the algorithm at hand is the best algorithm for solving the problem. Of course, algorithm quality can be judged in many different ways, with running time, resource requirements, simplicity (which affects the cost of development, debugging, and maintenance), and portability being some of the factors in this evaluation. Let us focus on running time for now. The question then becomes

What is the running timeƒ(n) of the fastest algorithm for solving this problem?

If we are interested in asymptotic comparison, then because an algorithm with running time g(n) is already known, ƒ(n) = O(g(n)); i.e., for large n, the running time of the best algorithm is upper bounded by cg( n) for some constant c. If, subsequently, someone develops an asymptotically faster algorithm for solving the same problem, say in time h(n), we conclude that ƒ(n) = O(h(n)). The process of constructing and improving algorithms thus contributes to the establishment of tighter upper bounds for the com-plexity of the best algorithm (Fig. 3.2).

Concurrently with the establishment of upper bounds as discussed above, we might work on determining lower bounds on a problem's time complexity. A lower bound is useful as it tells us how much room for improvement there might be in existing algorithms. Lower bounds can be established by a variety of methods. Examples include

PARALLEL ALGORITHM COMPLEXITY 51 1. Showing that, in the worst case, solution of the problem requires data to travel

a certain distance or that a certain volume of data must pass through a limited-bandwidth interface. An example of the first method is the observation that any sorting algorithm on a p-processor square mesh needs at least commu-nication steps in the worst case (diameter-based lower bound). The second method is exemplified by the worst-case linear time required by any sorting algorithm on a binary tree architecture (bisection-based lower bound).

2. Showing that, in the worst case, solution of the problem requires that a certain number of elementary operations be performed. This is the method used for establishing the Ω(n log n ) lower bound for comparison-based sequential sorting algorithms. Consider n distinct (unequal) keys. These n keys can be arranged in n!

different ways. The goal of sorting is to identify the one permutation (among n! ) that corresponds to the sorted order. Each comparison has only two possible outcomes, and thus narrows down our choice by at most a factor of 2. Thus, log2(n!)

= Θ(n log n) comparisons are needed in the worst case.

3. Showing that any instance of a previously analyzed problem can be converted to an instance of the problem under study, so that an algorithm for solving our problem can also be used, with simple pre- and postprocessing steps, to solve the previous problem. Any lower bound for the previous problem then becomes a lower bound for our new problem. For example, we saw in Section 2.4 that the carry computation problem can be converted to a parallel prefix computation. Thus, any lower bound established for carry computation is also a lower bound for general parallel prefix computation. Also, trivially, any upper bound for the prefix computation problem is an upper bound for the carry problem.

As shown in Fig. 3.2, a known lower bound can be viewed as a barrier against algorithmic speed improvements. When a wide gap exists between the best known lower and upper bounds, further efforts in raising the lower bound, or lowering the upper bound, might be warranted. The lower bound can be raised by applying the methods in the above list in novel ways. The upper bound can be lowered by designing new algorithms and showing them to be faster than the best previously known algorithms.

If and when the known upper bound and lower bound for a given problem converge, we say that we have an optimal algorithm. At this point, no asymptotic improvement is possible and the focus changes to improving the constant factors involved (e.g., reducing the algorithm’s running time from to

Figure 3.2. Upper and lower bounds may tighten over time.

52 INTRODUTION TO PARALLEL PROCESSING

Now, let us broaden our attention and consider the cost of the machine on which the algorithm runs in addition to the running time. Unfortunately, a simple, accurate, and time-invariant cost model for parallel machines does not exist. So, in the interest of tractability, we often take the number p of processors used for a given algorithm as a very rough indicator of cost. If we are allowed to vary the number of processors used (either by choosing/designing our own parallel machine or else by limiting the algorithm's execution to a subset of processors (a partition) of a larger parallel machine, then the running time will be a function of both the problem size n and the number p of processors used.

Now, because of the additional cost factor introduced, different notions of optimality can be entertained. Let T(n, p) be our algorithm’s running time when solving a problem of size n on a machine with p processors. The algorithm is said to be

•

Time-optimal if T(n, p) = g(n, p), where g(n, p) is an established time lower bound.

•

Cost-time optimal (cost-optimal for short) iff p T(n, p) = T (n, 1).

Redundancy = Utilization = 1

•

Cost-time efficient (efficient for short) iff p T(n, p) = Θ(T(n, 1)).

Redundancy = Utilization = Θ(1)

One final observation is in order. Just as we took a simplified view of cost by equating it with the number p of processors, we can simplify our view of time by counting computation and/or communication steps instead of measuring real time. So, rather than saying that a parallel matrix multiplication algorithm terminates in so many seconds, we may say that it executes so many floating-point operations and transmits so many messages between the processors. With this simplified view, one must be careful in comparing algorithm complexi-ties across different machines. A speed-up of 5 in terms of step counts may correspond to a speed-up of 2 or 3, say, when real time is considered (Fig. 3.3).

Figure 3.3. Five times fewer steps does not necessarily mean five times faster.

PARALLEL ALGORITHM COMPLEXITY 53 3.3. COMPLEXITY CLASSES

Complexity theory is a branch of computer science that deals with the ease or difficulty of solving various computational problems of interest. In complexity theory, problems are divided into several complexity classes according to their running times on a single-processor system (or a deterministic Turing machine, to be more exact). Problems whose running times are upper bounded by polynomials in n are said to belong to the P class and are generally considered to be tractable. Even if the polynomial is of a high degree, such that a large problem requires years of computation on the fastest available supercomputer, there is still hope that with improvements in the algorithm or in computer performance, a reasonable running time may be obtained.

On the other hand, problems for which the best known deterministic algorithm runs in exponential time are intractable. For example, if solving a problem of size n requires the execution of 2ⁿ machine instructions, the running time for n = 100 on a GIPS (giga IPS) processor will be around 400 billion centuries! A problem of this kind for which, when given a solution, the correctness of the solution can be verified in polynomial time, is said to belong to the NP (nondeterministic polynomial) class.

An example of an NP problem is the subset-sum problem: Given a set of n integers and a target sum s, determine if a subset of the integers in the given set add up to s. This problem looks deceptively simple, yet no one knows how to solve it other than by trying practically all of the 2ⁿ subsets of the given set. Even if each of these trials takes only 1 ps, the problem is virtually unsolvable for n = 100. This does not mean that we cannot solve specific instances of the subset-sum problem, or even most instances of practical interest, efficiently. What it implies is that an efficient general algorithm for solving this problem is not known. Neither has anyone been able to prove that an efficient (polynomial-time) algorithm for the subset-sum problem does not exist.

In fact, the P =? NP question is an open problem of complexity theory. A positive answer to this question would mean that the subset-sum and a host of other “hard” problems can be solved efficiently even if we do not yet know how to do it. A negative answer, on the other hand, would imply that there exist problems for which efficient algorithms can never be found. Despite a lack of proof in either direction, researchers in complexity theory believe that in fact P ≠ NP. Furthermore, they have defined the class of NP-complete problems, meaning that any problem in NP can be transformed, by a computationally efficient process, to any one of these problems. The subset-sum problem is known to be NP-complete. Thus, if one ever finds an efficient solution to the subset-sum problem, this is tantamount to proving P = NP. On the other hand, if one can prove that the subset-sum problem is not in P, then neither is any other NP-complete problem (leading to the conclusion P ≠ NP).

Figure 3.4 depicts the relationships of these classes as we currently understand them.

The details and subclasses shown inside the class P will be explained in Section 3.4. The class NP-hard is explained below.

Given the large class of problems of practical interest that are in NP and the vast amount of time and other resources spent over the years in trying to find efficient solutions to these problems, proving that a computational problem is NP-complete virtually removes any hope of ever finding an efficient algorithm for that problem. Thus, in a sense, NP-complete problems are the “hardest” problems in the NP class. Besides the subset-sum problem

54 INTRODUCTION TO PARALLEL PROCESSING

Figure 3.4. A conceptual view of complexity classes and their relationships.

mentioned above, the following problems of practical interest (and many others) are known to be NP-complete:

1. Determining if there exists an assignment of truth values to the variables in a Boolean expression, written as the AND of several OR clauses, such that the resulting value of the expression is "true" (the satisfiability problem). This problem is in NP even if each OR clause is restricted to have exactly 3 literals (true or complemented variables).

2. Determining if there exists an assignment of 0s and 1s to the inputs of a logic circuit that makes the output 1 (the circuit satisfiability problem).

3 . Deciding if a graph contains a cycle or loop with all of the nodes in it (the Hamiltonian cycle problem).

4. Finding a lowest-cost or shortest-distance tour of a number of cities, given the travel cost or distance between all pairs of cities (the traveling salesman problem).

One final bit of terminology: As difficult as NP problems may seem, there exist problems that are not even in NP, meaning that even verifying that a claimed solution to such a problem is correct is currently intractable. An NP-hard problem is one that we do not know to be in NP but do know that any NP problem can be reduced to it by a polynomial-time algorithm.

The name of this class implies that such problems are at least as hard as any NP problem.

Typically, the proof that a problem of interest is NP-complete consists of two parts: (1) proving that it is in NP by showing that a given solution for it can be verified in polynomial time and (2) proving that it is NP-hard by showing that some NP-complete (and thus any

PARALLEL ALGORITHM COMPLEXITY 55 NP) problem can be reduced to it. For the latter part of the proof, we have a wide array of NP-complete problems to choose from. But how was this process bootstrapped; i.e., where did the first NP-complete problem come from? The first seed was the satisfiability problem that was established to be NP-complete by Cook in 1971 [Cook71] using a rather tedious proof.

Dans le document Introduction to Parallel Processing (Page 75-80)