A world of exponential problems - Algorithmic Foundations of the Internet

In our discussion of computational complexity we mentioned that there are problems whose solution requires exponential time. This is obvious if an exponential lower bound can be found, as for example when asking for an out-put of exponential size (i.e., printing all the permutations of a set). There are other problems, however, naturally defined and apparently harmless, for which the only known algorithms require exponential time, but no polynomial lower bounds have been found. The gap between lower and upper bound is huge.

Nobody is able to solve these problems efficiently, i.e., in polynomial time, but neither can anybody prove that this is impossible. Everybody, however, is practically convinced that this is the case.

Noneless to say, a wealth of interesting and useful problems fall into this set! To explain what is behind this challenging situation we start with a story of pure fiction.

Story. A criminal group decides to use the Web to exchange secret information and forbidden material. They decide that the best way to avoid attracting attention is to join a legitimate social network, establish relationships inside it and exchange encrypted secret messages with each other. After a while, some rumors of the illegal activity of the group reach the police. Being unable to decode the contents of the messages, but wishing to identify the members of the group, the police decide to analyze the graph of the social network to identify communities with common interests. The task is assigned to the youngest officer.

The social network includes one thousand people, and the criminal group is estimated to consist of ten of them. Therefore the officer has to find whether a complete sub-graph of a size ten, or a 10-cliquein graph terminology, is con-tained in the network. Being an expert in computer programming, he decides to organize the search systematically, but has no better idea than to consider all possible groups of ten nodes in turn and verify whether all possible con-nections between pairs of nodes are present in each group. If so, the group is a 10-clique, and the participants merit intensive screening. For this purpose he assigns a number from 1 to 1000 to each node of the graph and writes a program to generate all groups of ten nodes starting with (1, 2, 3, ..., 9, 10), checking whether the group is a clique, and then increasing the value of each element of the group in unitary steps starting from the last ones.

Therefore the groups should be generated in the order: (1, 2, ..., 9, 10), (1, 2, ..., 9, 11), ..., (991, 992, ...,1000), but they are so many that, after some hours, the program has not yet output a final result when the lieutenant arrives. We let you imagine his fury at such an apparent inefficiency and the face of the worried programmer: neither of the two knows that no better algorithm than that used by the young officer essentially exists, namely listing one after the

FIGURE 4.12: Searching for 4-cliques in two graphs of twelve nodes. One graph has no such cliques; the other has two of them.

other all the possible groups of ten to decide whether a 10-clique is hiding in the graph.

To understand the meaning of the story we must consider how to build an algorithm from a more general point of view. The number of groups of ten elements out of one thousand equals the binomial coefficient ¹⁰⁰⁰₁₀

' 2.64×10²⁴. Recalling the definition of binomial coefficients, and applying Stirling’s approximation once again, it can be easily seen that (n/k)^k < ⁿ_k

< n^k, that is the function is exponential ink and becomes exponential in n whenkgrows linearly withn. In our story we have n= 1000 andk=n/100, so the above inequalities indicate that the value of ⁿ_k

lies between 10²⁰ and 10³⁰. Unfortunately no efficient strategy is known for discovering k-clique in a graph or reporting that there is no such sub-graph. This statement must be taken in a very general sense, because for particular graphs the answer may be easy or even obvious: e.g., since each node of ak-clique must be connected to k−1 other nodes in the clique, only the nodes of the original graph with degree at leastk−1 must be considered. In general, however, the only known way of approaching this problem is to examine all possible groups of k nodes, as the police officer did, or applying other equivalent methods based on complete enumeration. The reader may appreciate how difficult the search is by looking for 4-cliques in the two graphs of Figure 4.12.

Problems for which only exponential time algorithms are known are called intractable to indicate that in practice they cannot be solved by a computer as the input size increases. In fact for many intractable problems there is no definitive proof that a polynomial time algorithm does not exist, so they must be considered intractable until the (albeit unlikely) discovery of such an algorithm. Let us look at this situation more deeply.

After the algorithmic community began to be aware of the existence of intractable problems and of some relations between them, deep theoretical studies started in the 1970s with a famous result often called the Cook-Levine theorem, independently proved by the American-Canadian Stephen Cook and the Russian Leonid Levin. The theorem is far too specialized to be discussed here. A fundamental consequence, however, was to show the existence of a

large class of problems tightly linked to one another by an unexpected rela-tionship: if we were able to solve even only one of them in polynomial time, all the problems of the class would also be solvable in polynomial time. On the other hand if we were able to find an exponential lower bound for one of them, all the problems of the class would provably require exponential time.

These problems are calledNP-hard:¹⁴they have in common a logical structure responsible for their difficulty.

Highly diverse problems belong to the class NP-hard. From finding solu-tions to particular algebraic equasolu-tions; to replicating Web pages on different servers; to building shortest routing paths in a network; to, of course, graph problems like finding cliques, or finding Hamiltonian cycles as discussed in Chapter 2. However NP-hard problems and the ones like the existence of a k-clique have an interesting property in common: it is difficult to find a solution but is easy to verify that a certain object is indeed a solution.

For these problems finding a solution appears exponentially difficult; but, once a candidate solution is known it can be tested in polynomial time. Then verifying that a group ofk nodes in a graph is ak-clique amounts to control that all the nodes in the group are pairwise connected, an operation requiring O(k²) time becausek(k−1)/2 is the number of all possible connections in the group.

This ability to test a candidate solution in polynomial time is in fact com-mon to all NP-hard problems, and may be taken as a part of their definition.

Informally these are problems that we are not able to solve in polynomial time, but for which the soundness of a proposed solution can be checked in polynomial time.

It must be observed, however, that these problems are hard to solve in the “worst case,” that is, when their data occur in a most unfavorable com-bination. An exponential time algorithm would solve them in all cases, but a polynomial time algorithm could work in particular situations. If we take the computers of a local area network as being the nodes of a graph, this graph obviously has a Hamiltonian cycle if the network is a ring, and there is no cycle if the network is a tree. Both these configurations can be checked for in linear time, so if one of these two cases occurs the Hamiltonian cycle problem is solved immediately.¹⁵ But an algorithm valid for all possible graphs is ex-ponential as far as we know today. Algorithmic techniques to solve NP-hard

14It would be too long to motivate this name. We simply recall that NP stands for Nondeterministic Polynomialto indicate that these problems are polynomially solvable only in a non-deterministic paradigm of computation that exists only as a theoretical abstraction.

A related word isNP-completeto denote a class of problems where only the existence of a solution with given properties is in question (i.e., the existence of a k-clique) without actually having to report the solution itself. The two classes are subtly related, see the bibliographic notes.

15More interestingly, if the network is ann×mtwo-dimensional grid where each node is connected to its four neighbors (or to its three, or two, neighbors for nodes on the grid edges, or corners), a Hamiltonian cycle exists if and only ifnand/ormis even. The reader can spend a few moments to prove this statement personally, treating the grid as a black and white chessboard.

problems in polynomial time whenever their data allow for this are still to be invented.

In conclusion, we underline once again that many interesting problems are NP-hard, and the hope of finding efficient algorithms for solving them in all cases is actually very low. Hence the next question is: how can we solve these problems in practice, when the size of the input is too big to use an exponential algorithm? A wise answer is: when the optimal solution is too difficult to find, an almost optimal solution will be good enough. In some cases we can use an approximation algorithm for finding a solution not too far from optimal. In other cases we can use a randomized algorithm (see the next chapter), thus finding an exact solution with an extremely low probability of error.

Dans le document Algorithmic Foundations of the Internet (Page 79-82)