• Aucun résultat trouvé

NP-Complete Problems

P: the class of all decision problems that are solvable in polynomial time by a deterministic Turing machine

1.5 NP-Complete Problems

Now, let us go back to our choice of using pseudocodes to describe algorithms.

From the above discussion, we may assume (and, in fact, prove) that the logarithmic cost measure of a pseudocode procedure and the bit-operation complexity of an equivalent Turing machine program are within a polynomial factor. Therefore, in order to demonstrate that a problem is tractable, we can simply present the algorithm in a pseudocode procedure and perform a simple time analysis of the procedure. On the other hand, to show that a problem is intractable, we usually use Turing machines as the computational model.

1.5 NP-Complete Problems

In the study of computational complexity, an optimization problem is usually for-mulated into an equivalent decision problem, whose answer is eitherYES orNO. For instance, we can formulate the problem KNAPSACKinto the following decision problem:

KNAPSACKD: Given2n+ 2integers:S, K,s1, s2, . . . , sn,c1,c2,. . ., cn, determine whether there is a sequence(x1,x2,. . . , xn)∈ {0,1}n such thatn

i=1sixi≤Sandn

i=1cixi≥K.

It is not hard to see that KNAPSACKand KNAPSACKDare equivalent, in the sense that they are either both tractable or both intractable.

Proposition 1.4 The optimization problemKNAPSACKis polynomial-time solvable if and only if the decision problemKNAPSACKDis polynomial-time solvable.

Proof.Suppose the optimization problem KNAPSACKis polynomial-time solvable.

Then, we can solve the decision problem KNAPSACKDby finding the optimal so-lutionoptof the corresponding KNAPSACKinstance and then answeringYESif and only ifopt≥K.

Conversely, suppose KNAPSACKD is solvable in polynomial time by a Turing machineM. Assume thatM runs in timeO(Nk), whereNis the input size andk is a constant. Now, on inputI = (S, s1, . . . , sn, c1, . . . , cn)to the problem KNAP

-SACK, we can binary search for the maximum K such thatM answers YES on input(S, K, s1, . . . , sn, c1, . . . , cn). This maximum value K is exactly the opti-mal solutionopt for input I of the problem KNAPSACK. Note that K satisfies K ≤ M2 = n

i=1ci. Thus, the above binary search needs to simulateM for at mostlogM2+ 1=O(N)times, whereNis the size of inputI. So, we can solve

KNAPSACKin timeO(Nk+1).

From the discussion of the last section, in order to prove a problem intractable, we need to show that (the decision version of) the problem is not inP. Unfortunately, for a great number of optimization problems, there is strong evidence, both empirical and mathematical, suggesting that they are likely intractable, but no one is able to find a formal proof that they are not inP. Most of these problems, however, share a common property calledNP-completeness. That is, they can be solved by

nondeterministicalgorithms in polynomial time and, furthermore, if any of these problems is proved to be not inP, then all of these problems are not inP.

A nondeterministic algorithm is an algorithm that can make nondeterministic moves. In a nondeterministic move, the algorithm can assign a value of either0or 1to a variable nondeterministically, so that the computation of the algorithm after this step branches into two separatecomputation paths, each using a different value for the variable. Suppose a nondeterministic algorithm executes nondeterministic movesktimes. Then it may generate2k different deterministic computation paths, some of which may outputYESand some of which may outputNO. We say the non-deterministic algorithmacceptsthe input (i.e., answersYES) if at least one of the computation paths outputsYES; and the nondeterministic algorithmrejectsthe in-put if all comin-putation paths outin-putNO. (Thus, the actions of accepting and rejecting an input by a nondeterministic algorithmAare not symmetric: If we change each answerYESof a computation path to answerNO, and eachNOtoYES, the collective solution ofAdoes not necessarily change from accepting to rejecting.) On each in-putxaccepted by a nondeterministic algorithmA, the running time ofAonxis the length of the shortest computation path onxthat outputsYES. The time complexity of algorithmAis defined as the function

tA(n) = the maximum running time on anyxof lengthnthat is accepted by the algorithmA.

For instance, the following is a nondeterministic algorithm for KNAPSACK(more precisely, for the decision problem KNAPSACKD):

Algorithm 1.E(Nondeterministic Algorithm forKNAPSACKD)

Input: Positive integersS, s1, s2, . . . , sn, c1, c2, . . . , cn, and an integerK >0.

(1) Fori←1tondo

nondeterministically select a value0or1forxi. (2) Ifn

i=1xisi ≤Sandn

i=1xici≥KthenoutputYES

else outputNO.

It is clear that the above algorithm works correctly. Indeed, it contains2n dif-ferent computation paths, each corresponding to one choice of(x1, x2, . . . , xn)∈ {0,1}n. If one choice of(x1, x2, . . . , xn)satisfies the condition of step (2), then the algorithm accepts the input instance; otherwise, it rejects. In addition, we note that in this algorithm, all computation paths have the same running time,O(n). Thus, this is a linear-time nondeterministic algorithm.

The nondeterministic Turing machine is the formalism of nondeterministic al-gorithms. Corresponding to the deterministic complexity classPis the following nondeterministic complexity class:

NP: the class of all decision problems that are computable by a nondeterministic Turing machine in polynomial time.

We note that in a single path of a polynomial-time nondeterministic algorithm, there can be at most a polynomial number of nondeterministic moves. It is not hard to see

1.5 NP-Complete Problems 19 that we can always move the nondeterministic moves to the beginning of the al-gorithm without changing its behavior. Thus, all polynomial-time nondeterministic algorithmsMN have the following common form:

Assume that the inputxhasnbits.

(1) Nondeterministically select a stringy =y1y2· · ·yp(n)∈ {0,1}, wherep is a polynomial function.

(2) Run a polynomial-time deterministic algorithmMDon input(x, y).

Suppose MD answers YES on input(x, y); then we sayy is a witness of the instancex. Thus, a problemΠis inNPif there is a two-step algorithm forΠin which the first step nondeterministically selects a potential witnessyof polynomial size, and the second step deterministically verifies thaty is indeed a witness. We call such an algorithm aguess-and-verifyalgorithm.

As another example, let us show that the problem SATis inNP.

Algorithm 1.F(Nondeterministic Algorithm forSAT)

Input: A Boolean formulaφover Boolean variablesv1, v2, . . . , vn. (1) GuessnBoolean valuesb1, b2, . . . , bn.

(2) Verify (deterministically) that the formula φ is TRUEunder the assignment τ(vi) =bi, fori= 1, . . . , n. If so, outputYES; otherwise, outputNO.

The correctness of the above algorithm is obvious. To show that SAT is inNP, we only need to check that the verification of whether a Boolean formula containing no variables isTRUEcan be done in deterministic polynomial time.

We have seen that problems in NP, such as KNAPSACK and SAT, have sim-ple polynomial-time nondeterministic algorithms. However, we do not know of any physical devices to implement the nondeterministic moves in the algorithms. So, what is the exact relationship betweenPandNP? This is one of the most important open questions in computational complexity theory. On the one hand, we do not know how to find efficient deterministic algorithms to simulate a nondeterministic algorithm. A straightforward simulation by the deterministic algorithm that runs the verification step over all possible guesses would take an exponential amount of time.

On the other hand, though many people believe that there is no polynomial-time de-terministic algorithm for every problem inNP, no one has yet found a formal proof for that.

Without a proof forP=NP, how do we demonstrate that a problem inNPis likely to be intractable? The notion ofNP-completenesscomes to help.

For convenience, we write in the followingx ∈ Ato denote that the answer to the inputxfor the decision problemAisYES(that is, we identify the decision problem with the set of all input instances which have the answerYES). We say a decision problemAispolynomial-time reducibleto a decision problemB, denoted byA≤PmB, if there is a polynomial-time computable functionffrom instances of Ato instances ofB(called thereduction functionfromAtoB) such thatx∈Aif and only iff(x)∈ B. Intuitively, a reduction functionf reduces the membership

problem of whetherx∈Ato the membership problem of whetherf(x)∈B. Thus, if there is a polynomial-time algorithm to solve problemB, we can combine the functionf with this algorithm to solve problemA.

Proposition 1.5 (a) IfA≤PmBandB ∈P, thenA∈P.

(b) IfA≤PmBandB≤PmC, thenA≤PmC.

The above two properties justify the use of the notation ≤Pm between deci-sion problems: It is a partial ordering for thehardness of the problems (modulo polynomial-time computability).

We can now define the termNP-completeness: We say a decision problemAis NP-hardif, for anyB ∈NP,B ≤Pm A. We sayAisNP-completeifAisNP-hard and, in addition,A∈NP. That is, anNP-complete problemAis one of the hardest problems inNP with respect to the reduction≤Pm. For an optimization problem A, we also sayAisNP-hard (or,NP-complete) if its (polynomial-time equivalent) decision versionADisNP-hard (or, respectively,NP-complete).

It follows immediately from Proposition 1.5 that if an NP-complete problem is inP, thenP=NP. Thus, in view of our inability to solve thePvs.NPquestion, the next best way to prove a problem intractable is to show that it isNP-complete (and so it is most likely not inP, unlessP=NP).

Among all problems, SAT was the first problem proved NP-complete. It was proved by Cook [1971], who showed that for any polynomial-time nondeterminis-tic Turing machine, its computation on any inputxcan be encoded by a Boolean formulaφxof polynomially bounded length such that the formulaφxis satisfiable if and only ifM acceptsx. This proof is called ageneric reduction, since it works directly with the computation of a nondeterministic Turing machine. In general, it does not require a generic reduction to prove a new problemAto beNP-complete.

Instead, by Proposition 1.5(b), we can use any problemBthat is already known to beNP-complete and only need to prove thatB ≤PmA. For instance, we can prove that KNAPSACKDisNP-complete by reducing the problem SATto it.

Theorem 1.6 KNAPSACKDisNP-complete.

Proof. We have already seen that KNAPSACKD is in NP. We now prove that KNAPSACKD is complete forNP. In order to do this, we introduce a subproblem 3-SAT of SAT. In a Boolean formula, a variable or the negation of a variable is called aliteral. An elementary sum of literals is called aclause. A Boolean formula is in 3-CNF (conjunctive normal form) if it is a product of a finite number of clauses, each being the sum of exactly three literals. For instance, the following is a 3-CNF formula:

(v1+ ¯v2+ ¯v3)(¯v1+v3+v4)(v2+ ¯v3+ ¯v4).

The problem 3-SATasks whether a given 3-CNF Boolean formula is satisfiable.

This problem is a restrictive form of the problem SAT, but it is also known to be NP-complete. Indeed, there is a simple way of transforming a Boolean formulaφ into a new 3-CNF formulaψsuch thatφis satisfiable if and only ifψis satisfiable.

1.5 NP-Complete Problems 21 We omit the proof and refer the reader to textbooks on complexity theory. In the following, we present a proof for 3-SATPmKNAPSACKD.

Letφbe a 3-CNF formula that is of the formC1C2· · ·Cm, where eachCjis a clause with three literals. Assume thatφcontains Boolean variablesv1, v2, . . . , vn. We are going to define a list of2n+ 2mintegersc1, c2, . . . , c2n+2m, plus an integer K. All integers ci and the integer K are of value between 0 and10n+m. These integers will satisfy the following property:

φis satisfiable ⇐⇒ to the problem KNAPSACKDhas the answerYES. Therefore, this construction is a reduction function for 3-SATPmKNAPSACKD.

We now describe the construction of these integers and prove that they satisfy property (1.4). First, we note that each integer is between0and10n+m, and so it has a unique decimal representation ofexactlyn+mdigits (with possible leading zeroes). We will define each integer digit by digit, with thekth digit indicating the kth most significant digit. First, we define the firstndigits ofKto be1and the last mdigits to be3. That is, (n+j)th digit is1and all other digits are0. This completes the definition of the integers.

Now, we need to show that these integers satisfy property (1.4). First, we observe that for anyk,1≤k≤n+m, there are at most five integers amongct’s whosekth digit is nonzero, and each nonzero digit must be1. Thus, to get the sumK, we must choose, for eachi= 1,2, . . ., n, exactly one integer amongct’s whoseith digit is

(1) For eachi= 1,2, . . . , n, letxn+i= 1−xi=τ(vi).

(2) For eachj = 1,2, . . . , m, definex2n+2j1andx2n+2jas follows: Ifτ sat-isfies all three literals of Cj, thenx2n+2j1 = x2n+2j = 0; ifτ satisfies exactly two literals ofCj, then x2n+2j1 = 1andx2n+2j = 0; and if τ satisfies exactly one literal ofCj, thenx2n+2j1=x2n+2j= 1.

Then it is easy to verify that2n+2m

i=1 cixi=K.

Next, assume that there exists a sequence(x1,x2,. . .,x2n+2m)∈ {0,1}2n+2m such that2n+2m

i=1 cixi =K. Then, from our earlier observation, we see that ex-actly one ofxiandxn+ihas value1. Defineτ(vi) =xn+i. We claim thatτsatisfies each clauseCj,1 ≤ j ≤ m. Since the(n+j)th digit of the sum2n+2m

i=1 cixi is equal to3, and since there are at most two integers among the last2mintegers whose(n+j)th digit is1, there must be an integerk≤2nsuch thatxk= 1and the (n+j)th digit ofckis1. Suppose1≤k≤n; then it means thatτ(vk) = 0, andCj contains the literal¯vk. Thus,τ satisfiesCj. On the other hand, ifn+ 1≤k≤2n, then we know thatτ(vkn) = 1, andCj contains the literalvkn; and soτ also satisfiesCj. This completes the proof of property (1.4).

Finally, we remark that the above construction of these integers from the formula φis apparently time computable. Thus, this reduction is a

polynomial-time reduction.

In addition to the above two problems, thousands of problems from many seem-ingly unrelated areas have been proven to beNP-complete in the past four decades.

These results demonstrate the importance and universality of the concept of NP-completeness. In the following, we list a few problems that are frequently used to prove a new problem beingNP-complete.

VERTEX COVER(VC): Given an undirected graphG= (V, E)and a positive integerK, determine whether there is a setC⊆V of size≤K such that, for every edge{u, v} ∈E,C∩ {u, v} =∅. (Such a setCis called avertex coverofG.)

HAMILTONIAN CIRCUIT (HC): Given an undirected graphG = (V, E), determine whether there is a simple cycle that passes through each vertex exactly once. (Such a cycle is called aHamiltonian circuit.) PARTITION: Givennpositive integersa1, a2, . . . , an, determine whe-ther whe-there is a partition of these integers into two parts that have the equal sum. (This is a subproblem of KNAPSACK.)

SET COVER (SC): Given a familyCof subsets of I = {1,2, . . . , n} and a positive integerK, determine whether there is a subfamilyCof Cof at mostKsubsets such that

A∈CA=I.

For instance, from the problem HC, we can easily prove that (the decision ver-sions of) the following optimization problems are alsoNP-complete. We leave their proofs as exercises.