• Aucun résultat trouvé

Exponentially hard problems are sometimes polynomial, a large deviation analysis of search algorithms for the random satisfiability problem, and its application to stop-and-restart resolutions

N/A
N/A
Protected

Academic year: 2022

Partager "Exponentially hard problems are sometimes polynomial, a large deviation analysis of search algorithms for the random satisfiability problem, and its application to stop-and-restart resolutions"

Copied!
4
0
0

Texte intégral

(1)

Exponentially hard problems are sometimes polynomial, a large deviation analysis of search algorithms for the random satisfiability problem, and its application to stop-and-restart resolutions

Simona Cocco1and Re´mi Monasson2

1CNRS–Laboratoire de Dynamique des Fluides Complexes, 3 rue de l’Universite´, 67000 Strasbourg, France

2CNRS–Laboratoire de Physique The´orique de l’ENS, 24 rue Lhomond, 75005 Paris, France 共Received 20 February 2002; published 19 September 2002兲

A large deviation analysis of the solving complexity of random 3-satisfiability instances slightly below threshold is presented. While finding a solution for such instances demands an exponential effort with high probability, we show that an exponentially small fraction of resolutions require a computation scaling linearly in the size of the instance only. This exponentially small probability of easy resolutions is analytically calcu- lated, and the corresponding exponent is shown to be smaller共in absolute value兲than the growth exponent of the typical resolution time. Our study therefore gives some theoretical basis to heuristic stop-and-restart solving procedures, and suggests a natural cutoff共the size of the instance兲for the restart.

DOI: 10.1103/PhysRevE.66.037101 PACS number共s兲: 89.20.Ff, 05.20.⫺y, 89.70.⫹c, 89.75.Hc

Computational problems are usually divided into two classes. They are easy if there exists a solving procedure whose running time grows at most polynomially with the size of the problem, or hard if no such algorithm is believed to exist, and the best avalaible procedures may require an exponentially growing time关1兴. The polynomial vs exponen- tial classification was enriched in the past decades through the derivation of quantitative bounds on resolution complex- ity, and the study of average performances of various reso- lution algorithms for computational tasks with model input distributions.

In this paper, we show that, though the polynomial/

exponential dichotomy certainly applies to the typical reso- lution complexity of computational problem, it may not be so for large deviations from typical behavior. Typically ex- ponentially hard problems may sometimes be solved in poly- nomial time, a phenomenon that we take advantage of to accelerate resolution drastically.

We concentrate here on random 3-Satisfiability共3-SAT兲, a paradigm of hard combinatorial problems recently studied using statistical physics tools and concepts 关2,3兴 as e.g., number partitioning关4兴, vertex cover关5兴, etc. to which com- puter scientists have devoted a great attention over the past years关6 – 8兴. An instance of random 3-SAT is defined by a set of M constraints 共clauses兲 on N boolean共i.e., true or false兲 variables. Each clause is the logical OR of three randomly chosen variables, or of their negations. The question is to decide whether there exists a logical assignment of the vari- ables satisfying all the clauses 共called solution兲. The best currently known algorithm to solve 3-SAT is the Davis- Putnam-Loveland-Logemann共DPLL兲procedure关6兴 共Fig. 1兲. The sequence of assignments of variables made by DPLL in the course of instance solving can be represented as a search tree共Fig. 2兲, whose size Q共number of nodes兲is a convenient measure of the complexity. For very large sizes ( M ,N→⬁at fixed ratio␣⫽M /N), some static and dynamical phase tran- sitions arise 关2,7–10兴. Instances with a ratio of clauses per variable␣⬎␣C⯝4.3 are almost surely unsatisfiable and ob- taining proofs of refutation require an exponential effort 关3,11兴. Below the static threshold ␣C, instances are almost

surely satisfiable, but finding a solution may be easy or hard, depending on the value of ␣. A dynamical transition关3,12兴 takes place at ␣L⯝3.003 共for the heuristic used by DPLL shown in Fig. 1兲 separating a polynomial regime (␣

⬍␣L:QN, search tree A on Fig. 2兲 关13,14兴from an expo- nential regime (␣⬎␣L:Q⬃2N, search tree B). This pat- tern of complexity, and the value of ␻(␣) were obtained through an analysis of DPLL dynamics, reminiscent of real- space renormalization in statistical physics关3兴. DPLL gener- ates some dynamical flow of the instance, whose trajectory lies in the phase diagram of the 2⫹p-SAT model 关2兴, an extension of 3-SAT, where p⭐1 is the fraction of 3-clauses 共Fig. 2兲.

FIG. 1. DPLL algorithm. When a variable has been chosen at step共1兲, e.g., xT, at step共2兲some clauses are satisfied, e.g., C

(x OR y OR zand eliminated, others are reduced, e.g., C

(not xORyORz)→C(y ORz). If some clauses include one variable only, e.g., Cy , the corresponding variable is automati- cally fixed to satisfy the clause (yT). This unit-clause 共UC兲 propagation 共2c兲 is repeated up to the exhaustion of all UC. Con- tradictions result from the presence of two opposite UC, e.g., C

(y ),C⬘⫽(not y ). A solution is found when no clauses are left.

The heuristic studied here is the generalized UC 共GUC兲 rule: a variable is chosen at step共1兲 from one of the 2-clauses, or from a 3-clause if no 2-clause is present, and fixed to satisfy the clause.

The search process of DPLL is represented by a tree共Fig. 2兲whose nodes correspond to 共1兲, and edges to共2兲. Branch extremities are marked with contradictions C共2b兲, or by a solution S共2a兲. PHYSICAL REVIEW E 66, 037101 共2002兲

1063-651X/2002/66共3兲/037101共4兲/$20.00 66 037101-1 ©2002 The American Physical Society

(2)

We focus hereafter on the large deviations of complexity in the upper SAT phase ␣L⬍␣⬍␣C. Using numerical ex- periments and analytical calculations, we show that, though complexity Q almost surely grows as 2N, there is a finite, but exponentially small, probability 2Nthat Q is bounded from above by N only. In other words, finding solutions to these SAT instances is almost always exponentially hard, and very rarely easy共polynomial time兲. Taking advantage of the fact that ␨ is smaller than␻, we show how systematic re- starts of the heuristic may decrease substantially the overall search cost. Our study therefore gives some theoretical basis to stop-and-restart 共SR兲 solving procedures empirically known to be efficient关15兴, and suggests a natural cutoff for the stop.

Distributions of resolution times Q for ␣⫽3.5 are re- ported in Fig. 3. The histogram of␻⫽(log2Q)/N essentially exhibits a narrow peak共left side兲followed by a wider bump 共right side兲. As N grows, the right peak acquires more and more weight, while the left peak progressively disappears.

The center of the right peak gets slightly shifted to the left, but reaches a finite value␻*0.035 as N→⬁ 关3兴. This right peak thus corresponds to the core of exponentially hard reso- lutions: resolutions of instances almost surely require a time scaling similar to 2N*as the size N gets large, in agreement with the above discussion.

On the contrary, the abscissa of the maximum of the left peak vanishes as log2N/N when the size N increases, indicat- ing that the left peak accounts for polynomial共linear兲 reso- lutions. Its maximum is located at Q/N⯝0.2⫺0.25, with weak dependence on N. The cumulative probability Plin to have a complexity Q less than, or equal to N, decreases ex- ponentially: Plin⫽2Nwith␨⯝0.011⫾0.001共Inset of Fig.

4兲. In the following, we will concentrate on linear resolutions only 共an analysis of the distribution of exponential resolu- tions for the problem of the vertex covering of random graphs关5兴can be found in关16兴兲.

FIG. 2. Phase diagram of 2⫹p-SAT and first branch trajectories for satisfiable instances. The threshold line␣C( p) 共bold full line兲 separates SAT共lower part of the plane兲 from unSAT共upper part兲 phases. Departure points for DPLL trajectories are located on the 3-SAT vertical axis共empty circles兲. Arrows indicate the direction of

‘‘motion’’ along trajectories 共dashed curves兲 parametrized by the fraction t of variables set by DPLL. For small ratios ␣⬍␣L

(⯝3.003 for the GUC heuristic兲, branch trajectories remain con- fined in the SAT phase, end in S of coordinates (1,0), where a solution is found 共with a search process reported on tree A). For ratios ␣L⬍␣⬍␣C, the branch trajectory intersects the threshold line at some point G. A contradiction almost surely arises before the trajectory crosses the dotted curve␣⫽1/(1⫺p)point D), and ex- tensive backtracking up to G permits to find a solution共search tree B). With exponentially small probability, the trajectory共dot-dashed curve, full arrow兲 is able to cross the ‘‘dangerous’’ region where contradictions are likely to occur共search tree similar to A); it then exits from this region共point D⬘) and ends up with a solution共low- est dashed trajectory兲.

FIG. 3. Histograms of the logarithm␻ of the complexity Qbase 2, and divided by N) for

3.5 and different sizes N. Many instances are drawn randomly, and for each sample, DPLL is run until a solution is found共very few unsatisfi- able instances can be present and are discarded兲. Because of the large N limit this instance-to- instance distribution is equivalent to the run-to- run distribution on the same instance, coming from the randomness in the assignments of vari- ables by DPLL; the latter is indeed almost surely independent of the particular instances. See Ref.

关7兴 and the inset of Fig. 4 for a proof of this equivalence.

BRIEF REPORTS PHYSICAL REVIEW E 66, 037101 共2002兲

037101-2

(3)

Further numerical investigations show that, in easy reso- lutions, the solution is essentially found at the end of the first branch, with a search tree of type A, and not B, in Fig. 2.

Easy resolution trajectories are able to cross the ‘‘dangerous’’

region extending beyond point D in Fig. 2, contrary to most trajectories which backtrack earlier. Beyond D, unit clauses 共UC兲 indeed accumulate. Their number C1 becomes of the order of N (C1/N⯝0.022 for ␣⫽3.5), and the probability that the branch survives, i.e., that no two contradictory UC are present, is exponentially small in N, in agreement with the scaling of the left peak weight in Fig. 3.

Calculation of␨ requires the analysis of the first descent in the search tree 共Fig. 2兲. Each time DPLL assigns a vari- able, some clauses are eliminated and others are reduced or left unchanged共Fig. 1兲. We thus characterize an instance by its state C(C1,C2,C3), where Cj is the number of j clauses it includes ( j1,2,3). Initially, C⫽(0,0,␣0N). Let us call P˜ (C;T) the probability that the assignment of T vari- ables has produced no contradiction and an instance with state C. P˜ obeys a Markovian evolution P˜ (C;T⫹1)

⫽兺CK(C,C

;T) P˜ (C

;T) where the entries of the transi- tion matrix K read

KC,C

;T兲⫽Bp

3 C3,3w

30

3

B1/23,w3z

20 C2⫺v

Bp

2 C2v,z2

w

20 z2

B1/2z2,w2

z

10 C11v

1 2z1Bp

1 C11v,z1

⫻␦z2⫺⌬2w3⫹vz1⫺⌬1w21⫺v 共1兲

where␦C denotes the Kronecker delta function:␦C1 if C

⫽0; 0 otherwise. Variables appearing in Eq. 共1兲 are as fol- lows. ⌬jC

jCj, v⬅C

1, zj (wj) is the number of j clauses which are satisfied 共reduced to j⫺1 clauses兲 when the (T⫹1)th variable is assigned. These are stochastic vari- ables drawn from several binomial distributions BpL,K

⬅(KL) pK(1⫺p)LK. Parameter pjj /(NT) equals the probability that a j clause contains the variable just assigned by DPLL.

The introduction of the generating function P(y;T)

⫽兺CeyCP˜ (C,T), allows us to express the evolution equa- tion for the state probabilities in a compact manner,

Py;T⫹1兲⫽eg1(y)Pgy;T…⫹共eg2(y)eg1(y)

P„⫺⬁, g2y, g3y;T…, 共2兲 where gj(y)yj⫹ln关1⫹␥j(y)/N兴, ␥j(y)⬅␥j(yj, yj1)

jeyj(1⫹eyi1)/2⫺1兴/(1⫺t) for j1,2,3 (y0⬅⫺⬁).

From Eq.共1兲, the Cjs undergo O(1) changes each time a variable is fixed. After TtN assignments, the densities cj

Cj/N of clauses have been modified by O(1). This trans- lates into large N ansatz for the state probability, P˜ (C;T)

eN(c;t), and for the generating function, P(y;T)

eN(y;t), up to non exponential in N terms.and˜ are simply related to each other through a Legendre transform.

In particular, ␸(0;t) is the logarithm of the probability共di- vided by N) that the first branch has not been hit by any contradiction after a fraction t of variables have been as- signed. The most probable values of the densities cj(t) of j clauses are equal to the partial derivatives of ␸ in y0.

When DPLL starts running on a 3-SAT instance, clauses are reduced and some UC generated. Next they are elimi- nated through UC propagation, and splits occur frequently 共Fig. 1兲. The number C1 of UC remains bounded with re- spect to the instance size N, and the density c1(t)⫽⳵␸/y1

identically vanishes. ␸ does not depend on y1, and

(y2,y3;t) obeys the following partial differential equation 共PDE兲

⳵␸

t ⫽⫺y22y2, y2;t兲⳵␸

y2

⫹␥3y2, y3;t兲⳵␸

y3

. 共3兲 We have solved analytically PDE 共3兲 with initial condition

(y;0)⫽␣0y3. The high probability scenario is obtained for y2y3⫽0: ␸(0,0;t)⫽0 indicates that the probability of sur- vival of the branch is not exponentially small in N 关14兴, and the partial derivatives c2(t),c3(t) give the typical densities of 2- and 3-clauses, in full agreement with Chao and Fran- co’s result 关13兴. We plot in Fig. 2 the corresponding resolu- tion trajectories for various initial ratios␣0, using the change of variables pc3/(c2c3), ␣⫽(c2c3)/(1⫺t). Further- more, our calculation provides a complete description of rare deviations of the resolution trajectory from its highly prob- able locus, giving access to the exponentially small prob- abilities that p,␣ differ from their most probable values at

‘‘time’’ t.

FIG. 4. Log of complexity using DPLL (␻: simulations, circles;

theory from 关3兴, dotted line兲 and SR (␨: simulations, squares;

theory, full line兲as a function of ratio ␣. Inset: minus log of the cumulative probability Plin of complexities QN as a function of the size for 100⭐N⭐400共full line兲; log of the number of restarts Nrest necessary to find a solution for 100⭐N⭐1000共dotted line兲 for␣⫽3.5. Slopes are␨⫽0.0011 and¯␨⫽0.00115, respectively.

BRIEF REPORTS PHYSICAL REVIEW E 66, 037101 共2002兲

037101-3

(4)

The assumption C1O(1) breaks down for the most probable trajectory at some fraction tD, e.g., tD⯝0.308 for

03.5 at which the trajectory hits point D on Fig. 2. Be- yond D, UC accumulate, and the probability of survival of the first branch becomes exponentially small in N. Variables are almost always assigned through unit propagation: c1

⬎0. ␸ now depends on y1 and, from Eq. 共1兲, obeys the following PDE:

⳵␸

t ⫽⫺y1j

31 jy;t⳵␸yj. 共4兲 We have solved PDE共4兲through an expansion of␸ in pow- ers of y, whose coefficients obey, from Eq. 共4兲, a set of coupled linear ordinary differential equations 共ODEs兲. The initial conditions for the ODEs are chosen to match the ex- pansion of the exact solution of Eq. 共3兲, that is, the typical trajectory and its large deviations, at time tD. The quality of the approximation improves rapidly with the order k of the expansion, and no difference was found between k⫽3 and k4 results. c1first increases, reaches its top value (c1)max, then decreases and vanishes at tDwhen the trajectory comes out from the dangerous region where contradictions almost surely occur 共Fig. 2兲. The probability of survival scales as 2N for large N, with ␨⫽⫺␸(0;tD⬘)/ln 2. The calculated values of ␨⯝0.01,(c1)max0.022 and Q/N⯝0.21 for ␣

⫽3.5 are in very good agreement with numerics. Figure 4 shows the agreement between theory and simulations over the whole range␣L⬍␣⬍␣C.

The existence of rare but easy resolutions suggests the use of a systematic SR procedure to speed up resolution: if a solution is not found before N splits, DPLL is stopped and rerun after some random permutations of the variables and clauses. The expected number Nrest of restarts necessary to find a solution being equal to the inverse probability 1/Plin of linear resolutions, the resulting complexity should scale as

N20.011Nfor␣⫽3.5, with an exponential gain with respect to DPLL one-run complexity, 20.035N. Results of SR experi- ments are reported in Fig. 4. The typical number Nrest

⫽2N¯of restarts grows indeed exponentially with the size N, with a rate¯␨⫽0.0115⫾0.001 equal to ␨. The equality be- tween␨ and¯ confirms the equivalence between sample-to-␨ sample共Fig. 3兲and run-to-run共at fixed sample兲distributions of complexities for large sizes 关8兴.

Performances are greatly enhanced by the use of SR 共see Fig. 4 for comparison between ␨ and␻). While with usual DPLL, we were able to solve instances with 500 variables in about one day of CPU for␣⫽3.5, instances with 1000 vari- ables were solved with SR in 15 minutes on the same com- puter.

Our work therefore provides some theoretical support to the use of SR 关15,16兴, and in addition suggests a natural cutoff at which the search is halted and restarted, the deter- mination of which is usually widely empirical and problem dependent. If a combinatorial problem is efficiently solved 共polynomial time兲 by a search heuristic for some values of the control parameter of the input distribution, there might be an exponentially small probability that the heuristic is still successful 共in polynomial time兲 in the range of parameters where resolution almost surely requires massive backtrack- ing and exponential effort. When the decay rate of the poly- nomial time resolution probability ␨ is smaller than the growth rate␻ of the typical exponential resolution time, SR with a cutoff in the search equal to a polynomial of the instance size will lead to an exponential speed up of resolu- tions.

We thank A. Montanari and R. Zecchina for discussions and communication of their results prior to publication, and the French Ministry of Research for partial financial support through the ACI Jeunes Chercheurs ‘‘Algorithmes d’Optimisation et Syste`mes De´sordonne´s Quantiques.’’

关1兴C.H. Papadimitriou and K. Steiglitz, Combinatorial Optimiza- tion: Algorithms and Complexity 共Prentice-Hall, Englewood Cliffs, NJ, 1982兲.

关2兴R. Monasson, R. Zecchina, S. Kirkpatrick, B. Selman, and L.

Troyansky, Nature共London兲400, 133共1999兲.

关3兴S. Cocco and R. Monasson, Phys. Rev. Lett. 86, 1654共2001兲; Eur. Phys. J. B 22, 505共2001兲.

关4兴S. Mertens, Phys. Rev. Lett. 81, 4281共1998兲; 84, 1347共2000兲. 关5兴M. Weigt and A. Hartmann, Phys. Rev. Lett. 84, 6118共2000兲;

86, 1658共2001兲.

关6兴J. Gu, P.W. Purdom, J. Franco, and B.W. Wah, in DIMACS Series on Discrete Mathematics and Theoretical Computer Sci- ence共American Mathematical Society, Providence, 1997兲, Vol.

35, pp. 19–151.

关7兴D. Mitchell, B. Selman, and H. Levesque, in Proceedings of the Tenth National Conference on Artificial Intelligence (AAAI- 92)共AAAI Press, Cambridge, MA, 1992兲, pp. 440– 446.

关8兴T. Hogg and C.P. Williams, Artif. Intel. 69, 359 共1994兲; I.P.

Gent and T. Walsh, ibid. 70, 335 共1994兲; B. Selman and S.

Kirkpatrick, ibid. 81, 273共1996兲.

关9兴B. Selman and S. Kirkpatrick, Science 264, 1297共1994兲. 关10兴E. Friedgut, J. Am. Math. Soc. 12, 1017共1999兲.

关11兴V. Chva`tal, E. Szmeredi, J. Assoc. Comput. Mach. 35, 759 共1988兲.

关12兴C. Coarfa, D.D. Dernopoulos, A. San Miguel Aguirre, D. Sub- ramanian, and M.Y. Vardi, Lect. Notes Comput. Sci. 1894, 143 共2000兲.

关13兴M.T. Chao and J. Franco, Inf. Sci.共N.Y.兲51, 289共1990兲. 关14兴A. Frieze and S. Suen, J. Algorithms 20, 312共1996兲. 关15兴O. Dubois, P. Andre´, Y. Boufkhad, and J. Carlier, in DIMACS

Series in Discrete Math and Computer Science共Ref.关6兴兲, pp.

415– 436; C.P. Gomes, B. Selman, N. Crato, and H. Kautz, J.

Automated Reasoning 24, 67共2000兲.

关16兴A. Montanari, R. Zecchina, Phys. Rev. Lett. 88, 178701 共2002兲.

BRIEF REPORTS PHYSICAL REVIEW E 66, 037101 共2002兲

037101-4

Références

Documents relatifs

As the left branch associated to each variable corresponds to the value assigned to this variable in the previous best solution, the selection rules determine the neighborhood of

Balls with a thick border belong to any minimum covering; any two circled points cannot be covered by a single ball. Note that this object and its decomposition are invariant

We measured interpretation preferences in a picture completion experiment in which participants had to complete dot pictures corresponding to their preferred interpretation of

Yet, even if supertree methods satisfy some desirable properties, the inferred supertrees often contain polytomies which actually intermix two distinct phenomena: either a lack

For example, many recent works deal with F I -modules (functors from the category of finite sets with injec- tions to abelian groups, see [1]); finitely generated F I -modules

Strictly montone functions are the easiest imaginable testbed for hillclimbers: The function f always prefers one-bits over zero-bits, so there are always short and clearly

The input files were grouped depending on their average processing time by SAT-solvers, and compared on several factors, including average clause length and percentage of Horn

On the other hand, researchers working in the area of SAT and SMT solvers observed that by combining the combinatorial search capabilities of SAT solvers with mathematical