• Aucun résultat trouvé

Automatic extraction of functional dependencies

N/A
N/A
Protected

Academic year: 2021

Partager "Automatic extraction of functional dependencies"

Copied!
9
0
0

Texte intégral

(1)

HAL Id: hal-00397450

https://hal.archives-ouvertes.fr/hal-00397450

Submitted on 22 Jun 2009

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Automatic extraction of functional dependencies

Eric Gregoire, Richard Ostrowski, Bertrand Mazure, Lahkdar Sais

To cite this version:

Eric Gregoire, Richard Ostrowski, Bertrand Mazure, Lahkdar Sais. Automatic extraction of functional dependencies. Theory and Applications of Satisfiability Testing: 7th International Conference (SAT 2004), 2005, Vancouver, Canada. pp.122-132. �hal-00397450�

(2)

ÉriGrégoire,RihardOstrowski,BertrandMazure,andLakhdarSaïs

CRILCNRSUniversitéd'Artois

rueJeanSouvrazSP-18

F-62307LensCedexFrane

{gregoire,ostrowski,mazure,sa is} ril. univ- arto is.fr

Abstrat. Inthispaper,anewpolynomialtimetehniqueforextratingfuntionalde-

pendeniesinBooleanformulasisproposed.Itmakesanoriginaluseofthewell-known

Booleanonstraintpropagationtehnique(BCP)inanewpreproessingapproahthat

extratsmorehiddenBooleanfuntionsanddependentvariablesthanpreviouslypub-

lishedapproahesonmanylassesofinstanes.

Keywords:SAT,Booleanfuntion,propositionalreasoningandsearh.

1 Introdution

Reent impressiveprogress in the pratial resolution of hard and large SAT instanes al-

lows real-world problems that are enoded in propositional lausal normal form (CNF) to

be addressed(see e.g. [11,7,18℄). Whilethere remainsa strongompetition aboutbuilding

more eient provers dediated to hard random k-SAT instanes [6℄, there is also a real

surgeof interestin implementingpowerfulsystemsthat solvediultlargereal-worldSAT

problems.Manybenhmarkshavebeenproposedandregularompetitions(e.g.[4,1,14,15℄)

areorganizedaroundthese speiSATinstanes, whih are expeted toenodestrutural

knowledge,atleasttosomeextent.

Clearly,enodingknowledgeundertheformofaonjuntionofpropositionallausesan

attensomestruturalknowledgethatwouldbemoreapparentinmoreexpressiveproposi-

tionallogirepresentationformalisms,andthatouldproveusefulintheresolutionstep[13,

8℄.

Inthis paper,anew pre-proessingstepis proposed in theresolution ofSATinstanes,

that extratsand exploits somestruturalknowledgethat is hiddenin the CNF.The teh-

niquemakesanoriginaluseofthewell-knownBooleanonstraintpropagation(BCP)proess.

WhereasBCPistraditionallyusedtoprodueimpliedand/orequivalentliterals,inthis pa-

peritisshownhowitanbeextended sothatitdeliversanhybridformulamadeoflauses

together with aset of equations of the form y = f(x

1

;:::;x

n

) where f is astandard on-

netiveoperatoramongf_,^g andwhere y andx

i

areBoolean variablesoftheinitialSAT

instane. These Boolean funtions allowus to detet asubset of dependent variables, that

anbeexploited bySATsolvers.

This paper extends in a signiant way the preliminary results that were published in

[12℄in that it desribesatehniquethat allowsmoredependent variablesand hidden fun-

tional dependenies to be deteted in several lassesof instanes. We shallsee that theset

offuntionaldependeniesanunderlie yles.Unfortunately, highlightingatualdependent

variablestakingpartintheseylesanbetime-onsumingsineitoinidestotheproblem

ofndingaminimalyleutsetofvariablesinagraph,whihisawell-knownNP-hardprob-

lem.Aordingly,eientheuristisareexploredtouttheseylesanddelivertheso-alled

dependentvariables.

Thepaperisorganizedasfollows.After somepreliminary denitions,Booleangatesand

theirpropertiesarepresented.Itisthenshownhowmorefuntionaldependeniesthan[12℄an

bededuedfromtheCNF,usingBooleanonstraintpropagation.Then,atehniqueallowing

ustodeliverasetofdependentvariablesispresented,allowingthesearhspaetoberedued

in anexponentialway.Experimental resultsshowingthe interestof the proposed approah

(3)

2 Tehnial preliminaries

LetBbeaBoolean(i.e.propositional)languageofformulasbuiltin thestandardway,using

usualonnetives(_,^,:,),,)andasetofpropositionalvariables.

A CNFformula is aset (interpretedasaonjuntion) oflauses, where alauseis a

set (interpretedas adisjuntion) of literals.A literal is apositiveor negatedpropositional

variable. We note V() (resp. L()) the set of variables (resp. literals) ourring in . A

unitlauseis alauseformed withoneuniqueliteral. Aunitliteral istheuniqueliteralofa

unitlause.

Inaddition to theseusual set-basednotations, wedene thenegationof aset ofliterals

(:fl

1

;:::;l

n

g)asthesetof theorrespondingoppositeliterals(f:l

1

;:::;:l

n g).

An interpretation ofaBooleanformulais anassignmentoftruth valuesftrue;falsegto

itsvariables.Amodelofaformulaisaninterpretationthatsatisestheformula.Aordingly,

SATonsistsinndingamodelofaCNFformulawhensuhamodeldoesexistorinproving

thatsuhamodel doesnotexist.

Let

1

be alause ontaininga literal a and

2

a lause ontaining the opposite literal

:a,oneresolvent of

1 and

2

isthe disjuntionof allliteralsof

1 and

2

lessaand :a.A

resolventisalled tautologial whenitontainsoppositeliterals.

Let us reall here that any Boolean formula an be translated thanks to alinear time

algorithmintoCNF,equivalentwithrespettoSAT(butthatanuseadditionalpropositional

variables). Most satisability heking algorithms operate on lauses, where the strutural

knowledge of the initial formulas is thus attened. In the following,CNF formulas will be

representedasBooleangates.

3 Boolean gates

A (Boolean) gate is an expression of the form y = f(x

1

;:::;x

n

), where f is a standard

onnetiveamongf_,^,,gandwherey andx

i

arepropositionalliterals,thatisdenedas

follows:

y=^(x

1

;:::;x

n

)representsthesetoflausesfy_:x

1

_:::_:x

n

;:y_x

1

;:::;:y_x

n g,

translatingtherequirementthatthetruthvalueofyisdeterminedbytheonjuntionof

thetruthvaluesofx

i

s.t.i2[1::n℄;

y=_(x

1

;:::;x

n

)representsthesetoflausesf:y_x

1

_:::_x

n

;y_:x

1

;:::;y_:x

n g;

y =, (x

1

;:::;x

n

) represents the following equivalene hain (also alled bionditional

formula) y , x

1

,::: , x

n

, whih is equivalent to the set of lausesfy_x

1 _:::_

x

n

;y_:x

1

_:::_:x

n

;:y_x

1 _:x

2

_:::_:x

n

;:::;:y_:x

1

_:::_:x

n 1 _x

n g.

Inthefollowing,weonsidergates oftheformy=f(x

1

;:::;x

n

)wherey isavariableor

theBooleanonstanttrue,only.

Indeed,anylauseanberepresentedasagateoftheformtrue=_(x

1

;:::;x

n

).Moreover,

agate:y =^(x

1

;:::;x

n

) (resp.:y = _(x

1

;:::;x

n

)) is equivalentto y =_(:x

1

;:::;:x

n )

(resp. y = ^(:x

1

;:::;:x

n

) ). Aording to the well-known property of equivalene hain

assertingthateveryequivalenehainwithanodd(resp.even)numberofnegativeliteralsis

equivalentto the hain formed with thesameliterals, but allin positive(resp. exeptone)

form,everygateof theformy =,(x

1

;:::;x

n

)analwaysberewrittenintoagatewherey

isapositiveliteral.Forexample,:y=,(:x

1

;x

2

;x

3

)isequivalenttoy=,(x

1

;x

2

;x

3 )and

:y=,(:x

1

;x

2

;:x

3

)isequivalenttoe.g.y=,(x

1

;x

2

;:x

3 ).

Apropositionalvariabley(resp.x

1

;:::;x

n

)isanoutputvariable(resp.areinputvariables)

ofagateof theform y=f(x 0

1

;:::;x 0

n

),wherex 0

i 2fx

i

;:x

i g.

A propositionalvariable z is an output (dependent) variable of a set of gates i z is an

output variable of at least one gatein theset. An input (independent) variable of a set of

(4)

AgateissatisedunderagivenBooleaninterpretationitheleftandrighthandsidesof

thegatearesimultaneouslytrueorfalseunderthisinterpretation.Aninterpretationsatises

asetofgatesieahgateissatisedunderthisinterpretation.Suhaninterpretationisalled

amodel ofthis setofgates.

4 From CNF to gates

Pratially,wewanttondarepresentationofaCNFusinggatesthathighlightsamaximal

numberofdependentvariables,in orderto dereasetheatualomputationalomplexityof

heking the satisability of . Atually, weshall desribea tehniquethat extrats gates

thatanbededuedfrom,andthatthusoverasubsetoflausesof.Remaininglauses

ofwillberepresentedasor-gatesoftheformtrue=_(x

1

;:::;x

n

),inordertogetauniform

representation.

Moreformally,assumethatasetGofgateswhoseorrespondinglausesCl(G)arelogial

onsequenesofaCNF,theset

unovered(G)

ofunoveredlausesofw.r.t.Gistheset

oflausesofnCl(G).

Aordingly,

unovered(G)

[Cl(G).

Nottrivially,weshallseethattheadditionallausesCl(G)nanplayanimportantrole

infurtherstepsofdedutionorsatisabilityheking.

Knowing output variables anplay an important role in solvingthe onsisteny status

of a CNF formula. Indeed, the truth-value of an y output variable of a gate depends on

the truth value of the orresponding x

i

input variables. The truth value of suh output

variablesanbeobtainedbypropagation,andtheyanbeomittedbyseletionheuristisof

DPLL-likealgorithms[3℄.Inthegeneralase,knowingn 0

outputvariablesofagate-oriented

representationofaCNFformulausingnvariablesallowsthesizeofthesetofinterpretations

tobeinvestigatedtodereasefrom2 n

to2 n n

0

.Obviously,theredutioninthesearhspae

inreaseswiththenumberofdeteteddependentvariables.

Unfortunately,toobtainsuharedutioninthesearhspae,onemightneedtoaddress

thefollowingproblems:

Extrating gates from a CNF formula anbe a time-onsuming proess in the general

ase,unlesssomedepth-limitedsearhresouresorheuristiriteriaareprovided.Indeed,

showingthaty=f(x

1

;:::;x

i

)(wherey;x

1

;:::;x

i

belongto)followsfromagivenCNF

,isoNP-omplete.

when the set of deteted gates ontains reursivedenitions (like y = f(x;t) and x =

g(y;z)), assigning truth values to the set of independent variables is not suient to

determinethetruthvaluesofallthedependentones.Handlingsuhreursivedenitions

oinidestothewell-knownNP-hardproblemofndingaminimalyleutsetinagraph.

In this paper, these two omputationally-heavy problems are addressed. The rst one

byrestriting dedution to Boolean onstraintpropagation,only. The seond one byusing

graph-orientedheuristis.

Letusrstreallsomeneessarydenitions aboutBooleanonstraintpropagation.

5 Boolean onstraint propagation (BCP)

Booleanonstraintpropagationorunitresolution,isoneofthemostusedandusefullookahead

algorithmforSAT.

Let beaCNFformula,BCP()istheCNFformulaobtainedbypropagatingallunit

literalsof.Propagatingaunit literallof onsistsinsuppressing alllauses of suh

thatl2 andreplaing alllauses 0

of suhthat :l2 0

by 0

nf:lg.TheCNFobtained

insuhawayis equivalentto withrespetto satisability.

Theset of propagated unitliterals of using BCP isnotedUP(). Obviously,wehave

(5)

ItisalsoompleteforHornformulas.InadditiontoitsuseinDPLLproedures,BCPisused

in manySAT solversasaproessingstep todedue further interestinginformation suh as

implied[5℄ and equivalentliterals [2℄[9℄.Loal proessingbased-BCP is alsoused to deliver

promisingbranhingvariables(heuristiUP [10℄).

Inthesequel,itisshownthatBCPanbefurtherextended,allowingmoregeneralfun-

tionaldependeniestobeextrated.

6 BCP and funtional dependenies

Atually,BCPanbeusedtodetethiddenfuntionaldependenies.Themainresultofthe

paperisthepratialexploitationofthefollowingoriginalproperty:gatesanbeomputed

usingBCP only, while heking whether agateis alogialonsequeneof aCNFis oNP-

ompleteinthegeneralase.

Property1. Let beaCNFformula,l2L(),and2 s.t.l2.Ifnflg:UP(^l)

thenl=^(:fnflgg).

Proof. Let = fl;:l

1

;:l

2

;:::;:l

m

g 2 s.t. nflg = f:l

1

;:l

2

;:::;:l

m

g :UP(^l).

The Boolean funtion l = ^(:fnflgg) an be written as l = ^(l

1

;l

2

;:::;l

m

). To prove

that l = ^(l

1

;l

2

;:::;l

m

), we need to showthat every model of , is also a model of

l=^(l

1

;l

2

;:::;l

m

).LetI beamodelof,then

1. liseithertrueinI :I isalsoamodelof^l.Asf:l

1

;:l

2

;:::;:l

m

g:UP(^l),we

havefl

1

;l

2

;:::;l

m

gUP(^l), then fl

1

;l

2

;:::;l

m

g aretrue in I. Consequently, I is

alsoamodelofl=^(l

1

;l

2

;:::;l

m gg);

2. orlisfalseinI:as=fl;:l

1

;:l

2

;:::;:l

m

g2thenIsatises=f:l

1

;:l

2

;:::;:l

m g2

. So, at least one the literals l

i

;i 2 f1;:::;mg is true in I. Consequently, I is also a

modelofl=^(l

1

;l

2

;:::;l

m gg)

Clearly,dependingonthesignoftheliterall,and-gatesoror-gatesanbedeteted.Forex-

ample,theand-gate:l=^(l

1

;l

2

;:::;l

n

)isequivalenttotheor-gatel=_(:l

1

;:l

2

;:::;:l

n ).

Letusalsonote that thispropertyoversbinaryequivalene sinea=^(b)is equivalentto

a,b.

Atually, this property allows gates to be deteted, whih were not in the sope the

tehniquedesribedin[12℄.Letusillustratethisbymeansofanexample.

Example1. Let

1

fy_:x

1 _:x

2 _:x

3

;:y_x

1

;:y_x

2

;:y_x

3 g.

Aording to [12℄,

1

an be representedby agraphwhere eah vertex representsalause

andwhere eah edgeorrespondsto theexisteneof tautologialresolventbetweenthetwo

orrespondinglauses. Eah onneted omponent might be agate.As we ansee the rst

four lausesbelong to asame onnetedomponent. Thisis a neessaryonditionfor suh

a subset of lauses to represent a gate. Suh a restrited subset of lauses (namely, those

appearinginthesameonnetedomponent)isthenhekedsyntatiallytodetermineifit

representsanand/orgate.Suhapropertyanbehekedinpolynomialtime. Intheabove

example,wethushavey=^(x

1

;x

2

;x

3 ).

Now,letusonsider,thefollowingexample,

Example2.

2

fy_:x

1 _:x

2 _:x

3

;:y_x

1

;:x

1 _x

4

;:x

4 _x

2

;:x

2 _x

5

;:x

4 _:x

5 _x

3 g.

Clearly,thegraphialrepresentationofthislaterexampleisdierentandtheaboveteh-

niquedoesnothelpus indisoveringthey=^(x

1

;x

2

;x

3

)gate.Indeed,theaboveneessary

butnotsuientonditionisnotsatised.

Now,aordingto Property1,boththeand-gatesbehindExample1and Example2an

bedeteted.Indeed,UP( ^y)=fx ;x ;x g(resp.UP( ^y)=fx ;x ;x ;x ;x g)and

(6)

92

1

,(resp.

0

2

2

),=(y_:x

1 _:x

2 _:x

3

)(resp.

0

=(y_:x

1 _:x

2 _:x

3 ))suh

thatnfyg:UP(

1

^y)(resp.

0

nfyg:UP(

2

^y)).

Aordingly,apreproessingtehniquetodisovergatesonsistsinhekingtheProperty

1foranyliteralourringin .A furthersteponsistsin ndingdependentvariablesofthe

originalformulas, as theyan be reognisedin the disoveredgates.A gatelearlyexhibits

onedependentliteral with respetto theinputs whih are onsideredindependent, asfara

singlegate is onsidered.Now, when several gates share literals, suh a haraterisationof

dependent variables doesnotapply anymore.Indeed, forms of yleanour asshown in

thefollowingexample.

Example3.

3

fx=^(y;z);y=_(x;:t)g.

Clearly,

3

ontainayle.Indeed,xdependsonthevariablesyandz,whereasydepends

onthe variables x and t. Whena singlegate is onsidered,assigning truth valuesto input

variables determines the truth value of the output, dependent, variable. As in Example 3,

assigningtruthvaluesto inputvariablesthat arenotoutputvariablesfor othergatesis not

enoughtodeterminethetruthvalueofallinvolvedvariables.Intheexample,assigningtruth

valuesto z and t is notsuient to determinethe truth valueof x and y.However, in the

example,when we assign atruth valueto an additionalvariable (x, whih isalled ayle

utsetvariable) intheyle,thetruth valueof y isdetermined.Aordingly,weneedto ut

suh aform ofyle in orderto determinateasuient subsetofvariables that determines

thevaluesof all variables. Suh aset is alled astrong bakdoorin [17℄.InExample 3, the

strongbakdoororrespondsto thesetof fxg[fz;tg.In thisontext, astrong bakdooris

theunionofthesetofindependentvariablesandofthevariablesoftheyleutset.Finding

theminimalsetofvariablesthatutsalltheylesinthesetofgatesisanNP-hardproblem.

Thisissueisinvestigatedin thenextsetion.

7 Searhing for dependent variables

Inthefollowing,agraphrepresentationoftheinterationofgatesisonsidered.Moreformally,

Aset ofgatesanberepresentedbyabipartitegraphG=(O[I;E)asfollows:

foreahgateweassoiatetwoverties, therstone o2O representsthe outputof the

gate,andtheseondonei2I representsthesetofitsinputvariables.Sothenumberof

vertexislessthan2#gates,where#gates isthenumberofgates;

Foreah gate, anedge (o;i)betweenthe two vertieso and irepresenting theleft and

therighthandsidesofagateisreated.Additionaledgesarereatedbetweeno2Oand

i2I ifoneoftheliteralsoftheoutputvariableassoiatedtothevertexobelongstothe

setofinputliteralsassoiatedto thevertexi.

Finding a smallest subset V 0

of O s.t. the subgraph G 0

= (V 0

[O;E 0

) is ayli is a

well-knownNP-hardproblem.

Atually, any subset V 0

that makes thegraphayli is therepresentation of theset of

variables,whihtogetherwithalltheindependentones,allowsallvariablestobedetermined.

WhenV 0

is of size ,and theset of dependent variables is of size d, then thesearhspae

isredued from 2 n

to 2 n (d )

, where nis the number ofvariable ourring in theoriginal

CNFformula.

Wethusneedtondatrade-obetweenthesizeofV 0

,whihinuenestheomputational

ostto ndit,andtheexpetedtimegaininthesubsequentSAThekingstep.

In thefollowing,twoheuristis are investigated in order to nda yle-utset V 0

. The

rst-oneisalledMaxdegree.ItonsistsinbuildingV 0

inrementallybyseletingvertieswith

thehighestdegreerst,untiltheremainingsubgraphbeomesayli.

The seond one is alled MaxdegreeCyle. It onsists in building V 0

inrementally by

seletingrstavertexwiththehighestdegreeamongthevertiesthatbelongtoayle.This

Références

Documents relatifs

The Deep Web has enabled the availability of a huge amount of useful information and people have come to rely on it to fulll their information needs in a variety of domains.. We

In this study, we selected four basic indices to combine various facets of biodiversity and conservation value: the biodiversity conservation concern index (BCC), the origin

[ 16 ], they implemented a sentiment analysis framework by using word embedding-based features as well as sentiment lexicon data in order to build hybrid vector of document..

induces a quadratic overhead of the substitution process. Turning to practical values and embracing the second approach, instead, speeds up the substitution process, inducing a

The time course of activation of those subgraphs was closely related to the ictal involvement of both temporal lobes revealed by visual analysis of the seizures: involvement of the

The latter may have literal embodyments like the first type, but may also be abstract like “red” or “christianity.” For the first type we consider a family of computable

Taking the boundaries of intonation units as functional annotation, each IU was modelled with the sequence [M-T…B] (where here [ and ] correspond to the beginning

On the one hand, we adapt the definition of a canonical direct basis of im- plications with proper premises [4, 22] to the formalism of pattern structures, in order to prove that