HAL Id: hal-00397450
https://hal.archives-ouvertes.fr/hal-00397450
Submitted on 22 Jun 2009
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Automatic extraction of functional dependencies
Eric Gregoire, Richard Ostrowski, Bertrand Mazure, Lahkdar Sais
To cite this version:
Eric Gregoire, Richard Ostrowski, Bertrand Mazure, Lahkdar Sais. Automatic extraction of functional dependencies. Theory and Applications of Satisfiability Testing: 7th International Conference (SAT 2004), 2005, Vancouver, Canada. pp.122-132. �hal-00397450�
ÉriGrégoire,RihardOstrowski,BertrandMazure,andLakhdarSaïs
CRILCNRSUniversitéd'Artois
rueJeanSouvrazSP-18
F-62307LensCedexFrane
{gregoire,ostrowski,mazure,sa is} ril. univ- arto is.fr
Abstrat. Inthispaper,anewpolynomialtimetehniqueforextratingfuntionalde-
pendeniesinBooleanformulasisproposed.Itmakesanoriginaluseofthewell-known
Booleanonstraintpropagationtehnique(BCP)inanewpreproessingapproahthat
extratsmorehiddenBooleanfuntionsanddependentvariablesthanpreviouslypub-
lishedapproahesonmanylassesofinstanes.
Keywords:SAT,Booleanfuntion,propositionalreasoningandsearh.
1 Introdution
Reent impressiveprogress in the pratial resolution of hard and large SAT instanes al-
lows real-world problems that are enoded in propositional lausal normal form (CNF) to
be addressed(see e.g. [11,7,18℄). Whilethere remainsa strongompetition aboutbuilding
more eient provers dediated to hard random k-SAT instanes [6℄, there is also a real
surgeof interestin implementingpowerfulsystemsthat solvediultlargereal-worldSAT
problems.Manybenhmarkshavebeenproposedandregularompetitions(e.g.[4,1,14,15℄)
areorganizedaroundthese speiSATinstanes, whih are expeted toenodestrutural
knowledge,atleasttosomeextent.
Clearly,enodingknowledgeundertheformofaonjuntionofpropositionallausesan
attensomestruturalknowledgethatwouldbemoreapparentinmoreexpressiveproposi-
tionallogirepresentationformalisms,andthatouldproveusefulintheresolutionstep[13,
8℄.
Inthis paper,anew pre-proessingstepis proposed in theresolution ofSATinstanes,
that extratsand exploits somestruturalknowledgethat is hiddenin the CNF.The teh-
niquemakesanoriginaluseofthewell-knownBooleanonstraintpropagation(BCP)proess.
WhereasBCPistraditionallyusedtoprodueimpliedand/orequivalentliterals,inthis pa-
peritisshownhowitanbeextended sothatitdeliversanhybridformulamadeoflauses
together with aset of equations of the form y = f(x
1
;:::;x
n
) where f is astandard on-
netiveoperatoramongf_,^g andwhere y andx
i
areBoolean variablesoftheinitialSAT
instane. These Boolean funtions allowus to detet asubset of dependent variables, that
anbeexploited bySATsolvers.
This paper extends in a signiant way the preliminary results that were published in
[12℄in that it desribesatehniquethat allowsmoredependent variablesand hidden fun-
tional dependenies to be deteted in several lassesof instanes. We shallsee that theset
offuntionaldependeniesanunderlie yles.Unfortunately, highlightingatualdependent
variablestakingpartintheseylesanbetime-onsumingsineitoinidestotheproblem
ofndingaminimalyleutsetofvariablesinagraph,whihisawell-knownNP-hardprob-
lem.Aordingly,eientheuristisareexploredtouttheseylesanddelivertheso-alled
dependentvariables.
Thepaperisorganizedasfollows.After somepreliminary denitions,Booleangatesand
theirpropertiesarepresented.Itisthenshownhowmorefuntionaldependeniesthan[12℄an
bededuedfromtheCNF,usingBooleanonstraintpropagation.Then,atehniqueallowing
ustodeliverasetofdependentvariablesispresented,allowingthesearhspaetoberedued
in anexponentialway.Experimental resultsshowingthe interestof the proposed approah
2 Tehnial preliminaries
LetBbeaBoolean(i.e.propositional)languageofformulasbuiltin thestandardway,using
usualonnetives(_,^,:,),,)andasetofpropositionalvariables.
A CNFformula is aset (interpretedasaonjuntion) oflauses, where alauseis a
set (interpretedas adisjuntion) of literals.A literal is apositiveor negatedpropositional
variable. We note V() (resp. L()) the set of variables (resp. literals) ourring in . A
unitlauseis alauseformed withoneuniqueliteral. Aunitliteral istheuniqueliteralofa
unitlause.
Inaddition to theseusual set-basednotations, wedene thenegationof aset ofliterals
(:fl
1
;:::;l
n
g)asthesetof theorrespondingoppositeliterals(f:l
1
;:::;:l
n g).
An interpretation ofaBooleanformulais anassignmentoftruth valuesftrue;falsegto
itsvariables.Amodelofaformulaisaninterpretationthatsatisestheformula.Aordingly,
SATonsistsinndingamodelofaCNFformulawhensuhamodeldoesexistorinproving
thatsuhamodel doesnotexist.
Let
1
be alause ontaininga literal a and
2
a lause ontaining the opposite literal
:a,oneresolvent of
1 and
2
isthe disjuntionof allliteralsof
1 and
2
lessaand :a.A
resolventisalled tautologial whenitontainsoppositeliterals.
Let us reall here that any Boolean formula an be translated thanks to alinear time
algorithmintoCNF,equivalentwithrespettoSAT(butthatanuseadditionalpropositional
variables). Most satisability heking algorithms operate on lauses, where the strutural
knowledge of the initial formulas is thus attened. In the following,CNF formulas will be
representedasBooleangates.
3 Boolean gates
A (Boolean) gate is an expression of the form y = f(x
1
;:::;x
n
), where f is a standard
onnetiveamongf_,^,,gandwherey andx
i
arepropositionalliterals,thatisdenedas
follows:
y=^(x
1
;:::;x
n
)representsthesetoflausesfy_:x
1
_:::_:x
n
;:y_x
1
;:::;:y_x
n g,
translatingtherequirementthatthetruthvalueofyisdeterminedbytheonjuntionof
thetruthvaluesofx
i
s.t.i2[1::n℄;
y=_(x
1
;:::;x
n
)representsthesetoflausesf:y_x
1
_:::_x
n
;y_:x
1
;:::;y_:x
n g;
y =, (x
1
;:::;x
n
) represents the following equivalene hain (also alled bionditional
formula) y , x
1
,::: , x
n
, whih is equivalent to the set of lausesfy_x
1 _:::_
x
n
;y_:x
1
_:::_:x
n
;:y_x
1 _:x
2
_:::_:x
n
;:::;:y_:x
1
_:::_:x
n 1 _x
n g.
Inthefollowing,weonsidergates oftheformy=f(x
1
;:::;x
n
)wherey isavariableor
theBooleanonstanttrue,only.
Indeed,anylauseanberepresentedasagateoftheformtrue=_(x
1
;:::;x
n
).Moreover,
agate:y =^(x
1
;:::;x
n
) (resp.:y = _(x
1
;:::;x
n
)) is equivalentto y =_(:x
1
;:::;:x
n )
(resp. y = ^(:x
1
;:::;:x
n
) ). Aording to the well-known property of equivalene hain
assertingthateveryequivalenehainwithanodd(resp.even)numberofnegativeliteralsis
equivalentto the hain formed with thesameliterals, but allin positive(resp. exeptone)
form,everygateof theformy =,(x
1
;:::;x
n
)analwaysberewrittenintoagatewherey
isapositiveliteral.Forexample,:y=,(:x
1
;x
2
;x
3
)isequivalenttoy=,(x
1
;x
2
;x
3 )and
:y=,(:x
1
;x
2
;:x
3
)isequivalenttoe.g.y=,(x
1
;x
2
;:x
3 ).
Apropositionalvariabley(resp.x
1
;:::;x
n
)isanoutputvariable(resp.areinputvariables)
ofagateof theform y=f(x 0
1
;:::;x 0
n
),wherex 0
i 2fx
i
;:x
i g.
A propositionalvariable z is an output (dependent) variable of a set of gates i z is an
output variable of at least one gatein theset. An input (independent) variable of a set of
AgateissatisedunderagivenBooleaninterpretationitheleftandrighthandsidesof
thegatearesimultaneouslytrueorfalseunderthisinterpretation.Aninterpretationsatises
asetofgatesieahgateissatisedunderthisinterpretation.Suhaninterpretationisalled
amodel ofthis setofgates.
4 From CNF to gates
Pratially,wewanttondarepresentationofaCNFusinggatesthathighlightsamaximal
numberofdependentvariables,in orderto dereasetheatualomputationalomplexityof
heking the satisability of . Atually, weshall desribea tehniquethat extrats gates
thatanbededuedfrom,andthatthusoverasubsetoflausesof.Remaininglauses
ofwillberepresentedasor-gatesoftheformtrue=_(x
1
;:::;x
n
),inordertogetauniform
representation.
Moreformally,assumethatasetGofgateswhoseorrespondinglausesCl(G)arelogial
onsequenesofaCNF,theset
unovered(G)
ofunoveredlausesofw.r.t.Gistheset
oflausesofnCl(G).
Aordingly,
unovered(G)
[Cl(G).
Nottrivially,weshallseethattheadditionallausesCl(G)nanplayanimportantrole
infurtherstepsofdedutionorsatisabilityheking.
Knowing output variables anplay an important role in solvingthe onsisteny status
of a CNF formula. Indeed, the truth-value of an y output variable of a gate depends on
the truth value of the orresponding x
i
input variables. The truth value of suh output
variablesanbeobtainedbypropagation,andtheyanbeomittedbyseletionheuristisof
DPLL-likealgorithms[3℄.Inthegeneralase,knowingn 0
outputvariablesofagate-oriented
representationofaCNFformulausingnvariablesallowsthesizeofthesetofinterpretations
tobeinvestigatedtodereasefrom2 n
to2 n n
0
.Obviously,theredutioninthesearhspae
inreaseswiththenumberofdeteteddependentvariables.
Unfortunately,toobtainsuharedutioninthesearhspae,onemightneedtoaddress
thefollowingproblems:
Extrating gates from a CNF formula anbe a time-onsuming proess in the general
ase,unlesssomedepth-limitedsearhresouresorheuristiriteriaareprovided.Indeed,
showingthaty=f(x
1
;:::;x
i
)(wherey;x
1
;:::;x
i
belongto)followsfromagivenCNF
,isoNP-omplete.
when the set of deteted gates ontains reursivedenitions (like y = f(x;t) and x =
g(y;z)), assigning truth values to the set of independent variables is not suient to
determinethetruthvaluesofallthedependentones.Handlingsuhreursivedenitions
oinidestothewell-knownNP-hardproblemofndingaminimalyleutsetinagraph.
In this paper, these two omputationally-heavy problems are addressed. The rst one
byrestriting dedution to Boolean onstraintpropagation,only. The seond one byusing
graph-orientedheuristis.
Letusrstreallsomeneessarydenitions aboutBooleanonstraintpropagation.
5 Boolean onstraint propagation (BCP)
Booleanonstraintpropagationorunitresolution,isoneofthemostusedandusefullookahead
algorithmforSAT.
Let beaCNFformula,BCP()istheCNFformulaobtainedbypropagatingallunit
literalsof.Propagatingaunit literallof onsistsinsuppressing alllauses of suh
thatl2 andreplaing alllauses 0
of suhthat :l2 0
by 0
nf:lg.TheCNFobtained
insuhawayis equivalentto withrespetto satisability.
Theset of propagated unitliterals of using BCP isnotedUP(). Obviously,wehave
ItisalsoompleteforHornformulas.InadditiontoitsuseinDPLLproedures,BCPisused
in manySAT solversasaproessingstep todedue further interestinginformation suh as
implied[5℄ and equivalentliterals [2℄[9℄.Loal proessingbased-BCP is alsoused to deliver
promisingbranhingvariables(heuristiUP [10℄).
Inthesequel,itisshownthatBCPanbefurtherextended,allowingmoregeneralfun-
tionaldependeniestobeextrated.
6 BCP and funtional dependenies
Atually,BCPanbeusedtodetethiddenfuntionaldependenies.Themainresultofthe
paperisthepratialexploitationofthefollowingoriginalproperty:gatesanbeomputed
usingBCP only, while heking whether agateis alogialonsequeneof aCNFis oNP-
ompleteinthegeneralase.
Property1. Let beaCNFformula,l2L(),and2 s.t.l2.Ifnflg:UP(^l)
thenl=^(:fnflgg).
Proof. Let = fl;:l
1
;:l
2
;:::;:l
m
g 2 s.t. nflg = f:l
1
;:l
2
;:::;:l
m
g :UP(^l).
The Boolean funtion l = ^(:fnflgg) an be written as l = ^(l
1
;l
2
;:::;l
m
). To prove
that l = ^(l
1
;l
2
;:::;l
m
), we need to showthat every model of , is also a model of
l=^(l
1
;l
2
;:::;l
m
).LetI beamodelof,then
1. liseithertrueinI :I isalsoamodelof^l.Asf:l
1
;:l
2
;:::;:l
m
g:UP(^l),we
havefl
1
;l
2
;:::;l
m
gUP(^l), then fl
1
;l
2
;:::;l
m
g aretrue in I. Consequently, I is
alsoamodelofl=^(l
1
;l
2
;:::;l
m gg);
2. orlisfalseinI:as=fl;:l
1
;:l
2
;:::;:l
m
g2thenIsatises=f:l
1
;:l
2
;:::;:l
m g2
. So, at least one the literals l
i
;i 2 f1;:::;mg is true in I. Consequently, I is also a
modelofl=^(l
1
;l
2
;:::;l
m gg)
Clearly,dependingonthesignoftheliterall,and-gatesoror-gatesanbedeteted.Forex-
ample,theand-gate:l=^(l
1
;l
2
;:::;l
n
)isequivalenttotheor-gatel=_(:l
1
;:l
2
;:::;:l
n ).
Letusalsonote that thispropertyoversbinaryequivalene sinea=^(b)is equivalentto
a,b.
Atually, this property allows gates to be deteted, whih were not in the sope the
tehniquedesribedin[12℄.Letusillustratethisbymeansofanexample.
Example1. Let
1
fy_:x
1 _:x
2 _:x
3
;:y_x
1
;:y_x
2
;:y_x
3 g.
Aording to [12℄,
1
an be representedby agraphwhere eah vertex representsalause
andwhere eah edgeorrespondsto theexisteneof tautologialresolventbetweenthetwo
orrespondinglauses. Eah onneted omponent might be agate.As we ansee the rst
four lausesbelong to asame onnetedomponent. Thisis a neessaryonditionfor suh
a subset of lauses to represent a gate. Suh a restrited subset of lauses (namely, those
appearinginthesameonnetedomponent)isthenhekedsyntatiallytodetermineifit
representsanand/orgate.Suhapropertyanbehekedinpolynomialtime. Intheabove
example,wethushavey=^(x
1
;x
2
;x
3 ).
Now,letusonsider,thefollowingexample,
Example2.
2
fy_:x
1 _:x
2 _:x
3
;:y_x
1
;:x
1 _x
4
;:x
4 _x
2
;:x
2 _x
5
;:x
4 _:x
5 _x
3 g.
Clearly,thegraphialrepresentationofthislaterexampleisdierentandtheaboveteh-
niquedoesnothelpus indisoveringthey=^(x
1
;x
2
;x
3
)gate.Indeed,theaboveneessary
butnotsuientonditionisnotsatised.
Now,aordingto Property1,boththeand-gatesbehindExample1and Example2an
bedeteted.Indeed,UP( ^y)=fx ;x ;x g(resp.UP( ^y)=fx ;x ;x ;x ;x g)and
92
1
,(resp.
0
2
2
),=(y_:x
1 _:x
2 _:x
3
)(resp.
0
=(y_:x
1 _:x
2 _:x
3 ))suh
thatnfyg:UP(
1
^y)(resp.
0
nfyg:UP(
2
^y)).
Aordingly,apreproessingtehniquetodisovergatesonsistsinhekingtheProperty
1foranyliteralourringin .A furthersteponsistsin ndingdependentvariablesofthe
originalformulas, as theyan be reognisedin the disoveredgates.A gatelearlyexhibits
onedependentliteral with respetto theinputs whih are onsideredindependent, asfara
singlegate is onsidered.Now, when several gates share literals, suh a haraterisationof
dependent variables doesnotapply anymore.Indeed, forms of yleanour asshown in
thefollowingexample.
Example3.
3
fx=^(y;z);y=_(x;:t)g.
Clearly,
3
ontainayle.Indeed,xdependsonthevariablesyandz,whereasydepends
onthe variables x and t. Whena singlegate is onsidered,assigning truth valuesto input
variables determines the truth value of the output, dependent, variable. As in Example 3,
assigningtruthvaluesto inputvariablesthat arenotoutputvariablesfor othergatesis not
enoughtodeterminethetruthvalueofallinvolvedvariables.Intheexample,assigningtruth
valuesto z and t is notsuient to determinethe truth valueof x and y.However, in the
example,when we assign atruth valueto an additionalvariable (x, whih isalled ayle
utsetvariable) intheyle,thetruth valueof y isdetermined.Aordingly,weneedto ut
suh aform ofyle in orderto determinateasuient subsetofvariables that determines
thevaluesof all variables. Suh aset is alled astrong bakdoorin [17℄.InExample 3, the
strongbakdoororrespondsto thesetof fxg[fz;tg.In thisontext, astrong bakdooris
theunionofthesetofindependentvariablesandofthevariablesoftheyleutset.Finding
theminimalsetofvariablesthatutsalltheylesinthesetofgatesisanNP-hardproblem.
Thisissueisinvestigatedin thenextsetion.
7 Searhing for dependent variables
Inthefollowing,agraphrepresentationoftheinterationofgatesisonsidered.Moreformally,
Aset ofgatesanberepresentedbyabipartitegraphG=(O[I;E)asfollows:
foreahgateweassoiatetwoverties, therstone o2O representsthe outputof the
gate,andtheseondonei2I representsthesetofitsinputvariables.Sothenumberof
vertexislessthan2#gates,where#gates isthenumberofgates;
Foreah gate, anedge (o;i)betweenthe two vertieso and irepresenting theleft and
therighthandsidesofagateisreated.Additionaledgesarereatedbetweeno2Oand
i2I ifoneoftheliteralsoftheoutputvariableassoiatedtothevertexobelongstothe
setofinputliteralsassoiatedto thevertexi.
Finding a smallest subset V 0
of O s.t. the subgraph G 0
= (V 0
[O;E 0
) is ayli is a
well-knownNP-hardproblem.
Atually, any subset V 0
that makes thegraphayli is therepresentation of theset of
variables,whihtogetherwithalltheindependentones,allowsallvariablestobedetermined.
WhenV 0
is of size ,and theset of dependent variables is of size d, then thesearhspae
isredued from 2 n
to 2 n (d )
, where nis the number ofvariable ourring in theoriginal
CNFformula.
Wethusneedtondatrade-obetweenthesizeofV 0
,whihinuenestheomputational
ostto ndit,andtheexpetedtimegaininthesubsequentSAThekingstep.
In thefollowing,twoheuristis are investigated in order to nda yle-utset V 0
. The
rst-oneisalledMaxdegree.ItonsistsinbuildingV 0
inrementallybyseletingvertieswith
thehighestdegreerst,untiltheremainingsubgraphbeomesayli.
The seond one is alled MaxdegreeCyle. It onsists in building V 0
inrementally by
seletingrstavertexwiththehighestdegreeamongthevertiesthatbelongtoayle.This