• Aucun résultat trouvé

A trichotomy for regular simple path queries on graphs

N/A
N/A
Protected

Academic year: 2022

Partager "A trichotomy for regular simple path queries on graphs"

Copied!
21
0
0

Texte intégral

(1)

HAL Id: hal-02435355

https://hal.inria.fr/hal-02435355

Submitted on 25 Sep 2020

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

A trichotomy for regular simple path queries on graphs

Guillaume Bagan, Angela Bonifati, Benoit Groz

To cite this version:

Guillaume Bagan, Angela Bonifati, Benoit Groz. A trichotomy for regular simple path queries on graphs. Journal of Computer and System Sciences, Elsevier, 2020, 108, pp.29-48.

�10.1016/j.jcss.2019.08.006�. �hal-02435355�

(2)

Contents lists available atScienceDirect

Journal of Computer and System Sciences

www.elsevier.com/locate/jcss

A trichotomy for regular simple path queries on graphs

Guillaume Bagan

a

,

, Angela Bonifati

a

, Benoit Groz

b

aUniversitéLyon1,LIRISUMRCNRS5205,F-69622,Lyon,France bUniversitéParisSud,LRIUMRCNRS8623,F-91405,Orsay,France

a r t i c l e i n f o a b s t r a c t

Articlehistory:

Received3June2018

Receivedinrevisedform 23June2019 Accepted19August2019

Availableonline20September2019 Keywords:

Graphs Paths

Regularsimplepaths Complexity Regularlanguages Automata

We focus onthe computational complexity ofregular simple pathqueries (RSPQs). We consider thefollowingproblemRSPQ(L) foraregularlanguage L:givenanedge-labeled digraphG andtwonodesxand y,isthereasimplepathfromxto ythatformsaword belonging toL?We fullycharacterize thefrontierbetweentractability andintractability forRSPQ(L).Moreprecisely,weproveRSPQ(L)iseitherAC0,NL-completeorNP-complete dependingonthe languageL.Wealsoprovideasimplecharacterizationofthetractable fragment in terms of regular expressions. Finally, we also discuss the complexity of deciding whether a language L belongs to the fragment above. We consider several alternative representations of L:DFAs, NFAsorregular expressions,and provethat this problemisNL-completeforthefirstrepresentationandPSpace-completefortheothertwo.

©2019ElsevierInc.Allrightsreserved.

1. Introduction

Graphdatabaseshave beeninvestigatedstarting fromthelate 80s andarenow againinvogue duetotheir wide ap- plicationscenarios, rangingfromsocial networksto biologicalandscientificdatabases(see [2] fora survey).Regularpath queries (RPQs) are one ofthe mostnotable classesof queries on graphdatabases. Theyallow to retrievepairs ofnodes connectedbyapath,wherethepathisdescribedthrougharegularexpression.Suchregularpathqueriesarecomputablein timepolynomialinbothqueryanddatacomplexity(combinedcomplexity).Inthispaper,weinvestigatethecomputational complexityofregularsimplepathqueries(RSPQs),avariantofRPQinwhichthepathconnectingthepairhastobesimple, i.e., does not haverepeated vertices.Given an edge-labeled graph G anda regular language L, an RSPQ selects pairs of verticesconnectedbyasimplepathwhoseedgelabelsformawordinL.

TheevaluationofRSPQsisNP-completeevenforfixedbasiclanguagessuchas

(

aa

)

oraba[20],insharpcontrastwith RPQs.RSPQs are desirablein manyapplicationscenarios [17,23,6,15,13,30], such astransportation problems,VLSIdesign, metabolicnetworks,DNAmatchingandroutinginwirelessnetworks.Additionally,regularsimplepathshavebeenrecently considered in SPARQL 1.1 queries exhibiting property paths. In particular, recent studies on the complexity of property paths inSPARQL [3,18] havehighlighted the hardness ofthe semanticsproposed by W3Cto evaluate such paths inRDF graphs.Roughly speaking,accordingtothe semantics consideredin [18],the evaluationofexpressions underKleene-star closure should return a simple path, whereas the evaluationof the remaining expressions allows to traverse the same vertexmultipletimes.Assuch,thesemanticsstudiedin [18] isanhybridbetweenregularpaths andregularsimplepaths semantics.RPQshavebeenrecentlyfoundinpracticewithinreal-worldSPARQLquerylogs,suchasDBPediaandWikidata

Thisarticleisanextendedversionof[5].

*

Correspondingauthor.

E-mailaddresses:guillaume.bagan@liris.cnrs.fr(G. Bagan),angela.bonifati@univ-lyon1.fr(A. Bonifati),benoit.groz@lri.fr(B. Groz).

https://doi.org/10.1016/j.jcss.2019.08.006

0022-0000/©2019ElsevierInc.Allrightsreserved.

(3)

querylogs [8,7,9],showingthustheincreasinginterestofusersmanipulatingreal-worldgraphdatasetswiththesequeries.

In particular,the majorityofRPQsanalyzed in large corpuses [8,9] belongto thetractable fragment Ctract studiedinthis paper,furtherconfirmingtheapplicabilityandimpactofourtheoreticalinvestigation.

Contributions.Inthispaper,weaddressthelongstanding openquestion [20,6] ofexactlycharacterizingthemaximal class ofregular languagesforwhichRSPQsare tractable.By“tractable”wemeancomputableintime polynomialinthe sizeof thegraph.Precisely,we establishacomprehensiveclassificationofthecomplexityofRSPQsforafixedregularlanguage L.

Afirst steptowardsthisimportantissuehasbeenmadein [20].Theyexhibit atractablefragment: theclassoflanguages closedundertakingsubword.However,theirfragmentisnotmaximal.

Ourcontributions canbe detailedasfollows.Weintroducea classoflanguages,namedCtract,forwhichRSPQscan be evaluated inpolynomial time (data complexity),andevenin NL. Wethen show thatRSPQ evaluationisNP-completefor every regularlanguagethatdoesnotbelongtoCtract.Consequently,themaximaltractablefragment Ctract characterizesthe frontierbetweentractabilityandintractabilityforthisproblem,underthehypothesisNL

=

NP.Wefurtherrefineourresults toshowthefollowingtrichotomy:theevaluationofRSPQproblemiseitherAC0,NL-completeorNP-complete.Wenotethat weconsiderthelanguage Lasafixedparameter,thusourresultscharacterizethedatacomplexityofRSPQevaluation.

We also discuss the complexity of deciding, given a language L, whether the RSPQ problem for L is tractable. We considerseveralalternativerepresentationsofL:DFAs,NFAsorregularexpressions.Weprovethatthisproblemofdeciding tractabilityisNL-completeforthefirstrepresentationandPSpace-completeforthetwoothers.

Next,wegiveacharacterizationofthetractablefragmentCtract foredge-labeledgraphsintermsofregularexpressions.

We also show that Ctract isclosed byunion andintersectionand show thatlanguages inCtract are aperiodic, i.e., can be expressedbyfirst-orderformulas [29].

Weconcludewithsome minorresultsthatidentifyfurthercaseswhereRSPQsadmitefficientsolutions.Wethusprove thatRSPQsare FPTfortheclassoffinitelanguages.Furthermore,weprovethattheproblemisalsoFPTfortheclassofall regular languageswhentheparameteristhesizeofthepath.Finally,weprovethattheproblemRSPQispolynomialw.r.t.

combinedcomplexityongraphsofboundeddirectedtreewidth.Thisisactuallyastraightforwardgeneralizationofaresult of[14].

A preliminary version ofthe presentarticlehas appearedin [5] without proofsofthe main results.Here we provide detailedproofs.

Relatedwork.Afew papersdealwithRSPQsorarerelatedtothem.Lapaughetal. [16] provethatfinding simplepathsof evenlengthispolynomialfornondirectedgraphsandNP-completefordirectedgraphs.Thisstudyhasbeenextendedin [4]

byconsideringpathsoflengthi modk.Similarly,findingk disjointpathswithextremitiesgivenasinputispolynomialfor non directedgraphs [26] andNP-completefordirected graphs [10]. Usingtheseresults,MendelzonandWoodprovethat evaluating anRSPQ onanedge-labeled directedgraphisNP-hardevenforfixed languages [20].However,they show that the problemcanbe decidedinpolynomialtime forsubword-closedlanguages.Theyalsoshowthattheproblembecomes polynomialundersomerestrictionsonthesizeofcyclesofbothgraphandautomaton.Asubsequentpaper [22] provesthe polynomialityfortheclassofouterplanargraphs.Barrettetal. [6] extendthisresult,provingthat theregularsimplepath problem ispolynomial w.r.t.combined complexity forgraphs ofboundedtreewidth. Barrett etal. [6] alsoshow that the problemisNP-completefortheclassofgridgraphsevenwhenthelanguageisfixed.

WealreadymentionedthatthecomplexityofevaluatingSPARQLpropertypathshasbeeninvestigatedinpreviousstud- ies [18,3]. As highlighted above, the semantics of SPARQL property paths blends the arbitrary paths and simple paths semantics. Losemann andMartens [18] andArenaset al. [3] focus onthe complexity of evaluating such property paths.

TheyshowthattheevaluationisNP-completeinseveralcases,andexhibitcasesinwhichitispolynomial.Moreprecisely, LosemannandMartens [18] classifyseveralfragmentsofpropertypathswithrespecttotheircomplexity.Boththeseman- ticsandthequeryfragmentsaredifferentfromtheonesinourpaper.Countingthenumberofpathsthatmatcharegular expression(whichispermittedforinstanceinSPARQL1.1)ishardinmanycases [18,3].Recently,MartensandTrautner [19]

studied thedecision- and enumerationproblemsconcerning theevaluationofRPQsby consideringseveralsemantics: ar- bitrary paths,shortestpaths,andsimplepaths.Whilewe proveherea trichotomyforthedatacomplexityofthedecision problem,theyfocusonthedatacomplexityofthepolynomialdelayenumerationproblem.

Section 2 introduces and illustrates the problem. We then establish our trichotomy. We first show insection 3 why languages outside our tractable fragment have NP-hard complexity.Section 4 discusses some properties ofthe tractable fragment, andshowshow thoseproperties(about stronglyconnectedcomponents) lead toa polynomial evaluationalgo- rithm.Andfinally,Section5detailsouralgorithmtodealwiththetractablelanguages.

2. Preliminaries

Let

[

n

]

denotethesetofintegers

{

1

, . . . ,

n

}

and

[

n

,

m

]

denotethesetofintegers

{

n

, . . . ,

m

}

. 2.1. Complexity

A TM refers to a Turing Machine anda NDTM refers to a non deterministic TuringMachine. AC0

,

L

,

NL

,

P

,

NP

,

PSpace refer to theclassical classesofcomplexity [24]. Therelations AC0

L

NL

P

NP

PSpacebetweentheseclasses are wellknown.

(4)

FL is the class of functions computable by a deterministic log-space transducer. The class LNL is the set of decision problemscomputable by a deterministic log-space algorithmwith an oraclein NL.The class FLNL isthe set offunctions computablebyadeterministiclog-spacetransducerwithanoracleinNL.TheclassFLNL

(

NL

)

isthesetofproblemsreducible toaprobleminNLbyafunctioninFLNL.

Lemma1.[24,11]NL

=

LNL

=

FLNL

(

NL

)

.

NLNL istheclassofdecisionproblemscomputablebyanondeterministiclog-spacealgorithmwithanoracleinNL.The nextlemma istrueonlyifwemakesomerestrictions ontheoraclemachinemodel(see[27]):theTMmustwriteonthe oracletapedeterministicallyi.e.itworksdeterministicallyassoonasitstartstowrite onthetapeuntilitcallstheoracle.

Theoracletapeiserasedattheendofeachcall.

Lemma2.[11]NL

=

NLNL. 2.2. Graphs

Inourpaper,weessentiallyconsiderdb-graphs.Adb-graphisatupleG

= (

V

, ,

E

)

whereV isasetofvertices,

isa setoflabelsandE

V

× ×

V isasetofedgeslabeledbysymbolsof

.Givenaset S

V,G

[

S

]

theinducedsubgraph ofG by S is

(

S

, ,

E

S

× ×

S

)

.Apath pofadb-graphG fromxto y isasequence

(

v1

=

x

,

a1

, . . . ,

vm

,

am

,

vm+1

=

y

)

suchthatforeachi

∈ [

m

+

1

]

,viisavertexinGandforeachi

∈ [

m

]

,

(

vi

,

ai

,

vi+1

)

isanedgeinG.Apath pissimpleifall verticesvi inparedistinct.GivenalanguageL

,pisL-labeledifa1

. . .

am

L.Givenasubset S

V,pis S-restricted ifeveryintermediate vertexof p belongsto S.Givena simplepath pandtwo verticesxand y in p,p

[

x

,

y

]

denotesthe subpathofpfromxto y.

2.3. Languagesandautomata

Let L be a regular language. Given a word w and a language L, w1L

= {

w

:

w w

L

}

. We denote by AL

= (

QL

,

iL

,

FL

,

L

)

the minimal DFA for L, and by ML the number of states ML

= |

QL

|

in AL. Whenever the language is obviousfromcontext,wedropthesubscriptandwrite M insteadofML.Weassumethat AL iscompletei.e.

L isatotal function,so thatingeneral AL mayhavea sinkstate.Foranystate q

Q andword w

,

L

(

q

,

w

)

denotes thestate obtainedwhenreading wfromq.Foranystateq

Q andsetofwords S

,

L

(

q

,

S

)

denotesthesetofstatesq such thatthereexists w

S withq

=

L

(

q

,

w

)

.Finally,Lq denotesthesetofallwordsacceptedfromq.Foreverystateq we denotebyLoop

(

q

)

thesetofallnonemptywordsthatallowtolooponq: Loop

(

q

) = {

w

+

|

L

(

q

,

w

) =

q

}

.Wesaythat astateq isreachablefromastate qifq

L

(

q

,

)

.A(stronglyconnected)componentof AL isamaximalsetofstates thatarepairwisereachable.

TherunofL(or AL)overapathp

= (

v1

,

a1

, . . . ,

am

,

vm+1

)

isthemapping

ρ : {

v1

, . . . ,

vm+1

} →

QL suchthat:

ρ (

v1

) =

iL and

ρ (

vi+1

) =

L

(

iL

,

a1

. . .

ai

)

forevery i

∈ [

m

]

.Therearemanycharacterizationsofaperiodiclanguages [29].Alanguage L isaperiodicifandonlyifitsatisfies

L

(

q

,

wM+1

) =

L

(

q

,

wM

)

foreverystate qandwordw.Intuitively,thatmeansthat foranystate q0 andaword w,intheinfinitesequence

(

q0

,

q1

,

q2

, . . .)

withqi+1

=

L

(

q

,

w

)

foranyintegeri,thereisan integerMsuchthatqM+1

=

qM.

2.4. Regularsimplepaths

GivenaregularlanguageL,wedefinethefollowingproblem:

RSPQ

(

L

)

Input: Adb-graphG

= (

V

, ,

E

)

,andtwoverticesx

,

y

V Question:IsthereasimpleL-labeledpathfromxtoy?

Forthisproblem, L isfixed,sowefocuson datacomplexity.Notice thattherepresentationofL doesnot matterhere.

AlthoughweconsidertheBooleanversionoftheproblem,namelydecidingtheexistenceofapath,ouralgorithmsactually alsoreturnasimpleL-labeledpath.

The main problem that we address in thispaper is to distinguish cases when RSPQ

(

L

)

is tractable (i.e. decidablein polynomialtime)andwhenitisnot(i.e.NP-hard).

2.5. Theclassoftractablelanguages

WerecallthatM referstothesizeof QL,hereandhenceforth.WenextintroducetheclassCtract oflanguages.Wewill provethatitisexactlytheclassofregularlanguagesforwhichRSPQ

(

L

)

istractable.

(5)

Definition1.A regular language L belongs tothe class Ctract if the followingproperty is satisfied: forall pairs of states q1

,

q2

QL andallwordswwithLoop

(

q1

) = ∅

,Loop

(

q2

) = ∅

,q2

L

(

q1

,

)

andw

Loop

(

q2

)

,itholdsthatwMLq2

Lq1. ThisdefinitionismerelyatechnicaldefinitionforCtract,butwewillprovideinTheorem6amoreintuitivecharacteriza- tionsoftheclass.

Example1.Asan introductory example,considerthe language L

=

a

(

bb+

+ )

c.Weobserve thatthislanguage belongs toourclassCtract.WewishtodecideRSPQ(L),i.e.,whetherthereexistsasimplepathfromxto ylabeledby L,giventwo verticesx

,

y ofadb-graphG.ItisnotabsolutelytrivialthatRSPQ(L)canbesolvedefficiently:RSPQ(abc)hasindeedbeen provedNP-complete.YetweoutlinebelowapolynomialalgorithmforL.

Wedistinguishtwocases:thereisasimpleL-labeledpathfromxto yifandonlyifoneofthefollowingcasesholds:

1: thereexistsasimpleabkc-labeledpathfromxto yforsomek

∈ {

0

,

2

,

3

}

2: case 1doesnotholdandthereexistsasimpleab4bc-labeledpathfromxto y.

The first caseis theeasiest to check. We firstcheck whether y canbe reachedfrom x by a(non-necessarily simple) ac-labeledpath.Ifwefindone,we obtainasimpleac-labeledpathby eliminatingits loops.Assumenowthere isno ac-labeledpathfromxto y.Wethencheckasfollowsifthereexistsasimpleabkc-labeledpathfromxto y forsome k

∈ {

2

,

3

}

:we tryeverypossibleassignmentforthekmiddleb-edges.Foreach combinationofk b-edges, wecheckifthe initialb-edgecanbereachedfromxthroughana-labeledpath(avoidingtheverticesoftheotherbedges),andcheckifthe finalb-edgecanleadto ythroughsomec-labeledpath(avoidingtheverticesoftheotherbedges).Intheresultingabkc- labeledpaththea-labeledprefixandc-labeledsuffixcannotintersect(weassumedthereisnoac path).Consequently weobtainasimpleabkc-labeledpathbyeliminatingitsloops.Asthenumberofpossibleassignmentsforkedges(k

3) ispolynomial,wehaveprovedthatwecanfindoutinpolynomialtimewhethercase 1holds.

Letusnowassumew.l.o.g.that thereisnoabkc-labeledpathfromxto y fork

∈ {

0

,

2

,

3

}

.Wecanshow thatinthis secondcasethereexistsasimpleL-labeledpathfromxto y ifandonlyifthereexistsixverticesv1

,

v2

,

v3

,

v4

,

v5

,

v6,two integersla

,

lb andtwosetsSa,Sbsatisfyingallfollowingconditions:

theverticesv1

, . . . ,

v6 arealldistinctexceptthat v3 mayequalv4.

thereisab-labelededgefromv1 tov2,fromv2to v3,fromv4 tov5,andfromv5to v6.

thereisana-labeledpathfromxtov1 avoidingallother vis

(

i

>

1

)

.Theshortestpossiblesuchpathhaslengthla.

Saisthesetofallverticesreachablefromxthroughana-labeledpathoflengthatmostlathatavoidsallvis

(

i

>

1

)

.

thereisab-labeledpathfromv3 tov4 ofwhichallvertices(butthefirstandlast)avoidSa andthevis.Theshortest possiblesuchpathhaslengthlb.

Sb isthesetofall verticesreachablefrom v3 throughanyb-labeledpathoflengthatmostlb that avoids Sa andall other vis

(

i

=

4

)

.

thereisac-labeledpathfromv6 to yofwhichallvertices(butthefirst)avoidSaandSbandallother vis

(

i

<

6

)

. Thefigurebelowsummarizesalltheseconditions.

Theseconditionscanclearlybeverifiedintimepolynomialin G.Itisrelativelyclearalsothatthepathconstructedabove isan L-labeledsimplepathfromx to y,sotheconditionsaresufficienttoobtain an L-labeledsimplepath.Toprovethat our procedureiscorrect, we onlyhavetoprove thatreciprocally,ifthereexists asimple L-labeledpathwe canfind one satisfyingourrestrictions(theconditionsaboveinvolvingthevi,Sa, Sb).

ForeveryshortestL-labeledsimplepathp fromxto y,letv1

, . . . ,

v6 denotetheverticesthatdelimitthefirstandlast two b-edges of p.We nextshow that thoseverticessatisfy theconditionsabove. Thelast vertexof pthat belongsto Sa cannotoccurafterv3 inp.Otherwise,wecouldobtainasimplepath pbyreplacingtheprefixofpuptov withashorter path through Sa.Thisresultingpath p wouldstill be L-labeledby definitionof the vi,whichcontradicts theminimality of p.Asimilarargumentshowsthat thelastoccurrenceofavertexfrom Sb cannotoccurafter v6.We concludethat the paths connectingv3 tov4 in p(resp.v6 to y)excluderespectivelyall verticesfromSa (resp.Sa

Sb).Asaconsequence, the pathfromxto v1 willonlyfeature verticesfromSa byminimalityof p,whichproves thatvertices v1

, . . . ,

v6 satisfy theconditionsabove.

The crux ofour approachis to constructthe a, b andc subpathsindependently, lestwe enumerate exponentially manypaths.Thisiswhywerequirethattheb subpath avoids Sa: thiscondition isstrongerthannecessarytoguarantee thefirsttwosubpathsdonotintersect,butthestrongerrequirementallowsustobuildthetwosubpathsindependently,as Saisasupersetoftheverticesonthesubpathfromxtov1.Ouralgorithmfortractableinstanceswillgeneralizethisidea.

(6)

3. HardlanguagesforRSPQ

Thissectionisdevotedtotheproofofahardnessresult:RSPQ

(

L

)

isNP-hardforeveryregularlanguage Lthatdoesnot belongtoCtract.ThefirststeptowardthatproofliesinthefollowingcharacterizationofCtract.

Definition2(Witnessofhardness).Let L be aregular language. Awitnessforhardnessof L isa tuple

(

wl

,

wm

,

wr

,

w1

,

w2

)

wherewl

,

wr

andwm

,

w1

,

w2

+satisfying

wlw1wmw2wr

L

wl

(

w1

+

w2

)

wr

L

= ∅

.

Lemma3.LetL bearegularlanguagethatdoesnotbelongtoCtract.Then,L admitsawitnessforhardness.

Proof. Let L be a regular language that does not belong to Ctract. For commodity, we distinguish two cases, depending on whether L satisfies or not the following property: Lq2

Lq1 for every q1

,

q2

QL such that q2

L

(

q1

,

)

and Loop

(

q1

)

Loop

(

q2

) = ∅

(PropertyP).

LetLbealanguagethatdoesnotsatisfyPropertyP,thereexistq

,

q2

,

wm

,

w

,

wrsuchthat

L

(

q

,

wm

) =

q2,w

Loop

(

q

)

Loop

(

q2

)

,andwr

Lq2

\

Lq.Letwl suchthat

(

iL

,

wl

) =

q.Then wl

,

wm

,

wr

,

w1

=

w2

=

wisawitnessforhardness.

Wenextplantoexhibitawitness forhardnessforthecasewhereL satisfiesPropertyP,butwefirstprovethatevery language satisfyingproperty P (whetherin Ctract ornot)is aperiodic.Let L be a language satisfyingProperty P, q

QL andw awordin

+.Letalsoqdenotethestateq

=

L

(

q

,

wM

)

.Wedenotebyq thestate

L

(

q

,

w

)

.Wewanttoprove thatq

=

q.Bythepigeonhole principlethereexistssome k0

<

k1

M suchthat

L

(

q

,

wk0

) =

L

(

q

,

wk1

)

.Wethen have

L

(

q

,

wk

) =

q fork

=

k1

k0.Then q andq both loopon wk,sothat Lq

=

Lq bydefinitionofP,henceq

=

q by minimality.Consequently,Lisaperiodic.

Let L be a language that satisfies Property P (and so in particular is aperiodic), butthat does not belong to Ctract. Bydefinition ofCtract there existstates q

,

q2 andwords wl

,

w1

,

w2

,

wm

,

wr such that

(

iL

,

wl

) =

q, w1

Loop

(

q

)

, w2

Loop

(

q2

)

,

L

(

q

,

wm

) =

q2, wr

Lq2 and wM2 wr

/

Lq. W.l.o.g.we can supposethat w1

= (

w1

)

M forsome word w1.We then claimthat Lq

Lq forevery q in

L

(

q

,

w1

)

.Indeed,forevery q

L

(

q

,

w1

)

, thereexists somek

>

0 such that

L

(

q

,

wk1

) =

q, hence q loops over w1 by aperiodicity of L. We thus have w1

Loop

(

q

)

Loop

(

q

)

and therefore Lq

LqduetoPropertyP.

Letwr

=

wM2 wr.Bydefinition,wmw2wr

Lq becausewr

Lq2.Wenowprovethat

(

w1

+

w2

)

wr

Lq

= ∅

,because any wordin

(

w1

+

w2

)

wr can be decomposed into uv withu

+ (

w1

+

w2

)

w1 and v

(

w2

)

wr. We recall that wr

=

w2Mwr

/

LqandLisaperiodic,sothatv

/

Lq.Furthermore,wehavejustprovedthatq

=

L

(

q

,

u

)

satisfiesLq

Lq. Consequently, v

/

Lq anduv

/

Lq.Thus, wl, wm, wr, w1,and w2 providea witnessforhardness, whichconcludesthe proofofLemma3.

We cannow prove ourhardness result, by reduction fromVertex-Disjoint-Path, a problemalsoused in [20] toprove hardnessintheparticularcaseofaba.

Vertex-Disjoint-Path

Input: AdirectedgraphG

= (

V

,

E

)

,fourverticesx1

,

y1

,

x2

,

y2

V

Question:Aretheretwodisjointpaths,onefromx1 toy1 andtheotherfromx2to y2?

Lemma4.LetL bearegularlanguagethatdoesnotbelongtoCtract.Then,RSPQ

(

L

)

isNP-hard.

Proof. LetL

/

Ctract. Weexhibit areduction fromtheVertex-Disjoint-PathproblemtoRSPQ

(

L

)

. AccordingtoLemma3, L admitsawitnessforhardness wl

,

wm

,

wr

,

w1

,

w2.Bydefinitionwegetwl

(

w1

+

w2

)

wr

L

= ∅

andwlw1wmw2wr

L.

Webuild fromG adb-graph Gwhoseedges arelabeled bynon emptywords.Thisisactuallya generalizationofdb- graphs. Nevertheless,by addingintermediate vertices,an edge labeledby a word w canbe replaced witha pathwhose edgesformthewordw.

Gisconstructedasfollows.TheverticesofGarethesameastheverticesofG.Foreachedge

(

v1

,

v2

)

inG,weaddtwo edges

(

v1

,

w1

,

v2

)

and

(

v1

,

w2

,

v2

)

.Moreover,we addtwo newvertices x

,

y andthreeedges

(

x

,

wl

,

x1

)

,

(

y1

,

wm

,

x2

)

and

(

y2

,

wr

,

y

)

.WenextprovethatRSPQ

(

L

)

returnsTruefor

(

G

,

x

,

y

)

iffVertex-Disjoint-PathreturnsTruefor

(

G

,

x1

,

y1

,

x2

,

y2

)

. Assume thereis a simple L-labeled path p fromx to y in G.By definitionof G, thispathnecessarily goesthrough the edge

(

y1

,

wm

,

x2

)

since wl

(

w1

+

w2

)

wr

L

= ∅

.Since p is simple, the subpaths from x1 to y1 andx2 to y2 are disjoint,henceVertex-Disjoint-PathreturnsTruefor

(

G

,

x1

,

y1

,

x2

,

y2

)

.Reciprocally,ifVertex-Disjoint-PathreturnsTruefor

(

G

,

x1

,

y1

,

x2

,

y2

)

,thereexistdisjointpaths fromx1 to y1 andfromx2 to y2.Bydefinitionthesetwopathsmatchaword in

(

w1

+

w2

)

.We canthen obtain twodisjointsimple paths,one from x1 to y1 matchingaword in w1 andone from x2 to y2 matchinga wordin w2.Toobtain thosepathswe keepthevertices astheoriginal paths,eliminate theloopsif

(7)

Fig. 1.Reduction forL=ab(cc)d.

there areany,andswitch w1 andw2 edges whereneeded:wecanalways replacea w1 edgewitha w2 by construction of G sinceeverypairofverticesisconnectedby bothtypesofedges ornone.Concatenatingtheedge

(

x

,

wl

,

x1

)

withthe firstpath,theedge

(

y1

,

wm

,

x2

)

,thesecondpathandtheedge

(

y2

,

wr

,

y

)

providesasimpleL-labeledpath pfromxto y, whichconcludesourproof.WeillustrateinFig.1thereductionforL

=

ab

(

cc

)

d,onaninstance

(

G

,

x1

,

y1

,

x2

,

y2

)

,choosing wl

=

w1

=

a,wm

=

b, w2

=

cc,andwr

=

d.

This concludesour proof that languagesoutsideCtract are intractable. After thisnegative result, we now focuson the positiveresult,namelythatlanguagesinCtract admitefficientalgorithms.

4. PropertiesoflanguagesinCtract

The main result ofthispaper is that forevery L

Ctract,RSPQ

(

L

)

NL. The algorithm toevaluate efficiently RSPQ

(

L

)

exploitsaparticularkindofpumpingargumentbetweenstronglyconnectedcomponentsoftheautomaton.Thispumping argument proves that if we build carefully a path using the usual reachability algorithm inside the strongly connected components,thenweneednotcareaboutpossibleintersectionsbetweensubpathscorrespondingtodifferentcomponents.

Inthissection,weintroduceandprovethispumpingargumentinLemma11throughaserieoftechnicallemmasaboutthe structureofautomatathatrecognizeCtractlanguages.

4.1. AlternativecharacterizationofCtract

Tobeginwith,weprovethateverylanguagefromCtractisaperiodicanddeduceanalternativecharacterizationofCtract. Lemma5.LetL bearegularlanguageinCtract.ThenL isaperiodic.

Proof. IntheproofofLemma3wedefinedapropertyPandshowedthatlanguagessatisfyingpropertyPareaperiodic.We show thatevery L

Ctract satisfiespropertyP.LetL

Ctract,q1

,

q2

QL andw satisfyq2

L

(

q1

,

)

andw

Loop

(

q1

)

Loop

(

q2

)

.BydefinitionofCtract, wMLq2

Lq1,henceLq2

Lq1 becausew

Loop

(

q1

)

.

We then exploit thisaperiodicity property to establish the followingcharacterization of Ctract, which strengthens the requirementsfromDefinition1ontheloopsof AL.

Lemma6.LetL bea regularlanguage.Then, L belongstoCtractiffforeverypairofstatesq1

,

q2

QL suchthatLoop

(

q1

) = ∅

, Loop

(

q2

) = ∅

andq2

L

(

q1

,

)

,thefollowingstatementholds:

(

Loop

(

q2

))

MLq2

Lq1.

Proof. The (if) implication is immediate by Definition 1. Let us now prove the (only if) implication. Assume L

Ctract. Let q1

,

q2

QL satisfy Loop

(

q1

) = ∅

, Loop

(

q2

) = ∅

, q2

L

(

q1

,

)

, and let w

Loop

(

q2

)

. Let also q3 denote the state

L

(

q1

,

wM

)

.ThenDefinition1implies wMLq2

Lq1.Thus,Lq2

Lq3.Thecruxoftheproofistochoosecarefullyq1,q2 andw toexploittheconstraintsonLq3.

Let q1

,

q2 be two statessuch that Loop

(

q1

) = ∅

,Loop

(

q2

) = ∅

andq2

L

(

q1

,

)

. Let

(

v1

, . . . ,

vM

)

be a sequence of wordsin

(

Loop

(

q2

))

M andq3

=

L

(

q1

,

v1

. . .

vM

)

.WewishtoproveLq2

Lq3.

Forsome i

,

j,0

i

<

j

M, weget

L

(

q1

,

v1

. . .

vi

) =

L

(

q1

,

v1

. . .

vj

)

,usingtheconvention

L

(

q1

,

v1

. . .

vi

) =

q1 for i

=

0.Letu1

=

v1

. . .

vi,u2

=

vi+1

. . .

vjandu3

=

vj+1

. . .

vM.Letq4

=

L

(

q1

,

u1

)

.WeclaimthatLq2

Lq4.Theresultthen follows fromLq2

=

u31Lq2

u31Lq4

=

Lq3.Toprove theclaim, let w

=

u1uM2 andq5

=

L

(

q1

,

wM

)

.As

L

(

q1

,

wM

) =

q5 and w

Loop

(

q2

)

, we get Lq2

Lq5 through Definition 1 with q1

,

q2 and w. Furthermore, u2 belongs to Loop

(

q5

)

because Lisaperiodic.Toconcludetheproof,weobservethatLq5

Lq4,byDefinition1withq5

,

q4 andu2,andbecause

L

(

q4

,

u2M

) =

q4 andu2

Loop

(

q5

)

.1

1 ThislastapplicationofDefinition1correspondsactuallytoobservingthateverylanguageinCtractsatisfiespropertyPfromLemma3.

(8)

4.2. TechnicallemmasonthecomponentsofAL

In thissection, we show propertiesabout the componentsof Ctract languages. Notice that states in a componentare mutually reachable, but not reachable from states in other components that they can reach themselves. From now on, anduntilthe endofthe section,we fix alanguage L

Ctract.We introduce inLemmas9 and11thepumping argument thatwe exploitinthealgorithm tocomputea simplepath.Intheotherlemmaswe proveauxiliaryresults,basedonthe decompositionofthe automatoninstronglyconnectedcomponents.We provethat componentsoflanguagesinCtract are veryparticular,inthesensethateverywordstayinglongenoughinthecomponentissynchronizing.Apreliminarylemma showsthattwodistinctstatesq1 andq2inthesamecomponentcannotlooponthesameword.

Lemma7.Letq1andq2betwostatesbelongingtothesamecomponentofAL.IfLoop

(

q1

)

Loop

(

q2

) = ∅

,thenq1

=

q2.

Proof. Letq1

,

q2asabove,andletw awordinLoop

(

q1

)

Loop

(

q2

)

.AccordingtoDefinition1,wMLq2

Lq1,henceLq2

Lq1 since w

Loop

(

q1

)

.Bysymmetry,Lq2

=

Lq1,whichimpliesq2

=

q1.

Thenexttwolemmascharacterizetheinternallanguageofacomponent.

Lemma8.LetC beacomponentofAL,q1

,

q2

C anda

.Then

L

(

q1

,

a

)

C iff

L

(

q2

,

a

)

C .

Proof. Let q1

=

q2 two states in the same component C. Let a satisfy

L

(

q1

,

a

)

C. Let also w

Loop

(

q1

)

a

and q3

=

L

(

q2

,

wM

)

:a andw necessarilyexistbecauseC isthestronglyconnectedcomponentofq1 andq2.Wenextprove thatq3 belongstoC:byourdefinitionofC,thisimplies

L

(

q2

,

a

)

C.As Lisaperiodic, w

Loop

(

q3

)

,andconsequently, wMLq3

Lq1 byDefinition1.Furthermore,wMLq1

Lq2 alsobyDefinition1.HenceLq3

Lq1andLq1

(

wM

)

1Lq2

=

Lq3.Thus,Lq1

=

Lq3 and,byminimalityofAL,q1

=

q3,sothatq3

C.

Notation1.WedenotetheinternalalphabetofacomponentC ofAL by

C

= {

a

: ∃

q1

,

q2

C

.

L

(

q1

,

a

) =

q2

}

. AsadirectconsequenceofLemma8weget:

Lemma9.LetC beacomponentofAL,q

C andw

.Then

L

(

q

,

w

)

C iffw

(

C

)

.

Finally,weprovethatinsidea component,everywordwithlength atleast M2 issynchronizing. Thisresultisthecore ofourpumpingargumentbetweenstronglyconnectedcomponentsasexposedinLemma11.

Lemma10.LetC beacomponentof AL,

C be theinternalalphabetofC , q1

,

q2 be twostatesof C and w

(

C

)

M2.Then,

L

(

q1

,

w

) =

L

(

q2

,

w

)

.

Proof. Assumethat w

=

a1

. . .

aM2. Foreach i from0 to M2 and

α =

1

,

2, let qα,i =

L

(

,

a1

. . .

ai

)

. Since there are at mostM2 distinctpairs

(

q1,i

,

q2,i

)

,thereexisti

,

j,withi

<

j suchthatq1,i

=

q1,j andq2,i

=

q2,j.ByLemma9,q1,i

,

q2,i

C. Let w

=

ai+1

. . .

aj. We have w

Loop

(

q1,i

)

Loop

(

q2,i

)

, hence q1,i

=

q2,i by Lemma 7. As a consequence,

L

(

q1

,

w

) =

L

(

q2

,

w

)

.

Noticethattheabovelemmastillholdsforw

(

C

)

M2

C.Hereandthereafter,wefixtheconstant N

=

2M2.

Lemma11.Letq1

,

q2betwostatessuchthatLoop

(

q1

) = ∅

,Loop

(

q2

) = ∅

,andq2

L

(

q1

,

)

.LetC bethecomponentthatcontains q2and

CbetheinternalalphabetofC .Then,Lq2

(

C

)

N

Lq1.

Proof. Let w

Lq2

(

C

)

N

. There are some words u

,

v

(

C

)

M2, w

such that w

=

uv w. By Lemma 9 and the PigeonholePrinciple,there exista state q3

C and M

+

1 non-empty words v1

, . . . ,

vM+1 such that v

=

v1

. . .

vM+1 and

L

(

q2

,

uv1

. . .

vi

) =

q3 for every i

∈ [

M

]

. Therefore, w

uv1

(

Loop

(

q3

))

M1vM+1w. By Lemma 10,

L

(

q3

,

uv1

) =

L

(

q2

,

uv1

) =

q3.Thus, wbelongstoboth

(

Loop

(

q3

))

MvM+1wandLq3.ByLemma6,w

Lq1.

Ormainresultfocusesondatacomplexityandthereforeassumesthelanguage(henceN)isconstant.Yetthecomplexity willbeexponentialinN thereforewenextprove,forthesakeofefficiency,thatwecantake N

=

MinLemma11.

Lemma12.Letq1

,

q2betwostatessuchthatLoop

(

q1

) = ∅

,Loop

(

q2

) = ∅

,andq2

L

(

q1

,

)

.LetC bethecomponentthatcontains q2and

CbetheinternalalphabetofC .Then,Lq2

(

C

)

M

Lq1.

Références

Documents relatifs

Queries 1 and 2 showed similar performance to Query 3, while queries 4–7 all returned well over 100 exact results on all the data graphs, thus negating the need to apply APPROX

Such partnerships are critical to the market penetration of graded substrate technology and bring to these business partners the novelty of having larger substrate

By triple puncturing of the dodecacode, we obtain an additive code of length 9, and taking its coset graph, we construct a strongly regular graph of parameters ( 64 , 27 , 10 , 12

Abbeel, “Motion planning with sequential convex optimization and convex collision checking,” The International Journal of Robotics Research, vol. Lasserre, “Optimality in robot

The definition of the Gray code in the present paper could be presented independently of the definition of numeration systems and without showing which are the links between

In the linear case, CRPQ answering is E XP T IME -complete in combined complexity, PS PACE - complete in combined complexity with bounded predicate arity and NL-complete in data

The homeomorphic image of obtained graph be a plane connected graph after each running of the cycle body since any partial graph of rank k contains no cut-vertices.. After running of

Our first contribution is an algorithm that answers nested regular path queries in the combined linear time O(|tdn G,S (P)||P|) with respect to the size of top-down needed subgraph